Sahil Agarwal

Data Engineer

South Delhi, Delhi, India13 yrs 7 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Reduced ETL pipeline runtime from 36 hours to 4 hours.
  • Developed a highly-available data pipeline handling 75,000 QPS.
  • Revamped architecture of a news aggregator for improved performance.
Stackforce AI infers this person is a Data Engineer with expertise in Healthcare and AdTech sectors.

Contact

Skills

Core Skills

Data EngineeringBig DataSoftware Development

Other Skills

AWSAWS GlueAWS LambdaAmazon EKSAmazon Web Services (AWS)AnalyticsApache SparkAutomationBack-End Web DevelopmentCSSContinuous Delivery (CD)Data ArchitectureData ModelingData WarehousingDatabases

About

Yet another software developer, trying to live by the "Non-Zero Days" rules.

Experience

13 yrs 7 mos
Total Experience
2 yrs 3 mos
Average Tenure
3 yrs 6 mos
Current Experience

H1

Senior Data Engineer

Dec 2022Present · 3 yrs 6 mos · Hyderabad, Telangana, India · Remote

  • H1 is healthcare data company that provides insights for life sciences, academic medical institutions, healthcare systems, healthcare professionals and payors.
  • Data Delivery – Worked on compute intensive ETL pipelines (Spark, Scala, Databricks, Airflow, EMR, EKS) that deliver data to client via AWS S3, Azure Blob Storage, SFTP etc.
  • Created documentation and tests, automated manual tasks, replaced per-client solutions with configurations based solutions. This led to reduced team workload and quicker on-boarding of new clients.
  • Made code based and spark config optimizations that reduced runtime of pipelines from 36 hours to 4 hours.
Scala in SparkdockerExtract, Transform, Load (ETL)Data WarehousingAmazon EKSScala+9

Groundtruth

Data Engineer

Dec 2019Dec 2022 · 3 yrs · Gurugram, Haryana, India

  • GroundTruth is a global location technology (adtech) company that leverages data to deliver insights across 100 million places and points of interest spanning 21 countries globally.
  • Status Log – Created an API service, python client to access RDS DB (AWS Lambda, API Gateway, AWS SAM).
  • Dragonglass – Built a highly-available data pipeline (AWS SAM) to consume external data, where an S3 event triggers AWS Lambda which runs an AWS Glue Job (Apache Spark). The job populates an AWS Elasticache (Redis) cluster, which has a traffic of 75,000 QPS and a response time of 40 ms.
  • GTID – Built a pipeline (Spark, AWS EMR) that reconciles multiple external and internal datasets and stores mappings into AWS Elasticache (Redis). The API over this database has a traffic of 40,000 QPS.
  • Brand Insights – Developed the backend for a client portal using GraphQL, Spring Boot, Java.
  • IP accumulator – Created a job that extracts and processes user IPs from Athena table (Airflow, Spark).
  • IF – Real time forecasting of campaign delivery based on user filters. Built a framework to fetch, analyze Theta Sketches and compare cardinalities with source data (AWS Athena).
Problem SolvingAWS LambdaExtract, Transform, Load (ETL)Data WarehousingBig DataData Engineering+8

The smart cube

2 roles

Software Engineer

Promoted

May 2018Nov 2019 · 1 yr 6 mos · Noida, Uttar Pradesh, India

  • The Smart Cube is a finance and strategy consulting company that leverages technology to provide research and analytics services globally to corporations and firms.
  • Started with a project in which I directly interacted with the client to create a web based technology solution for them.
  • Server side development (server selector, apis) focused on scalability using queues (Flask, RabbitMQ).
  • Browser automation tests using selenium, python.
  • Revamped architecture (Django) of a news aggregator - web crawling, news analysis, sentiment analysis.
  • o Re-designed the database in MySQL, reduced 25 redundant tables.
  • o Moved all procedures and views created in MSSQL to Python (Django ORM).
  • o Bash scripts for CI/CD.
  • Built and deployed APIs and micro-services to streamline access to internal tools (Django Rest Framework, MySQL)
  • Built a Crawling Framework to crawl over 1500 websites (Scrapy, Selenium, Python, Django, xpath).
  • Built a forecasting engine which was used to take part in M4 Competition (www.m4.unic.ac.cy).
Problem SolvingDatabasesAutomationAnalyticsContinuous Delivery (CD)Amazon Web Services (AWS)

Associate Engineer

May 2017May 2018 · 1 yr · Noida, Uttar Pradesh, India

Problem SolvingDatabasesAutomationAnalyticsContinuous Delivery (CD)

Snapdeal

Technology Intern

Jan 2017May 2017 · 4 mos · Gurugram, Haryana, India

  • My work @ Snapdeal so far has been primarily in the area of Search related technologies. Information retrieval (IR) or "Search", is an established field of research and practice for the past few decades. There is relatively mature literature and tooling around "Search". "Lucene", "Solr", "ElasticSearch" are all examples of robust opensource softwares that allow for quick bootstraping of search applications.
  • Implemented Index sorting of documents and Early Search Termination of queries in Apache Solr, resulting in increase of query response speed by 540% (tested on 30 million documents).
  • Migration of Apache Solr index from Solr 4 to Solr 6.
Problem Solving

Craftsvilla

Technology Intern

Jun 2016Aug 2016 · 2 mos · Mumbai, India

  • Worked with the api development team @ Craftsvilla.
Problem Solving

Make ‘n’ live

Technology Intern

Jun 2015Jul 2015 · 1 mo · Greater Delhi Area

  • The company is an Internet startup focused on providing interior solutions to apartment owners in high-rise buildings. I was involved in digital marketing and optimizing the database for the company.

Rotaract

Vice President

Aug 2014Nov 2015 · 1 yr 3 mos · Greater Delhi Area

  • Vice President of Rotaract Club of Apeejay Stya University

Abhay finance solutions pvt. ltd.

Summer Intern

Jun 2014Jul 2014 · 1 mo · Greater Delhi Area

  • Learnt concepts of financial investment, how the stock market works and its indicators.
  • Trained in different kinds of stock market analysis and learnt different approaches to analyzing the market.

Interact club

Member

Mar 2008Mar 2011 · 3 yrs · Apeejay School, Sheikh Sarai, New Delhi, India

  • Interact is Rotary International's service club for young people ages 12 to 18. The clubs are sponsored by individual Rotary clubs, which provide support and guidance, but they are self-governing and self-supporting.

Education

Apeejay Stya University

Bachelor of Technology (B.Tech.) — Computer Science and Engineering

Jan 2014Jan 2017

Apeejay School, Sheikh Sarai

High School — Science

Jan 2000Jan 2012

Stackforce found 100+ more professionals with Data Engineering & Big Data

Explore similar profiles based on matching skills and experience