Sarthak Madan

Data Engineer

Delhi, India3 yrs 10 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

Reduced ETL runtimes by 90% using PySpark.
Implemented cost-saving strategies, lowering cloud spend by 25%.
Enhanced data quality, preventing 40% of downstream issues.

Stackforce AI infers this person is a Data Engineer specializing in cloud-native data pipeline development.

Contact

Skills

Core Skills

Apache AirflowAwsData Quality EngineeringPysparkEtlData AnalysisData Engineering

Other Skills

AWS Step FunctionsAmazon EC2Amazon Elastic MapReduce (EMR)Amazon RedshiftAmazon S3Amazon Web Services (AWS)Analytical SkillsArcGISAws s3Cloud ApplicationsDatabasesELTExtract, Transform, Load (ETL)Geographic Information Systems (GIS)Git

About

Data Engineer with 3+ years of experience building scalable, cloud-native data pipelines using PySpark, Airflow, and AWS. I specialize in transforming legacy ETL workflows into automated, high-performance architectures that reliably process 100M+-record datasets with minimal latency.I’ve delivered major efficiency wins—cutting ETL runtimes by 90%, reducing cloud spend through EMR/S3 optimization, and improving data quality with validation frameworks that prevent 35–40% of downstream issues.My strengths include workflow orchestration, cost-efficient cloud design, performance tuning, data quality engineering, and end-to-end pipeline ownership. I enjoy solving complex data problems, modernizing systems, and building reliable pipelines that scale.

Experience

3 yrs 10 mos

Total Experience

3 yrs 10 mos

Average Tenure

3 yrs 10 mos

Current Experience

Precisely

4 roles

Data Engineer 2

Promoted

Jul 2025 – Present · 10 mos

Automated deployment of Airflow DAGs using CI/CD, reducing release cycles from weeks to days.
Implemented S3 lifecycle policies and optimized Parquet storage, lowering monthly cloud costs by 25%.
Built and maintained a 120M+ record data product using Python, SQL, Airflow, S3, and Redshift.
Added data validation and monitoring workflows, reducing data issues by 40%.
Migrated orchestration from AWS Step Functions → Airflow, improving workflow speed and reliability by 20%.
Performed AWS cost analysis across EMR and S3, identifying idle compute and reducing cloud spend by 18%.

Amazon RedshiftExtract, Transform, Load (ETL)GitAws s3Apache AirflowData Analysis+9

Data Engineer 1

Promoted

Jul 2024 – Jul 2025 · 1 yr

Built & optimized PySpark ETL pipelines on AWS EMR, reducing runtime by 90%.
Modernized legacy ETL into modular Python pipelines and Airflow DAGs, cutting manual effort by 30%.
Migrated workflows from AWS Data Pipeline → Step Functions, improving orchestration reliability.
Designed Tableau dashboards on Redshift for real-time data quality insights.
Improved query performance via partitioning, clustering, and SQL tuning across multiple datasets.

Microsoft OfficeMicrosoft ExcelArcGISAmazon Web Services (AWS)LinuxSQL+25

Associate Software Engineer

Jul 2022 – Jul 2024 · 2 yrs

Built Spark-based auditing tools, reducing audit time by 75%.
Optimized ETL for 150M+ records, improving transformation performance by 80%.
Reengineered Redshift SQL with Python integration, reducing processing time by 30%.
Implemented automated data quality rules (null checks, duplicates, schema validation), reducing issues by 35%.

Amazon Web Services (AWS)LinuxSQLPython (Programming Language)ELTDatabases+11