Sachin Singh

Software Engineer

Gurugram, Haryana, India2 yrs 9 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

Designed scalable ETL pipelines using PySpark and Databricks.
Migrated data storage to Delta Lake, reducing costs by 25%.
Automated data validation, cutting manual checks by 90%.

Stackforce AI infers this person is a Data Engineering specialist in the SaaS industry.

Contact

Skills

Core Skills

Data EngineeringBig Data

Other Skills

Apache AirflowApache SparkAutomationCloud StorageData MigrationData ReconciliationData WarehousingDatabricksDelta LakeExtract, Transform, Load (ETL)JavaNoSQLPySparkSFTPSpark

About

Hi, I’m Sachin Singh, a Software/Data Engineer with 2+ years of experience designing and optimizing scalable data pipelines, ETL workflows, and data lake solutions. I focus on building reliable big data systems that support analytics, reporting, and business intelligence. I have hands-on experience working with PySpark, Airflow, Kafka, and Databricks, building and optimizing large-scale data pipelines. My work includes migrating pipelines to cloud platforms, leveraging Delta Lake for efficient storage and incremental processing, and applying Spark performance optimizations to handle growing data volumes with speed and reliability. 🚀 Key Skills: 💻 Programming: Python, SQL, Java 📊 Data Engineering: ETL/ELT Pipelines, Data Modeling, Delta Lake ⚡ Big Data: PySpark, Spark SQL, Hadoop, Spark Streaming ☁️ Cloud Platforms: Azure Databricks, ADF, ADLS | AWS S3, Glue 🛠️ Orchestration & Streaming: Apache Airflow, Kafka 🐳 Tools & DevOps: Docker, Git, Linux I enjoy solving data challenges and continuously learning new technologies in the modern data stack. I look forward to connecting with professionals in the data space and exploring opportunities to solve complex challenges together.

Experience

2 yrs 9 mos

Total Experience

2 yrs 9 mos

Average Tenure

2 yrs 9 mos

Current Experience

Spectramedix

2 roles

Software Engineer

Promoted

May 2025 – Present · 1 yr · Gurugram, Haryana, India · Hybrid

Designed and optimized scalable ETL pipelines using PySpark and Databricks to process diverse datasets (JSON, CSV, Parquet, XML).
Applied Spark performance tuning (caching, partitioning, broadcast joins), reducing runtimes by 30% and improving cluster utilization.
Migrated data storage to Parquet/Delta Lake, enabling faster queries and reducing costs by 25%.
Implemented SCD1 and SCD2 logic to support accurate historical and current reporting.
Orchestrated complex workflows using Apache Airflow, improving reliability through dependency management, retries, and scheduling.
Leveraged Delta Lake features (MERGE, UPSERT) for efficient incremental updates and transactional consistency.
Resolved Spark data duplication issues, cutting storage needs by 50% and boosting pipeline performance by 70%.

PySparkDatabricksApache AirflowDelta LakeData EngineeringBig Data

Jr. Software Engineer

Jul 2023 – Apr 2025 · 1 yr 9 mos · Gurugram, Haryana, India · Hybrid

Supported the migration of legacy pipelines to Databricks and cloud storage (ADLS, S3), improving scalability and cutting runtimes by 20%.
Built data ingestion pipelines to load flat files into the data lake, enabling downstream analytics.
Automated reconciliation and feed validation with Spark, reducing manual checks by 90%.
Developed a Java-based SFTP file validation solution to ensure consistent ingestion of upstream data.
Performed Spark SQL analysis and production fixes, maintaining data quality across environments.
Delivered performance improvements in Spark pipelines, reducing execution times by 20% on large cloud datasets.

DatabricksSpark SQLAutomationJavaData EngineeringBig Data