Navnoor Singh

Software Engineer

India4 yrs 9 mos experience

Highly Stable

Key Highlights

Processed over 300 TB/day of data.
Reduced cloud costs by 40% through optimization.
Improved data pipeline performance significantly.

Stackforce AI infers this person is a Data Engineer with expertise in Big Data and cloud-native architectures.

Contact

Skills

Core Skills

Apache SparkAws

Other Skills

Apache IcebergKafkaKubernetesPythonEMRAirflowReal-time analyticsETLRESTPySparkDistributed ComputingBig DataApache AirflowExtract, Transform, Load (ETL)Hadoop

About

Performance-driven Data Engineer (4+ years of experience) specializing in building large-scale data pipelines, real-time streaming systems, and cloud-native data platforms. Currently working as an SDE 3 (Big Data) at Baazi Games, where I design and optimize high-volume data systems powering gaming analytics, financial transactions, and user behavior insights. I have hands-on experience processing 300+ TB/day of data, reducing cloud costs by 40%, and enabling faster BI insights through scalable Spark-based architectures on AWS. 🔹 Core Expertise: • Apache Spark (batch & streaming) • Apache Iceberg (Lakehouse architecture, partitioning, compaction) • Kafka & Debezium (CDC pipelines) • AWS (EMR, S3, Glue, Athena) • Kubernetes & distributed systems 🔹 What I work on: • Designing end-to-end CDC pipelines (MySQL/Postgres → Kafka → Iceberg) • Building and optimizing real-time and batch data pipelines at scale • Implementing high-performance Lakehouse architectures • Tuning Spark jobs (AQE, joins, memory optimization) • Ensuring data reliability, scalability, and fault tolerance 🔹 Impact Highlights: • Processed 300+ TB/day of data across distributed systems • Reduced cloud costs by 40% through optimization and efficient architecture • Improved data pipeline performance and query latency for faster analytics 🔹 Interests: I am deeply interested in distributed systems, real-time analytics, and next-generation data architectures inspired by companies like Netflix and Uber. Always open to connecting and discussing data engineering, system design, and scalable data platforms.

Experience

4 yrs 9 mos

Total Experience

2 yrs 2 mos

Average Tenure

4 mos

Current Experience

Baazi games

SDE3 - Big Data

Jan 2026 – Present · 4 mos · Delhi, India · Hybrid

Big Data | Designing Lakehouse & CDC Pipelines | Spark | Iceberg | Kafka | AWS | Airflow | Real-time Analytics

Apache SparkApache IcebergKafkaAWSKubernetes

Nielsen

Member of Technical Staff - 2

Feb 2025 – Jan 2026 · 11 mos · Gurugram, Haryana, India · Hybrid

◦ Spearheaded migration of legacy Informatica system to a modern Spark + Python/Polars stack,
cutting licensing costs and boosting throughput.
◦ Reduced Informatica license cost worth $800k and minimized developer hours by building a
framework and automatic DAG creator to accelerate job migration.
◦ Architected a metadata-driven Spark framework + AI agent, reducing migration timelines by
60%.
◦ Adopted EMR Serverless, realizing 40% cost savings on intermittent workloads.
◦ Built Python automation scripts to monitor EMR clusters and terminate idle resources, saving
thousands in cloud spend.

Apache SparkPythonEMRAirflowAWS

Airtel digital

3 roles

Senior Software Engineer

Promoted

Dec 2023 – Feb 2025 · 1 yr 2 mos

◦ Reduced Spark job runtime by 37.5% (4h → 2.5h) on trillions of records, improving data
availability.
◦ Scaled real-time data apps processing 300+ TB/day, powering personalization and BI analytics.
◦ Developed a Spark-on-Kubernetes operator to abstract infra complexity and increase pipeline
reliability.
◦ Built a generic metadata-driven codebase, cutting new aggregation task development time by
50%.

Apache SparkKubernetesReal-time analyticsAWS

Software Engineer

Aug 2021 – Jan 2024 · 2 yrs 5 mos

Engineered clustering-based pipelines to infer work/home locations from telecom data for
geospatial analytics.
◦ Designed a pipeline to calculate user transportation modes from CDR data, powering mobility
insights.
◦ Built observability tools (Spark Listener + Airflow integration), reducing troubleshooting time.
◦ Standardized deployments with a custom Airflow operator, reducing manual errors by 80%

ETLREST