Shubam Sharma

Data Engineer

Pune, India4 yrs 8 mos experience
Highly Stable

Key Highlights

  • Achieved 70% cost savings in ETL processes.
  • Enhanced healthcare data accuracy with 97% matching.
  • Built scalable geospatial data pipelines across 197 countries.
Stackforce AI infers this person is a Senior Data Engineer specializing in Geospatial and Healthcare data systems.

Contact

Skills

Core Skills

SparkScalaFastapiBigquery

Other Skills

JavaAWSParquetProtobufKafkaRedisCassandraKubernetesS3OSMPythonGCPMSSQLDockerGitHub Actions

About

I work on Apple Maps' Geo Data Platform, mostly Spark/Scala pipelines that process geospatial and probe data across 197 countries. The stuff I've been focused on recently: cutting a 10+ hour ETL job down to 4.6 hours (70% cost savings), fixing a conflation bug that was generating 10 million false deltas per cycle, and building a pipeline to refresh airport geometry data from OSM that hadn't been touched since 2019. Before this I was at FIGmd for about 3.5 years working on healthcare data. Built Spark ETL jobs for patient-payer record matching, FastAPI services on GCP Cloud Run for medical record retrieval, and an event-driven reporting system with BigQuery and Pub/Sub. I like working on systems where getting the data right actually matters - whether that's someone's navigation route or a patient's medical records. Stack: Spark, Scala, Python, SQL, AWS (EMR/EKS/S3), GCP (BigQuery, Cloud Functions, Dataproc), Kafka, Cassandra, Redis, Parquet, Protobuf Open to Senior Data Engineer roles.

Experience

4 yrs 8 mos
Total Experience
3 yrs 10 mos
Average Tenure
10 mos
Current Experience

Apple

Senior Data Engineer

Jul 2025Present · 10 mos · Hyderabad · On-site

  • Working in the Maps org on the Geo Data Platform team. My day to day is Spark/Scala pipelines that handle geospatial data ingestion, conflation, and validation for Apple Maps across 197 countries.
  • Some things I've worked on:
  • Optimized a probe data ETL pipeline (~100 TiB/month across 10 countries). Brought runtime down from 10+ hours to 4.6 hours and cut monthly costs by 70% through serialized persistence, S3 checkpointing, and a tiered executor config (200/500/1000 executors based on country volume).
  • Found and fixed a bounding box bug in building conflation that was creating 10.3 million false deltas per cycle. One line fix, 97% reduction in noise.
  • Fixed a turn restriction bug that was incorrectly blocking valid turns on highways in 3 countries. Traced through 500+ lines of Spark transformations to find the root cause in path selection logic.
  • Built a Spark pipeline to refresh a stale airport geometry resource (untouched since 2019) by extracting from 1.1B+ OSM ways. Expanded coverage from 15.6K to 23K airports across 238 countries.
  • Debugged and stabilized 3P address ingestion pipelines across multiple countries during EMR to EKS migration. Fixed dynamic allocation issues, S3 URI scheme conflicts, serialization bugs.
  • Extended the OSM validation framework with 10+ data quality checks and fixed MapRoulette upload logic for per-country projects.
  • Tech: Spark, Scala, Java, AWS (EMR, EKS, S3), Parquet, Protobuf, Kafka, Redis, Cassandra, Kubernetes
SparkScalaJavaAWSParquetProtobuf+4

Figmd, inc.

Senior Software Engineer

Aug 2021Jun 2025 · 3 yrs 10 mos · Pune District · Remote

  • Worked on the data platform team building pipelines and services for healthcare data - clinical records, payer data, and reporting systems. Mix of Spark/Scala for batch processing and Python/FastAPI for APIs.
  • Built a Spark/Scala ETL pipeline that matched patient records with payer data by pulling from GCS and MSSQL. Got it to 97% matching accuracy with configurable rules per practice, and cut runtime by 40% through partition tuning and predicate pushdown.
  • Developed FastAPI microservices deployed on GCP Cloud Run for medical record retrieval from Cassandra and GCS. These served clinical workflows with strict SLA requirements.
  • Built an event-driven reporting system using Cloud Functions, Pub/Sub, and BigQuery. Cloud Scheduler handled orchestration, Log Sinks for observability. Fully serverless.
  • Migrated legacy services from Python 3.6 to 3.12. Set up CI/CD pipelines with 88% test coverage. Added structured logging and monitoring that brought MTTR down by 35%.
  • Tech: Spark, Scala, Python, FastAPI, GCP (Cloud Run, BigQuery, Cloud Functions, Pub/Sub, GCS), Cassandra, MSSQL, Docker, GitHub Actions
SparkScalaPythonFastAPIGCPCassandra+3

Education

Army Institute of Technology (AIT), Pune

Bachelor of Engineering - BE — Computer Science

Jul 2017Jul 2021

Stackforce found 100+ more professionals with Spark & Scala

Explore similar profiles based on matching skills and experience