Rajat Ahuja

Senior Software Engineer

San Francisco, California, United States13 yrs 10 mos experience

Highly Stable

Key Highlights

Expert in building real-time data ingestion platforms.
Proven track record in cloud infrastructure migration.
Strong background in data engineering and machine learning.

Stackforce AI infers this person is a Data Engineering expert with strong cloud computing capabilities.

Contact

Skills

Core Skills

Data EngineeringMachine LearningCloud Computing

Other Skills

FlinkWeb Crawl DataSearchAudio/Video pipelinesindexingSampling pipelinesPerformance measurementScaldingSparkDataflowKubernetesAuroraGoogle Cloud PlatformHadoopBigQuery

Experience

13 yrs 10 mos

Total Experience

3 yrs 5 mos

Average Tenure

Current Experience

Apple

Senior Software Engineer

Aug 2023 – Aug 2025 · 2 yrs · San Francisco Bay Area · On-site

Building a Generic Real time Ingestion platform on top of Flink for the Web Crawl Data that supports variety of use cases from Search to Audio/Video pipelines to support various indexing and machine learning use cases.
Worked on Sampling pipelines for the Siri data to measure the performance of various features released across iPhones/iOS.

FlinkWeb Crawl DataSearchAudio/Video pipelinesindexingmachine learning+2

Twitter

Senior Software Engineer

May 2018 – Jun 2023 · 5 yrs 1 mo · San Francisco Bay Area · On-site

Collaborated on various Data Processing Libraries like Scalding, Spark, Dataflow to run an End to End job in the most simplistic manner at Twitter.
Integrated Scalding/Spark/Dataflow jobs with Twitter Data Discovery and State-managed Services to run Scheduled jobs on top of Kubernetes/Aurora.
Build an Auto-tuning Service for Scalding Jobs to dynamically adjust the Containers Memory/Numbers necessary for the Job depending on the history of previous scheduled runs, resulting in greater Hadoop Cluster usage.
Lead a project to migrate existing on-premises infrastructure to Google Cloud Platform in order to perform batch jobs.
Collaborated with peers to develop an Event-Driven Scheduler for replicating data from various Data stores, including Hadoop distributed file system, Google Cloud Storage, Key Value NoSQL, BigQuery.
Develop a Managed Service to replicate data from BigQuery to our in-house Key-value store in order to serve machine learning models in the Serving path.
Consolidated all backend metrics into a single store that is used for visualization and to automatically optimize scala/spark jobs.

ScaldingSparkDataflowKubernetesAuroraGoogle Cloud Platform+4

Inmobi

software Engineer

Apr 2012 – May 2018 · 6 yrs 1 mo

Implemented a new Spark Join API (Bucked By Bucket Join) that joins Map/Reduced Partitioned Datasets based on the available partitions.
It decreased overall duration of some crucial jobs by 80 %.This idea was inspired by Map Side join in Map reduce.
Build pipelines to measure how many people near the point of interest (Macdonald's,
Universities) were seen in a certain time frame (hour, day, week) and within a certain radius (250m, 500m).
Collaborated on the creation of services that locate the user's location based on Lat Long, IP to calculate the Zip/City/State to efficiently target advertisements.
Built Geo-based location-hygiene pipelines to eliminate fraud Geo-inaccurate site IDs.

SparkMap ReduceGeo-based location servicesData Engineering