Rajat Ahuja

Senior Software Engineer

San Francisco, California, United States13 yrs 10 mos experience
Highly Stable

Key Highlights

  • Expert in building real-time data ingestion platforms.
  • Proven track record in cloud infrastructure migration.
  • Strong background in data engineering and machine learning.
Stackforce AI infers this person is a Data Engineering expert with strong cloud computing capabilities.

Contact

Skills

Core Skills

Data EngineeringMachine LearningCloud Computing

Other Skills

FlinkWeb Crawl DataSearchAudio/Video pipelinesindexingSampling pipelinesPerformance measurementScaldingSparkDataflowKubernetesAuroraGoogle Cloud PlatformHadoopBigQuery

Experience

13 yrs 10 mos
Total Experience
3 yrs 5 mos
Average Tenure
--
Current Experience

Apple

Senior Software Engineer

Aug 2023Aug 2025 · 2 yrs · San Francisco Bay Area · On-site

  • Building a Generic Real time Ingestion platform on top of Flink for the Web Crawl Data that supports variety of use cases from Search to Audio/Video pipelines to support various indexing and machine learning use cases.
  • Worked on Sampling pipelines for the Siri data to measure the performance of various features released across iPhones/iOS.
FlinkWeb Crawl DataSearchAudio/Video pipelinesindexingmachine learning+2

Twitter

Senior Software Engineer

May 2018Jun 2023 · 5 yrs 1 mo · San Francisco Bay Area · On-site

  • Collaborated on various Data Processing Libraries like Scalding, Spark, Dataflow to run an End to End job in the most simplistic manner at Twitter.
  • Integrated Scalding/Spark/Dataflow jobs with Twitter Data Discovery and State-managed Services to run Scheduled jobs on top of Kubernetes/Aurora.
  • Build an Auto-tuning Service for Scalding Jobs to dynamically adjust the Containers Memory/Numbers necessary for the Job depending on the history of previous scheduled runs, resulting in greater Hadoop Cluster usage.
  • Lead a project to migrate existing on-premises infrastructure to Google Cloud Platform in order to perform batch jobs.
  • Collaborated with peers to develop an Event-Driven Scheduler for replicating data from various Data stores, including Hadoop distributed file system, Google Cloud Storage, Key Value NoSQL, BigQuery.
  • Develop a Managed Service to replicate data from BigQuery to our in-house Key-value store in order to serve machine learning models in the Serving path.
  • Consolidated all backend metrics into a single store that is used for visualization and to automatically optimize scala/spark jobs.
ScaldingSparkDataflowKubernetesAuroraGoogle Cloud Platform+4

Inmobi

software Engineer

Apr 2012May 2018 · 6 yrs 1 mo

  • Implemented a new Spark Join API (Bucked By Bucket Join) that joins Map/Reduced Partitioned Datasets based on the available partitions.
  • It decreased overall duration of some crucial jobs by 80 %.This idea was inspired by Map Side join in Map reduce.
  • Build pipelines to measure how many people near the point of interest (Macdonald's,
  • Universities) were seen in a certain time frame (hour, day, week) and within a certain radius (250m, 500m).
  • Collaborated on the creation of services that locate the user's location based on Lat Long, IP to calculate the Zip/City/State to efficiently target advertisements.
  • Built Geo-based location-hygiene pipelines to eliminate fraud Geo-inaccurate site IDs.
SparkMap ReduceGeo-based location servicesData Engineering

Amazon.com

Software Development Engineer

Jul 2011Mar 2012 · 8 mos

Education

IIIT allahabad

B-Tech — IT

Jan 2007Jan 2011

Indian Institute Of Information Technology Allahabad

B-Tech — Information Technology

Stackforce found 100+ more professionals with Data Engineering & Machine Learning

Explore similar profiles based on matching skills and experience