Nitin Pandey

Engineering Manager

Bengaluru, Karnataka, India9 yrs 11 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

Over 10 years of experience in data engineering.
Expert in big data technologies and cloud platforms.
Proven track record of successful data migrations.

Stackforce AI infers this person is a Data Engineering expert with extensive experience in SaaS and AdTech industries.

Contact

pandeyn@uber.com LinkedIn

Skills

Core Skills

Big DataData Engineering

Other Skills

AWSAgile MethodologiesAirflowAlgorithmsAnalytical SkillsApache BeamApache SparkBigQueryBusiness AnalysisCC++CSSCascading Style Sheets (CSS)CodingData Analysis

About

Engineer with 10+ years working in the data domain. TECH STACK / FRAMEWORKS USED: Cloud Platforms - AWS, GCP, Databricks Big Data – • OSS - Apache Spark, Apache Kafka, Apache Hudi, Delta, Apache Airflow • CDH – Hadoop, Yarn, Hive, HDFS, MapReduce, Impala • GCP - Apache Beam (Cloud Dataflow), Big query, Stack Driver • AWS – Redshift, Athena, DynamoDB • Databricks – Delta Lake, Databricks Delta Languages – Python, Scala, Java, Shell-Scripting Backend - Redis, Celery, Flask, MongoDb

Experience

9 yrs 11 mos

Total Experience

2 yrs 5 mos

Average Tenure

4 yrs

Current Experience

Uber

3 roles

Engineering Manager II

Promoted

Jun 2024 – Present · 1 yr 11 mos

Staff Software Engineer

Jan 2024 – Jun 2024 · 5 mos

Senior Software Engineer

Apr 2022 – Dec 2023 · 1 yr 8 mos

Makemytrip

2 roles

Lead Data Engineer

Jun 2020 – Apr 2022 · 1 yr 10 mos

Data Migration
Created a utility as a M * N connector which can read data from multiple sources (like MySQL, SQL- SERVER, Redshift, MongoDB, Kafka) and write to sinks (Delta Lake, Redshift). Supports run-time data transformations. Written an optimized spark JDBC reader which allows parallel and distributed reads. Has inbuilt offset management, metric logging and monitoring for slow jobs/failures.
Redshift to Delta Migration
Migrated from data warehouse architecture in AWS Redshift to Databricks Delta Lake. Added support for delta in all the ingestion tools and created parallel workloads in Delta and ensured smooth transition without any business impact or downtime.

Data MigrationSpark JDBC ReaderDelta LakeRedshiftData TransformationBig Data+1

Senior Data Engineer

Mar 2019 – Jun 2020 · 1 yr 3 mos

Airflow Platform
Provided Airflow as a service for entire organization. The setup was powered by a GitHub repo and docker. Adding a new DAG, modifications was made easier for everyone without managing Airflow. Currently runs over 300 DAGs for all lines of businesses.
Go-Memory
Created a platform to get user interactions done on the website/mobile app with a very low latency in real-time (<1s). The platform allowed consumers to search for queries like – last n hotel/flight searches done by user, gross revenue from a user over a window, last n bookings etc. and so on. Powers use cases like hotel rankings, upcoming bookings notifications/alerts for a user, reviews analysis for a user etc.
Goibibo’s Data Platform
Created the initial version of data platform over Amazon Redshift. Dumped all the data to redshift and provided hourly refreshes for batch jobs. Created many ETL/ELT pipelines. Setup schema repository, contacts for data logging and pipelines to ingest this data to Redshift. Wrote 75+ dags in 3 months.

AirflowGitHubDockerETL/ELT PipelinesData EngineeringBig Data

Equifax

Data Engineer

Oct 2018 – Mar 2019 · 5 mos · Bengaluru, Karnataka, India

Created the backend and the data pipeline for optimahub – a platform which helps advertisers optimize the spend and track ROIs spend on each channel – social, search, SEO, SEM, display etc. Deployed the solution using big data technology stack on GCP. Technologies/Frameworks/Languages used:
Apache Beam (Cloud Dataflow), Dataproc, Bigquery, cloud storage, IAM

Apache BeamGCPBigQueryData PipelineData EngineeringBig Data

Zs

2 roles

Senior Data Engineer

Dec 2017 – Feb 2018 · 2 mos

EDL – Common Components
Developed a framework using which provided many utilities which can be used to carry out specific needs of a project. Example – AWS S3 to HDFS Copy can be done using one such utility without needing to write a single code afterwards. Deployed many such utilities as a part of setting backbone of EDL – which was used as a platform to host multiple projects. Technologies/Frameworks/Languages used
1. Hadoop Ecosystem – Hive, Impala, Apache Kafka, Spark, Python 2. Logging – Log4j, Logstash, Kibana, ELK
3. Configuration files – JSON Team Size: 6

HadoopAWSPythonKafkaData EngineeringBig Data

Data Engineer

Jun 2015 – Nov 2017 · 2 yrs 5 mos

Large data files from different vendors containing patient data was received and loaded into
data lake. Multiple transformations and business rules were applied as a part of automated process developed. All the parameters/properties were highly configurable. Provided functionalities like creating copy of data, modify as required and share among colleagues. Technologies/Frameworks/Languages used
1. Hadoop Ecosystem – Hive, Impala, Map-reduce, distcp, s3cmd, oozie, CDH, HDFS, Hue, Kerberos, Python, Java, AWS, Redshift
2. Logging – Log4j, logstash, Kibana, ELK
Team Size: 6

HadoopAWSJavaPythonData EngineeringBig Data