Pappu Yadav

Software Engineer

Bengaluru, Karnataka, India10 yrs 8 mos experience

Most Likely To Switch

Key Highlights

Expert in building real-time ETL frameworks.
Proficient in Spark and data processing technologies.
Strong background in API development using Spring Boot.

Stackforce AI infers this person is a Backend-heavy Data Engineering expert in Fintech and Big Data solutions.

Contact

Skills

Core Skills

Cloud ComputingData EngineeringReal-time Data ProcessingApi DevelopmentEtl Framework DevelopmentData ProcessingEtl Pipeline Development

Other Skills

AWSAmazon RedshiftAmazon Web Services (AWS)Apache AirflowApache FlinkApache KafkaApache PinotApache SparkApache Spark StreamingApplication DevelopmentBack-End Web DevelopmentBig DataBig Data AnalyticsBusiness AnalysisComputer Science

About

Platform Engineer with skills in BigData Technologies like Spark, Hadoop, Hive ,Presto. Building ETL Framework to process large amount of data in realtime with Spark streaming and batch with Spark and at the same time handling exactly once semantics. Developed generic frameworks like Reconciliation Engine in spark streaming with handles recon of millions of records in realtime. I have experience in writing complex SQL queries. I have expertise in building platform tools used by various business teams. I also have experience in building Rest APIs using spring boot.

Experience

Apollo.io

Senior Engineer

Feb 2024 – Present · 2 yrs 1 mo · India · Remote

Google Cloud Platform (GCP)Apache AirflowCloud ComputingData Engineering

Simpl

Staff Engineer

May 2022 – Mar 2024 · 1 yr 10 mos · Bengaluru, Karnataka, India

Wroked with team to move from Batch Approval Model ( T - 1 day) to realtime approval system for approving users in real time for doing transactions.Implemented APIs using Spring Boot framework and realtime pipeline using Flink.
Implemented Real Time Analytics Support using Apache Pinot for merchants to view stats in real time.
Reduced load on Redshift by moving data deduplication from Redshift to the processing layer in ETL.
Optimized run-time for multiple Airflow DAGs jobs.
Tech Stack : AWS services, Redshift,Kafka,DynamoDB,Spring,Spark,Airflow,Flink,Apache Pinot,Databricks

Continuous Integration and Continuous Delivery (CI/CD)Spring BootData IntegrationETL ToolsDatabricksContainerization+22

Airtel africa digital labs

Technical Lead

Sep 2020 – Apr 2022 · 1 yr 7 mos · Gurugram, Haryana, India

Spark Batch and Streaming ETL Framework :
Generic framework to run any spark ETL job. Job is configured using a json file, configurable source and sink.
Presto Exactly Once Spark Streaming :
Framework to achieve exactly-once(No duplicate data) in Presto query over Hadoop ORC type data.
Hive Metastore Update: Framework to update hive metastore to add newly added Hadoop partitions.
Generic Recon Framework Spark Streaming.
1-1 mapping of records between any number of sources. Configurable recon rules, output schema etc.
Partial recon output, late arrival records,Hudi for partial recon.

Continuous Integration and Continuous Delivery (CI/CD)Spring BootData IntegrationETL ToolsContainerizationBig Data+16

Mobileum

Senior Software Development Engineer

Jan 2019 – Sep 2020 · 1 yr 8 mos · Gurgaon, India

ETL Pipeline in Spark Streaming
Building ETL pipeline from scratch to process realtime data using spark streaming.
HortonWorks POC
Changes implemented in spark job , Hive Metastore for HortonWorks platform compatibility.
Trip Metrics (Subscriber Roaming Experience)
Real time Fusion of Trips(Roaming Users) with events to generate scores for various
services(sms,call,data) using Spark Streaming in realtime.
Basic Recon Framework Spark Streaming
1-1 mapping of records with configured sliding window time duration to recon records accurately from 2 data sources.
Custom Compaction ORC Files
Spark job on ORC paths to merge multiple small files to improve presto query performance and file listing time.