Pappu Yadav

Software Engineer

Bengaluru, Karnataka, India10 yrs 8 mos experience
Most Likely To Switch

Key Highlights

  • Expert in building real-time ETL frameworks.
  • Proficient in Spark and data processing technologies.
  • Strong background in API development using Spring Boot.
Stackforce AI infers this person is a Backend-heavy Data Engineering expert in Fintech and Big Data solutions.

Contact

Skills

Core Skills

Cloud ComputingData EngineeringReal-time Data ProcessingApi DevelopmentEtl Framework DevelopmentData ProcessingEtl Pipeline Development

Other Skills

AWSAmazon RedshiftAmazon Web Services (AWS)Apache AirflowApache FlinkApache KafkaApache PinotApache SparkApache Spark StreamingApplication DevelopmentBack-End Web DevelopmentBig DataBig Data AnalyticsBusiness AnalysisComputer Science

About

Platform Engineer with skills in BigData Technologies like Spark, Hadoop, Hive ,Presto. Building ETL Framework to process large amount of data in realtime with Spark streaming and batch with Spark and at the same time handling exactly once semantics. Developed generic frameworks like Reconciliation Engine in spark streaming with handles recon of millions of records in realtime. I have experience in writing complex SQL queries. I have expertise in building platform tools used by various business teams. I also have experience in building Rest APIs using spring boot.

Experience

Apollo.io

Senior Engineer

Feb 2024Present · 2 yrs 1 mo · India · Remote

Google Cloud Platform (GCP)Apache AirflowCloud ComputingData Engineering

Simpl

Staff Engineer

May 2022Mar 2024 · 1 yr 10 mos · Bengaluru, Karnataka, India

  • Wroked with team to move from Batch Approval Model ( T - 1 day) to realtime approval system for approving users in real time for doing transactions.Implemented APIs using Spring Boot framework and realtime pipeline using Flink.
  • Implemented Real Time Analytics Support using Apache Pinot for merchants to view stats in real time.
  • Reduced load on Redshift by moving data deduplication from Redshift to the processing layer in ETL.
  • Optimized run-time for multiple Airflow DAGs jobs.
  • Tech Stack : AWS services, Redshift,Kafka,DynamoDB,Spring,Spark,Airflow,Flink,Apache Pinot,Databricks
Continuous Integration and Continuous Delivery (CI/CD)Spring BootData IntegrationETL ToolsDatabricksContainerization+22

Airtel africa digital labs

Technical Lead

Sep 2020Apr 2022 · 1 yr 7 mos · Gurugram, Haryana, India

  • Spark Batch and Streaming ETL Framework :
  • Generic framework to run any spark ETL job. Job is configured using a json file, configurable source and sink.
  • Presto Exactly Once Spark Streaming :
  • Framework to achieve exactly-once(No duplicate data) in Presto query over Hadoop ORC type data.
  • Hive Metastore Update: Framework to update hive metastore to add newly added Hadoop partitions.
  • Generic Recon Framework Spark Streaming.
  • 1-1 mapping of records between any number of sources. Configurable recon rules, output schema etc.
  • Partial recon output, late arrival records,Hudi for partial recon.
Continuous Integration and Continuous Delivery (CI/CD)Spring BootData IntegrationETL ToolsContainerizationBig Data+16

Mobileum

Senior Software Development Engineer

Jan 2019Sep 2020 · 1 yr 8 mos · Gurgaon, India

  • ETL Pipeline in Spark Streaming
  • Building ETL pipeline from scratch to process realtime data using spark streaming.
  • HortonWorks POC
  • Changes implemented in spark job , Hive Metastore for HortonWorks platform compatibility.
  • Trip Metrics (Subscriber Roaming Experience)
  • Real time Fusion of Trips(Roaming Users) with events to generate scores for various
  • services(sms,call,data) using Spark Streaming in realtime.
  • Basic Recon Framework Spark Streaming
  • 1-1 mapping of records with configured sliding window time duration to recon records accurately from 2 data sources.
  • Custom Compaction ORC Files
  • Spark job on ORC paths to merge multiple small files to improve presto query performance and file listing time.
Continuous Integration and Continuous Delivery (CI/CD)Spring BootData IntegrationETL ToolsContainerizationBig Data+16

Mobikwik

Software Developer

Aug 2017Dec 2018 · 1 yr 4 mos · Gurgaon, India

Spring BootBig DataComputer SciencePython (Programming Language)

Payu

Software Developer

Jun 2015Aug 2017 · 2 yrs 2 mos · Gurgaon, India

  • Software Development
Spring BootBig DataComputer SciencePython (Programming Language)

Samsung research institute

Summer Intern

Jun 2014Jul 2014 · 1 mo · Bengaluru Area, India

  • Software Development
Computer SciencePython (Programming Language)

Education

Delhi College of Engineering

Bachelor of Technology (BTech) — Computer Engineering

Jan 2011Jan 2015

Stackforce found 100+ more professionals with Cloud Computing & Data Engineering

Explore similar profiles based on matching skills and experience