Prakhar Agrawal — Data Engineer
Data Engineer at PayU with 5+ years of experience designing and operating production-grade data platforms in fintech. What I've built and own: → Real-Time CDC Data Lake — Architected an end-to-end streaming pipeline (MySQL → Debezium → Kafka → PyFlink → Apache Iceberg on S3) ingesting 5+ TB/day of CDC events at 10K events/sec, with 2-3 minute end-to-end latency. Designed a two-job architecture (stateless append + stateful ROW_NUMBER dedup over RocksDB) for fault isolation and replayability. Live in production, scaling to 20 tables. → Iceberg Health Framework — Built automated 5-step optimization on Spark/EMR: compaction by primary key, equality delete-file resolution, snapshot expiry, and orphan cleanup, with conflict-aware retries and SES-based HTML alerting. Also developed a sync-lag monitor leveraging Iceberg per-file column stats for memory-efficient scans. → Business-Centric Data Mart — Led a cross-functional initiative across all PayU offices to design two star-schema marts that standardized previously undocumented business logic. Cut Redshift spend from $100K/month to $40-45K/month and enabled 10x traffic on the same infrastructure. Migrated 20K+ legacy queries via SQLGlot script (90% automated) and built an AI-powered agent for ongoing onboarding. → ML Feature Pipelines — Built real-time feature systems for domestic and international fraud models using Spark Structured Streaming, Delta Lake on S3, and Redis for sub-millisecond serving. → Data Governance — Deployed OpenMetadata for org-wide discovery and lineage. Built a self-hosted Great Expectations framework during audit for automated data quality validation. Tech: Apache Flink (PyFlink), Spark Structured Streaming, Kafka, Debezium, Apache Iceberg, Delta Lake, Airflow, AWS (EMR, Redshift, MSK, S3, Glue, SES), Python, SQL, Redis, Cassandra, MySQL. Education: M.Tech Data Science and Analytics from IIIT Allahabad (Gold Medalist, 9.815 CGPA). Published in Journal of Intelligent and Fuzzy Systems (Feb 2023). Open to senior data engineering / staff engineer / data platform roles. Reach out — sunshineprakhar@gmail.com.
Stackforce AI infers this person is a Fintech Data Engineer specializing in real-time data processing and cost optimization.
Location: Bengaluru, Karnataka, India
Experience: 6 yrs 9 mos
Skills
- Data Engineering
- Real-time Data Processing
- Data Management
- Data Governance
- Business Intelligence
- Machine Learning
Career Highlights
- Architected a real-time CDC pipeline processing over 5 TB/day.
- Reduced Redshift costs from $100K to $40K/month.
- Developed automated data quality validation frameworks.
Work Experience
PayU
Data Engineer - II (1 yr 2 mos)
Data Engineer - I (2 yrs 5 mos)
Axtria - Ingenious Insights
Analyst Intern - Decision Science (7 mos)
Indian Institute Of Information Technology
Data Science & Analytics placement coordinator (1 yr 7 mos)
Teaching Assistant (1 yr 1 mo)
Accenture
Application development Associate (1 yr 8 mos)
Education
Master of Technology - MTech at Indian Institute Of Information Technology Allahabad
Bechlore of Engineering at Madhav Institute of Technology and Science, Gwalior
Higher Secondary at Jawahar Navodaya Vidyalaya - JNV
High School at Jawahar Navodaya Vidyalaya - JNV