Pralabh Kumar

Director of Engineering

Bengaluru, Karnataka, India17 yrs 2 mos experience

Most Likely To Switch

Key Highlights

Over 15 years of experience in data solutions.
Contributed significantly to open source Spark projects.
Led modernization efforts in Walmart's supply chain.

Stackforce AI infers this person is a seasoned expert in Big Data and Machine Learning for Retail and Finance sectors.

Contact

Skills

Core Skills

Machine LearningBig DataApache SparkKubernetesPerformance TuningReal-time ProcessingModel Management

Other Skills

BigQueryC++CassandraData ScienceDeep LearningEclipseHDFSHadoopJDBCJSPJavaKafkaLanggraphLuceneMCP

About

I have over 15 years of experience in building scalable and reliable data solutions for various projects related to machine learning, analytics, and optimization. As a Director at Walmart , I lead a team of engineers across India and US in replenishment area . I am passionate about contributing to the open source community and enhancing the capabilities of Spark and its related projects (see my open source contributions below) . I am a committer to Dr. Elephant, a performance monitoring and tuning tool for Spark applications, and I have also contributed to Spark, Livy(Committer) , Airflow, Pallet, and Elastalert. Additionally, I have filed and resolved several issues and pull requests for Spark on GitHub, especially related to Kubernetes and YARN deployment modes. I have also presented at several conferences and workshops on Spark and its best practices. My mission is to share my knowledge and expertise in big data technologies and help others achieve their data-driven goals.

Experience

17 yrs 2 mos

Total Experience

1 yr 8 mos

Average Tenure

2 yrs 6 mos

Current Experience

Walmart global tech india

Director of Engineering

Nov 2023 – Present · 2 yrs 6 mos · Bengaluru, Karnataka, India · On-site

Building a Merchant Super AI Agent , which interacts with sub agents via A2A . This Super agent helps answer merchants question about data in a simple fashion .
Agent is build with Langgraph, ReAct , MCP , Semantic Layer(cube.js) , BQ . Interaction with other agents via A2A .
Leading an effort to modernize Walmart Supply Chain. Specifically leading an effort in Walmart replenishment area called as Dynamic Flow.
Dynamic Flow consist of Big data and Machine learning applications to optimize outbound and inbound order traffic to save $.

Machine LearningData ScienceBig Data

Uber

Engineering

Mar 2023 – Nov 2023 · 8 mos · India · Hybrid

Engineering : Data , platform, Spark
Lead the effort to migrate the Uber batch p from Spark 2 to Spark 3.
Architect the platform to do the contact less migration .
Worked with application team /ML teams to upgrade their applications from Spark 2 to Spark 3

Visa

Lead Engineer - Director

Jun 2021 – Mar 2023 · 1 yr 9 mos · India

https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/speaker/267532.html
Leading Apache spark (K8s and Yarn) efforts ( a team across India and USA ) .
Build Visa Spark 3.0.1 and Spark 3.2.0 distribution from Open source.
Deployed on Visa productions cluster on Yarn and K8s .
Build , develop and Support e2e Spark platform in Visa.
Troubleshooting / tuning Spark application on Yarn and K8s . Develop tooling around Spark on K8s
Actively contributing to the community.
Spark Open Source Contribution
https://issues.apache.org/jira/browse/SPARK-33782
https://issues.apache.org/jira/browse/SPARK-39755
https://issues.apache.org/jira/browse/SPARK-39965
https://issues.apache.org/jira/browse/SPARK-38854
https://issues.apache.org/jira/browse/SPARK-39179
https://issues.apache.org/jira/browse/SPARK-30537
https://issues.apache.org/jira/browse/SPARK-37181
https://issues.apache.org/jira/browse/SPARK-32161
https://issues.apache.org/jira/browse/SPARK-20199
https://issues.apache.org/jira/browse/SPARK-26462
https://issues.apache.org/jira/browse/SPARK-37491
https://issues.apache.org/jira/browse/SPARK-38879
https://issues.apache.org/jira/browse/SPARK-39029

Target

Principal Engineer - Director

Aug 2020 – Jun 2021 · 10 mos · Bengaluru, Karnataka, India

Technically leading the team of 15 people for Vision computing in Target.
Architected Feature Management system (Featuretron) , which is responsible for storing all the features generated in Target. It has batch and real time components (HDFS , Cassandra , Spark , Kafka)
Architected / Implemented Model Management System , which is responsible for storing and retrieving models in Target . Stored pyspark and deep learning models .
Working in visual computing team, to scale deep learning model scoring.
Architect Scalable Real Time Product recognition using Image embeddings (Face net). Solution recognizes images in real time (Training in batch & inferencing in real time)

Sr Engineer

Jan 2018 – Jun 2020 · 2 yrs 5 mos · Bengaluru Area, India

Open Source Contribution
YARN
https://issues.apache.org/jira/browse/YARN-9761
Spark
https://issues.apache.org/jira/browse/SPARK-20199
https://issues.apache.org/jira/browse/SPARK-26462
ElastAlert
https://github.com/Yelp/elastalert/issues/1079
Livy
https://issues.cloudera.org/browse/LIVY-337 (https://github.com/cloudera/livy/pull/334)
Pallet
http://code.google.com/p/pallet/
Leading Apache Spark efforts in LinkedIn Bangalore .
1. Leading the effort to create Auto Tune framework which automatically tune Spark and Hadoop jobs on Cluster . TuneIn uses Heuristics based approach (Rule based approach ) and Optimization based approach(Machine Learning , PSO ) to automatically tune Spark and Hadoop jobs.
Integrating TuneIn with Open Source Project Dr Elephant(https://github.com/linkedin/dr-elephant).
Project is selected for Spark and Strata conferences (2018).
Overall AutoTuning gives 30% save in clusters resources and 90% increase in developers productivity.
Its been widely deployed in LinkedIn Hadoop clusters (1000s of nodes) .
2. Committer to project Dr Elephant (https://github.com/linkedin/dr-elephant)
3 . Architected and design the framework ,which automatically optimizes Hive Scripts to Spark Script. It has three component , Recommender(Machine Learning Based approach to decide which scripts to convert to Spark) , Optimizer( DAG based optimizer , which converts and optimize hive script to Spark ) and tester (Automatically test the output of legacy script (hive) and optimized script (spark) .
4 . Researched various Machine learning based approach to tune Spark Jobs and integrating with AutoTuning.

Walmart

2 roles

Architect

Promoted

Feb 2017 – Jan 2018 · 11 mos

Sr Tech Lead - Data Specialist

Oct 2014 – Dec 2017 · 3 yrs 2 mos

Walmart Labs
Sr. Tech Lead OCT-14- till Date
Leading a team of people to create a real time system to provide unique ID to
Walmart customer in real time. Created a real time system using Kafka, Spark,
Cassandra. Used ELK stack for the visualization
Worked on Apache Spark and H20 (Sparkling Water) ,to clustered the consumers based in their similarity. Also conducted the benchmarking between Spark ML Lib vs. Sparkling Water.
Worked on Apache Zeppelin to create faster visualization for data analytics. Also Integrated H2O interpreter in the Apache Zeppelin and enhancement to the SQL Interpreter.
Worked on Hadoop, Hive to analysis the customer data .
Received David Glass Innovator of the Year 2016. Also Filed a patent
Received COE and SPOT awards for the above works.

Motorola mobility

Sr Hadoop Developer

Feb 2012 – Sep 2014 · 2 yrs 7 mos · India

1) Working on Hadoop ,Hive and HBase. Mainly Handling large data set(in terabytes) and writing MR jobs and hive queries to do Data analysis.
2) Working on Google Cloud platform:- BigQuery,Google App Engine,Google Cloud SQL,Google
Compute Engine. Setup hadoop cluster on GCE and do big data analysis over that.
3) Working on user sentiment analysis . Using weka do to the sentiment analysis (worked different algorithms and techniquies like ,AdaBosst,Bagging and SVM to improve classifiers accuracy)
4) Working on Trend Analysis : Given a corpus ,devise an algorithm to find out the trending topics for the same.

Sears holdings corporation

Sr Hadoop Developer

May 2011 – Feb 2012 · 9 mos

Working on Hadoop ,Hbase ,Hive ,Java programing and Ruby.

Nextbio

Software Developer

May 2010 – May 2011 · 1 yr

Working on solr ,Lucene ,hadoop,data mining , Natural Language Processing.

University of texas at dallas

Research Assistant

Dec 2008 – May 2010 · 1 yr 5 mos

Doing Research in the field of Machine Learning,Classfication ,Sequence Tagging using Mallet(Machine Learning Language Toolkit) and its application on Semantic Data.
For more information please see http://code.google.com/p/pallet/