Pralabh Kumar

Director of Engineering

Bengaluru, Karnataka, India17 yrs 1 mo experience
Most Likely To Switch

Key Highlights

  • Over 15 years of experience in data solutions.
  • Contributed significantly to open source Spark projects.
  • Led modernization efforts in Walmart's supply chain.
Stackforce AI infers this person is a seasoned expert in Big Data and Machine Learning for Retail and Finance sectors.

Contact

Skills

Core Skills

Machine LearningBig DataApache SparkKubernetesPerformance TuningReal-time ProcessingModel Management

Other Skills

BigQueryC++CassandraData ScienceDeep LearningEclipseHDFSHadoopJDBCJSPJavaKafkaLanggraphLuceneMCP

About

I have over 15 years of experience in building scalable and reliable data solutions for various projects related to machine learning, analytics, and optimization. As a Director at Walmart , I lead a team of engineers across India and US in replenishment area . I am passionate about contributing to the open source community and enhancing the capabilities of Spark and its related projects (see my open source contributions below) . I am a committer to Dr. Elephant, a performance monitoring and tuning tool for Spark applications, and I have also contributed to Spark, Livy(Committer) , Airflow, Pallet, and Elastalert. Additionally, I have filed and resolved several issues and pull requests for Spark on GitHub, especially related to Kubernetes and YARN deployment modes. I have also presented at several conferences and workshops on Spark and its best practices. My mission is to share my knowledge and expertise in big data technologies and help others achieve their data-driven goals.

Experience

Walmart global tech india

Director of Engineering

Nov 2023Present · 2 yrs 4 mos · Bengaluru, Karnataka, India · On-site

  • Building a Merchant Super AI Agent , which interacts with sub agents via A2A . This Super agent helps answer merchants question about data in a simple fashion .
  • Agent is build with Langgraph, ReAct , MCP , Semantic Layer(cube.js) , BQ . Interaction with other agents via A2A .
  • Leading an effort to modernize Walmart Supply Chain. Specifically leading an effort in Walmart replenishment area called as Dynamic Flow.
  • Dynamic Flow consist of Big data and Machine learning applications to optimize outbound and inbound order traffic to save $.
Machine LearningData ScienceBig Data

Uber

Engineering

Mar 2023Nov 2023 · 8 mos · India · Hybrid

  • Engineering : Data , platform, Spark
  • Lead the effort to migrate the Uber batch p from Spark 2 to Spark 3.
  • Architect the platform to do the contact less migration .
  • Worked with application team /ML teams to upgrade their applications from Spark 2 to Spark 3

Visa

Lead Engineer - Director

Jun 2021Mar 2023 · 1 yr 9 mos · India

  • https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/speaker/267532.html
  • Leading Apache spark (K8s and Yarn) efforts ( a team across India and USA ) .
  • Build Visa Spark 3.0.1 and Spark 3.2.0 distribution from Open source.
  • Deployed on Visa productions cluster on Yarn and K8s .
  • Build , develop and Support e2e Spark platform in Visa.
  • Troubleshooting / tuning Spark application on Yarn and K8s . Develop tooling around Spark on K8s
  • Actively contributing to the community.
  • Spark Open Source Contribution
  • https://issues.apache.org/jira/browse/SPARK-33782
  • https://issues.apache.org/jira/browse/SPARK-39755
  • https://issues.apache.org/jira/browse/SPARK-39965
  • https://issues.apache.org/jira/browse/SPARK-38854
  • https://issues.apache.org/jira/browse/SPARK-39179
  • https://issues.apache.org/jira/browse/SPARK-30537
  • https://issues.apache.org/jira/browse/SPARK-37181
  • https://issues.apache.org/jira/browse/SPARK-32161
  • https://issues.apache.org/jira/browse/SPARK-20199
  • https://issues.apache.org/jira/browse/SPARK-26462
  • https://issues.apache.org/jira/browse/SPARK-37491
  • https://issues.apache.org/jira/browse/SPARK-38879
  • https://issues.apache.org/jira/browse/SPARK-39029

Target

Principal Engineer - Director

Aug 2020Jun 2021 · 10 mos · Bengaluru, Karnataka, India

  • Technically leading the team of 15 people for Vision computing in Target.
  • Architected Feature Management system (Featuretron) , which is responsible for storing all the features generated in Target. It has batch and real time components (HDFS , Cassandra , Spark , Kafka)
  • Architected / Implemented Model Management System , which is responsible for storing and retrieving models in Target . Stored pyspark and deep learning models .
  • Working in visual computing team, to scale deep learning model scoring.
  • Architect Scalable Real Time Product recognition using Image embeddings (Face net). Solution recognizes images in real time (Training in batch & inferencing in real time)

Linkedin

Sr Engineer

Jan 2018Jun 2020 · 2 yrs 5 mos · Bengaluru Area, India

  • Open Source Contribution
  • YARN
  • https://issues.apache.org/jira/browse/YARN-9761
  • Spark
  • https://issues.apache.org/jira/browse/SPARK-20199
  • https://issues.apache.org/jira/browse/SPARK-26462
  • ElastAlert
  • https://github.com/Yelp/elastalert/issues/1079
  • Livy
  • https://issues.cloudera.org/browse/LIVY-337 (https://github.com/cloudera/livy/pull/334)
  • Pallet
  • http://code.google.com/p/pallet/
  • Leading Apache Spark efforts in LinkedIn Bangalore .
  • 1. Leading the effort to create Auto Tune framework which automatically tune Spark and Hadoop jobs on Cluster . TuneIn uses Heuristics based approach (Rule based approach ) and Optimization based approach(Machine Learning , PSO ) to automatically tune Spark and Hadoop jobs.
  • Integrating TuneIn with Open Source Project Dr Elephant(https://github.com/linkedin/dr-elephant).
  • Project is selected for Spark and Strata conferences (2018).
  • Overall AutoTuning gives 30% save in clusters resources and 90% increase in developers productivity.
  • Its been widely deployed in LinkedIn Hadoop clusters (1000s of nodes) .
  • 2. Committer to project Dr Elephant (https://github.com/linkedin/dr-elephant)
  • 3 . Architected and design the framework ,which automatically optimizes Hive Scripts to Spark Script. It has three component , Recommender(Machine Learning Based approach to decide which scripts to convert to Spark) , Optimizer( DAG based optimizer , which converts and optimize hive script to Spark ) and tester (Automatically test the output of legacy script (hive) and optimized script (spark) .
  • 4 . Researched various Machine learning based approach to tune Spark Jobs and integrating with AutoTuning.

Walmart

2 roles

Architect

Promoted

Feb 2017Jan 2018 · 11 mos

Sr Tech Lead - Data Specialist

Oct 2014Dec 2017 · 3 yrs 2 mos

  • Walmart Labs
  • Sr. Tech Lead OCT-14- till Date
  • Leading a team of people to create a real time system to provide unique ID to
  • Walmart customer in real time. Created a real time system using Kafka, Spark,
  • Cassandra. Used ELK stack for the visualization
  • Worked on Apache Spark and H20 (Sparkling Water) ,to clustered the consumers based in their similarity. Also conducted the benchmarking between Spark ML Lib vs. Sparkling Water.
  • Worked on Apache Zeppelin to create faster visualization for data analytics. Also Integrated H2O interpreter in the Apache Zeppelin and enhancement to the SQL Interpreter.
  • Worked on Hadoop, Hive to analysis the customer data .
  • Received David Glass Innovator of the Year 2016. Also Filed a patent
  • Received COE and SPOT awards for the above works.

Motorola mobility

Sr Hadoop Developer

Feb 2012Sep 2014 · 2 yrs 7 mos · India

  • 1) Working on Hadoop ,Hive and HBase. Mainly Handling large data set(in terabytes) and writing MR jobs and hive queries to do Data analysis.
  • 2) Working on Google Cloud platform:- BigQuery,Google App Engine,Google Cloud SQL,Google
  • Compute Engine. Setup hadoop cluster on GCE and do big data analysis over that.
  • 3) Working on user sentiment analysis . Using weka do to the sentiment analysis (worked different algorithms and techniquies like ,AdaBosst,Bagging and SVM to improve classifiers accuracy)
  • 4) Working on Trend Analysis : Given a corpus ,devise an algorithm to find out the trending topics for the same.

Sears holdings corporation

Sr Hadoop Developer

May 2011Feb 2012 · 9 mos

  • Working on Hadoop ,Hbase ,Hive ,Java programing and Ruby.

Nextbio

Software Developer

May 2010May 2011 · 1 yr

  • Working on solr ,Lucene ,hadoop,data mining , Natural Language Processing.

University of texas at dallas

Research Assistant

Dec 2008May 2010 · 1 yr 5 mos

  • Doing Research in the field of Machine Learning,Classfication ,Sequence Tagging using Mallet(Machine Learning Language Toolkit) and its application on Semantic Data.
  • For more information please see http://code.google.com/p/pallet/

Education

The University of Texas at Dallas

Master's degree — Computer Science

Guru Gobind Singh Indraprastha University

B.tech — Computer Science

Doon Public School

Stackforce found 100+ more professionals with Machine Learning & Big Data

Explore similar profiles based on matching skills and experience