Soumabrata Chakraborty

Software Engineer

Bengaluru, Karnataka, India19 yrs experience

Most Likely To SwitchHighly Stable

Key Highlights

Expert in Hadoop ecosystem and Big Data processing.
Led significant data platform projects at Walmart and CDK Global.
Strong background in Java and RESTful microservices.

Stackforce AI infers this person is a Big Data Engineer with expertise in cloud-based data platforms and microservices architecture.

Contact

Skills

Core Skills

HadoopSparkJavaJee

Other Skills

Data PipelinesDockerHibernateKafkaMachine LearningMavenObject-Oriented Programming (OOP)PythonREST ServicesRESTful MicroservicesSOAPSQLSpringBootUnixWeb Services

About

Responsible for design and implementation of Data Platform based on the Hadoop Ecosystem (HDFS, Hive, Spark, Kafka, etc). Equally at home in the JEE world (RESTful Micro Services, Spring) and Docker.

Experience

19 yrs

Total Experience

4 yrs 9 mos

Average Tenure

7 yrs 8 mos

Current Experience

Walmart global tech

4 roles

Distinguished Software Engineer - ML

Promoted

Jul 2025 – Present · 10 mos

Principal Software Engineer - ML

Promoted

May 2021 – Jun 2025 · 4 yrs 1 mo

Staff Software Engineer - ML

May 2020 – Apr 2021 · 11 mos

Staff Software Engineer - Data Platforms

Jul 2018 – May 2020 · 1 yr 10 mos

Engineering Lead in Walmart's Data Platform (Hadoop Ecosystem) responsible for design and implementation of the platform.
> The platform is at the core of much of the Big Data processing at Walmart catering to different requirements across many different engineering teams and business teams at Walmart.
> The platform was designed to:
Be able to run on different flavours of Hadoop Clusters (HDP/CloudBreak, DataProc, MapR)
Be able to run on different Cloud environments (On Prem, GCP, Azure)
Use different compute engines (Spark, Others via Beam)
Integrate with different databases and database types (Several RDBMS, MongoDB, Cassandra, etc)
Integrate to different filesystems and object stores (HDFS, GCP, ADLS, MAPRFS, SWIFT, etc)
Integrate to messaging systems (Kafka)
Integrate with mainframe files at Walmart
Integrate with big data tools like Sqoop, DistCp (customized for walmart)
Support fully configurable data pipelines via simple Yaml Configuration for data engineers
Expose REST Services for a drag-and-drop canvas based UI that allows Business users to create and deploy data pipelines
Be able to do all that across batch and near realtime workloads (duh!)
Provide ability to configure cluster templates that set-up and tear-down clusters on the cloud for batch workloads

HadoopSparkKafkaREST ServicesData Pipelines

Cdk global

2 roles

Principal Consultant

Promoted

Sep 2017 – Jul 2018 · 10 mos · Pune, Maharashtra, India

Senior Big Data Engineer responsible for:
> Design and Implementation of the CDK's Data Platform based on the Hadoop Ecosystem
Avro Schema Registration using REST based microservices
Data Acquisition Interfaces - REST (microservice) and FTP (Apache Flume) accepting data in various formats (Text, JSON or Avro)
Conversion to common format (Avro) and moving to the Kafka Cluster
Downloading from the Kafka Cluster to our Data Lake (Currently HDFS on-premises cluster)
Data Curation Framework for Batch Flows using Hive, Pig, Spark, Spark-SQL
Data Curation Framework for Realtime Flows using Kafka, Spark-Streaming
Workflow management using Oozie
Distributing data out of the Lake to different sinks like MongoDB (Reporting Cache), File Exports, etc
> Evaluation of running our Spark Jobs on Cloud Platforms like Amazon EMR and Microsoft Azure HDInsights and doing comparative analysis based on Ease of Migration, Performance, Pricing, etc
> Designing platform changes to make it more efficient to run on the Cloud - like Separation of Data and Compute Grids, On-Demand Spark Cluster creation for handling Peak Batch Loads with a Smaller Always-On Cluster for AdHoc querying and Realtime flows
> Development of RESTful Microservices for our reporting layer serving data from Mongo.
> Led the Adoption of SpringBoot and Containerization(using Docker) for our 100+ Microservices and Migration to the CoreOS environment
> Evaluation of other stream processing frameworks like Kafka Connect, Kafka Streams.
> Contribution to Open Source Software like Hadoop, Confluent's Schema Registry, Community plugin for Kafka Connect for MongoDB
> Set up Gemfire In-Memory data grid as a Distributed Cache with Partitioned data. Also set up geo-distributed Gemfire Cache across different data centers.

HadoopKafkaSparkRESTful MicroservicesSpringBootDocker

Senior Consultant

Feb 2015 – Aug 2017 · 2 yrs 6 mos · Pune, Maharashtra, India

Amdocs

2 roles

Development Expert

Jul 2012 – Jan 2015 · 2 yrs 6 mos · Pune, Maharashtra, India

> Senior Engineer and Tech Lead in the Product R&D Division working on products like Amdocs Small Cell Rollout Solution, Discover Engine and Amdocs Universal Activator
> Responsible for requirement analysis, design, development and unit testing of product components.
> Also responsible for mentoring junior engineers within the team and providing technical guidance for the team.
> Awarded Amdocs Innovator of The Year 2011
> Worked on Java and JEE Stack (EJB, JMS, JPA), SOAP based Web Services, Weblogic, WebSphere

Senior Subject Matter Expert

Feb 2010 – Jun 2012 · 2 yrs 4 mos · Pune, Maharashtra, India

JavaJEESOAPWeb Services

Cognizant technology solutions

Programmer Analyst

Dec 2006 – Feb 2010 · 3 yrs 2 mos · Pune, Maharashtra, India

> Worked in the Banking & Financial services vertical for an Investment Banking client
Responsible for requirements analysis, design, development and unit testing of software for a Trade mismatch reconciliation platform.
> Also spent 5 months at client location for requirements gathering and analysis for a new project.
> Worked on Java, JSP, Servlets, Struts, Spring and Hibernate