Soumabrata Chakraborty

Software Engineer

Bengaluru, Karnataka, India18 yrs 11 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Expert in Hadoop ecosystem and Big Data processing.
  • Led significant data platform projects at Walmart and CDK Global.
  • Strong background in Java and RESTful microservices.
Stackforce AI infers this person is a Big Data Engineer with expertise in cloud-based data platforms and microservices architecture.

Contact

Skills

Core Skills

HadoopSparkJavaJee

Other Skills

Data PipelinesDockerHibernateKafkaMachine LearningMavenObject-Oriented Programming (OOP)PythonREST ServicesRESTful MicroservicesSOAPSQLSpringBootUnixWeb Services

About

Responsible for design and implementation of Data Platform based on the Hadoop Ecosystem (HDFS, Hive, Spark, Kafka, etc). Equally at home in the JEE world (RESTful Micro Services, Spring) and Docker.

Experience

Walmart global tech

4 roles

Distinguished Software Engineer - ML

Promoted

Jul 2025Present · 8 mos

Principal Software Engineer - ML

Promoted

May 2021Jun 2025 · 4 yrs 1 mo

Staff Software Engineer - ML

May 2020Apr 2021 · 11 mos

Staff Software Engineer - Data Platforms

Jul 2018May 2020 · 1 yr 10 mos

  • Engineering Lead in Walmart's Data Platform (Hadoop Ecosystem) responsible for design and implementation of the platform.
  • > The platform is at the core of much of the Big Data processing at Walmart catering to different requirements across many different engineering teams and business teams at Walmart.
  • > The platform was designed to:
  • Be able to run on different flavours of Hadoop Clusters (HDP/CloudBreak, DataProc, MapR)
  • Be able to run on different Cloud environments (On Prem, GCP, Azure)
  • Use different compute engines (Spark, Others via Beam)
  • Integrate with different databases and database types (Several RDBMS, MongoDB, Cassandra, etc)
  • Integrate to different filesystems and object stores (HDFS, GCP, ADLS, MAPRFS, SWIFT, etc)
  • Integrate to messaging systems (Kafka)
  • Integrate with mainframe files at Walmart
  • Integrate with big data tools like Sqoop, DistCp (customized for walmart)
  • Support fully configurable data pipelines via simple Yaml Configuration for data engineers
  • Expose REST Services for a drag-and-drop canvas based UI that allows Business users to create and deploy data pipelines
  • Be able to do all that across batch and near realtime workloads (duh!)
  • Provide ability to configure cluster templates that set-up and tear-down clusters on the cloud for batch workloads
HadoopSparkKafkaREST ServicesData Pipelines

Cdk global

2 roles

Principal Consultant

Promoted

Sep 2017Jul 2018 · 10 mos · Pune, Maharashtra, India

  • Senior Big Data Engineer responsible for:
  • > Design and Implementation of the CDK's Data Platform based on the Hadoop Ecosystem
  • Avro Schema Registration using REST based microservices
  • Data Acquisition Interfaces - REST (microservice) and FTP (Apache Flume) accepting data in various formats (Text, JSON or Avro)
  • Conversion to common format (Avro) and moving to the Kafka Cluster
  • Downloading from the Kafka Cluster to our Data Lake (Currently HDFS on-premises cluster)
  • Data Curation Framework for Batch Flows using Hive, Pig, Spark, Spark-SQL
  • Data Curation Framework for Realtime Flows using Kafka, Spark-Streaming
  • Workflow management using Oozie
  • Distributing data out of the Lake to different sinks like MongoDB (Reporting Cache), File Exports, etc
  • > Evaluation of running our Spark Jobs on Cloud Platforms like Amazon EMR and Microsoft Azure HDInsights and doing comparative analysis based on Ease of Migration, Performance, Pricing, etc
  • > Designing platform changes to make it more efficient to run on the Cloud - like Separation of Data and Compute Grids, On-Demand Spark Cluster creation for handling Peak Batch Loads with a Smaller Always-On Cluster for AdHoc querying and Realtime flows
  • > Development of RESTful Microservices for our reporting layer serving data from Mongo.
  • > Led the Adoption of SpringBoot and Containerization(using Docker) for our 100+ Microservices and Migration to the CoreOS environment
  • > Evaluation of other stream processing frameworks like Kafka Connect, Kafka Streams.
  • > Contribution to Open Source Software like Hadoop, Confluent's Schema Registry, Community plugin for Kafka Connect for MongoDB
  • > Set up Gemfire In-Memory data grid as a Distributed Cache with Partitioned data. Also set up geo-distributed Gemfire Cache across different data centers.
HadoopKafkaSparkRESTful MicroservicesSpringBootDocker

Senior Consultant

Feb 2015Aug 2017 · 2 yrs 6 mos · Pune, Maharashtra, India

Amdocs

2 roles

Development Expert

Jul 2012Jan 2015 · 2 yrs 6 mos · Pune, Maharashtra, India

  • > Senior Engineer and Tech Lead in the Product R&D Division working on products like Amdocs Small Cell Rollout Solution, Discover Engine and Amdocs Universal Activator
  • > Responsible for requirement analysis, design, development and unit testing of product components.
  • > Also responsible for mentoring junior engineers within the team and providing technical guidance for the team.
  • > Awarded Amdocs Innovator of The Year 2011
  • > Worked on Java and JEE Stack (EJB, JMS, JPA), SOAP based Web Services, Weblogic, WebSphere

Senior Subject Matter Expert

Feb 2010Jun 2012 · 2 yrs 4 mos · Pune, Maharashtra, India

JavaJEESOAPWeb Services

Cognizant technology solutions

Programmer Analyst

Dec 2006Feb 2010 · 3 yrs 2 mos · Pune, Maharashtra, India

  • > Worked in the Banking & Financial services vertical for an Investment Banking client
  • Responsible for requirements analysis, design, development and unit testing of software for a Trade mismatch reconciliation platform.
  • > Also spent 5 months at client location for requirements gathering and analysis for a new project.
  • > Worked on Java, JSP, Servlets, Struts, Spring and Hibernate

Education

The Maharaja Sayajirao University of Baroda

Bachelor of Engineering (BE)

Jan 2002Jan 2006

Stackforce found 100+ more professionals with Hadoop & Spark

Explore similar profiles based on matching skills and experience