Samrat Deb

Software Engineer

Bengaluru, Karnataka, India7 yrs 11 mos experience

Key Highlights

Expert in Distributed Systems and Software Architecture.
Significant contributions to Apache Flink and Hadoop.
Proven track record in optimizing performance and scalability.

Stackforce AI infers this person is a Backend-heavy Fullstack Engineer specializing in Distributed Systems and Big Data technologies.

Contact

decordeapex@gmail.com LinkedIn

Skills

Core Skills

Apache FlinkDistributed Systems

Other Skills

AlgorithmsSoftware DesignSoftware InfrastructureSoftware as a Service (SaaS)KubernetesApache HadoopApache KafkaGo (Programming Language)Amazon Web Services (AWS)Data StructuresObject-Oriented Programming (OOP)Big DataSoftware Development Life Cycle (SDLC)DjangoPython

About

Looking to identify the problem and provide an efficient and simple solution using computer science fundamentals. Passionate about Distributed Systems, Software Architecture, and Clean code SKILLS: Computer Languages: Golang, Java, C++, Shell Scripting(Bash), Python, Scala Programming Environment:- IntelliJ, Emacs, Vim Big Data Technologies(Hands-On Experience):- Flink, Zookeeper, Kafka, Spark, Hadoop, Docker Data Center Management(Hands-On Experience): Kubernetes, YARN Open Source Contribution: Apache Hudi, Apache Hadoop, Apache Flink

Experience

7 yrs 11 mos

Total Experience

3 yrs 5 mos

Average Tenure

1 yr 11 mos

Current Experience

Uber

Software Engineer (L5)

Aug 2025 – Present · 9 mos

Confluent

Senior Software Engineer

Jun 2024 – Present · 1 yr 11 mos · Bangalore Urban, Karnataka, India · Remote

Working in Confluent on-prem Offering for Apache Flink.

Software DesignSoftware InfrastructureSoftware as a Service (SaaS)Apache FlinkDistributed SystemsKubernetes

Amazon web services (aws)

SDE-II

May 2022 – Jun 2024 · 2 yrs 1 mo · Bangalore Urban, Karnataka, India · Hybrid

Integrated EMR Flink with Glue for centralized Metastore, enhancing user metadata access.
Implemented NUMA support in Hadoop YARN, boosting EMR job performance by 2-3%.
Ensured continuous updates of Flink in EMR, including versions EMR-6.8 to EMR-6.12.
Developed Exclusive Zeppelin interpreter support for Flink 1.17, offered solely by AWS EMR.

Distributed SystemsApache FlinkApache Hadoop

The apache software foundation

Contributor

Jan 2022 – Present · 4 yrs 4 mos

Introduced NUMA support for the Default Container Executor in Hadoop to optimize performance.
https://github.com/apache/hadoop/pull/4742
Create a persistent GlueCatalog that natively supports Glue Metastore in Apache Flink (FLIP-277).
https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
Flink Sink connector for Redshift (FLIP-307).
https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift

AlgorithmsApache Flink

Disney+ hotstar

SDE-II

Jun 2018 – Apr 2022 · 3 yrs 10 mos · Bangalore Urban, Karnataka, India · Remote

Worked on 2 major Projects :
1. Bifrost - Stateless web Application to track user activities in the Hotstar platform
Collaborated on re-architecting the Bifrost application to meet scaling requirements.
Achieved 22% network footprint reduction, optimizing efficiency.
Enhanced producer client to handle high concurrency of 50 million+ operations.
2. CDCv2 - Build a new version of the data lake to support near real-time ingestion.
Part of a 2-3 member team to build the entire CDC stack for growing data at (OTT Hotstar) led by @vinay.
Worked on benchmarking iceberg file format.
Collaborated and built real-time data pipelines for Analytics events in the newly established Data Lake.
Involved in the entire lifecycle from design to implementation.

Distributed SystemsApache KafkaGo (Programming Language)KubernetesAmazon Web Services (AWS)

Centre for development of advanced computing

Summer Intern

May 2017 – Jul 2017 · 2 mos · bangalore

The internship was on SCADA and Automation. The internship project aims to build a "SCADA system: A Scalable Storage and Processing". All the processing of data can be managed by setting up a Hadoop cluster over the cloud storage. The data is directly kept in HDFS and MapReduce technique is used to process the data efficiently. Later HIVE is used over Hadoop, a query processing language to filter out data according to the requirements. Human-Machine Interface (HMI) can directly access the filtered data by running HIVE on the machine over the cloud. Here the major concerns of redundancy and fault tolerance are minimized to limit/reduce the loss of the data that we accumulated/gathered over decades in a critical SCADA System.

Data StructuresAlgorithms