Sushil Kumar Shivashankar — Engineering Manager

An Engineer by choice, Architect by nature, and Leader at heart. I am an Engineering Manager and technology leader with over a decade of experience designing, scaling, and modernizing Big Data and Distributed Systems. My passion lies in building high-impact Data Lake, Data Warehouse, and OLAP platforms from scratch, handling Petabytes of data for mission-critical use cases like funnel analysis, segmentation, and web analytics. I am a strong advocate for Open Source and have deep expertise in performance engineering, system design, and the fundamentals of distributed systems. Currently, I lead two teams (Kafka and Flink) at Uber, where I am driving a multi-million dollar cloud optimization strategy using disaggregated storage and focused on platform modernization toward a unified Lakehouse architecture. Previously, I served as a Software Development Manager at AWS EMR & Athena, where I was responsible for the development and growth of engines like Hadoop, Spark, Flink, Presto/Trino, and architecting next-gen S3A performance optimizations to enhance Lakehouse performance. Prior to that, I led platform architecture for Microsoft's Azure Data Platform (HDInsight), including driving its migration onto Kubernetes, and architecting data platforms at scale for Ola Cabs and Flipkart. Specialties: Distributed Systems, Data Lakehouse (Iceberg, Delta Lake, Hudi), Data Engineering, Stream Analytics, System Design, Scalability, Kubernetes, Algorithm Design, Cloud Computing (AWS, Azure), Hadoop, YARN, Spark, Flink, Kafka, Presto/Trino, Data Governance, Real-Time Analytics, Multi-Threading, Cost Optimization, Engineering Leadership.

Stackforce AI infers this person is a Big Data and Cloud Computing expert with extensive experience in Data Engineering.

Location: Bengaluru, Karnataka, India

Experience: 12 yrs 3 mos

Skills

Engineering Management
Distributed Systems
Data Engineering
Cloud Computing

Career Highlights

Over a decade of experience in Big Data and Distributed Systems.
Led multi-million dollar cloud optimization strategy at Uber.
Strong advocate for Open Source with contributions to major projects.

Work Experience

Uber

Engineering Manager - 2 (9 mos)

Amazon Web Services (AWS)

Software Development Manager (3 yrs 4 mos)

Microsoft

Senior Software Engineering Manager (1 yr)

Senior Software Engineer (Azure Data Platform) (7 mos)

Software Engineer - 2 (Azure Data Platform) (2 yrs 4 mos)

The Apache Software Foundation

Open Source Contributor (Hadoop) (1 yr)

Ola (ANI Technologies Pvt. Ltd)

Software Development Engineer 3 (Big Data Platform) (1 yr)

Software Development Engineer 2 (Big Data Platform) (11 mos)

Flipkart.com

Software Development Engineer 1 (Big Data Platform) (1 yr 8 mos)

RNS Institute of Technology (RNSIT)

Research Assistant (8 mos)

Sushira Swara Sangama

Freelance Musician (2 yrs 3 mos)

Education

Bachelor's Degree at Visvesvaraya Technological University

Diploma at MS Ramaiah Polytechnic

Sushil Kumar Shivashankar

Engineering Manager

Bengaluru, Karnataka, India12 yrs 3 mos experience

Highly Stable

Key Highlights

Over a decade of experience in Big Data and Distributed Systems.
Led multi-million dollar cloud optimization strategy at Uber.
Strong advocate for Open Source with contributions to major projects.

Stackforce AI infers this person is a Big Data and Cloud Computing expert with extensive experience in Data Engineering.

Contact

Skills

Core Skills

Engineering ManagementDistributed SystemsData EngineeringCloud Computing

Other Skills

Algorithm DesignAmazon S3Amazon Web Services (AWS)Apache FlinkApache KafkaApache MesosApache SparkApache ZooKeeperBig DataCC++Cost OptimizationData GovernanceData InfrastructureData Ingestion

About

Experience

12 yrs 3 mos

Total Experience

1 yr 9 mos

Average Tenure

9 mos

Current Experience

Uber

Engineering Manager - 2

Aug 2025 – Present · 9 mos · Bengaluru, Karnataka, India

Leading two teams(Flink and Kafka) from Bangalore.

Engineering ManagementApache FlinkApache KafkaGoogle Cloud Platform (GCP)Cloud ComputingData Streaming+2

Amazon web services (aws)

Software Development Manager

Apr 2022 – Aug 2025 · 3 yrs 4 mos · Bengaluru, Karnataka, India · On-site

Leading the Development and Growth for Flink, Hadoop and Trino offerings for AWS EMR on EC2/EKS and Athena.
[0] https://aws.amazon.com/blogs/big-data/optimize-amazon-emr-runtime-for-apache-spark-with-emr-s3a/
[1] https://docs.aws.amazon.com/emr/latest/ReleaseGuide/trino-ft.html
[2] https://docs.aws.amazon.com/emr/latest/ReleaseGuide/presto-spot-loss.html
[3] https://docs.aws.amazon.com/emr/latest/ReleaseGuide/presto-strict-mode.html
[4] https://aws.amazon.com/about-aws/whats-new/2023/11/data-lake-queries-amazon-athena-s3-express-one-zone/
[5] https://aws.amazon.com/blogs/big-data/run-trino-queries-2-7-times-faster-with-amazon-emr-6-15-0/
[6] https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3-express-one-zone-storage-class-emr/

Data InfrastructureApache SparkData PipelinesMentoringTeam DevelopmentEngineering Management+2

Microsoft

3 roles

Senior Software Engineering Manager

Apr 2021 – Apr 2022 · 1 yr

Azure HDInsight

Data InfrastructureApache SparkData PipelinesMentoringTeam DevelopmentEngineering Management+2

Senior Software Engineer (Azure Data Platform)

Aug 2020 – Mar 2021 · 7 mos

Azure HDInsight

Data InfrastructureApache SparkData PipelinesData Engineering

Software Engineer - 2 (Azure Data Platform)

Apr 2018 – Aug 2020 · 2 yrs 4 mos

Initially was part of Cosmos Resource Management Team, where most of the work is mainly focused with Apache YARN running on 100k Nodes.
Developed a framework to support Azure CosmosDB and any Document Store vendor as a backend for Apache YARN ATSv2, also contributed this work to OSS as YARN-9016.
https://issues.apache.org/jira/browse/YARN-9016
Built the Infrastructure to support autoscale, based on load or schedule for Azure HDInsight Clusters and later led the project for GA.
https://azure.microsoft.com/en-gb/updates/autoscale-for-azure-hdinsight-is-now-general-available/
Actively contributing patches to OSS Hadoop.

Data InfrastructureApache SparkData Engineering

The apache software foundation

Open Source Contributor (Hadoop)

Apr 2019 – Apr 2020 · 1 yr · Bengaluru, Karnataka, India

Ola (ani technologies pvt. ltd)

2 roles

Software Development Engineer 3 (Big Data Platform)

Apr 2017 – Apr 2018 · 1 yr

◘ Built a real-time stream upserts framework end to end, some key features are
A continuous running pipeline listening to a topic on Apache Kafka and can perform UPSERTS in real-time on Distributed FileSystem i.e ObjectStore (S3) or BlockStore(HDFS)
Users can parallelly query the data from FileSystem as Hive External Tables via Presto / Tez / SparkSQL
All the UPSERTS are eventual consistent while reading and no locks are applied i.e READ and UPSERTS are Parallel.
Technologies/Components Used:
Apache YARN with Node Labels for compute.
Apache Beam for building the continuous running pipeline framework.
Apache Spark / Flink on Yarn for pipeline orchestration and Disaster recovery.
In-house Data Governance Warehouse MetaData Service to sanitize the messages being read from Kafka.
◘ Part of Building KAAS(Kafa As A Service) with AuthN, AuthZ, Audits and Multitenancy for entire OLA Engg. I developed a ConfigSVC end to end for orchestrating and managing Kafka Clusters and Clients.

Data PipelinesLarge Scale EventsData Ingestion

Software Development Engineer 2 (Big Data Platform)

Apr 2016 – Mar 2017 · 11 mos

Built the Data Lake for Big Data from scratch for democratising and discovering of data for analytics, starting form Preparation -> Ingestion -> Cleaning -> Transformation/Processing -> Consumption
Preparation - All the transactional(Entities) or non transactional(Events) systems data in OLA are prepared by registering the metadata in the warehouse with additional PIE(Processing , Ingestion and Execution times) semantics in each payload.
Ingestion :
Push Based - All the data payloads would be directly pushed to a messaging system via ingestion
library.
Pull Based - For MySQL by reading binlogs and taking care of the schema management automatically. For NoSQL DB, I wrote a custom sqoop from scratch for bulk pulling the data which syncs the schema automatically with the warehouse metadata and stores the data in the FileSystem by skipping the messaging queue. This was also used for backup and disaster recovery.
Cleaning - This is aka journalling, where all the entities/events would be deduplicated and partitioned by date and hour on a particular datetime field to answer the question as of then data for entities.
Transformation - Reconciling(snapshotting) of the data periodically by fetching the changed data from Journal Store, to answer the question as of now data for entities.
Consumption - The consumption of the journal and snapshot related data were exposed as external hive tables and were queried via Hive/Tez/Spark/Presto engines.
Technologies/Components Used:
HDP Distro Hadoop 2.7 cluster managed by Ambari
Compute via Yarn and Storage via S3A
Hive , Tez, Presto and Spark for querying
Dropwizard for creating microservices
Customised version of Maxwell binlog parser project from github
Customised version of Kafka to S3 project called Secor from github
In house config service for resolving application config for microservices on marathon
Mesos, Docker, Marathon for microservices orchestration
Oozie/Azkaban for scheduling

Data PipelinesLarge Scale EventsData Ingestion

Flipkart.com

Software Development Engineer 1 (Big Data Platform)

Jul 2014 – Mar 2016 · 1 yr 8 mos · Greater Bengaluru Area

Was part of developing a batch system to process system generated logs which was around 6-7TB of data getting ingested on normal days and 50+TB on sale days as part of the Data Governance practice and enable processing pipelines to run on them by the data analysts.
Generated Canned Reports by writing processing pipelines for funnel analysis by mining the logs that was not available from Omniture Software (Site catalyst) for Analysis's .
Later served as part of the Infrastructure and Systems Engineering team of Flipkart's Data Platform where we developed in house Cloud Platform Products through open source Big Data technologies. Some of the key features are like porting compute systems(i.e Spark, Storm etc) on YARN, metering ,billing, auditing , security etc similar to a private cloud. I developed org wide monitoring tool end to end for all the components, contributed to DC Migration for transferring PB’s of data across DC, benchmarking 1000+ node hadoop cluster by writing my own custom benchmarks to handle the Big Billion Day sales load. Open Sourced Project : https://github.com/flipkart-incubator/BlueShift

Data PipelinesLarge Scale EventsData Ingestion

Rns institute of technology (rnsit)

Research Assistant

Jul 2013 – Mar 2014 · 8 mos · Bangalore

Worked as a research assistant under Prof.T Satish Kumar in my final year undergrad.The research was based on applying compiler optimisation for programs running on distributed/multicore systems and also published two Journals on it.
1 - Optimizing Code by Selecting Compiler Flags using Parallel Genetic Algorithm on Multicore CPUs
2 - Compiler Phase ordering and Optimizing MPI runtime parameters using Heuristic Algorithms on
SMPs