Mehul Batra

CTO

Hyderabad, Telangana, India7 yrs 7 mos experience

Highly StableAI Enabled

Key Highlights

Founding Engineer at DigitalOcean's Data Cloud team.
Led data infrastructure initiatives at Dream11 for 260M+ users.
Active contributor to Apache Fluss and open-source Big Data ecosystem.

Stackforce AI infers this person is a SaaS Data Engineering expert with a focus on scalable data infrastructure and analytics.

Contact

Skills

Core Skills

LakehouseData CloudData InfrastructureData StreamingData AnalyticsData EngineeringMachine LearningData Migration

Other Skills

Stream ProcessingDistributed SystemsVector DatabasesApache IcebergKafkaFlinkTrinoKubernetesSparkLoad BalancingSpringMySQLRedisOpenSearchAirflow

About

I'm a software engineer specializing in data infrastructure, currently a Founding Engineer on the Data Cloud team at DigitalOcean building the simple, scalable, and reliable foundation for BI to AI workloads on the Agentic Inference Cloud, with ownership across hiring, architecture, and product definition. DigitalOcean crossed $1B in annualized monthly run-rate revenue in December 2025, with AI customer ARR hitting $120M growing 150% YoY, as DigitalOcean cements its position as the go-to cloud for AI-native and digital native enterprises. Before DigitalOcean, I led data infrastructure initiatives at Dream11 the world's largest fantasy sports platform with 260 million+ users building self-service platforms and data-intensive applications that powered strategic decisions at unprecedented scale. My mission is to build data systems that enable companies to generate real value from their data. I am an Apache Fluss PMC member and maintainer of the fluss-iceberg and fluss-flink modules, actively contributing to the open-source Big Data ecosystem. Beyond my professional work, I share expertise through writings/talks on Big Data and Distributed Systems.

Experience

7 yrs 7 mos

Total Experience

2 yrs 5 mos

Average Tenure

10 mos

Current Experience

The apache software foundation

2 roles

Apache Fluss PPMC

Promoted

Feb 2026 – Present · 3 mos

Apache Fluss Committer

Jun 2025 – Feb 2026 · 8 mos

Fluss is a streaming storage built for real-time analytics and AI, which can serve as the real-time data layer on top of Lakehouse to make a composable data system to cater to both Realtime + Historic needs through a unified system.
PR's Authored: https://github.com/apache/fluss/pulls?q=is%3Apr+author%3AMehulBatra+
PR's Reviewed: https://github.com/apache/fluss/pulls?q=is%3Apr+reviewed-by%3AMehulBatra

LakehouseStream Processing

Digitalocean

SSE-2

Jul 2025 – Present · 10 mos

● Founding Engineer, Data Cloud — responsible for building simple, scalable, and reliable data infrastructure for AI and analytics workloads, with ownership across hiring, architecture, and product definition.

Distributed SystemsVector DatabasesData Cloud

Dream11

2 roles

SDE-3 (Data Platform)

Promoted

Aug 2023 – Jun 2025 · 1 yr 10 mos · Mumbai, Maharashtra, India

● Designed and executed the transition to a headless data architecture using Apache Iceberg, Kafka, and Flink. Led a cross-functional team of 15 engineers to deliver a resilient platform handling 9PB of data, managing a 24k table footprint with 8k active batch and real-time processing workflows
● Replaced Redshift with an open compute platform(Trino) on Kubernetes to optimize resource consumption, reducing costs by 25% through elastic scaling. It runs almost 100k queries daily, on frugal compute while maintaining FTE, Performance SLA’s.
● Built Pelican gateway for in-house Spark platform with 3 engineers. Implemented authentication, routing, load balancing, and state management with Livy integration for automated deployments and metadata storage.
● Actively participating in the open-source data systems community and promoting knowledge-sharing standards within the team.

SDE-2 (Data Platform)

Jul 2021 – Aug 2023 · 2 yrs 1 mo · Mumbai, Maharashtra, India

● Built an in-house streaming platform (Spring, Mysql, Redis, Flink, Kafka) to make streams accessible to everyone without having prior knowledge of stream processing. Reduced real-time pipeline creation by 50% with pre-built operators & 1-click deployment for streams.
● Led ELK to Flink/OpenSearch migration with 3 engineers, supporting real-time analytics for Marketing/Product/Revenue teams. Delivered sub-100ms search performance and 50% cost reduction across 98 streams processing 8TB weekly.
● Developed a collaborative, drag-and-drop ETL platform (Spring, Mysql, Airflow, Trino) for data warehouses & lakehouses with a team of 5, empowering data analysts/engineers/scientists & eliminating 100% of manual involvement in the data model or feature creation.
● Processed TBs of data via Batch + Real Time stateful Jobs to power Reward Shop (on dream11 app can be seen as DreamCoins) with Flink, Spark, S3 and MySQL.

Pitney bowes

Software Engineer - ML Platform

Sep 2018 – Jun 2021 · 2 yrs 9 mos

● Created event-driven delivery tracking platform using Kafka, Spark Streaming, and MongoDB. Implemented windowed aggregations with exactly-once processing guarantees, reducing latency from 2 hours to 1 minute. Designed fault-tolerant topology with checkpointing and schema evolution strategies for backward compatibility.
● Designed and implemented a scalable Feature engineering service for semi-structured MongoDB data to support machine learning workflows via Airflow. Automated model training, inferencing, and deployment using Docker, FastAPI, S3 and Datadog monitoring, reducing manual intervention and simplifying data flow management for production-grade ML models.
● Engineered a self-serve anomaly detection system using FastAPI, MSSQL, Airflow and Snowflake to enable service owners to define dataset rules for anomaly detection. The system delivered batch alerts for breach scenarios, allowing proactive decision-making for new feature launches, improving operational efficiency and reducing downtime.
● Worked on implementation of a binlog-based Change Data Capture (CDC) architecture using HVR/Debezium to migrate data from MSSQL to Snowflake, optimizing OLAP workloads. Enabled seamless on-prem to cloud data flow with Snowflake’s elastic scaling, resulting in a 50% reduction in query time for time-sensitive analytical use cases.