Pradip Thoke — CTO

An engineering leader with 18+ years of experience building scalable data platforms, AI-native systems, and high-performing teams across fast-growing startups and enterprises. At Enlyft, I head engineering and lead the charge on building AI-native GTM intelligence products for global teams. My role spans scaling our Pune hub and driving innovation across the engineering function in platform scalability, modern data architecture, and AI/ML-powered capabilities. Previously at Dream11, I led the journey of building a multi-petabyte, self-serve data platform from the ground up. Operating at 50+ billion events per day and 300M+ requests per minute at peak concurrency for 16M users, the platform powered analytics, ML, and real-time decisioning at massive scale. We embraced an in-house, open-source-first approach - implementing a Lakehouse (Apache Iceberg, Trino, Spark), a governed semantic layer, and self-serve streaming data products that made data accessible across the org. Earlier in my career at Pitney Bowes, CDK Global, and Cognizant, I worked on designing and delivering data lakes, warehouses, and BI solutions, which gave me hands-on grounding in everything from Hadoop ecosystems to observability frameworks and data governance at enterprise scale. What I’m most passionate about - • Building and scaling engineering organizations with strong culture • Architecting real-time, data-intensive, and AI-first platforms • Driving data democratization through self-serve systems for analytics and ML • Championing modern engineering practices: CI/CD, observability, automation, rapid iteration I love working at the intersection of data, engineering, and leadership - bringing people and platforms together to turn data into business impact. Always open to exchanging ideas on data strategy, AI-first product building, and scaling engineering teams!

Stackforce AI infers this person is a SaaS expert specializing in scalable data platforms and AI-driven solutions.

Location: Pune, Maharashtra, India

Experience: 18 yrs 3 mos

Skills

Ai-native Product Development
Engineering Org Scaling
Data Lakes
Data Warehousing
Data Architecture
Big Data Analytics
Hadoop
Etl
Business Intelligence

Career Highlights

Built multi-petabyte data platform at Dream11.
Led engineering teams to innovate AI-native products.
Expert in scalable data architecture and engineering practices.

Work Experience

Enlyft

Head of Engineering (2 yrs)

Dream11

VP - Data Platform & Engineering (2 yrs 10 mos)

AVP - Data Platform & Engineering (11 mos)

Director - Data Platform & Engineering (11 mos)

Architect - Data Platform & Engineering (1 yr 4 mos)

Pitney Bowes

Big Data Architect (1 yr 5 mos)

CDK Global

Technology Expert (Big Data, DW & BI) - Product Development (2 yrs 7 mos)

Cognizant Technology Solutions

Associate Projects - Data Engineering (3 yrs 4 mos)

Programmer Analyst - Data Engineering (1 yr 11 mos)

Programmer Analyst Trainee - Data Engineering (1 yr)

Education

Master of Science (MS) at Birla Institute of Technology and Science, Pilani

BE at Government College of Engineering, Karad, Maharashtra

Diploma at Maharashtra State Board of Technical Education - MSBTE, Mumbai

Secondary School at RJMV Secondary School, Umberkhede

Pradip Thoke

CTO

Pune, Maharashtra, India18 yrs 3 mos experience

AI ML PractitionerAI Enabled

Key Highlights

Built multi-petabyte data platform at Dream11.
Led engineering teams to innovate AI-native products.
Expert in scalable data architecture and engineering practices.

Stackforce AI infers this person is a SaaS expert specializing in scalable data platforms and AI-driven solutions.

Contact

Skills

Core Skills

Ai-native Product DevelopmentEngineering Org ScalingData LakesData WarehousingData ArchitectureBig Data AnalyticsHadoopEtlBusiness Intelligence

Other Skills

Agile MethodologiesAmazon AuroraAmazon RedshiftAmazon Web Services (AWS)Apache DruidApache FlinkApache KafkaApache SparkApache SqoopBackendBusiness ObjectsCassandraCoaching & MentoringCognosConfluence

About

Experience

18 yrs 3 mos

Total Experience

4 yrs

Average Tenure

2 yrs

Current Experience

Enlyft

Head of Engineering

Jun 2024 – Present · 2 yrs · Pune, Maharashtra, India · Hybrid

Leading engineering org to build AI-Native GTM intelligence products used by global teams. Scaling the Pune hub - doubled team strength, and driving major advances in platform scalability, data architecture, and ML-driven capabilities.
Build high-performing teams and engineering culture from the ground up
Deliver AI-first and AI-native systems across data, infrastructure, and application layers
Drive cross-functional execution aligned with product and business goals
Champion modern engineering practices: CI/CD, observability, automation and rapid iteration - time to production
Actively mentor and grow the next generation of tech leaders

Engineering Org ScalingTechnical Vision & ExecutionScalable Platform ArchitectureDistributed SystemsAI-Native Product DevelopmentData Lakes+9

Dream11

4 roles

VP - Data Platform & Engineering

Promoted

Jul 2021 – May 2024 · 2 yrs 10 mos · On-site

As head of Dream11's multi-petabyte in-house data platform, playing a pivotal role in shaping Dream11's data platform journey.
Dream11’s multi tenant data platform operates at an impressive scale, handling a staggering billions of data points and processing over hundreds of million requests per minute at our real-time layer. Built entirely in-house, hosted on AWS, and adheres to open-source and open standards. We have embraced a self-serve data platform/product philosophy, enabling seamless data acquisition, processing, and serving through a suite of automated data products throughout entire data lifecycle and powering Analytics, ML and related end user facing applications
Lakehouse implementation using Apache Iceberg, Spark, EMR, S3, Trino
Governed metrics layer - Dimensional Data Modeling using DBT & Spark
Semantic layer to remove redudancy, improve standardization and speed for data access
Self serve streaming platform to power streaming use cases across org

Apache SparkApache KafkaApache FlinkAmazon RedshiftApache DruidExtract, Transform, Load (ETL)+17

AVP - Data Platform & Engineering

Jul 2020 – Jun 2021 · 11 mos · On-site

Scaling up self serve platforms and improving data accessibility for data democritization
Optimized systems to handle growing scale - dimensional modeling, aggregation layer to improve speed for analytics
Implemented proper data lake for scale enabling user behaviour and transactional data querying capabilities
Developed python based frameworks for ETL pipelines and scheduling
Implemented protocols for data security and PII data handling
Scaling up self serve data platforms further - ML Feature store, data discovery, anomaly detection, ETL creations, Knowledge Graphs for fraud detection
Built ODS layer for critical transactional pipelines and serving critical downstream low latency data needs
Built real time OLAP layer on top of apache Druid
Setting up ML Platform to scale up ML initiatives across org

Director - Data Platform & Engineering

Jul 2019 – Jun 2020 · 11 mos · On-site

Setting up vision and strategy to build inhouse data platform solving analytics flexibily, security, cost and control over own data
Built clickstream analytics self serve platform using Kafka, Flink and REST APIs, achieving 99.99 uptime at 500M+ RPM at our real time layer
Decommisioned legacy SaaS third party applications
Built multiple self serve data products - funnel analytics, events analytics, anomaly alerting and user segmentation
Developed data catalog / schema standardization platform to tackle data governance - data quality, data trust, and data discovery
Further matured real time analytics stack for concurrency, real time ML, audience generation and other low latency applications

Architect - Data Platform & Engineering

Feb 2018 – Jun 2019 · 1 yr 4 mos · On-site

Joined as 1st data engineer to setup data platform ground up, ownership to take technical and people decisions and grow team to get to a mission of building entire Dream Sports data platform
Set up data warehouse for analytics, build self serve data pipeline tool to move data from multiple sources (OLTP, NoSQL) to multiple sinks (OLTP, OLAP systems)
Built Batch Jobs processing frameworks powered by SQL, Spark, Cassandra and Python for Analytics, ML and User facing applications
Built streaming pipelines for real time applications - fraud detection, anomaly detection, real time operational analytics powered by Kafka, KSQL, Kafka Connect and ELK stack
Reverse ETL - data pipelines to send data to other third party applications from data platform

Pitney bowes

Big Data Architect

Sep 2016 – Feb 2018 · 1 yr 5 mos · Pune Area, India · On-site

Pitney Bowes has wide range of in-house products in the space of mailing and shipping, location Intelligence. I was responsible for setting up Data Lake and Data Warehouse platform for the multiple products, and having common integrated data store for Big Data Analytics
Designed and delivered large-scale Big Data solutions for Pitney Bowes, integrating internal and external data sources to create monetizable value using AWS Big Data tools like S3, EMR, Spark, Redshift, and Aurora
Collaborated with Business Unit teams and Data Scientists to gather solution requirements and translate them into architecture using the Data and Analytics Platform
Utilized SnapLogic's web based ETL tool to acquire data from multiple sources and store it in the Data Lake
Developed a data portal UI for publishing the data catalog of available datasets in the Data Lake
Designed processes to transform raw data from the Data Lake into processed data for data discovery, summarization, and Data Warehouse needs
Captured data from various OLTP systems and stored it in AWS S3 buckets
Managed IOT data from Smartlink devices through Kinesis streams, storing it in Elastic Search and S3
Implemented ETL processes using AWS Data Pipeline and EMR, pushing data into RDS – Aurora Warehouse
Designed and implemented Data Warehouses for multiple subject areas using Aurora
Re-architected existing ETL and Database flat designs into proper dimensional modeling
Integrated multiple silo databases into a single common data warehouse as an Integrated Data Store (IDS)
Established standard guidelines for ETL and data modeling, providing training to the team
Presented architecture, design decisions, and work progress to higher management

Amazon RedshiftSnowflakeApache KafkaElastic Stack (ELK)Apache SparkData Lakes+7

Cdk global

Technology Expert (Big Data, DW & BI) - Product Development

Nov 2013 – Jun 2016 · 2 yrs 7 mos · Pune, Maharashtra, India · On-site

CDK Global has been providing services for US auto dealers and maintaining a Data Warehouse to provide the BI Solutions for digital advertising made by each US dealer and OEM’s.The BI Open Platform initiative was to replace existing GreenPlum MPP based data warehouse to new technology stack in Big Data Platform
Designed the entire architecture for the CDK Data Lake platform
Evaluated Hadoop distributions (Hortonworks, Cloudera, MapR) for CDK and managed Hortonworks cluster
Conducted requirements gathering for the new platform
Determined the technology stack for each Data Lake layer (Acquisition, Processing, and Aggregation)
Established technical implementation and Hadoop tech stack guidelines for each component
Defined the architecture flow for the Hadoop Ecosystem
Designed real-time log capture pipelines using Logstash, Kafka, Camus, and Flume
Implemented the aggregation layer in HBase with Phoenix SQL for real-time API queries
Transformed traditional batch processing into real-time processing in Hadoop
Implemented a Hadoop application deployment process
Enabled continuous delivery for modularized Hadoop applications, ensuring rapid testing and deployment
Resolved concurrency issues using virtual containers, overcoming traditional technology limitations
Used Sqoop to transfer structured relational data from Greenplum to HDFS on a daily basis
Proficient in job scheduling with Oozie and chaining in Apache Falcon
Developed custom monitoring scripts for Oozie jobs to detect failures and delays
Implemented scripts to invoke Oozie workflows from Informatica
Created PIG programs using PIG Latin
Developed Java and Hive UDFs for custom functionality such as MD5 and DateTime conversion
Implemented data security using Apache Ranger
Administered the Hortonworks cluster to optimize performance
Provided training to team members within the organization on Hadoop ecosystems

Agile MethodologiesHadoopShell ScriptingApache SqoopHiveApache Spark+10

Cognizant technology solutions

3 roles

Associate Projects - Data Engineering

Promoted

Jul 2010 – Nov 2013 · 3 yrs 4 mos · On-site

Onsite Tech Lead - JP Morgan Chase.
Data Warehouse & Business Intelligence Professional, worked on traditional ETL and DW & BI Projects and transitioned to modern Data Engineering space - Big Data and Cloud
Conducted requirements gathering, data analysis, and data modeling and ETL architectures
Developed Informatica objects and created Teradata tables with stored procedures and views
Utilized Teradata utilities, performed performance tuning, and parameterized connections
Developed shell scripts, scheduled jobs, and converted PL-SQL code to Informatica
Managed daily and weekly deployments of Informatica, database, and scripts
Automated manual jobs and tracked production issues
Coordinated between client business leads and Cognizant offshore development team
Conducted continuous performance tuning and provided statistical reports
Prepared project plans conducted Proof-Of-Concepts (POCs) and facilitated communication
Guided team members in technical challenges and managed knowledge
Tracked effort, schedule, and represented the project in audits
Ensured adherence to standards and served as the primary escalation point for issues
Carried out Hadoop POCs to see fitment and solving traditional tech stack scalability problems
Replaced Informatica ETLs to Hadoop as compute layer