Pradip Thoke

CTO

Pune, Maharashtra, India18 yrs 3 mos experience
AI ML PractitionerAI Enabled

Key Highlights

  • Built multi-petabyte data platform at Dream11.
  • Led engineering teams to innovate AI-native products.
  • Expert in scalable data architecture and engineering practices.
Stackforce AI infers this person is a SaaS expert specializing in scalable data platforms and AI-driven solutions.

Contact

Skills

Core Skills

Ai-native Product DevelopmentEngineering Org ScalingData LakesData WarehousingData ArchitectureBig Data AnalyticsHadoopEtlBusiness Intelligence

Other Skills

Agile MethodologiesAmazon AuroraAmazon RedshiftAmazon Web Services (AWS)Apache DruidApache FlinkApache KafkaApache SparkApache SqoopBackendBusiness ObjectsCassandraCoaching & MentoringCognosConfluence

About

An engineering leader with 18+ years of experience building scalable data platforms, AI-native systems, and high-performing teams across fast-growing startups and enterprises. At Enlyft, I head engineering and lead the charge on building AI-native GTM intelligence products for global teams. My role spans scaling our Pune hub and driving innovation across the engineering function in platform scalability, modern data architecture, and AI/ML-powered capabilities. Previously at Dream11, I led the journey of building a multi-petabyte, self-serve data platform from the ground up. Operating at 50+ billion events per day and 300M+ requests per minute at peak concurrency for 16M users, the platform powered analytics, ML, and real-time decisioning at massive scale. We embraced an in-house, open-source-first approach - implementing a Lakehouse (Apache Iceberg, Trino, Spark), a governed semantic layer, and self-serve streaming data products that made data accessible across the org. Earlier in my career at Pitney Bowes, CDK Global, and Cognizant, I worked on designing and delivering data lakes, warehouses, and BI solutions, which gave me hands-on grounding in everything from Hadoop ecosystems to observability frameworks and data governance at enterprise scale. What I’m most passionate about - • Building and scaling engineering organizations with strong culture • Architecting real-time, data-intensive, and AI-first platforms • Driving data democratization through self-serve systems for analytics and ML • Championing modern engineering practices: CI/CD, observability, automation, rapid iteration I love working at the intersection of data, engineering, and leadership - bringing people and platforms together to turn data into business impact. Always open to exchanging ideas on data strategy, AI-first product building, and scaling engineering teams!

Experience

18 yrs 3 mos
Total Experience
4 yrs
Average Tenure
2 yrs
Current Experience

Enlyft

Head of Engineering

Jun 2024Present · 2 yrs · Pune, Maharashtra, India · Hybrid

  • Leading engineering org to build AI-Native GTM intelligence products used by global teams. Scaling the Pune hub - doubled team strength, and driving major advances in platform scalability, data architecture, and ML-driven capabilities.
  • Build high-performing teams and engineering culture from the ground up
  • Deliver AI-first and AI-native systems across data, infrastructure, and application layers
  • Drive cross-functional execution aligned with product and business goals
  • Champion modern engineering practices: CI/CD, observability, automation and rapid iteration - time to production
  • Actively mentor and grow the next generation of tech leaders
Engineering Org ScalingTechnical Vision & ExecutionScalable Platform ArchitectureDistributed SystemsAI-Native Product DevelopmentData Lakes+9

Dream11

4 roles

VP - Data Platform & Engineering

Promoted

Jul 2021May 2024 · 2 yrs 10 mos · On-site

  • As head of Dream11's multi-petabyte in-house data platform, playing a pivotal role in shaping Dream11's data platform journey.
  • Dream11’s multi tenant data platform operates at an impressive scale, handling a staggering billions of data points and processing over hundreds of million requests per minute at our real-time layer. Built entirely in-house, hosted on AWS, and adheres to open-source and open standards. We have embraced a self-serve data platform/product philosophy, enabling seamless data acquisition, processing, and serving through a suite of automated data products throughout entire data lifecycle and powering Analytics, ML and related end user facing applications
  • Lakehouse implementation using Apache Iceberg, Spark, EMR, S3, Trino
  • Governed metrics layer - Dimensional Data Modeling using DBT & Spark
  • Semantic layer to remove redudancy, improve standardization and speed for data access
  • Self serve streaming platform to power streaming use cases across org
Apache SparkApache KafkaApache FlinkAmazon RedshiftApache DruidExtract, Transform, Load (ETL)+17

AVP - Data Platform & Engineering

Jul 2020Jun 2021 · 11 mos · On-site

  • Scaling up self serve platforms and improving data accessibility for data democritization
  • Optimized systems to handle growing scale - dimensional modeling, aggregation layer to improve speed for analytics
  • Implemented proper data lake for scale enabling user behaviour and transactional data querying capabilities
  • Developed python based frameworks for ETL pipelines and scheduling
  • Implemented protocols for data security and PII data handling
  • Scaling up self serve data platforms further - ML Feature store, data discovery, anomaly detection, ETL creations, Knowledge Graphs for fraud detection
  • Built ODS layer for critical transactional pipelines and serving critical downstream low latency data needs
  • Built real time OLAP layer on top of apache Druid
  • Setting up ML Platform to scale up ML initiatives across org

Director - Data Platform & Engineering

Jul 2019Jun 2020 · 11 mos · On-site

  • Setting up vision and strategy to build inhouse data platform solving analytics flexibily, security, cost and control over own data
  • Built clickstream analytics self serve platform using Kafka, Flink and REST APIs, achieving 99.99 uptime at 500M+ RPM at our real time layer
  • Decommisioned legacy SaaS third party applications
  • Built multiple self serve data products - funnel analytics, events analytics, anomaly alerting and user segmentation
  • Developed data catalog / schema standardization platform to tackle data governance - data quality, data trust, and data discovery
  • Further matured real time analytics stack for concurrency, real time ML, audience generation and other low latency applications

Architect - Data Platform & Engineering

Feb 2018Jun 2019 · 1 yr 4 mos · On-site

  • Joined as 1st data engineer to setup data platform ground up, ownership to take technical and people decisions and grow team to get to a mission of building entire Dream Sports data platform
  • Set up data warehouse for analytics, build self serve data pipeline tool to move data from multiple sources (OLTP, NoSQL) to multiple sinks (OLTP, OLAP systems)
  • Built Batch Jobs processing frameworks powered by SQL, Spark, Cassandra and Python for Analytics, ML and User facing applications
  • Built streaming pipelines for real time applications - fraud detection, anomaly detection, real time operational analytics powered by Kafka, KSQL, Kafka Connect and ELK stack
  • Reverse ETL - data pipelines to send data to other third party applications from data platform

Pitney bowes

Big Data Architect

Sep 2016Feb 2018 · 1 yr 5 mos · Pune Area, India · On-site

  • Pitney Bowes has wide range of in-house products in the space of mailing and shipping, location Intelligence. I was responsible for setting up Data Lake and Data Warehouse platform for the multiple products, and having common integrated data store for Big Data Analytics
  • Designed and delivered large-scale Big Data solutions for Pitney Bowes, integrating internal and external data sources to create monetizable value using AWS Big Data tools like S3, EMR, Spark, Redshift, and Aurora
  • Collaborated with Business Unit teams and Data Scientists to gather solution requirements and translate them into architecture using the Data and Analytics Platform
  • Utilized SnapLogic's web based ETL tool to acquire data from multiple sources and store it in the Data Lake
  • Developed a data portal UI for publishing the data catalog of available datasets in the Data Lake
  • Designed processes to transform raw data from the Data Lake into processed data for data discovery, summarization, and Data Warehouse needs
  • Captured data from various OLTP systems and stored it in AWS S3 buckets
  • Managed IOT data from Smartlink devices through Kinesis streams, storing it in Elastic Search and S3
  • Implemented ETL processes using AWS Data Pipeline and EMR, pushing data into RDS – Aurora Warehouse
  • Designed and implemented Data Warehouses for multiple subject areas using Aurora
  • Re-architected existing ETL and Database flat designs into proper dimensional modeling
  • Integrated multiple silo databases into a single common data warehouse as an Integrated Data Store (IDS)
  • Established standard guidelines for ETL and data modeling, providing training to the team
  • Presented architecture, design decisions, and work progress to higher management
Amazon RedshiftSnowflakeApache KafkaElastic Stack (ELK)Apache SparkData Lakes+7

Cdk global

Technology Expert (Big Data, DW & BI) - Product Development

Nov 2013Jun 2016 · 2 yrs 7 mos · Pune, Maharashtra, India · On-site

  • CDK Global has been providing services for US auto dealers and maintaining a Data Warehouse to provide the BI Solutions for digital advertising made by each US dealer and OEM’s.The BI Open Platform initiative was to replace existing GreenPlum MPP based data warehouse to new technology stack in Big Data Platform
  • Designed the entire architecture for the CDK Data Lake platform
  • Evaluated Hadoop distributions (Hortonworks, Cloudera, MapR) for CDK and managed Hortonworks cluster
  • Conducted requirements gathering for the new platform
  • Determined the technology stack for each Data Lake layer (Acquisition, Processing, and Aggregation)
  • Established technical implementation and Hadoop tech stack guidelines for each component
  • Defined the architecture flow for the Hadoop Ecosystem
  • Designed real-time log capture pipelines using Logstash, Kafka, Camus, and Flume
  • Implemented the aggregation layer in HBase with Phoenix SQL for real-time API queries
  • Transformed traditional batch processing into real-time processing in Hadoop
  • Implemented a Hadoop application deployment process
  • Enabled continuous delivery for modularized Hadoop applications, ensuring rapid testing and deployment
  • Resolved concurrency issues using virtual containers, overcoming traditional technology limitations
  • Used Sqoop to transfer structured relational data from Greenplum to HDFS on a daily basis
  • Proficient in job scheduling with Oozie and chaining in Apache Falcon
  • Developed custom monitoring scripts for Oozie jobs to detect failures and delays
  • Implemented scripts to invoke Oozie workflows from Informatica
  • Created PIG programs using PIG Latin
  • Developed Java and Hive UDFs for custom functionality such as MD5 and DateTime conversion
  • Implemented data security using Apache Ranger
  • Administered the Hortonworks cluster to optimize performance
  • Provided training to team members within the organization on Hadoop ecosystems
Agile MethodologiesHadoopShell ScriptingApache SqoopHiveApache Spark+10

Cognizant technology solutions

3 roles

Associate Projects - Data Engineering

Promoted

Jul 2010Nov 2013 · 3 yrs 4 mos · On-site

  • Onsite Tech Lead - JP Morgan Chase.
  • Data Warehouse & Business Intelligence Professional, worked on traditional ETL and DW & BI Projects and transitioned to modern Data Engineering space - Big Data and Cloud
  • Conducted requirements gathering, data analysis, and data modeling and ETL architectures
  • Developed Informatica objects and created Teradata tables with stored procedures and views
  • Utilized Teradata utilities, performed performance tuning, and parameterized connections
  • Developed shell scripts, scheduled jobs, and converted PL-SQL code to Informatica
  • Managed daily and weekly deployments of Informatica, database, and scripts
  • Automated manual jobs and tracked production issues
  • Coordinated between client business leads and Cognizant offshore development team
  • Conducted continuous performance tuning and provided statistical reports
  • Prepared project plans conducted Proof-Of-Concepts (POCs) and facilitated communication
  • Guided team members in technical challenges and managed knowledge
  • Tracked effort, schedule, and represented the project in audits
  • Ensured adherence to standards and served as the primary escalation point for issues
  • Carried out Hadoop POCs to see fitment and solving traditional tech stack scalability problems
  • Replaced Informatica ETLs to Hadoop as compute layer
InformaticaOracleTeradataShell ScriptingExtract, Transform, Load (ETL)Cognos+8

Programmer Analyst - Data Engineering

Promoted

Jul 2008Jun 2010 · 1 yr 11 mos · On-site

  • Senior Data Warehouse Developer in Banking and Financial domain for JP Morgan Chase
InformaticaTeradata Data WarehouseUnix Shell ScriptingOnsite-Offshore co-ordinationBusiness IntelligenceData Warehousing

Programmer Analyst Trainee - Data Engineering

Jun 2007Jun 2008 · 1 yr · On-site

  • Data Warehouse Developer - worked for Banking and Financial domain
InformaticaUnix Shell ScriptingOracleBusiness IntelligenceBusiness ObjectsData Warehousing

Education

Birla Institute of Technology and Science, Pilani

Master of Science (MS) — Software Engineering

Jan 2011Jan 2013

Government College of Engineering, Karad, Maharashtra

BE — Information Technology

Jan 2004Jan 2007

Maharashtra State Board of Technical Education - MSBTE, Mumbai

Diploma — Information Technology

Jan 2001Jan 2004

RJMV Secondary School, Umberkhede

Secondary School — High School/Secondary Diplomas and Certificates

Jan 1990Jan 2000

Stackforce found 100+ more professionals with Ai-native Product Development & Engineering Org Scaling

Explore similar profiles based on matching skills and experience