Mohit Saxena

CTO

Palo Alto, California, United States13 yrs 6 mos experience
Most Likely To SwitchAI Enabled

Key Highlights

  • Led AI innovations for AWS Analytics services.
  • Co-founded and expanded AWS Glue to global markets.
  • Authored 20+ papers and filed 16 patents.
Stackforce AI infers this person is a SaaS expert with a focus on big data analytics and AI-driven solutions.

Contact

Skills

Core Skills

Ai AgentsAmazon Web Services (aws)Apache SparkBig Data AnalyticsDatabase Systems

Other Skills

AWS GlueAlgorithmsAmazon EMRCC++Data IntegrationDistributed SystemsFile SystemsGenerative AIHDFSHadoopLeadershipLinuxLinux KernelMachine Learning

About

Passionate leader with proven experience in driving product roadmaps, scaling business & operations, growing people & geo-distributed functions, building cross-org partnerships & leading technical innovations with AI/ML capabilities & Agents for Data Analytics. Deep expertise in distributed storage/compute systems, query engines & ML systems. Led the vision, execution & delivery of industry-differentiating AI capabilities for AWS Analytics services (AWS Glue, Amazon EMR & Amazon SageMaker Notebooks) to optimize the experience of thousands of enterprise customers building big data analytics applications with Apache Spark, Amazon S3, data lakes & warehouses. 🔹 Spark troubleshooting agent reduces troubleshooting time from hours to minutes: ✨ Converse via Cursor, Kiro: no longer manually going to Logs, Spark History Server across Analytics services. ✨ Agent-driven notifications with Apache Airflow DAGs & Amazon Eventbridge 🎥 Blog: https://lnkd.in/gSJs2xQT 🔹 Spark upgrades agent reduces upgrade time from months to weeks: ✨ Agent analyzes Spark application & test code, automates compile/build, code rewrites, configuration updates, dependency upgrades, runtime validation on AWS Analytics services, data-quality checks & outputs upgrade summary 🎥 Blog: https://lnkd.in/gFSeeKhP 🔹 Amazon Q Data Integration: integrate and transform data in English 🎥 Blog: https://lnkd.in/gFAs4Bu8 🔹 Launched the first Fully-managed MCP (Remote Model Context Protocol) service for AWS Analytics with AI tools for Analytics Agents. https://awslabs.github.io/mcp/servers/sagemaker-unified-studio-spark-upgrade-mcp-server 🔹 Open-source: Led open-sourcing efforts for KubeFlow Spark History MCP Server project (https://lnkd.in/gfWD7mBD). MCP server for Analytics: Glue, EMR & Athena (https://lnkd.in/efSAsVan). Downloaded & used by thousands of users via PyPi. 🔹 Re:Invent talk on AI capabilities with AWS Analytics: https://www.youtube.com/watch?v=FM1K9fi8iTA 🔹 Senior engineering leader for AWS Services: Co-led founding teams that drove the global launch of two new AWS Analytics services (AWS Glue & Lake Formation) from 2017-2020, operated & grew business from ground up to hundreds of thousands of customers. Running the AWS Glue engineering org growing the product and business from Glue 2.0 (2020) to Glue 5.0 (2025+). Engineering journey that we presented at VLDB 2023: https://www.amazon.science/publications/the-story-of-aws-glue 🔹 16 patents, 20+ papers at top-tier DB/systems conferences including VLDB, Usenix ATC, EuroSys (1300+ citations), 20+ blogs on Analytics and AI.

Experience

Amazon web services (aws)

3 roles

Head of Generative AI for Data Processing with AWS Glue and Amazon EMR

Promoted

Apr 2020 – Present · 5 yrs 11 mos

  • 🔹 Apache Spark troubleshooting agent for AWS Analytics (Glue, EMR, SageMaker): https://lnkd.in/gSJs2xQT
  • 🔹 Apache Spark upgrades agent for Amazon EMR: https://lnkd.in/gFSeeKhP
  • 🔹 Amazon Q and GenAI capabilities for Apache Spark in SageMaker Data Processing: https://www.youtube.com/watch?v=FM1K9fi8iTA
  • 🔹 Generative Data Integration with Amazon Q: https://aws.amazon.com/glue/amazon-q-integration-generative-ai
  • 🔹 Data Infrastructure & Runtime for serverless Apache Spark, Ray and Python Engines: https://www.amazon.science/publications/the-story-of-aws-glue
Strategic LeadershipAI AgentsAmazon Web Services (AWS)

Engineering Leader: AWS Glue Data Infrastructure & Runtime (Spark)

Promoted

Jan 2019 – Present · 7 yrs 2 mos

  • AWS Glue and Lake Formation: Responsible to oversee teams that manage the core query engine (Apache Spark) and data infrastructure of the leading serverless service offered on AWS for big data processing and data integration using Apache Spark, Apache Ray and Python. Led the service from GA to global expansion in 40+ geographical regions over the last 8+ years.
  • Few highlights of team's work:
  • 🔹 Serverless auto-scaling for improved cost efficiency and infrastructure utilization (>2x) for distributed data processing using Apache Spark.
  • 🔹 Data governance and ACID transactions for data lakes built on Amazon S3 object storage with Apache Spark.
  • 🔹 Innovations such as 2.5x performance speedup for faster data processing with columnar vectorization, SIMD CPU instructions, and disaggregation of compute and storage with S3-backed distributed data shuffle.
  • 🔹 Open sourcing of AWS Glue Spark runtime DI libraries and S3 Shuffle Manager available with Maven and github. Used globally by thousands of users on a daily basis.
  • 🔹 First AWS Marketplace service for custom data connectors to cross-cloud data stores (Google BigQuery), SaaS applications (Salesforce), data warehouses (Snowflake) and AWS services (OpenSearch).
  • ✨ Open-source Cloud Shuffle Storage for Apache Spark: https://aws.amazon.com/blogs/big-data/introducing-the-cloud-shuffle-storage-plugin-for-apache-spark/
  • ✨ AWS Glue 5.0 with optimized Spark runtime and FGAC:
  • https://aws.amazon.com/blogs/big-data/introducing-aws-glue-5-0-for-apache-spark/
  • ✨ AWS Glue 4.0: https://aws.amazon.com/blogs/aws/new-aws-glue-4-0-new-and-updated-engines-more-data-formats-and-more/
  • ✨ Introducing AWS Glue connector marketplace: https://aws.amazon.com/blogs/big-data/developing-testing-and-deploying-custom-connectors-for-your-data-stores-with-aws-glue/
Apache SparkBig Data Analytics

Technical Lead - Sr. Software Engineer

Jan 2017 – Jan 2019 · 2 yrs

  • AWS Glue and Lake Formation

Ibm research

Research Staff Member

Jan 2013 – Jan 2017 · 4 yrs · Almaden, San Jose, California

  • ➢ Products: Cloudant NoSQL database (CDC to IBM Cloud Storage), IBM General Parallel File System (GPFS), DB2 BLU (in-memory column store)
  • ➢ Innovations: Led the design and development of prototypes for adaptive memory caching in Apache Spark, and dual-erasure coded storage for faster data recovery in HDFS. This work demonstrated up to 70% performance speedup than open-source Spark, and 45% reduction in recovery time compared to single-coded storage systems such as Google Colossus FS, Facebook HDFS, and Microsoft Azure storage system.
  • ➢ Filed 15+ patents on innovations that went in IBM products. Published papers at top-tier venues including Usenix FAST, Usenix HotStorage and SIGMOD DaMoN.
Big Data AnalyticsDatabase Systems

Hewlett-packard laboratories

Research Intern

May 2010 – Aug 2010 · 3 mos · Palo Alto, California

  • Designed and implemented a persistent, scalable, and ACID-compliant memory store.

Qualcomm corporate r&d

Summer Intern

May 2007 – Jul 2007 · 2 mos · San Diego, California

  • Implemented a compiler for automatically generating stubs to serialize/deserialize message buffers for LTE transport protocol.

Inria research labs

Research Intern

May 2005 – Jul 2005 · 2 mos · Rennes, France

  • Implemented a new fault diagnosis algorithm for distributed systems using PetriNets.

Education

University of Wisconsin-Madison

Doctor of Philosophy (PhD) — Computer Sciences

Indian Institute of Technology, Delhi

B.Tech. — Computer Sciences

Purdue University

MS — Computer Sciences

Indian Institute of Technology, Delhi

B.Tech — Computer Science

Stackforce found 100+ more professionals with Ai Agents & Amazon Web Services (aws)

Explore similar profiles based on matching skills and experience