Mehul Shah — CEO

In the past decade or more, two technology trends have intersected: the cloud, with its abundance of on-demand, computing resources, and the ubiquity of data. This makes it cheap to learn from data and makes previously intractable problems feasible. In my work, I have leveraged this to build more efficient, smarter, and easier to use cloud data systems. I'm currently focusing on the most important things in life - family and pursuing my passions in cloud and data. At Google, I was VP of Engineering for Streams and Lakes - the data integration, streaming, and open source analytics services on Google Cloud. At AWS, I ran Search Services which includes Amazon OpenSearch/Elasticsearch Service, the OpenSearch project, Open Distro, and Amazon CloudSearch. I also launched and ran two fast-growing cloud services, AWS Lake Formation and AWS Glue, and managed engineering teams in Amazon Redshift. Prior to Amazon, I was co-founder and CEO of Amiato (2011-2014), a managed ETL service in the cloud (acquired by Amazon). From 2004-2011, I was a principal scientist at HP Labs where my work spanned large-scale data management, distributed systems, and energy-efficient computing. This work has been published in top-tier database and systems conferences and has won several awards. Prior to HP, I received my PhD from U.C. Berkeley (2004) for adding parallelism, fault-tolerance, and load-balancing to the TelegraphCQ data-stream processing system. In 1999, I worked on the IBM DB2/UDB database. I received an MEng in 1997 and BS in Computer Science and Physics in 1996, all from MIT. In my spare time, I serve on the Sort Benchmark committee.

Stackforce AI infers this person is a leader in cloud data systems and analytics.

Location: Saratoga, California, United States

Experience: 29 yrs 1 mo

Career Highlights

Led engineering for Google Cloud's data integration services.
Co-founded a managed ETL service acquired by Amazon.
Pioneered scalable data management solutions at HP Labs.

Work Experience

Aryn

CEO and co-founder (3 yrs 9 mos)

Google

VP, Engineering, Streams and Lakes, Google Cloud Analytics (5 mos)

Amazon Web Services (AWS)

Director, GM, OpenSearch, Amazon CloudSearch Service, AWS Lake Formation (2 yrs)

Director, General Manager, AWS Lake Formation and AWS Glue (5 yrs 9 mos)

Sr. Manager, Amazon Redshift (1 yr 4 mos)

UC Berkeley Electrical Engineering & Computer Sciences (EECS)

Visiting Faculty (5 mos)

Amiato

CEO and co-founder (2 yrs 8 mos)

Hewlett-Packard Laboratories

Principal Research Scientist (7 yrs)

IBM Almaden Research Center

Research Intern (9 mos)

University of California, Berkeley

Graduate Student (7 yrs)

AT&T Labs, Inc.

Intern (7 mos)

Bell Laboratories

Intern (2 mos)

Education

PhD at University of California, Berkeley

MEng at Massachusetts Institute of Technology

BS at Massachusetts Institute of Technology

at Montgomery Blair High School

Mehul Shah

CEO

Saratoga, California, United States29 yrs 1 mo experience

Most Likely To SwitchHighly Stable

Key Highlights

Led engineering for Google Cloud's data integration services.
Co-founded a managed ETL service acquired by Amazon.
Pioneered scalable data management solutions at HP Labs.

Stackforce AI infers this person is a leader in cloud data systems and analytics.

Contact

Skills

Other Skills

Distributed SystemsC++PythonCJavaData ManagementPostgreSQLDB2ScalabilityDatabasesAnalyticsCloud ComputingProgrammingSoftware DevelopmentAlgorithms

About

Experience

29 yrs 1 mo

Total Experience

3 yrs 1 mo

Average Tenure

3 yrs 9 mos

Current Experience

Aryn

CEO and co-founder

Aug 2022 – Present · 3 yrs 9 mos · Mountain View, California, United States · Hybrid

Google

VP, Engineering, Streams and Lakes, Google Cloud Analytics

Jan 2022 – Jun 2022 · 5 mos · Sunnyvale, California, United States

I ran engineering for "Streams and Lakes" - the data integration, streaming, and open source analytics services on Google Cloud. These include Dataproc, Dataflow, PubSub, Dataplex, DPMS, Data Catalog, Composer, Data Fusion, and more. Customers use these "first mile" services to move, prepare, organize, and analyze their data in data lakes and data warehouses.

Amazon web services (aws)

3 roles

Director, GM, OpenSearch, Amazon CloudSearch Service, AWS Lake Formation

Promoted

Jan 2020 – Jan 2022 · 2 yrs

I ran the Amazon OpenSearch Service (successor to Amazon Elasticsearch Service), the OpenSearch project, Amazon CloudSearch Service, and AWS Lake Formation. I was responsible for go-to-market, product, engineering, and operations. I helped set strategy and oversaw the fork of OpenSearch from Elasticsearch.
I was part of analytics at AWS. We were re-architecting big data systems for the cloud, where resources are plentiful, data is abundant, and everything is a service. I had a chance to put my crazy ideas into practice and deliver them to thousands of customers.

Director, General Manager, AWS Lake Formation and AWS Glue

Jan 2015 – Oct 2020 · 5 yrs 9 mos

I launched and ran engineering, operations, and go to market for two fast-growing cloud services: AWS Lake Formation and AWS Glue. We are a multi-disciplinary team of distributed system and database architects, front-end and UX specialists, and ML experts.
AWS Lake Formation is a new service that makes it easy to setup, secure, and manage Data Lakes. Data Lakes -- the evolution of enterprise data warehouses -- are curated repositories of structured and unstructured data that allow self-serve analytics for modern use-cases: IoT, ML model training, data science, and more. AWS Lake Formation leverages cutting-edge ML techniques for data ingestion and cleaning, and simplifies data security and governance. We envision it as the locus of control for all data in an enterprise.
AWS Glue is a serverless data integration service which powers AWS Lake Formation. AWS Glue offers a centralized metadata service -- Data Catalog -- with crawlers that automatically extract and index metadata to enable data discovery. It also provides an ETL (extract-transform-and-load) engine that automates much of the undifferentiated heavy-lifting for moving and transforming data sets. At its core, it uses the Schema-lift technology from Amiato that makes it easy to handle modern semi-structured data sets.

Sr. Manager, Amazon Redshift

May 2014 – Sep 2015 · 1 yr 4 mos

I led the development and public launch of two headlining features in Amazon Redshift: interleaved sort keys (Z-indexing) and user-defined-functions (UDFs). Interleaved sort keys are an alternative to indexing and projections for columnar databases that allow fast search on tables across multiple dimensions. UDFs allow users to customize Redshift analyses in Python for modern big-data use-cases.

Uc berkeley electrical engineering & computer sciences (eecs)

Visiting Faculty

Jan 2018 – Jun 2018 · 5 mos · Berkeley, CA

In the Spring 2018 semester, I taught the Introduction to Database Systems course, an upper-division course for juniors and seniors in the EECS department. The course had 450+ students enrolled and a staff of 10 TAs.

Amiato

CEO and co-founder

Sep 2011 – May 2014 · 2 yrs 8 mos · Palo Alto, CA

Amiato was a fully managed, real-time ETL cloud service. It bridged the gap between unstructured data and the world of structured business intelligence (BI) tools. Schema-lift was the technology powering Amiato. Schema-lift automatically infers the structure of semi-structured data (e.g. JSON logs), transforms it into tables, and loads data warehouses.
As CEO and co-founder, my responsibilities spanned: fund-raising, recruiting, sales, go-to-market, and managing customer and partner relationships. I led the M&A process for our successful acquisition by Amazon. I raised our funding which included notable investors: Andreessen-Horowitz, Ignition, and YC. I also helped build early prototypes and design Schema-lift.

Hewlett-packard laboratories

Principal Research Scientist

Sep 2004 – Sep 2011 · 7 yrs · Palo Alto, CA

My work at HP spanned large-scale data management, distributed computing and energy-efficiency. Below I list my most significant projects in reverse chronological order.
Armonia: Principal investigator for Armonia project -- a scalable, distributed, main-memory data management platform that offers strongly consistent low-latency operations and complex on-the-fly analytics. Applications include financial trading and social networking.
HP-KVS: Built a highly available, low-cost key-value service for the cloud. HP-KVS is an eventually consistent, erasure-coded, large object store that spans multiple geographies.
Sinfonia: A highly scalable, distributed transactional store for building data-center infrastructure applications. Built a large-scale distributed B-tree, clustered file-system, and group communication using Sinfonia. Basis for the Armonia project. Won best paper, SOSP 2007.
Energy-efficient systems: Characterized and optimized the energy use of computer systems as a whole, from storage to memory to compute. Inventor and maintainer of the JouleSort bechmark, the first holistic energy efficiency benchmark, which has inspired efficient server designs and influenced other benchmarks. Investigated energy efficiency of DB workloads.
Other work: Designed software and hardware for non-volatile RAM technologies like NAND Flash and Memristors. Developed methods for long-term preservation of digital information.

Ibm almaden research center

Research Intern

Jan 1999 – Oct 1999 · 9 mos · Almaden

Investigated alternative strategies for implementing collection types in IBM DB2/UDB. Designed language extensions for querying collection types. Gained experience with administration and software development in DB2/UDB.

University of california, berkeley

Graduate Student

Jan 1997 – Jan 2004 · 7 yrs · Berkeley, CA

Thesis: “Flux: A Mechanism for Building Highly-Available, Fault-Tolerant, Scalable Dataflows”: In the TelegraphCQ system, my dissertation focused on making parallel CQ dataflows – computations that analyze high-throughput streaming data in real time – highly available, fault-tolerant, and automatically load-balancing.
Continuously Adaptive Continuous Queries (CACQ): Developed an adaptive query processing system that executes numerous long-running queries simultaneously over streaming data.
AMDB: A debugger and profiler for search indexes on non-traditional data types like audio and images. Designed UI for navigating high-fanout search trees. (Released open-source).

At&t labs, inc.

Intern

Jun 1996 – Jan 1997 · 7 mos · Murray Hill, NJ

MEng Thesis (jointly done at MIT): "ReferralWeb: A Resource Location System Guided by Personal Relations." The first system to automatically discover and extract social networks by mining publicly available data on the web. ReferralWeb also automatically finds experts on user-specified topics and recommends paths in the extracted social graph to connect users with those experts.