Sameer Agarwal

Co-Founder

San Francisco, California, United States17 yrs 3 mos experience

Most Likely To SwitchAI Enabled

Key Highlights

Expert in building distributed systems at scale.
Co-founder of a leading applied AI research lab.
Recognized for performance optimization in Apache Spark.

Stackforce AI infers this person is a SaaS expert with a strong focus on distributed systems and applied AI.

Contact

Skills

Core Skills

Distributed SystemsApplied AiApache SparkCloud Computing

Other Skills

ObservabilityStatistical AnomaliesReal-time Data ProcessingData AnalyticsPerformance OptimizationSecuritySQLDeployment AutomationPricing StrategyDatabasesSparkAlgorithmsMachine LearningPythonHadoop

About

Hi, I'm Sameer, I’m a CTO and systems builder working at the intersection of distributed systems, reliability, and applied AI. Over the last decade, I’ve worked on large-scale production systems across research and industry, including Databricks and Facebook, and now as a co-founder at Deductive AI. Across very different environments, one pattern has repeated itself: as software systems grow more complex, failures don’t get rarer, they get harder to explain. I focus on incident response problems that require reasoning across code, telemetry, configurations, and change history under pressure. At Deductive AI, I build reasoning-first systems that help engineering teams investigate failures faster and reduce cognitive load during incidents. I enjoy connecting with engineers and engineering leaders who think deeply about how complex systems behave at scale.

Experience

17 yrs 3 mos

Total Experience

3 yrs 8 mos

Average Tenure

9 yrs 3 mos

Current Experience

Deductive ai

Co-Founder and CTO

Jun 2023 – Present · 2 yrs 10 mos · San Francisco Bay Area

Deductive AI is an applied AI research lab building AGI to enable self-healing software systems.
Our code-aware observability platform helps numerous companies root-cause and mitigate large-scale software outages by reasoning about distributed systems, code, and statistical anomalies in real time across unprecedented volumes of structured and unstructured data. We’re a team of engineers and researchers with decades of experience building and maintaining large-scale production systems at Databricks, Facebook, ThoughtSpot, Google, Splunk, and Amazon.

Distributed SystemsApplied AIObservabilityStatistical Anomalies

Facebook

Senior Staff Software Engineer

Jan 2018 – Jan 2023 · 5 yrs · Menlo Park, CA

Area Tech Lead of Large-Scale Data Analytics at Facebook. I work with a team of 50+ amazing engineers in building distributed systems and databases that scale across geo-distributed clusters of hundreds of thousands of machines.
Received additional/discretionary equity (typically reserved for the top 1% of employees at Facebook) for all five consecutive years during my tenure.

Distributed SystemsData Analytics

The apache software foundation

Apache Spark Committer

Jan 2017 – Present · 9 yrs 3 mos

Apache Spark is the largest open source project in data processing with a state of art execution engine built around speed, ease of use, and sophisticated analytics.
Github: https://github.com/apache/spark

Distributed Systems

Databricks

Founding Software Engineer

Jan 2014 – Jan 2018 · 4 yrs · San Francisco, CA

Joined as one of the first 10 engineers and led the open source Apache Spark team (as a TL and an EM) with a deep focus on performance, scalability and security. Key contributions include:
1. Project Tungsten [1], SQL-Based Access Control [2] , Cost-Based Query Optimizer [3], Approximate Queries [4], and several key query optimizations and features in Apache Spark across SQL, PySpark and Spark Core.
2. Created Databricks Vault, Databricks' first automatic deployment engine that continuously and securely deployed our services for several hundred customers on several thousand machines.
3. Created Databricks' Pay-As-You-Go pricing infrastructure that continuously synced, reported and charged customers based on their usage while operating at the scale of several thousand machines and millions of dollars.
[1] https://spark-summit.org/eu-2016/events/sparks-performance-the-past-present-and-future
[2] https://docs.databricks.com/spark/latest/spark-sql/structured-data-access-controls.html
[3] https://spark-summit.org/2017/events/cost-based-optimizer-in-apache-spark-22/
[4] https://spark-summit.org/2015/events/blinkdb-ola-supporting-continuous-answers-in-sparksql

Apache SparkPerformance OptimizationSecurity