Sumit Sardana

Software Engineer

San Francisco, California, United States8 yrs 1 mo experience
Highly Stable

Key Highlights

  • Architected scalable ML observability systems.
  • Achieved 4-5X speed improvements in data transformation.
  • Led development of enterprise-scale ML solutions.
Stackforce AI infers this person is a SaaS-focused engineer with expertise in AI/ML infrastructure and observability.

Contact

Skills

Core Skills

Software DesignDistributed SystemsLarge Language Models (llm)Data IngestionMachine Learning

Other Skills

AHVAirflowAlgorithmsApache AirflowApache DatasketchesApache IcebergApache KafkaApache PrestoBeatsCC++Clinical DataData ProcessingData TaggingData Visualization

About

๐Ÿš€ Building Scalable AI/ML Infrastructure | Architecting High-Performance AI/ML Observability Systems Iโ€™m a Senior Software Engineer passionate about designing and building scalable AI/ML systems with a strong focus on observability, data pipelines, and performance optimization. I thrive at the intersection of engineering and product, driving technical innovation while ensuring real-world impact. What I Do ๐Ÿ”น AI/ML Observability & Monitoring โ€“ Architected and implemented a scalable ML Monitoring platform, optimizing data transformation pipelines to enhance performance and cost efficiency. ๐Ÿ”น Scalable Data Pipelines โ€“ Engineered high-performance ML Observability pipelines, achieving 4-5X speed & cost improvements in user data transformation. ๐Ÿ”น LLM Observability & Metadata Management โ€“ Developed a robust metadata tracking layer to monitor LLM evaluation and operational status. ๐Ÿ”น Optimized Query Execution for ML Metrics โ€“ Integrated core ML Observability metrics into a proprietary query execution engine, enhancing real-time monitoring capabilities. ๐Ÿ”น SQL for AI Monitoring โ€“ Designed and implemented SQL-based solutions for efficient retrieval and analysis of observability data. ๐Ÿ”น Data Sketching & Performance Benchmarking โ€“ Conducted in-depth benchmarking of quantile-based data sketches, comparing t-Digest and q-Digest for ML Observability. Analyzed algorithmic trade-offs, optimizing for accuracy, memory efficiency, and query performance in large-scale monitoring workloads. ๐Ÿ”น Schema Design & Storage Efficiency โ€“ Created a normalized table schema for storing aggregated observability statistics at scale. I love solving high-impact engineering challenges that push the boundaries of AI/ML infrastructure. Whether it's designing efficient query execution, optimizing large-scale data pipelines, or driving product decisions, I am always focused on building for scale and efficiency.

Experience

8 yrs 1 mo
Total Experience
2 yrs
Average Tenure
2 yrs 1 mo
Current Experience

Snowflake

2 roles

Senior Software Engineer

Jul 2025 โ€“ Present ยท 11 mos

Senior Software Engineer

May 2024 โ€“ Jul 2025 ยท 1 yr 2 mos

Software DesignDistributed SystemsLarge Language Models (LLM)Data sketches

Truera

2 roles

Senior Software Engineer

Promoted

Aug 2023 โ€“ May 2024 ยท 9 mos ยท Bengaluru, Karnataka, India

  • (Acquired by Snowflake)
  • Built solutions for ML at enterprise scale - LLMs and traditional ML models. My key contributions include:
  • ML Observability metrics:
  • Led the effort in implementing ML metrics computation at scale using SQL based Query Engine - Apache Presto/Trino backed by Apache Iceberg.
  • Implementing complex ML metrics using Data sketches & (User Defined Function)UDF in Trino plugin.
  • Extensive understanding of Query Engine & it's query planning.
  • Iceberg & Query Performance:
  • Analysed & Improved the Iceberg data schema & metadata to provide best performance specially focused for our use case.
  • A major win, by changing the schema to bypass TopN computation lead to ~60-70% improvement in performance.
  • LLM Observability:
  • Inspired from Trulens (open source app by Truera), added the capability in Observability to track LLM traces with in-depth analysis of individual spans.
  • Easy to evaluate LLM performance by visualising feedback-function & performance metrics.
  • Monitoring RCA (Root Cause Analysis):
  • Implementing e2e Monitoring -> Diagnostics RCA for issues discovered in production.
Software DesignPythonDistributed SystemsLarge Language Models (LLM)KubernetesData sketches+6

Software Engineer

Aug 2022 โ€“ Aug 2023 ยท 1 yr ยท Bengaluru, Karnataka, India

Large Language Models (LLM)Data sketches

Nference

3 roles

Staff Engineer

Promoted

Oct 2021 โ€“ Aug 2022 ยท 10 mos

  • Early Engineer
  • Lead the engineering efforts on nferX ML, data ingestion, augmented curation, molecular and clinical platforms:
  • Deep Model Builder
  • Built & Lead development of Deep Model Builder platform, supporting entire ML life-cycle. Tagging text, Image data and using the tagged data to train ML models on-the-go, load and infer predictions.
  • + Lead a team of 2 Engineers.
  • + MLOps complexity from training, loading & predicting. Ensuring the load is well distributed & scalable.
  • nferX orchestrator (ETL pipeline):
  • Built the nferX orchestrator (ETL pipeline) to process, streamline & ingest patient's raw data to nferX standards. Process TBs of data on daily basis to be further consumed by downstream apps: KnowledgeGraph, ElasticSearch, Spark, etc. Key to nferX nsights Platform.
  • Patient Explorer (Data Infra):
  • Built patient's (Electronic Health Record)EHR visualizer platform backed by Elasticsearch. The platform comprised of 20B+ (structured & unstructured).
  • + Reduced query time over all indexes by 10-15X.
  • Proteomics
  • Lead & Built the molecular Proteomics pipeline to process raw data using the MaxQuant software which allowed Clinical Scientists to process studies with manual feedback and enrich nferX platform with 1000s of studies in no time.
  • Built the nferX proteomics platform to query and present various statistical data and enrichments from processed studies.
Software DesignPythonMongoDBRedisApache Airflowetcd+5

Senior Software Engineer

Promoted

Apr 2021 โ€“ Oct 2021 ยท 6 mos

Software Engineer

May 2019 โ€“ Apr 2021 ยท 1 yr 11 mos

Nutanix

2 roles

Systems Reliability Engineer

Jul 2018 โ€“ Feb 2019 ยท 7 mos

Systems Reliability Engineer - Intern

Jan 2018 โ€“ Jun 2018 ยท 5 mos

Makemytrip

Software Engineer Intern - Gofro.com

Dec 2016 โ€“ Jan 2017 ยท 1 mo ยท Greater Delhi Area ยท On-site

Thinksys inc

Summer Intern

Jun 2016 โ€“ Jul 2016 ยท 1 mo

Education

Vellore Institute of Technology

Bachelor of Technology (B.Tech.) โ€” Computer Science & Engineering (with specialisation in Bioinformatics)

Jan 2014 โ€“ Jan 2018

Stackforce found 100+ more professionals with Software Design & Distributed Systems

Explore similar profiles based on matching skills and experience