Kushal Thakkar

CEO

San Mateo, California, United States14 yrs experience
Highly StableAI Enabled

Key Highlights

  • Expert in optimizing compute efficiency and system resilience.
  • Led initiatives improving reliability and developer experience.
  • Strong advocate for mentorship and engineering culture.
Stackforce AI infers this person is a Backend-heavy Infrastructure Engineer specializing in AI/ML and Distributed Systems.

Contact

Skills

Core Skills

Distributed SystemsAi/ml PlatformsCompute InfrastructureData Pipeline

Other Skills

AI/MLAlgorithm DesignAlgorithmsCC++Cloud InfrastructureComputer ScienceData MiningData ProcessingData StructuresIncident ManagementMapReduceMonitoring ToolsNatural Language ProcessingOpenTelemetry

About

Engineering leader specializing in distributed systems, compute infrastructure, and AI/ML platforms. I focus on scalability, efficiency, and reliability, driving strategic initiatives that power AI workloads and large-scale compute systems. I have led org-wide projects to optimize compute efficiency, system resilience, and developer experience, ensuring infrastructure scales seamlessly with evolving AI and ML demands. Beyond technical leadership, I am deeply invested in mentorship, hiring, and fostering a strong engineering culture. I advocate for best practices in reliability, cost optimization, and system design, shaping teams and technology to drive lasting impact.

Experience

Anthropic

Member of Technical Staff

Jul 2024Present · 1 yr 8 mos · San Francisco, California, United States

  • Claude Code:
  • Built the OpenTelemetry connector for Claude Code which is used by several organizations and individuals for measuring ROI of Claude Code. This connector has influenced competitor products like Codex CLI, Gemini CLI to build similar integrations in their products.
  • Enterprise features: Teams and Enterprise subscriptions for Claude, self-serve product features for measuring ROI of Claude Code
  • Infrastructure:
  • Pre-training data: Helped unlock $XM worth of capacity through efficiency projects for Spark deployment. It is used for data processing in preparation for model training for Claude models.
  • Event schematization & Ingestion:
  • dbt Orchestration platform: Built an orchestration platform for running DBT jobs for better error handling, retry mechanisms and alerting.
OpenTelemetrySparkdbtAI/MLDistributed SystemsAI/ML Platforms

Meta

Software Engineer

Oct 2015Jul 2024 · 8 yrs 9 mos · Menlo Park, California

  • Compute Org
  • We offer a serverless compute engine similar to Databricks used by almost every engineer at Meta.
  • Built a cross-region setup for Spark jobs allowing us to spread the load to multiple regions reducing
  • storage and compute costs
  • Led reliability initiatives introducing Incident & Production Readiness Reviews and SLOs to foster a sustainable reliability culture.
  • Improved Spark reliability with a reduction in storage-related failures, fewer stuck jobs, and OOMs. These critical fixes enabled the Hive-to-Spark migration at the company.
  • Provided extensive operational support, earning 250+ thanks from stakeholders (customers, partners,
  • team).
  • Data Pipeline Platform Org
  • Led the development and widespread adoption of a scheduled job monitoring tool significantly improving load times and fixing OOM issues. I drove roadmap planning, helped with hiring, onboarding, mentorship, and operational support.
  • Data Infra Efficiency Team
  • Reduced key company storage metric through tools and strategies to eliminate data inefficiency.
  • Conducted 350+ interviews over my time at the company.
SparkIncident ManagementSLOsData PipelineDistributed SystemsCompute Infrastructure

Inmobi

Software Engineer

Apr 2013Nov 2014 · 1 yr 7 mos · Bengaluru, Karnataka, India

  • Part of the team (InMobi Analytics) responsible for scaling the product from ~10M to ~1.5B events/day.

Microsoft

Software Engineer

Jul 2011Mar 2013 · 1 yr 8 mos · Bengaluru, Karnataka, India

  • BingAds
  • Part of the 'Opportunities' product team focused on providing optimization suggestions to BingAds advertisers.
  • Built automation for DQ, functional and E2E testing for measuring quality of suggestions.

Tata institute of fundamental research, mumbai

Research Intern

Jan 2009May 2009 · 4 mos

  • Research & Development in Formal Verification for real-time systems.

Education

Indian Institute of Technology, Bombay

M Tech — Computer Science

Institute of Technology, Nirma University

B. Tech. — Computer Engineering

Stackforce found 100+ more professionals with Distributed Systems & Ai/ml Platforms

Explore similar profiles based on matching skills and experience