K

Kushaagra Goyal

CTO

Palo Alto, California, United States9 yrs 2 mos experience
Most Likely To SwitchAI ML Practitioner

Key Highlights

  • Led a 25-person team to develop AI models.
  • Grew a startup from $0 to over $1M+ ARR.
  • Architected scalable AI infrastructure on AWS.
Stackforce AI infers this person is a SaaS architect specializing in AI systems and infrastructure.

Contact

Skills

Core Skills

Artificial IntelligenceSoftware InfrastructureManagementText-to-speechMachine LearningSystems Engineering

Other Skills

Technology LeadershipKubernetesBusiness StrategyProduct DevelopmentLarge Language Models (LLM)Deep LearningAWSScalaC++Software DesignEmbedded SystemsPython (Programming Language)Computer HardwareComputer ScienceVLSI

About

I build AI software systems end-to-end — from developing core foundational models to engineering large-scale infrastructure and deploying applied AI solutions such as enterprise-ready AI agents. At Gan.ai, I served as CTO where I led a 25-person team to create Indic-language text-to-speech foundational models and the backend systems powering generative video personalization at scale. This gave me firsthand experience in bridging cutting-edge model research with robust production infrastructure. At Rubrik and Databricks, I focused on large-scale compute and storage platforms, contributing to patented innovations and scaling infrastructure for data-intensive AI and big-data workloads. Always open to conversations on AI systems at scale — whether in foundation model training, RAG/agent architectures, or the future of AI hardware and infra.

Experience

9 yrs 2 mos
Total Experience
2 yrs 9 mos
Average Tenure
5 yrs 4 mos
Current Experience

Rubrik

Member of Technical Staff

Sep 2024Present · 1 yr 8 mos

  • Zeus Datastore :
  • As a tech lead, built the Zeus Datastore (on top of Azure PostgreSQL and Azure Blob Storage), a high-performance storage system managing over 1 PB of relational data to serve as the critical backbone for grounding and fine-tuning Gen-AI models.
  • Compute Infrastructure :
  • Architecting and developing a compute platform of over 1,000 Kubernetes clusters and GPU nodes, directly powering a $XXXM+ product portfolio.
Artificial IntelligenceTechnology LeadershipSoftware InfrastructureKubernetes

Gan.ai

Chief Technology Officer

Jan 2022Aug 2024 · 2 yrs 7 mos

  • Lead Generative AI R&D and Infrastructure
  • Led the development of proprietary Generative AI models, including a Text-to-Speech (TTS) engine for Indic and global languages, and high-fidelity Lipsync technology.
  • Architected a scalable AI infrastructure on AWS (Lambda, ECS, SageMaker, Batch) to power our custom in-house models, enabling the platform to generate over 2 million personalized videos daily.
  • Team Leadership and Business Growth
  • Recruited, scaled, and led a stellar team of 25 engineers, attracting top talent from leading global universities (IITs, BITS, CMU, Stanford) and tech institutions (Microsoft, Meta) to build and execute the product.
  • Grew the company from $0 to over $1M+ ARR by translating the vision into a compelling product and user-experience.
Text-to-SpeechManagementArtificial IntelligenceBusiness StrategyProduct DevelopmentTechnology Leadership+2

Databricks

Member of Technical Staff

Jul 2021Jan 2023 · 1 yr 6 mos · Mountain View, California, United States

  • Serverless Compute Platform :
  • Contributed to the development of the Databricks Serverless SQL platform deployed across AWS, Azure, and GCP.
  • https://www.databricks.com/blog/2021/08/30/announcing-databricks-serverless-sql.html
  • Enhanced container orchestration and allocation mechanisms, significantly reducing warm-up times and improving the efficiency of spark cluster allocation.
  • Designed and implemented Fair Scheduling and Rate Throttling policies to streamline warm-pool cluster allocation, ensuring resource optimization and fair usage.
  • Worked extensively with Kubernetes (AKS, EKS, GKE), Docker, Bazel, Prometheus, Scala, and more to deliver high-performance, scalable solutions.
Machine LearningArtificial IntelligenceScalaKubernetes

Freelance

Startup Advisor

Jan 2021Present · 5 yrs 4 mos

  • Help building core AI infrastructure and creating AI-native products that win over customers.
  • Guidance in building high-performing engineering teams.

Stanford university

Remote Teaching Assistant - CS 229 and CS 230

Sep 2018Dec 2018 · 3 mos · Palo Alto, California, United States · Hybrid

  • Provided remote academic support for Stanford’s core Machine Learning (CS229) and Deep Learning (CS230) courses.
  • Held online office hours and advised students on course concepts, assignments, and project work.
  • Assisted with grading and detailed feedback to reinforce student learning.

Rubrik, inc.

2 roles

Senior Software Engineer

Aug 2018Jul 2021 · 2 yrs 11 mos · Palo Alto, California, United States

  • NAS/Fileset Team:
  • Accelerated Large-File Ingestion: Solved the long-tail problem for multi-TB-scale files by sharding data across multiple nodes, enabling high-speed ingestion.
  • Ensured Backup Integrity: Led the design and implementation of a snapshot-verification service to validate backup integrity and provide auto-remediation for specific inconsistency scenarios.
  • Optimized Storage Performance: Collaborated with a cross-functional team to enhance the data storage layer format, achieving a 3–4x performance improvement for large workloads.
  • Boosted Metadata Scanning: Improved metadata scan performance for snapdiff-based NAS backups by 2–3x, delivering ultra-fast backups for large-scale workloads.
  • Continuous Data Protection (CDP) Team:
  • Achieved Near-Zero RPO: Enabled near-zero Recovery Point Objective (RPO) protection for VMware workloads by building high-performance data ingestion pipelines.
  • Developed High-Speed Data Receiver: Designed and implemented the data-receiver service to ingest high-speed streaming I/O data efficiently.
  • Built Multi-Tiered Buffering: Designed a multi-layered tiering buffer (Memory → SSD → HDD) to handle brief bursts in I/O traffic, ensuring seamless data capture.
  • Real-Time Data Replication: Implemented a real-time replicator service to stream data to remote clusters with minimal latency.
  • Enabled Resync Functionality: Designed and implemented resync feature to recover quickly from brief network outages, slowdowns, or I/O bursts, ensuring CDP system remains in sync.
C++Systems EngineeringSoftware InfrastructureSoftware Design

Software Engineer Internship

Jul 2017Sep 2017 · 2 mos · Palo Alto, California, United States

  • Distributed Systems | Scala | Cloud Computing
  • Worked in a distributed system environment using Java and Scala to design a framework to allow Rubrik jobs to request compute instances in the cloud (Amazon EC2)

Stanford university

Course Assistant - CS 110, CS 229, CS 246

Jan 2017Jun 2018 · 1 yr 5 mos · Palo Alto, California, United States · On-site

  • Served as a Teaching Assistant for CS110 (Principles of Computer Systems), CS229 (Machine Learning), and CS246 (Mining Massive Data Sets) during my Master’s at Stanford.
  • Conducted office hours, discussion sessions, and occasional lectures for undergraduate and graduate students.
  • Assisted in grading assignments and projects, providing detailed feedback to support learning.
  • Mentored students on course concepts and project implementation, helping them tackle complex technical challenges.

Samsung electronics

Machine Learning Intern

May 2015Jul 2015 · 2 mos · South Korea

  • ROS & SLAM Expertise: Developed a strong understanding of Robot Operating System (ROS) and explored Simultaneous Localization and Mapping (SLAM) techniques for map building.
  • Autonomous Exploration: Enabled autonomous exploration capabilities in AGVs for efficient and accurate map building.
  • Multi-Sensor Integration: Designed an interface to integrate open-source laser-based SLAM with Kinect camera images, leveraging 3D depth information for enhanced mapping accuracy.

Massachusetts institute of technology (mit)

Research Intern EECS Lab

May 2014Jul 2014 · 2 mos · Boston, United States of America

  • Machine Learning & Hardware Design: Designed and implemented a custom hardware architecture for efficient feature extraction from images.
  • Human Body Part Classification: Implemented decision forests to classify human body parts from depth images captured by a Kinect camera.

Education

Stanford University

Master's degree — Electrical Engineering

Indian Institute of Technology, Delhi

Bachelor of Technology (B.Tech.) — Electrical Engineering

Stanford University Graduate School of Business

Stanford Ignite 2018 — Certificate Program in Innovation and Entrepreneurship

Jun 2018Jul 2018

Stackforce found 100+ more professionals with Artificial Intelligence & Software Infrastructure

Explore similar profiles based on matching skills and experience