V

Varun Bhadauria

Software Engineer

Lafayette, California, United States20 yrs 2 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Expert in scaling AI infrastructure for recommendation systems.
  • Led development for PS5 platform architecture and I/O stack.
  • Experience with messaging infrastructure for over 2.7 billion users.
Stackforce AI infers this person is a highly skilled engineer in AI infrastructure and embedded systems.

Contact

Skills

Core Skills

Machine LearningDistributed Systems

Other Skills

PyTorchStorage ArchitectureDeep LearningSoftware InfrastructureSoftware DesignFile SystemsCClusterStorageDebuggingData StructuresScalabilityAlgorithmsHigh AvailabilityMultithreading

About

I am currently part of Meta’s AI Infra team, focusing on scaling training and inference for recommendation system (RecSys) workloads using LLMs and long user sequences. My work spans optimizing state-of-the-art training and inference stacks built on PyTorch 2, exploring modern GPU architectures, and developing custom kernels in Triton to push the boundaries of performance and efficiency. Previously, I designed and implemented large-scale schedulers for LLM and RecSys training jobs, orchestrating workloads across 100,000+ GPU clusters. Earlier in my career, I worked on messaging infrastructure for WhatsApp, where I scaled tier-1 services to support over a billion monthly active users. My foundational experience includes developing file systems, block I/O kernel drivers, and SSD caching software for enterprise systems. I later transitioned to consumer gaming, contributing to the PlayStation 5 compression engine and building modern game authoring systems leveraging advanced I/O compression and deduplication technologies.

Experience

20 yrs 2 mos
Total Experience
3 yrs 4 mos
Average Tenure
4 yrs 11 mos
Current Experience

Meta

Staff Software Engineer

May 2021Present · 4 yrs 11 mos · Menlo Park, CA

  • Current : AI Infra / PyTorch Team for modernizing recommendation systems at Meta using LLMs , Improving reliability and efficiency of distributed training and inference for RecSys Models using PyTorch2 , GPU / Triton.
  • Previous :
  • AI Training Infra : Resource management and job scheduler for GenAI , RecSys jobs across 100K GPUs.
  • WhatsApp Infra : Scaling services for 2.7B+ MAU at WhatsApp Messaging Infra.
PyTorchMachine LearningStorage ArchitectureDeep LearningDistributed SystemsSoftware Infrastructure+1

Playstation

Senior Staff Software Engineer

May 2016May 2021 · 5 yrs · San Francisco Bay Area

  • PS5 Platform Architecture and I/O Stack , Application delta patching, Machine Learning for I/O compression and deduplication. Patents : 11449325, 11307841

Samsung semiconductor usa

Staff Software Engineer

Dec 2013May 2016 · 2 yrs 5 mos · San Francisco Bay Area

  • Lead development of Linux Kernel Block Layer caching software (called Samsung AutoCache for Enterprise Linux)
  • Lead development of Linux Kernel log structured block device for improving write endurance of client SSD/eMMC.
  • Lead development of Flash Aware compression based DRAM caching solution for Android Kernel.

Symantec

Senior Software Engineer

Jul 2007Nov 2013 · 6 yrs 4 mos · Mountainview CA

  • Lead for Veritas Cluster Filesystem (VxFS) Development & Escalations. Developed OpenStack Cinder driver for VxFS.

Kodeassets

Co-Founder

Dec 2005Dec 2006 · 1 yr · New Delhi Area, India

  • Startup for measuring and rating "code quality" of a software.

The australian software company pty ltd, sydney ,au

Research Programmer

Jul 2005Dec 2005 · 5 mos · Sydney, Australia

  • DestroyX and ViewX for Linux.

Sissa trieste italy

Programmer

May 2005Jul 2005 · 2 mos · Trieste Area, Italy

  • Developed image processing system in MATLAB for cognitive science research.

Education

Indian Institute of Technology, Delhi

Bachelor of Technology (B.Tech.) — Computer Science and Engineering

Jan 2002Jan 2007

University of California, Berkeley, Haas School of Business

Executive Product Management

Jan 2017Jan 2017

Stackforce found 100+ more professionals with Machine Learning & Distributed Systems

Explore similar profiles based on matching skills and experience