M

Mukul Budania

Senior Software Engineer

Seattle, Washington, United States15 yrs 8 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Led the largest foundational model training cluster at Amazon.
  • Reduced onboarding time for financial systems from months to weeks.
  • Expert in building resilient and scalable distributed systems.
Stackforce AI infers this person is a SaaS and Fintech expert with a focus on scalable distributed systems and ML infrastructure.

Contact

Skills

Core Skills

MlopsMl InfrastructureSystem DesignApi DevelopmentSystem ModernizationTesting FrameworksData ManagementPlatform Design

Other Skills

KubernetesAmazon Web Services (AWS)Amazon SageMakerObservabilityCost ReductionRESTful WebServicesCommunicationComputer ScienceOnboarding ProcessesRegression TestingTypeScriptCross-functional Team LeadershipAlgorithmsProgrammingJava

About

As a seasoned Software Development Engineer (SDE III) with over 12 years of experience, I specialize in designing and building large-scale distributed systems and driving technical innovation. From startups to Big Tech, my career has been defined by tackling complex problems, leading high-impact engineering teams, and delivering scalable, production-grade solutions. Currently at Amazon, I lead efforts in MLOps and large-scale ML infrastructure, including Amazon's foundational model training cluster. Previously, I led initiatives that cut onboarding time from months to weeks for financial systems handling billion-dollar revenues. My technical toolkit spans Java, Python, AWS, Kubernetes, and ML/LLM applications. I’m passionate about distributed systems, ML-driven development, and engineering leadership. Whether improving MTTR on ML workloads or redesigning operational processes, I aim to build systems that are not only functional but resilient and future-proof. Let’s connect if you’re interested in system design, ML infra, or simply like solving hard engineering problems!

Experience

15 yrs 8 mos
Total Experience
2 yrs 7 mos
Average Tenure
8 yrs 9 mos
Current Experience

Amazon

3 roles

Senior Software Engineer

Promoted

Apr 2024Present · 2 yrs 2 mos

  • Leading a team of 15 engineers with active mentoring and leading processes to bring accountability and simplifying operations.
  • SageMaker HyperPod
  • Technical lead for Amazon Nova Infra, the largest foundational model training cluster in company history. Delivered a resilient, secure ML infrastructure powering AGI efforts, with enhanced observability.
  • Selling Partner Financial Tech
  • Proposed and implemented a Blue print design to accelerate onboarding of specail fees from 12 weeks to 1 week.
  • Built and shipped global-scale APIs reducing the cost of serving from 95k to 14k per month.
KubernetesAmazon Web Services (AWS)Amazon SageMakerMLOpsML Infrastructure

SDE2

Jan 2020Apr 2024 · 4 yrs 3 mos

  • Selling Partner Financial Tech
  • Led a team to modernize SPFT onboarding, reducing time from 44 to 12 weeks.
  • Built regression testing framework covering 99% of traffic and enhanced support for 14+ shipment use cases.
  • Reduced rendering latency from 300ms to 20ms and simplified configurations from 9 to 1.
  • Created a library for implicit feedback collection adopted by 14 services across 7 teams to enhance customer sentiment tracking.
  • Restructured dashboards across 14 services and rolled out dashboard best practices across 3 teams.
  • Designed and implemented a peak readiness template adopted by 13 teams across 3 orgs, identifying multiple critical service gaps.
  • Improved alarm handling and ticket routing workflows, raising operational ticket score from 83 to 98 in 3 months.
  • Ebs Stats
  • Designed and implemented the Query and Orchestration layer for the EBS Stats Platform, serving 5PB of daily data.
  • Reduced client onboarding time from 18 weeks to 3 by redesigning the platform for self-service and simplifying storage/control layers.
  • Refactored the storage layer as a service and partnered to bring idempotency and IAC into the control layer.
  • Introduced key operational processes: Bar Raiser reviews, production readiness checklists, and on-call handoff procedures, reducing tickets by 80% in 3 months.
RESTful WebServicesCommunicationComputer ScienceSystem ModernizationTesting Frameworks

SDE2

Sep 2017Jan 2020 · 2 yrs 4 mos

RESTful WebServicesCommunicationComputer Science

Expedia, inc.

SDE 2

Feb 2017Aug 2017 · 6 mos · Gurgaon, Haryana, India

RESTful WebServicesCommunicationComputer Science

Paytm

Senior Software Engineer

Jun 2016Jan 2017 · 7 mos · Noida Area, India

RESTful WebServicesCommunicationComputer Science

Oyo rooms

Senior Software Engineer

Dec 2015May 2016 · 5 mos · Gurgaon, India

  • Worked as a full stack Developer.
  • Ruby on Rails
  • Ember JS
RESTful WebServicesCommunicationComputer Science

Talentica software

Senior Software Developer

Oct 2014Dec 2015 · 1 yr 2 mos · Pune Area, India

  • Worked with 3 Clients:
  • 1. Invoke Solutions - Survey taking platform
  • 2. HomeUnion - Property selling platform
  • 3. Viewics - Health Care based analytics solutions
RESTful WebServicesCommunicationComputer Science

Oracle

Member Technical Staff

Jul 2010Oct 2014 · 4 yrs 3 mos · Bangalore

RESTful WebServicesCommunicationComputer Science

Education

National Institute of Technology, Tiruchirappalli

Bachelor of Technology (BTech) — Computer Science And Engineering

Jan 2006Jan 2010

Navrachana Higher Secondary School

Jan 2000Jan 2006

Stackforce found 100+ more professionals with Mlops & Ml Infrastructure

Explore similar profiles based on matching skills and experience