Arpit Singh

Senior Software Engineer

Sunnyvale, California, United States13 yrs 2 mos experience
Most Likely To SwitchAI Enabled

Key Highlights

  • Over a decade of experience in AI and distributed systems.
  • Contributed to Kubernetes as an open-source contributor.
  • Holds two granted patents and seven pending.
Stackforce AI infers this person is a Cloud Computing and AI Infrastructure expert with extensive experience in distributed systems.

Contact

Skills

Core Skills

Ai PlatformKubernetesMl TrainingApi InfrastructureApi ManagementNetworking

Other Skills

API designLLM evaluationsRAG evaluationsJob controllerScheduleretcd managementDockerTCP/IPUDPHigh AvailabilityArtificial Intelligence (AI)Machine LearningOperating SystemsAlgorithmsDistributed Systems

About

I work as an Applied Machine Learning Engineer in the Artificial Intelligence Research organization at NVIDIA, specializing in AI platform and building scalable services for large-scale deep learning applications. My recent work includes agentic evaluations, retrieval-augmented generation (RAG) evaluations, and advanced LLM evaluation frameworks to assess accuracy and performance. In the last five years, I have also specialized in artificial intelligence infrastructure on distributed systems at NVIDIA, focusing on building services for large-scale deep learning model training. I have over a decade of experience spanning Cisco, Nutanix, and NVIDIA, bringing a wealth of expertise in AI infrastructure, networking, and distributed systems. I hold two granted patents and have seven more pending, and I have contributed to Kubernetes as an open-source contributor, including work that shipped in production releases. Academically, I hold dual master’s degrees—one in Computer Science from Stony Brook University and another in Electronics and Communication from IIT Roorkee—and I am currently pursuing graduate professional courses in AI from Stanford University. I believe that the breadth of my experience across AI infra, networking, and compute, combined with the depth of my expertise in each domain, enables me to wear many hats and deliver impact across diverse organizational and business needs. Key Strengths: LLM evaluations, AI infra, Distributed systems, Operating systems generalist, API design, Kubernetes, Docker containers, Envoy Languages : Golang, Python, C/C++, bash Patents: 1) https://patents.google.com/patent/US20190235900A1 2) https://patents.google.com/patent/US20200150950A1/en 3) https://patents.google.com/patent/US20200045116A1

Experience

13 yrs 2 mos
Total Experience
3 yrs 3 mos
Average Tenure
4 yrs 11 mos
Current Experience

Nvidia

2 roles

Senior Software Engineer

Mar 2025Present · 1 yr 3 mos · On-site

  • AI Research Engineering organisation. NeMo Platform
  • Owns Evaluator microservice on NeMo LLM Platform
  • Build and contributed to API and workflows to execute evaluations for LLM endpoints and RAG pipelines
  • Build evaluation workflows for academic as well as custom evaluations
  • Implemented workflow of Agentic evaluations based on RAGAS metrics
  • Implemented evaluations prechecks and orchestration mechanisms of evaluation
  • on kubernetes platform
AI platformAPI designKubernetesLLM evaluationsRAG evaluations

Senior Software Engineer

Jul 2021Mar 2025 · 3 yrs 8 mos · On-site

  • Deep learning infrastructure team. Building compute services on top of kubernetes.
  • Owns components for end to end ML training control plane including job controller , scheduler etc. Abstracted logic of podgroup functionality into webhook extender.
  • Build support of Logical cluster partition feature in scheduler
  • Enabled perf. improvement & scaling of job controller to 1000+ jobs burst.
  • Implemented align-by-socket feature on Kubelet for better CPU/GPU alignment.
  • Benchmark studies of Nvidia A100 in various NPS(numa node per socket) config.
KubernetesML trainingJob controllerScheduler

Nutanix

2 roles

Senior Member Of Technical Staff

Feb 2020Jul 2021 · 1 yr 5 mos · San Francisco Bay Area

  • Part of Cloud Platform team. Building PaaS platform based on Kubernetes . Owns etcd management on kubernetes, API infra, Platform Upgrade.
  • Working on kubernetes deployment engine on Nutanix platform
  • Owned platform upgrade.
  • Mentored 2+ software engineers
  • Worked as torchbearer and Point of Contact
  • Demoed Load balancer as a Service using envoy
KubernetesAPI infrastructureetcd management

Member of Technical Staff

Aug 2016Feb 2020 · 3 yrs 6 mos · San Francisco Bay Area

  • Part of Cloud Platform and Core-Infra team. Part of team building Kubernetes based application deployment platform. Owns etcd on kubernetes, swagger based API, registry management and application deployment workflows.
  • Working on kubernetes deployment engine on Nutanix platform
  • Designed and developed 1-Click Memory upgrade of cluster.
  • Build and maintained docker ecosystem on Nutanix management platform
  • Worked as torchbearer and Point of Contact
  • Part of team winning Hackathon in 2016 for building prototype of NFV and in
  • 2017 for go based read-only pluggable shell
KubernetesDockerAPI management

Cisco systems

Software Engineer II

Jun 2012Jan 2015 · 2 yrs 7 mos · Bengaluru Area, India

  • Network Operating System Technology Group
  • Owned Transports technology consisting of TCP, UDP, High Availability, Sockets and
  • IPC of Cisco IOS. Implemented Adjust-MSS and POSIX compliance feature.
  • Led maintenance eforts and ensured zero bug backlog
  • Managed the quarterly release for 3 cycles and resolved 40+ customer issues
  • Worked as Domain Expert, Point of Contact and mentored two new hires
  • Recipient of two AMAZE Awards and two INSPIRE awards for ensuring customer
  • satisfaction, on-time delivery and helping support engineers
TCP/IPUDPHigh AvailabilityNetworking

Indian institute of technology, roorkee

Teaching Assistant

Aug 2011May 2012 · 9 mos · Roorkee India

  • Course Taken - Basic Electronics under Prof N.P.Pathak
  • I have helped professor in designing as well as evaluating assignments. I also helped students to solve problems and understand concepts

Education

Stanford University

Graduate certificate in Artificial Intelligence[Professional degree][June 2023 - Present] — Artificial Intelligence

Jun 2023Present

Stony Brook University

Master’s Degree — Computer Science

Jan 2015Jan 2016

IIT Roorkee

M.Tech — Radio Frequency And Microwave Engineering

Jan 2010Jan 2012

Uttar Pradesh Technical University

Bachelor of Technology - BTech

Jan 2006Jan 2010

Stackforce found 100+ more professionals with Ai Platform & Kubernetes

Explore similar profiles based on matching skills and experience