K

Kaivalya Dabhadkar

Senior Software Engineer

Bengaluru, Karnataka, India4 yrs 1 mo experience
AI Enabled

Key Highlights

  • Expert in GPU fault tolerance for AI training.
  • Experience with Azure Kubernetes and hybrid cloud technologies.
  • Contributed to open-source GPU monitoring tools.
Stackforce AI infers this person is a skilled engineer in AI Infrastructure and Cloud Computing.

Contact

Skills

Core Skills

Artificial Intelligence (ai)KubernetesCloud Computing

Other Skills

Distributed ComputingDistributed SystemsDockerGolangReactSpring framework

About

Working on making GPU clusters fault tolerant for training AI. Previously worked on the Azure Kubernetes Edge computing service. Interested in AI Infrastructure, AI training/Inference, Cloud Computing and Distributed Systems. I maintain a blog where I write about Math and AI: https://kaivalya1997.github.io/blog I have professional work experience on technologies like Golang, Kubernetes, Docker, React and the Spring framework.

Experience

Full-time

DigiCert, Inc.

Present

1 yr 6 mos

Publicis Sapient

Present

Nvidia

Senior Software Engineer

Mar 2024Present · 2 yrs · Bengaluru, Karnataka, India · Hybrid

  • Working in DGX GPU cloud team for GPU monitoring and fault tolerance for AI training/inference on GPU clusters. Contributor to the open-source GPU fault detection and remediation system called NVSentinel (https://github.com/NVIDIA/NVSentinel)
Artificial Intelligence (AI)KubernetesCloud ComputingGolangDistributed SystemsDistributed Computing

Microsoft

Software Engineer 2

Jan 2022Feb 2024 · 2 yrs 1 mo · Bengaluru, Karnataka, India

  • A part of the Azure Kubernetes Edge team. Working with Azure stack HCI (Hyper-converged infrastructure) and hybrid cloud tech.
KubernetesGolangDistributed SystemsDistributed ComputingCloud Computing

Stackforce found 100+ more professionals with Artificial Intelligence (ai) & Kubernetes

Explore similar profiles based on matching skills and experience