Hari Hud

Software Engineer

Pune City, Maharashtra, India9 yrs 7 mos experience
Most Likely To SwitchAI Enabled

Key Highlights

  • 10× performance improvements in data processing.
  • Expertise in building scalable MLOps platforms.
  • Proven track record in CI/CD and DevSecOps.
Stackforce AI infers this person is a Backend-heavy Fullstack Engineer specializing in MLOps and Cloud Infrastructure.

Contact

Skills

Core Skills

Cloud ComputingInfrastructure AutomationMlopsMachine LearningDevopsCloud OrchestrationFull Stack Development

Other Skills

AWSAutomationData ValidationSecurity ScanningPythonData AcquisitionWeb ScrapingFastAPIPostgreSQLKubernetesGolangTemporalAnsibleCI/CDDevSecOps

About

Senior Software Engineer with 10+ years of experience building scalable distributed systems, MLOps platforms, and cloud-native infrastructure at NVIDIA. Expertise in backend development, CI/CD, DevOps, Infra Automation, and large-scale data pipelines. Proven track record of delivering 10× performance improvements, reducing processing time from weeks to days, and building secure, production-grade platforms used across organizations. For the past 7+ years, I have been with Nvidia, contributing to various projects within the platform development team. Currently, I am working on building a unified MLOps platform for model development, data preparation, model training, and evaluations. My main focus is on creating an evaluation tool and evaluator microservice for assessing models like NeMo, Megatron, and LLaMA. One of my key contributions was developing ClusterForge, a modern platform built on NVIDIA's Kaizen Framework, Temporal, Go, Java, and Kubernetes. I was responsible for developing REST APIs, a workflow engine, and Ansible playbooks to manage BCP clusters on both NVIDIA's infrastructure and external data centers. I also worked on a DevSecOps project, where I built a security service to help developers identify and resolve security vulnerabilities before release. This included container scanning, open-source dependency scanning, static code analysis, secret detection, and infrastructure-as-code security checks. Another significant project I contributed to was AKUC, a non-disruptive Kubernetes upgrade controller designed to upgrade Kubernetes clusters without interrupting workloads. Additionally, I developed a managed CI/CD platform to ensure seamless integration, deployment, security, and testing of applications throughout their lifecycle. I have experience in developing command-line interfaces (CLIs) using various programming languages and implementing infrastructure as code. Before joining Nvidia, I spent 3 years at GSLab Pune as a Cloud Orchestration Engineer, where I focused on developing workflows for VM lifecycle management using Python, Django, OpenStack, and VMware vRA/vRO. Overall, I am passionate about software development and have a strong background in platform development, cloud orchestration, CI/CD, infrastructure as code, MLOps, and DevOps.

Experience

9 yrs 7 mos
Total Experience
4 yrs 9 mos
Average Tenure
6 yrs 6 mos
Current Experience

Nvidia

3 roles

Senior System Software Engineer, AI

Mar 2026Present · 3 mos

  • • Building an event-driven automation system to streamline vendor data intake by automating AWS infrastructure provisioning, security scanning, data validation workflows, data transfer, and dataset catalog registration.
AWSAutomationData ValidationSecurity ScanningCloud ComputingInfrastructure Automation

Senior System Software Engineer, LLM MLOps & Speech AI

Promoted

Jun 2023Feb 2026 · 2 yrs 8 mos

  • Built a unified LLM & RAG evaluation framework at NVIDIA, streamlining fragmented tools into a single extensible platform. Supports academic/custom benchmarks, LLM-as-a-judge, safety/security evals, and diverse model backends (NeMo, HuggingFace, Llama, OpenAI, DeepSeek, etc.).
  • Worked on data acquisition from external vendors for Physical AI and robotics training, and on web scraping pipelines for LLM training datasets.
  • Led the design and development of NVIDIA Eval Factory — a unified platform to build, certify, and operationalize standardized LLM evaluation benchmarks across the company.
  • Built an automated ASR evaluation pipeline on Kratos (Kubeflow), reducing evaluation time from 2–3 weeks to just 1–2 days.
  • Drove a 90%+ reduction in dataset processing time by parallelizing the TTS data pipeline (leveraging vertical and horizontal scaling on large-scale GPU infrastructure) and enabling archive-based uploads. This optimization allowed 28 English datasets to be processed in just 2–3 days instead of months, a 10× improvement. This pipeline was later used to process 100+ datasets across multiple languages.
  • Designed and developed an Evaluator Microservice using Python FastAPI and PostgreSQL to externalize API-based LLM evaluation capabilities for enterprise users.
  • Created an OpenAI API-compatible Inference API Server for NVIDIA's TensorRT-LLM Engine.
  • Improved evaluation observability and reliability by designing and implementing a progress tracking solution within the Evaluator Microservice.
  • Designed and developed CI/CD pipelines and testing frameworks for MLOps projects.
  • Integrated the LLM eval-tool into customers' CI/CD pipelines to automate model evaluations.
  • Authored the NTech paper titled "NeMo Eval Tool: A Unified Evaluation Framework for LLMs and Retrievers/RAGs," which was selected as a Software Paper for NVIDIA's NTech India presentation
  • Filed and received acceptance for U.S. patent for Evaluator Microservice
PythonMLOpsData AcquisitionWeb ScrapingFastAPIPostgreSQL+1

System Software Engineer

Oct 2019May 2023 · 3 yrs 7 mos

  • 🐳 Kubernetes & Infrastructure Management
  • Engineered a zero-impact Kubernetes upgrade controller that successfully upgraded 100+ clusters (e.g., v1.18 to v1.22) without disrupting long-running deep learning pods
  • Built a scalable workflow engine using Temporal and Golang to automate infrastructure provisioning, Kubernetes setup, and workload deployments, supporting 100+ concurrent task execution
  • Developed a web service to streamline cluster/application deployments, cutting Nvidia BCP cluster provisioning time from days to hours
  • Automated Kubernetes deployment on NVIDIA GPU platforms (H100, A100, DGX) using Infrastructure-as-Code
  • Developed an autopilot for automatic Kubernetes SSL certificate renewal. It reduced the number of broken clusters caused by expired certificates
  • Developed Helm charts for Kubernetes application deployments
  • 🚀 CI/CD & DevSecOps Platform Development
  • Designed and built a security service that helps developers detect and remediate security vulnerabilities before release, including container scans, dependency scans, static code analysis, secret detection, and IaC security checks
  • Built a managed CI/CD platform to ensure seamless integration, deployment, security, and testing of applications throughout their lifecycle
  • Created a test framework for the DevSecOps service using PyTest and GitlabCI, which reduced manual testing time by 93%
  • Developed Jenkins and GitLab CI pipelines to automate DevOps & CICD operations
  • ⚙️ Automation & Tooling
  • Developed a Host State Management solution for software deployment in air-gapped and non-air-gapped environments
  • Created a self-service framework for tenant onboarding to AWX(Ansible Tower), reducing manual onboarding time and boosting customer satisfaction
  • To increase speed of cluster operation and minimize downtime due to manual errors, developed automation for infra and k8s management
  • Built a CLI for HashiCorp Vault secret migration to enable secret management in external Data Centers
KubernetesGolangTemporalAnsibleCI/CDDevSecOps+2

Gs lab

2 roles

Sr. Software Engineer, Cloud Orchestration

Promoted

Jan 2019Sep 2019 · 8 mos · Pune, India

  • VMware vRA/vRO workflow and pattern development.
  • Designed and built a Python-Django based web application to orchestrate various IT applications, which helped in reducing VM provisioning efforts from 1-2 days to 2-3 hrs.
  • Deployed scalable containerized web applications with Docker and Kubernetes.
  • Designed and built restful services with Django rest framework.
  • Developed microservice based solutioning project using Python, Django and Kubernetes.
  • Hands-on experience in AWS Cloud - EC2, S3, VPC, ELB, IAM, CloudFront, Route53, APIs
  • Developed vRO Workflows, XaaS Blueprints and Day2 Actions with vRealize Automation to orchestrate middlewares and databases installation and configuration such as IBM WebSphere, MQ, SQL, F5, and Cassandra.
PythonDjangoVMware vRA/vRODockerKubernetesCloud Orchestration

Software Engineer, Full Stack

Jul 2016Dec 2018 · 2 yrs 5 mos · Pune, India

  • Worked in Cloud orchestration team, where my primary responsibility was to roll out new features and the technology stack used is Python, Django, vRA/vRO, Chef, Docker, OpenStack and Web Technologies.
  • Developed OpenStack volume management tool with Python and Django.
  • Written Python scripts for system and service management tools integration like SNOW, TSM, etc.
  • Built Python module for HashiCorp Vault Integration with IBM Cloud Orchestrator.
  • Experience of integrating vRA/vRO to other systems using Plug-Ins and APIs.
  • Experience with creating/maintaining vRA blueprints, catalogs and entitlements, etc.
  • Implemented request queuing mechanism in vRO to handle concurrent requests.
  • Chef cookbooks and OpenStack Heat development.
  • Extended Django authentication middleware to use OpenStack authentication.
  • Developed Outlook plugin to process ICO Inbox assignments.
  • Provided deployment support for customer and worked on defect fixing.
PythonDjangoOpenStackDockerFull Stack Development

Education

Shri Guru Gobind Singhji Institute of Engineering and Technology, Vishnupuri, Nanded

B.Tech — Information Technology

Aug 2013May 2016

PES POLYTECHNIC, Chhatrapati Sambhajinagar (Aurangabad, MH)

Diploma — Computer Science

Aug 2010Jun 2013

ZPPS Gadiwat, Chhatrapati Sambhajinagar (Aurangabad, MH)

SSC

Jun 2000Jun 2010

Stackforce found 100+ more professionals with Cloud Computing & Infrastructure Automation

Explore similar profiles based on matching skills and experience