Kapil Ahuja

Lead ML Engineer

Hyderabad, Telangana, India5 yrs 1 mo experience
AI EnabledAI ML Practitioner

Key Highlights

  • Expert in building scalable AI/ML platforms across multiple clouds.
  • Led development of in-house MLOps tools saving hundreds of hours.
  • Pioneered innovative LLM integration for enterprise solutions.
Stackforce AI infers this person is a Senior MLOps Engineer with expertise in SaaS and Healthcare industries.

Contact

Skills

Core Skills

MlopsKubernetesCi/cdAzureNlpDevopsData ScienceMachine Learning

Other Skills

API DevelopmentASP.NET MVCAWSAlgorithmsApache AirflowArtificial Intelligence (AI)AthenaAutomationBERT (Language Model)CloudFormationComputer VisionConvolutional Neural Networks (CNN)Data AnalysisData CleaningData Structures

About

Senior LLMOps/MLOps Engineer with 5+ years of experience building scalable AI/ML platforms across AWS, Azure, and GCP. I specialize in Kubernetes-based infrastructure, GitOps automation, CI/CD, and deploying enterprise LLM ecosystems. My work includes multi-cloud LLM integration (Gemini, Claude, GPT), LiteLLM routing, MCP tool development, and end-to-end MLOps pipelines that enable fast, reliable model deployment and automation for Data Science teams.

Experience

5 yrs 1 mo
Total Experience
1 yr 10 mos
Average Tenure
1 yr 5 mos
Current Experience

Verisk

Senior MLOps Engineer

Jan 2025Present · 1 yr 5 mos · Hyderabad, Telangana, India

  • Developed 100+ MCP tools and custom LLM agents to automate AWS operations (ECS, ECR, IAM, DynamoDB, CloudFormation), saving hundreds of manual hours.
  • Set up scalable Jenkins-on-EKS with custom Docker agents, Bitbucket webhooks, Grafana/Prometheus monitoring, and security scanning using Trivy.
  • Integrated MCP server with internal MLOps Chatbot, enabling LLM-powered self-serve infra and troubleshooting support.
  • Designed CloudFormation-based CronJob compute patterns using ASG + Capacity Providers to run high-scale workloads on demand.
  • Integrated Kubecost/OpenCost across EKS environments to give teams deep cost visibility and spot instance insights.
  • Automated deployment of platform components like Airflow, LiteLLM, Langfuse, Prometheus, Trivy Operator using Helm + ArgoCD, improving reliability and speeding up delivery.
  • Enabled automated weekly execution of Azure AI Agents with Bing Search using Airflow.
AWSKubernetesDockerGitOpsCI/CDMLOps+3

Chubb

2 roles

Senior MLOps Engineer

Promoted

Apr 2024Jan 2025 · 9 mos · Hyderabad, Telangana, India

  • Headed the global MLOps team, providing leadership and mentoring to team members, including onboarding new joiners and guiding them to success.
  • Pioneered the development and organizational adoption of an in-house Chubb GPT API, leveraging Microsoft Azure’s OpenAI models to drive innovation across the company.
  • Led the strategic migration from Azure 1.9 to Azure 2.0, collaborating with DevOps teams to design and implement new cloud infrastructure, cognitive resources, and updated network and security frameworks.
  • Led the end-to-end deployment of over 30 projects into production environments, utilizing Docker, Azure Kubernetes Service (AKS), and Jenkins to ensure smooth and successful rollouts.
AzureDockerKubernetesJenkinsMLOpsAPI Development

MLOps Engineer

Mar 2022Mar 2024 · 2 yrs · Hyderabad, Telangana, India

  • Achieved ~800 manual hours savings and ~$50K annual costs savings through creation of In-house MLOps Platform for ML/DL projects from ground up to bring down the deploying, monitoring and scaling of models time to less than 15 mins using Azure services like AKS, Azure Application Insights, Jenkins(CI/CD), Kubernetes and Docker.
  • Worked on Pre-processing of Un-Structured Data with 11million records and implemented NLP Model from Hugging Face like Transformer Question-Answering (Roberta), Summarization (longformer-base-4096) and Doc-Classification(distibert-base-uncased) model and fine-tuned the pre-trained transformers Model for multiple ML/DL use-cases.
  • Responsible for creating asynchronous API Wrapper using FastAPI framework, code refactoring, building of Docker images and deployment on Azure Kubernetes(AKS) for ML/DS use cases to facilitate smooth onboarding on platform.
  • Rich stakeholder management experience, proficient in grasping the big picture, conceptualizing, developing & partnering closely with cross functional business leaders & teams.
AzureDockerKubernetesNLPFastAPIMLOps

Pitney bowes

MLOps Engineer

Mar 2021Feb 2022 · 11 mos · Pune, Maharashtra, India

  • Awarded twice with Star Award and Thumbs Up Awards within 8 months of the tenure for outstanding performance.
  • Developed and managed data pipelines using Apache Airflow on AWS to orchestrate machine learning workflows.
  • Integrated Airflow with various AWS services, including Lambda, S3, and DynamoDB, to streamline MLOps workflows and ensure reliable job execution and Implemented monitoring and alerting mechanisms in Airflow to ensure data and workflow accuracy and handle failure recovery.
  • Created and managed AWS CloudFormation templates to automate infrastructure provisioning and ensure consistency across environments.
  • Worked on DevOps environment for integrating test cases in Teams City (CI-CD tool) to automate validations for new builds and generating artifacts and HTML reports for passed and failed cases.
  • Created a customizable dashboard for better visualization and monitoring of data metrics in Grafana for API’s Performance and server performance viz. CPU Usages, Disk Usages, Network Usages, etc. Designed Framework with the help of Grafana, Influx DB, JMeter, and telegraph.
AWSApache AirflowCloudFormationGrafanaMLOpsDevOps

Artivatic.ai [insurtech & healthcare]

Machine Learning Engineer

Dec 2020Jan 2021 · 1 mo · Bangalore Urban, Karnataka, India

  • Collected the data from National Family Health Index Survey regarding sanitation and water facilities for various states, cleaning the data and exploratory data analytics (EDA) and integrating the same into django restful API.
  • Calculating the impact on Infant Mortality and Maternal Mortality due to poor sanitation, latrine facilities and water sources in India.
  • Implemented the machine learning algorithms to find the sanitation risk for each of the PIN
  • codes in a particular state and also analysis the chances of getting affected with other diseases on the basis of sanitation, latrine and water facilities.
  • Assessing the risk and Calculating the impact on public health on the basis of Air Quality Index
  • with the help of Machine Learning model.
Data AnalysisMachine LearningDjangoData Science

Education

National Institute of Technology Raipur

Computer Science

Jan 2018Jan 2021

Stackforce found 100+ more professionals with Mlops & Kubernetes

Explore similar profiles based on matching skills and experience