Thatode Sai Surya Abhishek

Lead ML Engineer

Bengaluru, Karnataka, India6 yrs 10 mos experience
AI EnabledAI ML Practitioner

Key Highlights

  • 6+ years in machine learning systems and infrastructure.
  • Expert in building scalable ML pipelines and GenAI services.
  • Strong collaborator with cross-functional teams.
Stackforce AI infers this person is a Machine Learning Engineer with expertise in MLOps and cloud infrastructure across Healthcare and Fintech.

Contact

Skills

Core Skills

MlopsMachine LearningDevopsCloud Computing

Other Skills

API DevelopmentAWSAlgorithmsAnnotationAutomated Machine Learning (AutoML)AzureAzure AutomationAzure Data FactoryAzure DatabricksAzure DevOpsAzure DevOps ServerAzure FunctionsAzure Infrastructure as a Service (IaaS)Bayesian methodsBootstrap

About

"Design, Develop and Deploy" - Lead ML Platform Engineer with 6+ years of experience designing and deploying machine learning systems and infrastructure across healthcare, fintech, and analytics domains. Proven expertise in building scalable, production-grade ML pipelines and LLM-powered GenAI services. Strong collaborator with cross-functional teams, including business stakeholders, data scientists, and platform engineers. Comfortable working in client-facing and fast-paced consulting environments as well as product organizations.

Experience

6 yrs 10 mos
Total Experience
1 yr
Average Tenure
1 yr 7 mos
Current Experience

Athenahealth

Lead Member of Technical Staff - Machine Learning Platform

Nov 2024Present · 1 yr 7 mos · Bengaluru, Karnataka, India

  • Responsible for the development and design of the ML platform infrastructure using AWS (EKS, S3, Kubeflow, KServe), improving the deployment speed of classical ML models by 15% QoQ.
  • Collaborate with cross-functional stakeholders, including product and data science teams, to build scalable MLOps solutions addressing business needs.
  • Acted as SME for MLOps and GenAI best practices, mentoring 10+ teams and facilitating internal tech sessions on ML deployment workflows and toolchains.
  • Established rigorous model validation workflows, including AUC, precision-recall tracking, as well as offline and online performance monitoring with Prometheus and Grafana.
  • Led the development of the LLM AI gateway for GenAI applications supporting the use of regulated healthcare data in bespoke use cases, optimizing the AI LLM gateway by leveraging LiteLLM and custom infrastructure, thereby unlocking 50+ teams across the organization to access 10+ models in a secure and scalable environment.
AWSMLOpsGenAIModel ValidationCross-functional CollaborationMachine Learning

Razorpay

Software Engineer - Machine Learning Ops Engineer

Jun 2023Nov 2024 · 1 yr 5 mos · Bengaluru, Karnataka, India · Hybrid

  • Revamped the MLOps infrastructure for Razorpay, streamlining model development processes and reducing turnaround times for mission-critical models by 45%, resulting in increased efficiency and faster time to market.
  • Developed and implemented automated data preprocessing scripts to integrate the DataRobot environment with the transaction risk ML model, resulting in a 10-fold reduction in fraudulent transactions within the Razorpay environment.
  • Fine-tuned H2O models for Return-to-Origin (RTO) prediction and transaction risk, addressing high-cardinality, imbalanced classification problems similar to CTR and conversion modeling. Improved deployment efficiency by 90% and built custom API microservices using the H2O Wave SDK for scalable, production-grade inference.
  • Spearheaded the development of the Fraud Detection Model microservice component for Razorpay, developed in Golang, and fine-tuned it for deployment, which improved the fraud detection metric by 5% to 97% at the product level.
  • Designed the LLMOps pipelines repurposing open-source ML tools for GenAI solutions at Razorpay, which effectively reduced the dependencies for merchants by providing tailor-made solutions in a self-serve environment and increased operational capabilities by 35%.
MLOpsDataRobotH2OAPI DevelopmentGenerative AIMachine Learning

Lyric

Software Engineer - Machine Learning Platform

Dec 2022May 2023 · 5 mos · Bengaluru, Karnataka, India · Remote

  • Defined end-to-end ML model life cycle by architecting Infrastructure on Elastic Kubernetes Service integrated with Prometheus, Grafana, and AWS Sagemaker, consequently optimizing the ML deployment turnaround time by 60%.
  • Conducted in-depth technical R&D for MLOps network capabilities, by performing a feasibility study for FEAST (Feature engineering tool) vs Tecton, and Optuna (Hyperparameter optimization) on the entire platform, optimizing feature retrieval by 50%.
  • Built unit/integration tests and incorporated load testing for model-serving APIs using Python and Postman for production readiness.
  • Programmed the MLflow custom plugin by engineering the network API calls for the plugin in Python, which improved the efficiency of the API calls by 20%.
  • Designed ML Infrastructure on AWS, leveraging EKS and ECR for continuous deployment on scalable infrastructure. Enabled private networking on AWS by designing and deploying VPCs, Load balancers, and private connections. Optimized S3 bucket components to store Model artifacts and ML features for retrieval, reducing turnaround time for the data extraction by 30%.
  • Develop and deploy monitoring automation solutions for model performance to collect the performance metrics such as AUC-ROC, precision, and recall, leveraging the Pytorch library of Python.
AWSMLOpsMLflowKubernetesData PreparationMachine Learning

Fractal

Software Engineer - Machine Learning Ops Engineer

Sep 2021Dec 2022 · 1 yr 3 mos · Bengaluru, Karnataka, India

  • Overhauled the network service mesh to enable pod-to-pod encrypted communication in the Kubernetes cluster with encrypted ingress and egress traffic, restructured the SSL certificates and DNS hostnames, which improved the traffic throughput and security by 80%.
  • Streamlined the build process of Docker images for applications by implementing CI/CD automation pipelines in Azure DevOps, which decreased image deployment time by 5 minutes.
  • Redesigned the ML infrastructure on AWS, leveraging AWS Sagemaker and implemented the model deployment strategy into production via Kubeflow on EKS clusters and MLflow for model version control.
  • Prepared in-depth technical documentation for everything, took ownership to release documentation for 10 projects as part of the overall deliverables.
  • Collaborated with non-technical stakeholders and consulting teams to align ML solutions with business strategy.
AWSKubernetesMLflowDevOpsData PreparationMLOps+1

Neudesic technologies private limited

3 roles

Consultant 2

Mar 2021Sep 2021 · 6 mos

KubernetesDevOps

Consultant 1

Jul 2020Mar 2021 · 8 mos

KubernetesDevOps

Associate Consultant

Jul 2019Jul 2020 · 1 yr

KubernetesDevOps

Microsoft

2 roles

Cloud Devops Consultant 2 (Vendor Partner)

Mar 2021Sep 2021 · 6 mos

  • 1) Designed and demonstrated the latest industry standards for Azure Kubernetes Service (AKS) security and directly contributed to more than 5 design principles that were standardized across the whole engagement.
  • 2) Mentored and initiated multiple new team members and then became one of the point people for helping other teams across 14+ migration units to achieve migration goals set overall at the program level.
  • 3) Maintained and lead the daily scrum calls and collaborated with the team following the latest Agile methodologies.
  • 4) Equipped every AKS-based application with extensive monitoring and logging capabilities using an arsenal of tools from kibana, Azure log analytics workspace to Prometheus, grafana, and Azure application insights.
AzureKubernetesDevOpsCloud Computing

Cloud DevOps Consultant 1 (Vendor Partner)

Dec 2019Mar 2021 · 1 yr 3 mos

  • 1) Accelerated the existing CI-CD pipelines for a major Canadian insurance company that partnered up with Microsoft, which boosted the pipeline runtime by ~ 5% and decreased the wait time by ~ 5%.
  • 2) Achieved Successful cloud migration of more than 5+ applications to the Azure platform for a major telecommunications giant. Spearheaded the development and delivery of the CI-CD {Continuous Integration and Continuous Deployment} pipelines for these applications which were based on various technologies such as Java, Python, SQL, LAMP stack, MEAN stack, and more.
  • 3) Consequentially, the time required for the application teams to deploy the latest code drops reduced by ~ 80%, increasing their productivity and decreasing administrative overheads required for maintaining infrastructure and the applications.
  • 4) Successfully implemented and Expedited the delivery of complex cloud infrastructure solutions using industry-leading standards in networking and security principles for the cloud platform, which resulted in getting myself promoted to the current designation with honors within 12 months.
AzureDevOpsCloud Computing

Education

International Institute of Information Technology Bangalore

Postgraduate Degree — Data Science

Mar 2022Mar 2023

BMS Institute of Technology and Management

Bachelor of Engineering — Computer science and engineering

Jan 2015Jan 2019

Stackforce found 100+ more professionals with Mlops & Machine Learning

Explore similar profiles based on matching skills and experience