Shubham Jain

SRE (Site Reliability Engineer)

Delhi, India7 yrs 10 mos experience
Highly Stable

Key Highlights

  • Expert in cloud infrastructure and DevOps practices.
  • Proven track record in automation and optimization.
  • Strong leadership in large-scale systems management.
Stackforce AI infers this person is a SaaS Infrastructure Engineer with expertise in cloud and DevOps automation.

Contact

Skills

Core Skills

Cloud InfrastructureDevopsMonitoringWeb Development

Other Skills

Technical LeadershipAzure DevOpsCI/CDPrometheusLogstashGrafanaJupyterHubAutomationLarge Scale SystemsServer ArchitecturePythonWeb InfrastructureKubernetesLinuxJenkins

About

SRE with relevant experience having a deep knack and passion for automation, dedicated to optimization and understand the melding of operations and development to quickly deliver code to customers. I have experience with the Cloud and monitoring processes as well as DevOps development. My experiences include using tools and platforms like Jenkins, Python, Docker, GCP, Saltstack, ELK, and Terraform, with deep knowledge of AWS cloud infrastructure and Kubernetes.

Experience

7 yrs 10 mos
Total Experience
1 yr 10 mos
Average Tenure
5 mos
Current Experience

Nvidia

Senior Site Reliability Engineer

Dec 2025Present · 5 mos · India

Microsoft

2 roles

Senior Software Engineer

Promoted

Mar 2025Dec 2025 · 9 mos · Noida, Uttar Pradesh, India

Software Engineer 2

Jan 2022Mar 2025 · 3 yrs 2 mos · Noida, Uttar Pradesh, India

  • Automated CI/CD pipelines, version control, and artifact management using Azure DevOps.
  • Set and monitor SLOs and SLIs to measure and maintain the reliability of services.
  • Responsible for the management of Azure cloud infrastructure, ensuring high availability and reliability for critical production environments.
  • Responsible for the design and execution of robust BCDR (Business Continuity and Disaster Recovery) plans to ensure uninterrupted system availability.
  • Being in the Microsoft Defender team, I excel in conducting audits, ensuring compliance, and enhancing system security through best practices.
  • Established the three pillars of observability by developing and implementing a tailored solution through the integration of open-source tools like Prometheus, Logstash, Grafana, and Microsoft internal tools, including Geneva and Jarvis, to address specific operational requirements.
  • Led the deployment and maintenance of the JupyterHub platform, ensuring a scalable and reliable environment for security analysts/hunters. Implemented robust automation, user provisioning, and monitoring solutions to enhance platform performance and availability.
Technical LeadershipCloud InfrastructureDevOps

Olx group

Site Reliability Engineer

May 2021Jan 2022 · 8 mos · Gurugram, Haryana, India · On-site

  • Setting up Prometheus Openmetrics Integration in New Relic.
  • Implementation of Kubernetes Integration with New Relic.
  • ETL data pipeline setup using AWS Athena and Glue.
  • GitLab Pipeline setup.
  • Migration of Applications from one AWS region to another with zero downtime.
Large Scale SystemsServer ArchitecturePythonWeb InfrastructureTechnical LeadershipKubernetes+4

Tokopedia

DevOps Engineer

Feb 2020May 2021 · 1 yr 3 mos · Noida, Uttar Pradesh, India

  • Deployment of Machine learning models using Seldon-core in GKE clusters.
  • Configuring HPA for Pods as well as autoscaling for node pools.
  • Centralized ELK stack for logging of all machine learning models.
  • Prometheus server along with Grafana for monitoring of models and Kubernetes
  • cluster as well.
  • Jenkins Pipeline Groovy: Shared Libraries
  • Global shared groovy libraries defined in the Github repository and loaded into
  • existing pipelines using common Jenkinsfile.
  • Jenkins Master Slave setup on Kubernetes.
  • Canary Setup using Flagger.
Technical LeadershipCloud InfrastructureDevOps

Paytm payments bank

DevOps Engineer

Jul 2018Feb 2020 · 1 yr 7 mos · Noida

  • Implementing fully functional CI/CD pipeline using Jenkins.
  • Managing AWS cloud infrastructure including EC2, S3, ELB, Route53. Highly available infrastructure for production environment.
  • Automating day to day and ad hoc tasks.
  • Setting up build and deployment automation for terraform scripts using Jenkins.
  • Server configuration management by Saltstack.
  • Deployement and operation of ELK stack (ElasticSearch, Logstash, Kibana).
  • Working on software containerization platforms like Docker and container orchestration tools like Kubernetes.
  • Implement a load balanced, highly available, fault tolerant kubernetes infrastructure.
  • Designed and deployed a patch management system using Katello.
  • Installing, setup and configuring Apache Kafka and Apache Zookeeper.
Technical LeadershipCloud InfrastructureDevOps

Steller inc.

Intern

May 2017Jun 2017 · 1 mo · Greater Delhi Area

  • Have developed Tours and Travels Management System (A web-app), using Python, Django, and Mysql.
  • This allows users to view all the tour packages available, as well as the itinerary of each of the tour along with the trek essentials and precautions measures if any. Users can also subscribe to the newsletter by registering themselves through their Email ids.

Education

KRISHNA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GHAZIABAD

Bachelor of Technology - BTech — Computer Science

Jan 2014Jan 2018

Stackforce found 100+ more professionals with Cloud Infrastructure & Devops

Explore similar profiles based on matching skills and experience