Shubham Pandey

DevOps Engineer

Bengaluru, Karnataka, India3 yrs 2 mos experience

Highly Stable

Key Highlights

Delivered zero-downtime releases for AWS EKS workloads.
Reduced MTTR by 30% through advanced observability.
Automated workflows, cutting manual effort by 35%.

Stackforce AI infers this person is a Cloud Infrastructure and Site Reliability Engineering expert in the SaaS industry.

Contact

shubhampandey8756@gmail.com LinkedIn

Skills

Core Skills

Site Reliability EngineeringCloud InfrastructureReliability EngineeringAutomationPerformance EngineeringFull-stack DevelopmentSoftware Development

Other Skills

AWSGCPTerraformCloudFormationIncident managementRCASLA compliancecapacity planningJenkinsGitHub ActionsAnsiblePythonShell scriptingSplunkPrometheus

About

Currently working as a Software Engineer (SRE) at Xoriant, I bring 3 years of experience in cloud infrastructure, reliability engineering, and DevOps automation. My focus is on building scalable, resilient, and high-performing systems across both AWS and GCP environments. Cloud & Infrastructure: AWS, GCP, Terraform, CloudFormation Reliability & Operations: Incident management, RCA, SLA compliance, capacity planning Automation & CI/CD: Jenkins, GitHub Actions, Ansible, Python, Shell scripting Monitoring & Observability: Splunk, Prometheus, Grafana, CloudWatch Key highlights: Delivered zero-downtime releases for containerized workloads on AWS EKS. Reduced MTTR by 30% through advanced monitoring and observability. Automated workflows to cut manual effort by 35%. Ensured platform stability during high-traffic product launches. Certified as an AWS Cloud Practitioner, AWS AI Practitioner, and Google Associate Cloud Engineer, I thrive at the intersection of DevOps and SRE, driving operational excellence and scaling mission-critical systems across multi-cloud platforms.

Experience

3 yrs 2 mos

Total Experience

3 yrs 1 mo

Average Tenure

1 mo

Current Experience

Xoriant

Software Engineer

May 2026 – Present · 1 mo · Bengaluru · Hybrid

AWSGCPTerraformCloudFormationIncident managementRCA+13

Wipro

Project Engineer - Apple COE

Mar 2023 – Apr 2026 · 3 yrs 1 mo · Bengaluru · Hybrid

Site Reliability Engineering & Cloud
Managed reliability of high-traffic Apple Online Store systems handling millions of users.
Defined SLIs/SLOs and used error budgets to balance reliability and release velocity.
Built scalable AWS infra (EC2, EKS, VPC, S3, IAM) using Terraform & CloudFormation.
Ran containerized workloads on Kubernetes (EKS) with auto-scaling and zero-downtime deploys.
Implemented HPA for peak traffic handling and system stability.
Designed event-driven systems using SQS/SNS.
Observability & Reliability
Improved availability to 99.95% using Prometheus, Grafana, CloudWatch, Splunk.
Reduced MTTR by 30% via better alerting and faster detection.
Cut 40% alert noise, improving on-call efficiency.
Built dashboards for latency, errors, throughput, and infra health.
Incident & Production Support
Led outage debugging using logs, metrics, and microservices tracing.
Performed RCA, reducing recurring incidents by 20%.
Handled on-call rotations and ensured quick recovery.
Managed production readiness for high-traffic releases.
Performance Engineering
Led performance testing for 500+ backend services.
Designed load, stress, spike, soak tests using JMeter.
Simulated high traffic to find bottlenecks across app, DB, infra.
Performed capacity planning for peak readiness.
CI/CD & Automation
Optimized CI/CD (Jenkins, GitHub Actions), reducing failures by 25%.
Reduced deploy time from 15 to 6 mins.
Implemented validation, rollback, blue-green deploys.
Automated tasks using Python, Shell, Ansible.
Kubernetes & Debugging
Debugged CrashLoopBackOff, OOMKilled issues.
Managed services and ingress configs.
Troubleshot TCP/IP, DNS, load balancing issues.
Analyzed request flow to find latency bottlenecks.