Tarun Sharma

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India5 yrs 2 mos experience

Highly Stable

Key Highlights

5+ years of experience in Site Reliability Engineering.
8 industry certifications across AWS, Azure, and GCP.
Expertise in managing multi-tenant cloud environments.

Stackforce AI infers this person is a Cloud Infrastructure Engineer with a focus on Site Reliability Engineering.

Contact

Skills

Core Skills

Site Reliability EngineeringDistributed SystemsInfrastructure As CodeDevops

Other Skills

Incident ManagementMonitoring & AlertingAutomationAWSCost OptimizationAzureCloud NetworkingTerraformGitNetworking (TCP/IP, DNS, HTTP)Amazon Web Services (AWS)KubernetesContinuous Integration and Continuous Delivery (CI/CD)Systems DesignService Level Objectives (SLO)

About

I am a Site Reliability and Cloud Engineer with 5+ years of experience operating and improving reliability for production-grade cloud systems at scale. Currently working at Amazon (AWS Managed Services), I focus on maintaining highly available, scalable infrastructure while reducing operational toil through automation and standardization. My work revolves around: • Incident management, root cause analysis, and postmortems • Improving system reliability through monitoring, alerting, and observability • Designing and operating distributed systems on AWS • Infrastructure as Code (Terraform, CloudFormation) • Performance optimization and cost efficiency I have hands-on experience managing multi-tenant cloud environments, handling critical production incidents, and collaborating across teams to ensure high availability and operational excellence. I hold 8 industry certifications across AWS, Azure, and GCP, and I am actively focused on advancing deeper into Site Reliability Engineering, particularly for large-scale distributed systems. I am open to global opportunities in SRE and production engineering roles.

Experience

5 yrs 2 mos

Total Experience

1 yr 11 mos

Average Tenure

1 yr 3 mos

Current Experience

Amazon web services (aws)

AWS Cloud Operations Engineer (AWS Managed Services)

Mar 2025 – Present · 1 yr 3 mos · Bengaluru

Managed production cloud environments within AWS Managed Services (AMS), ensuring high availability and reliability for customer workloads
Handled critical production incidents, performed root cause analysis (RCA), and contributed to postmortems to prevent recurrence
Reduced mean time to resolution (MTTR) by ~30% by improving monitoring, alerting, and runbook standardization
Implemented proactive monitoring using CloudWatch, identifying potential failures before customer impact
Automated repetitive operational tasks, reducing manual effort and improving operational efficiency
Supported multi-tenant distributed systems, ensuring scalability, performance, and security compliance
Collaborated with cross-functional teams to improve deployment reliability and system resilience

Site Reliability EngineeringDistributed SystemsIncident ManagementMonitoring & AlertingAutomation

Rackspace technology

AWS Cloud Administrator II

Aug 2024 – Mar 2025 · 7 mos · India

Managed multi-cloud environments (AWS & Azure), ensuring stable and scalable infrastructure operations
Implemented Infrastructure as Code (Terraform, CloudFormation) to standardize deployments and reduce configuration drift
Created SOPs and operational runbooks, improving consistency and reducing incident resolution time
Handled live customer incidents and troubleshooting during calls, ensuring minimal downtime and rapid recovery
Performed system patching, resource provisioning, and infrastructure modifications across production environments
Contributed to cost optimization initiatives by identifying underutilized resources and improving resource allocation
Worked with internal platforms (Encore, Core, Raincheck) for ticketing, monitoring, and issue resolution

Site Reliability EngineeringDistributed SystemsInfrastructure as CodeCost Optimization

Dxc technology

2 roles

AWS Cloud Engineer

Apr 2021 – Aug 2024 · 3 yrs 4 mos

Designed and managed scalable cloud infrastructure for multiple clients, ensuring high availability and performance
Implemented DevOps practices using Terraform and Git, enabling automated and repeatable deployments
Built and optimized cloud networking architectures to improve system reliability and connectivity
Reduced infrastructure costs through resource optimization and efficient architecture design
Automated operational workflows, improving deployment speed and reducing manual intervention
Collaborated with cross-functional teams to deliver secure, scalable, and production-ready systems

Site Reliability EngineeringDistributed SystemsDevOpsCloud Networking