Tarun Sharma

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India5 yrs 2 mos experience
Highly Stable

Key Highlights

  • 5+ years of experience in Site Reliability Engineering.
  • 8 industry certifications across AWS, Azure, and GCP.
  • Expertise in managing multi-tenant cloud environments.
Stackforce AI infers this person is a Cloud Infrastructure Engineer with a focus on Site Reliability Engineering.

Contact

Skills

Core Skills

Site Reliability EngineeringDistributed SystemsInfrastructure As CodeDevops

Other Skills

Incident ManagementMonitoring & AlertingAutomationAWSCost OptimizationAzureCloud NetworkingTerraformGitNetworking (TCP/IP, DNS, HTTP)Amazon Web Services (AWS)KubernetesContinuous Integration and Continuous Delivery (CI/CD)Systems DesignService Level Objectives (SLO)

About

I am a Site Reliability and Cloud Engineer with 5+ years of experience operating and improving reliability for production-grade cloud systems at scale. Currently working at Amazon (AWS Managed Services), I focus on maintaining highly available, scalable infrastructure while reducing operational toil through automation and standardization. My work revolves around: • Incident management, root cause analysis, and postmortems • Improving system reliability through monitoring, alerting, and observability • Designing and operating distributed systems on AWS • Infrastructure as Code (Terraform, CloudFormation) • Performance optimization and cost efficiency I have hands-on experience managing multi-tenant cloud environments, handling critical production incidents, and collaborating across teams to ensure high availability and operational excellence. I hold 8 industry certifications across AWS, Azure, and GCP, and I am actively focused on advancing deeper into Site Reliability Engineering, particularly for large-scale distributed systems. I am open to global opportunities in SRE and production engineering roles.

Experience

5 yrs 2 mos
Total Experience
1 yr 11 mos
Average Tenure
1 yr 3 mos
Current Experience

Amazon web services (aws)

AWS Cloud Operations Engineer (AWS Managed Services)

Mar 2025Present · 1 yr 3 mos · Bengaluru

  • Managed production cloud environments within AWS Managed Services (AMS), ensuring high availability and reliability for customer workloads
  • Handled critical production incidents, performed root cause analysis (RCA), and contributed to postmortems to prevent recurrence
  • Reduced mean time to resolution (MTTR) by ~30% by improving monitoring, alerting, and runbook standardization
  • Implemented proactive monitoring using CloudWatch, identifying potential failures before customer impact
  • Automated repetitive operational tasks, reducing manual effort and improving operational efficiency
  • Supported multi-tenant distributed systems, ensuring scalability, performance, and security compliance
  • Collaborated with cross-functional teams to improve deployment reliability and system resilience
Site Reliability EngineeringDistributed SystemsIncident ManagementMonitoring & AlertingAutomation

Rackspace technology

AWS Cloud Administrator II

Aug 2024Mar 2025 · 7 mos · India

  • Managed multi-cloud environments (AWS & Azure), ensuring stable and scalable infrastructure operations
  • Implemented Infrastructure as Code (Terraform, CloudFormation) to standardize deployments and reduce configuration drift
  • Created SOPs and operational runbooks, improving consistency and reducing incident resolution time
  • Handled live customer incidents and troubleshooting during calls, ensuring minimal downtime and rapid recovery
  • Performed system patching, resource provisioning, and infrastructure modifications across production environments
  • Contributed to cost optimization initiatives by identifying underutilized resources and improving resource allocation
  • Worked with internal platforms (Encore, Core, Raincheck) for ticketing, monitoring, and issue resolution
Site Reliability EngineeringDistributed SystemsInfrastructure as CodeCost Optimization

Dxc technology

2 roles

AWS Cloud Engineer

Apr 2021Aug 2024 · 3 yrs 4 mos

  • Designed and managed scalable cloud infrastructure for multiple clients, ensuring high availability and performance
  • Implemented DevOps practices using Terraform and Git, enabling automated and repeatable deployments
  • Built and optimized cloud networking architectures to improve system reliability and connectivity
  • Reduced infrastructure costs through resource optimization and efficient architecture design
  • Automated operational workflows, improving deployment speed and reducing manual intervention
  • Collaborated with cross-functional teams to deliver secure, scalable, and production-ready systems
Site Reliability EngineeringDistributed SystemsDevOpsCloud Networking

Trainee

Feb 2021Apr 2021 · 2 mos

Cdac mohali

Network Administrator

Jun 2019Jul 2019 · 1 mo · Mohali

Networking (TCP/IP, DNS, HTTP)

Education

Chandigarh Engineering College

Bachelor of Technology - BTech — Electronics and Communications Engineering

Jul 2017Jan 2021

Stackforce found 100+ more professionals with Site Reliability Engineering & Distributed Systems

Explore similar profiles based on matching skills and experience