Munish Kumar

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India11 yrs 5 mos experience
Most Likely To SwitchAI ML Practitioner

Key Highlights

  • Over 8 years of experience in SRE and DevOps.
  • Led major infrastructure overhauls at top companies.
  • Passionate about mentoring engineers in SRE practices.
Stackforce AI infers this person is a Site Reliability Engineer specializing in SaaS infrastructure and DevOps automation.

Contact

Skills

Core Skills

Site Reliability EngineeringDevopsQuality Assurance

Other Skills

API TestingAWS Command Line Interface (CLI)AnsibleApache KafkaApplication Lifecycle ManagementApplication MonitoringAzure DevOpsAzure Kubernetes Service (AKS)Configuration ManagementConsulContinuous IntegrationContinuous Integration and Continuous Delivery (CI/CD)DatadogElasticsearchEnvoy

About

I’m a Site Reliability Engineer with 8+ years of experience across QA, DevOps, and SRE, passionate about building scalable, reliable, and observable systems. I began my career in QA, which gave me a strong foundation in system behavior and performance under real-world conditions. Over time, I transitioned into DevOps and eventually SRE, where I now focus on building fault-tolerant infrastructure and automating reliability at scale. 🔧 Key Skills & Tools: - Cloud: Azure, AWS, Kubernetes (AKS), Docker, Vault - IaC & Automation: Terraform, Ansible, GitHub Actions, Jenkins - Observability: Prometheus, Grafana, Loki, Tempo, Datadog, ELK Stack - SRE Practices: SLIs/SLOs, incident response, blameless postmortems - CI/CD: End-to-end pipeline automation for high-frequency deployments I’ve led major infrastructure and observability overhauls at Maersk and Shuttl, reducing deployment time, increasing system resilience, and improving developer experience through better tooling and automation. I thrive in collaborative, high-trust environments and enjoy mentoring engineers transitioning into DevOps/SRE roles — a journey I’ve taken myself. Let’s connect if you're building scalable systems, adopting SRE practices, or leading DevOps transformation through automation and cultural change.

Experience

11 yrs 5 mos
Total Experience
2 yrs 10 mos
Average Tenure
4 yrs 9 mos
Current Experience

A.p. moller - maersk

Senior Site Reliability Engineer

Sep 2021Present · 4 yrs 9 mos · Bengaluru, Karnataka, India

  • Designed CI/CD pipelines using GitHub Actions, reducing deployment time by 60% and increasing delivery frequency across multiple environments.
  • Automated infrastructure provisioning using Terraform and Ansible, reducing manual work by 70% and enabling repeatable global rollouts.
  • Managed and scaled Azure Kubernetes Service (AKS) clusters supporting critical applications with zero-downtime deployments.
  • Implemented observability stack using Prometheus, Grafana, Loki, and Tempo, reducing MTTR by 30% and improving alert precision.
  • Led blameless postmortems for high-priority incidents, implementing long-term fixes and reducing recurrence rates.
  • Collaborated across platform/dev teams to define and enforce SLIs/SLOs, improving reliability KPIs and on-call experience.
GitHub ActionsTerraformAnsibleAzure Kubernetes Service (AKS)PrometheusGrafana+4

Shuttl

2 roles

Site Reliability Engineer

Feb 2020Aug 2021 · 1 yr 6 mos

  • Responded to 24/7 critical incidents with urgency and precision, improving incident resolution time by 20%.
  • Managed Nomad and Consul-based infrastructure using Terraform and Vault, ensuring high availability across microservices.
  • Improved observability by building custom dashboards, alerts, and notebooks in Datadog and Sentry for core systems.
  • Developed internal tools to automate routine SRE tasks and reduce operational overhead.
TerraformVaultDatadogSentrySite Reliability Engineering

Senior QA Engineer

Aug 2017Feb 2020 · 2 yrs 6 mos

  • Owned full QA for backend microservices, driving SDLC test strategy during system-wide rewrite.
  • Built automation test suites for REST APIs and UI using Postman, Selenium, and REST-Assured.
  • Led performance testing efforts with JMeter and Locust, identifying and resolving key system bottlenecks.
PostmanSeleniumREST-AssuredJMeterLocustQuality Assurance

Tata 1mg

QA Engineer

Jul 2015Aug 2017 · 2 yrs 1 mo · Gurgaon, India

  • Led QA for diagnostics product on Web and Mobile, improving release confidence.
  • Executed functional, DB, API, and performance testing across pharmacy and order fulfillment platforms.

Fabfurnish.com

Jr Quality Analyst

Dec 2014Jul 2015 · 7 mos · Gurgaon, India

  • Owned end-to-end QA for the FabSeller platform, ensuring high-quality releases through rigorous testing practices; also contributed to testing efforts across other FabFurnish e-commerce products
  • Performed comprehensive functional, API, and performance testing, validating core business workflows and ensuring system reliability under load

Education

Maharshi Dayanand University (MDU), Rohtak

B.Tech — Information Technology

Stackforce found 100+ more professionals with Site Reliability Engineering & Devops

Explore similar profiles based on matching skills and experience