Kartikeya Mittal

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India4 yrs 9 mos experience

Key Highlights

  • Expert in scaling cloud-native infrastructure across multiple platforms.
  • Proven track record in disaster recovery and system reliability.
  • Strong background in DevOps and automation solutions.
Stackforce AI infers this person is a SaaS Infrastructure Engineer with expertise in Site Reliability Engineering and DevOps.

Contact

Skills

Core Skills

Site Reliability EngineeringKubernetesSoftware ObservabilityDevops

Other Skills

Python (Programming Language)TerraformAWSGCPAzureDisaster RecoveryPrometheusGrafanaLokiAmazon Web Services (AWS)PythonDockerOpenShiftGolangGo (Programming Language)

About

Site Reliability Engineer (SRE) / DevOps Engineer with 4+ years of experience designing and operating large-scale, cloud-native infrastructure across AWS, GCP, and Azure.

Experience

4 yrs 9 mos
Total Experience
2 yrs 2 mos
Average Tenure
3 mos
Current Experience

Airbnb

System Engineer

Feb 2026Present · 3 mos · Bengaluru · Remote

Singlestore

3 roles

Senior Site Reliability Engineer

Promoted

Apr 2025Feb 2026 · 10 mos · Remote

  • Scaled infrastructure globally in 14+ regions by deploying 100+ Kubernetes clusters across AWS, GCP & Azure with Terraform, while automating deployments through CI/CD (GitLab) pipelines, reducing
  • provisioning time by 90% and enabling rapid environment rollouts supporting 2,000+ customer
  • databases.
  • Designed & implemented a full Disaster Recovery (DR) solution for backend infrastructure, achieving seamless failover with zero downtime (RTO < 15 min, RPO ~0) during regional outages.
  • Developed a custom Kubernetes secrets-sync controller (Golang) to replicate secrets across multiple clusters in realtime.
  • Managed and optimized Kubernetes clusters across multi-cloud environments, with expertise in
  • troubleshooting networking, and workload issues, reducing incident resolution time and
  • improving cluster reliability.
  • Set up and managed end-to-end observability stack (Prometheus, Alertmanager, Grafana, and
  • Loki) to enable proactive monitoring, custom alerting, and log-based visualization, significantly
  • improving system reliability and debugging efficiency.
  • On-call rotation contributor, improving SLIs/SLOs and reducing MTTR through deep debugging of distributed systems.
Python (Programming Language)KubernetesSite Reliability Engineering

Site Reliability Engineer 2

Apr 2024Apr 2025 · 1 yr · Remote

KubernetesAmazon Web Services (AWS)Site Reliability Engineering

Site Reliability Engineer

Mar 2023Mar 2024 · 1 yr · Remote

Python (Programming Language)KubernetesSite Reliability Engineering

Amadeus labs

DevOps Engineer

Aug 2021Mar 2023 · 1 yr 7 mos · Bengaluru, Karnataka, India

  • Responsible for building , delivering and maintaining highly available platforms for customer Airlines.
  • Performed root cause analysis ,provided support during major incidents, fixed and documented
  • problems ,and implemented preventive measures.
  • Experience in developing automation solutions in python to reduce manual efforts and increase
  • team efficiency .
  • Experience with container-based deployements using Docker , working with Docker images, Docker
  • registries and OpenShift
DevOpsPython (Programming Language)Site Reliability Engineering

Education

Vellore Institute of Technology

Bachelor of Technology - BTech — Information Technology

Jan 2017Jan 2021

Stackforce found 100+ more professionals with Site Reliability Engineering & Kubernetes

Explore similar profiles based on matching skills and experience