P

Pankaj Kumar Dubey

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India9 yrs experience
AI EnabledAI ML Practitioner

Key Highlights

  • 9+ years in cloud-native platform engineering.
  • Expertise in Kubernetes and AWS for high availability.
  • Proven track record in cost reduction and reliability.
Stackforce AI infers this person is a Cloud Infrastructure Engineer with expertise in DevOps and Site Reliability Engineering.

Contact

Skills

Core Skills

KubernetesAwsDevops

Other Skills

golangECSEKSpythonArtificial Intelligence (AI)AgentspulumiTerraformGitOpsgpuAIOpsLarge Language Models (LLM)AI AgentsJupyterPyTorch

About

Senior Site Reliability Engineer (SRE) with 9+ years of experience in building and scaling cloud-native platforms at high-growth companies like Zomato and Paytm. Skilled in designing highly available systems, automating infrastructure, and driving zero-downtime deployments. Expertise in Kubernetes, AWS, Terraform, Docker, Prometheus, and CI/CD pipelines. Strong background in cloud security (Akamai, Cloudflare), DevSecOps, and incident management. Proven track record in reducing costs, improving reliability, and enabling developer productivity through platform engineering. Passionate about building scalable, resilient, and secure infrastructure that powers millions of users daily. Open to senior DevOps, SRE, Platform, and Cloud roles.

Experience

9 yrs
Total Experience
1 yr 9 mos
Average Tenure
--
Current Experience

Zomato

2 roles

SRE SD3

Jul 2025Nov 2025 · 4 mos · Bengaluru, Karnataka, India · On-site

  • Spearheaded Zomato District App infra design, scaling to millions of concurrent users with HA + failover.
  • Migrated entire District system from the main Zomato account to a new District account without any downtime.
  • Implemented logging stack with Promtail and Loki and a metrics store with Prometheus and VictoriaMetrics for robust observability.
  • Migrated workloads from EKS → ECS, refactoring Node.js/Java/Python apps & ReactJS builds for container compatibility.
  • Reduced infra cost by 30% via Graviton migration (RDS, Redis, ElasticSearch, EKS nodes).
  • Implemented KEDA (SQS-driven consumers) + Karpenter for autoscaling, improving efficiency by 40%.
  • Delivered 3+ years zero downtime using AWS AppConfig and automated failover strategies.
  • Upgraded EKS 1.16 → 1.32, resolving DNS, AMI, CoreDNS, and IAM issues during live traffic.
  • Set up Akamai WAF for public URLs to protect against common web threats.
golangECSEKSpythonArtificial Intelligence (AI)Agents+11

Senior SRE

Aug 2024Jul 2025 · 11 mos · Bengaluru, Karnataka, India · On-site

  • Building-District by Zomato
Cloud InfrastructureAkamaiAmazon Web Services (AWS)ArchitectureTerraformDevSecOps+19

Paytm

DevOps Lead

Jul 2023Aug 2024 · 1 yr 1 mo · Bengaluru, Karnataka, India · On-site

  • Led infra for CAT, DAP, Movies & Events verticals handling millions of daily active users.
  • Designed multi-subnet VPC architecture with strict WAF whitelisting (Akamai, Cloudflare, Imperva).
  • Migrated microservices from self-managed **Docker on EC2 to Kubernetes (EKS)** with Helm and automated deployments via **ArgoCD GitOps** pipelines.
  • Set up centralized Helm-based tech stack for infra + app deployments, enabling reusable patterns across teams.
  • Integrated Keycloak for SSO & RBAC across developer and infra platforms.
  • Migrated databases & Redis from self-managed EC2 to AWS managed RDS & ElastiCache, improving reliability and reducing ops overhead.
  • Built HA RDS clusters (Aurora) with reader/writer split and implemented automated query killer to prevent long-running query failures.
  • Automated infrastructure provisioning with Terraform/Ansible, eliminating manual infra setup.
  • Optimized CI/CD build pipelines (Jenkins + GitHub Actions), reducing build time & failures by >40%.
  • Automated security patching, IAM role optimization, and RBAC controls, including **fixing security issues by migrating services behind WAFs**.
  • Reduced production downtime incidents by 70% through proactive scaling, infra automation, and continuous chaos testing.
AkamaiTerraformDevSecOpsCloudflareInfrastructure as code (IaC)Cloud Computing+9

Insider.in

3 roles

DevOps Lead

Apr 2023Jul 2023 · 3 mos

  • Managing Verticals.
  • Collaborating with developers and different infra stakeholders in order to ensure smooth operations.
  • Improving and optimizing infrastructure for cost and efficiency.
  • Leading/doing POCs to integrate modern tools and techniques.- Managing team. - Collaborating with developers and different infra stakeholders in order to ensure smooth operations. - Improving and optimizing infrastructure for cost and efficiency. - Leading/doing POCs to integrate modern tools and techniques.
  • Skills: Grafana · Ansible · Terraform · People Management · Cost Reduction · Amazon Web Services (AWS) · Linux System Administration · DevOps, Airflow, DevSecops
GrafanaAnsibleTerraformPeople ManagementCost ReductionAmazon Web Services (AWS)+4

Senior Devops Engineer

Promoted

Apr 2022Apr 2023 · 1 yr

  • Managing verticals
  • Contributing in large scale projects
  • Skills: Grafana · Ansible · Terraform · Amazon Web Services (AWS) Kubernetes
GrafanaAnsibleTerraformAmazon Web Services (AWS)Kubernetes

DevOps Engineer

Dec 2020Apr 2022 · 1 yr 4 mos

  • ➤ Managing Paytm's Multiple Verticles infra such as Events, Live, and Movies.
  • ➤ Designed/Implemented EKS Cluster with VPC and other base components through Terraform with the combination of spot and on-demand EC2.
  • ➤ Implemented Kubernetes Monitoring and logging stacks with ELK, PLG, and Prometheus.
  • ➤ Designed/Implemented central Monitoring architecture for monitoring N numbers of k8s cluster as well as standalone servers through Prometheus and Thanos.
  • ➤ Designed/Implemented central Logging architecture to handle logs for N numbers of k8s cluster and standalone servers through ELK and Kafka.
  • ➤ Implemented GitOps Pipeline for Infra/application deployments through Terraform, CircleCi, and Jenkins.
  • ➤ Dockerized Multiple Node, java, PHP Applications, and Deployed on production.
  • ➤ WAF configuration for multiple web servers.
  • ➤ Implemented WAF/VPC log Analysis pipeline through AWS Elastic-search and Kibana.
  • ➤ Handled more than 2000TPS production requests for multiple products.
  • ➤ PCI/DSS compliance Security patching on existing infrastructure through AWS native tools.
  • ➤ Experienced in maintenance and configuration of dev, QA, and production servers.
  • ➤ Developed multiple boto3 scripts for cloud automation.

Sigmoid

Software Engineer

Jan 2017Dec 2020 · 3 yrs 11 mos · Bengaluru, Karnataka, India

Jspiders - training & development center

Software Engineer Intern

Aug 2016Jan 2017 · 5 mos · Bengaluru, Karnataka, India

  • Doing Certification on Various Core Programming Languages.

Webtek labs pvt. ltd.

Software Trainee

Jan 2016Jun 2016 · 5 mos · New Delhi Area, India

Education

RIMT- Institute of Engg. and Technology, Mandi Gobindgarh

Bachelor’s Degree — Electronics and Communications Engineering

Jan 2012Jan 2016

gncs garhwa

10th — Schooling

Jan 2010Jan 2011

Stackforce found 100+ more professionals with Kubernetes & Aws

Explore similar profiles based on matching skills and experience