Pranav Deshpande

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India8 yrs 3 mos experience

Highly StableAI Enabled

Key Highlights

Expert in managing large scale Kubernetes clusters.
Implemented AIOps to streamline operations.
Proven mentor fostering engineering culture.

Stackforce AI infers this person is a Site Reliability Engineer specializing in cloud infrastructure and automation in SaaS environments.

Contact

Skills

Core Skills

Site Reliability EngineeringKubernetesObservabilityCloud InfrastructureAutomationReliabilityInfrastructure Automation

Other Skills

EngineeringAWSGCPPrometheusOpenTelemetryAIOpsLMAIaCSaltStackAnsibleTerraformGoogle Cloud Platform (GCP)Google Kubernetes Engine (GKE)Artificial Intelligence (AI)Infrastructure as code (IaC)

About

Site Reliability Engineer with extensive experience operating large scale production Kubernetes clusters across AWS and GCP. Specialised in building reliable, scalable, and observable platforms using Prometheus, Open Telemetry, Karpenter, and External Secrets. Strong expertise in incident response, on-call operations, and troubleshooting complex distributed systems. Led release management for multi cloud and multi clusters setup in production. Implemented internal GPT driven AIOps workflows to reduce operational toil. A proven mentor and technical interviewer, fostering strong engineering culture and reliability best practices.

Experience

8 yrs 3 mos

Total Experience

2 yrs

Average Tenure

1 mo

Current Experience

Adobe

Software Engineer - SRE 4

Apr 2026 – Present · 1 mo · Bengaluru, Karnataka, India · Hybrid

EngineeringSite Reliability Engineering

Juniper networks

Software Engineer - SRE 4

Mar 2024 – Apr 2026 · 2 yrs 1 mo · Bengaluru, Karnataka, India · Hybrid

Managing 50+ production Kubernetes clusters across AWS and GCP, ensuring high availability and optimal performance.
Familiarity working with Kubernetes Operators & Controllers like Prometheus, Karpenter and External Secrets.
Developed e2e monitoring solutions using Prometheus and OpenTelemetry, enhancing system reliability and cost
optimization. Ensuring design best practices, retention and cost optimization aspects were catered.
Ownership of Application Release cycle for 50+ Production Clusters. Participation in 4 new cloud infrastructure and
platform provisioning.
Implemented an internal GPT for AIOps, streamlining operational processes and improving efficiency. OnCall
responsibilities for business critical applications and Incident Management process.
Mentored junior engineers, fostering a culture of continuous improvement.
Participation in Interview Process for technical evaluation of candidates across levels.

ObservabilityKubernetes

Vmware

2 roles

Software Engineer - SRE 3

Feb 2023 – Mar 2024 · 1 yr 1 mo

Managed and deployed observability components for logging, metrics, and alerting across 20+ Kubernetes clusters for
EDR micro services on A WS.
Designed repository structures for wavefront-proxy, fluentd, and Kubernetes metric collector to enhance CI/CD
efficiency from scratch.
Migration from traditional Kubernetes manifests to helm charts for optimal release management as part of CI/CD
process enhancements.
Managing release of new versions of Observability components using helm.
T roubleshoot Kubernetes and application failures, contributing to the success of multiple COGS initiatives by
implementing filtering framework to reduce ingestion of unused metrics and logs.
OnCall and Incident management responsibilities.
Mentored junior engineers.

LMAObservability

Software Engineer - SRE 2

Oct 2020 – Feb 2023 · 2 yrs 4 mos

Automated and maintained production infrastructure on A WS, enhancing system reliability and performance.
Led the end-to-end migration of logging and monitoring tools, including Splunk to Logz.io and Grafana to W avefront.
Participated in on-call rotations to address critical customer-facing infrastructure issues, ensuring minimal downtime.
Parallel ownership of 2500+ EC2 Fleet across multiple regions in A WS for Hosted EDR using IaC and Salt Stack.

ObservabilityReliability

Calsoft

Member Of Technical Staff

Dec 2018 – Oct 2020 · 1 yr 10 mos · Pune, Maharashtra, India

Developed an automation framework for deploying clusters across A WS, GCP, and OpenStack, enhancing operational
efficiency.
Leveraged Ansible, Python, and T erraform to streamline Kubernetes, Openshift, Docker UCP, and Rancher
deployments.
Resolved complex customer infrastructure issues, improving client satisfaction and reducing downtime.

EngineeringAutomation

Mojo networks, inc.

2 roles

Member Of Technical Staff

Aug 2018 – Dec 2018 · 4 mos · Pune, Maharashtra, India

Automation using Python based Robot Framework for Access Points features, UI automation using selenium. Features included interaction with Server, APs, Clients.
Automated features covering Security(WIPS) and traditional AP/Server functionalities.

EngineeringAutomation

Intern

Feb 2018 – Aug 2018 · 6 mos · Pune, Maharashtra, India

Setting up and managing complex Network Infrastructure including Access Points, Clients (Linux,Windows and Raspberry Pi's), Sniffers (Wired and Wireless). Managed Regression for multiple Server Builds.

Engineering