Pranav Deshpande

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India8 yrs 3 mos experience
Highly StableAI Enabled

Key Highlights

  • Expert in managing large scale Kubernetes clusters.
  • Implemented AIOps to streamline operations.
  • Proven mentor fostering engineering culture.
Stackforce AI infers this person is a Site Reliability Engineer specializing in cloud infrastructure and automation in SaaS environments.

Contact

Skills

Core Skills

Site Reliability EngineeringKubernetesObservabilityCloud InfrastructureAutomationReliabilityInfrastructure Automation

Other Skills

EngineeringAWSGCPPrometheusOpenTelemetryAIOpsLMAIaCSaltStackAnsibleTerraformGoogle Cloud Platform (GCP)Google Kubernetes Engine (GKE)Artificial Intelligence (AI)Infrastructure as code (IaC)

About

Site Reliability Engineer with extensive experience operating large scale production Kubernetes clusters across AWS and GCP. Specialised in building reliable, scalable, and observable platforms using Prometheus, Open Telemetry, Karpenter, and External Secrets. Strong expertise in incident response, on-call operations, and troubleshooting complex distributed systems. Led release management for multi cloud and multi clusters setup in production. Implemented internal GPT driven AIOps workflows to reduce operational toil. A proven mentor and technical interviewer, fostering strong engineering culture and reliability best practices.

Experience

8 yrs 3 mos
Total Experience
2 yrs
Average Tenure
1 mo
Current Experience

Adobe

Software Engineer - SRE 4

Apr 2026Present · 1 mo · Bengaluru, Karnataka, India · Hybrid

EngineeringSite Reliability Engineering

Juniper networks

Software Engineer - SRE 4

Mar 2024Apr 2026 · 2 yrs 1 mo · Bengaluru, Karnataka, India · Hybrid

  • Managing 50+ production Kubernetes clusters across AWS and GCP, ensuring high availability and optimal performance.
  • Familiarity working with Kubernetes Operators & Controllers like Prometheus, Karpenter and External Secrets.
  • Developed e2e monitoring solutions using Prometheus and OpenTelemetry, enhancing system reliability and cost
  • optimization. Ensuring design best practices, retention and cost optimization aspects were catered.
  • Ownership of Application Release cycle for 50+ Production Clusters. Participation in 4 new cloud infrastructure and
  • platform provisioning.
  • Implemented an internal GPT for AIOps, streamlining operational processes and improving efficiency. OnCall
  • responsibilities for business critical applications and Incident Management process.
  • Mentored junior engineers, fostering a culture of continuous improvement.
  • Participation in Interview Process for technical evaluation of candidates across levels.
ObservabilityKubernetes

Vmware

2 roles

Software Engineer - SRE 3

Feb 2023Mar 2024 · 1 yr 1 mo

  • Managed and deployed observability components for logging, metrics, and alerting across 20+ Kubernetes clusters for
  • EDR micro services on A WS.
  • Designed repository structures for wavefront-proxy, fluentd, and Kubernetes metric collector to enhance CI/CD
  • efficiency from scratch.
  • Migration from traditional Kubernetes manifests to helm charts for optimal release management as part of CI/CD
  • process enhancements.
  • Managing release of new versions of Observability components using helm.
  • T roubleshoot Kubernetes and application failures, contributing to the success of multiple COGS initiatives by
  • implementing filtering framework to reduce ingestion of unused metrics and logs.
  • OnCall and Incident management responsibilities.
  • Mentored junior engineers.
LMAObservability

Software Engineer - SRE 2

Oct 2020Feb 2023 · 2 yrs 4 mos

  • Automated and maintained production infrastructure on A WS, enhancing system reliability and performance.
  • Led the end-to-end migration of logging and monitoring tools, including Splunk to Logz.io and Grafana to W avefront.
  • Participated in on-call rotations to address critical customer-facing infrastructure issues, ensuring minimal downtime.
  • Parallel ownership of 2500+ EC2 Fleet across multiple regions in A WS for Hosted EDR using IaC and Salt Stack.
ObservabilityReliability

Calsoft

Member Of Technical Staff

Dec 2018Oct 2020 · 1 yr 10 mos · Pune, Maharashtra, India

  • Developed an automation framework for deploying clusters across A WS, GCP, and OpenStack, enhancing operational
  • efficiency.
  • Leveraged Ansible, Python, and T erraform to streamline Kubernetes, Openshift, Docker UCP, and Rancher
  • deployments.
  • Resolved complex customer infrastructure issues, improving client satisfaction and reducing downtime.
EngineeringAutomation

Mojo networks, inc.

2 roles

Member Of Technical Staff

Aug 2018Dec 2018 · 4 mos · Pune, Maharashtra, India

  • Automation using Python based Robot Framework for Access Points features, UI automation using selenium. Features included interaction with Server, APs, Clients.
  • Automated features covering Security(WIPS) and traditional AP/Server functionalities.
EngineeringAutomation

Intern

Feb 2018Aug 2018 · 6 mos · Pune, Maharashtra, India

  • Setting up and managing complex Network Infrastructure including Access Points, Clients (Linux,Windows and Raspberry Pi's), Sniffers (Wired and Wireless). Managed Regression for multiple Server Builds.
Engineering

Education

Pune Institute of Computer Technology

Bachelor of Engineering (B.E.) — Computer Engineering

Jan 2015Jan 2018

Stackforce found 100+ more professionals with Site Reliability Engineering & Kubernetes

Explore similar profiles based on matching skills and experience