Sreekanth Warrier

DevOps Engineer

Bengaluru, Karnataka, India11 yrs 9 mos experience

AI ML PractitionerAI Enabled

Key Highlights

Over a decade of experience in platform and reliability engineering.
Led organization-wide observability initiatives at Zepto.
Expert in cloud automation and multi-cloud reliability.

Stackforce AI infers this person is a SaaS platform engineering expert with a focus on observability and cloud automation.

Contact

Skills

Core Skills

ObservabilityPlatform EngineeringSite Reliability EngineeringCloud AutomationIncident ManagementDeveloper ExperienceDevopsAutomationSystem Administration

Other Skills

AI SolutionsAWSAWS Elastic BeanstalkAgentic AI SolutionsAmazon ECSAmazon Web Services (AWS)AnsibleApplication DeploymentArgo CDArgoCDAzure DevOpsAzure Kubernetes Service (AKS)BashBash scriptingChef

About

I am a platform and reliability engineering leader with over a decade of experience building high-performance, scalable, and observable systems across cloud-native environments. My work has consistently focused on improving service reliability, streamlining developer workflows, and driving automation at scale. At Zepto, I lead the Observability vertical within the Infra Platform team, where I drive organization-wide visibility, operational maturity, and platform resilience. I mentor a talented team of engineers, own the end-to-end observability strategy, and collaborate across product and engineering groups to deliver a unified, efficient, and cost-optimized telemetry ecosystem. Previously at InMobi, I played a key role in platform engineering and cloud automation—migrating services to GCP, enhancing Kubernetes deployment workflows, and strengthening multi-cloud reliability. My contributions in incident management, infrastructure automation, and cloud observability significantly improved service continuity and reduced operational overhead. Across roles, my mission has remained consistent: build reliable, observable, and scalable platforms that empower engineering teams, reduce toil, and drive business velocity. Always open to conversations around SRE, observability, platform engineering, cloud architecture, and AI-driven operations.

Experience

11 yrs 9 mos

Total Experience

2 yrs 7 mos

Average Tenure

1 yr 3 mos

Current Experience

Zepto

Lead Engineer

Feb 2025 – Present · 1 yr 3 mos · Bengaluru, Karnataka, India · On-site

Currently leading the Observability vertical within Zepto Infra Platform team, driving platform reliability and visibility across the entire organization. Managing a growing team of engineers focused on building scalable and efficient observability solutions.
Key Responsibilities:
1) Team leadership with a focus on people development, mentoring, and performance management.
2) Driving sprint planning, retrospectives, and quarterly road-mapping for the Observability track.
3) Cross-functional collaboration with application, and product teams to define and deliver platform capabilities.
4) Ownership of end-to-end observability strategy, ensuring high availability and performance across environments.
5) Budget and resource allocation, optimizing TCO for observability and telemetry stacks.
Strategic Initiatives & Projects:
1) Centralized Observability Stack: Consolidated fragmented telemetry systems into a unified LGTM-based observability platform.
2) Application profiling at scale using Grafana Pyroscope to reduce performance bottlenecks and drive CPU/memory optimization
3) In-House APM: Rolled out OpenTelemetry auto-instrumentation across microservices, reducing reliance on third-party agents.
4) SLI/SLO Framework: Implemented org-wide service-level indicators/objectives to track and improve reliability.
5) TCO Analysis: Established cost Observability for every Observability stack to ensure sustainable growth and transparency.
6) Alert as Code: Standardized alerting configurations using GitOps principles, enabling consistent, audit-friendly alert routing across AWS and Kubernetes workloads.
7)SRE Agent (MCP): Built an in-house agentic AI workflow combining Grafana MCP, New Relic MCP, LLMs, and Slack for incident debugging automation.

ObservabilityTeam LeadershipCross-functional CollaborationPerformance ManagementCloud SecurityPlatform Engineering

Inmobi advertising

3 roles

Lead Engineer

Mar 2024 – Feb 2025 · 11 mos · On-site

Key Responsibilities:
Platform Engineering: Developed and implemented innovative tools to streamline application team workflows and improve efficiency.
Team leadership with a focus on people development, mentoring, and performance management.
Driving sprint planning, retrospectives, and quarterly road-mapping
Strategic Initiatives & Projects:
Error Tracking: Established a self-hosted Sentry instance to monitor and track errors, enabling proactive issue resolution.
AI Self-Service: Created an AI bot to provide self-service support for common use cases, reducing support ticket volume and improving user experience.
Developer Experience Enhancement: Integrated Jira products to track DORA metrics and implement SDLC checks, empowering developers to improve their productivity and deliver high-quality code.
Helmify: Centralized commonly used templates as charts using Chartmuseum, streamlining chart management and enhancing deployment efficiency.
Ingress Cost Reduction: Achieved significant cost savings by transitioning from L4 (Contour) to L7 (GCLB) load balancers, optimizing ingress performance and reducing expenses.

Platform EngineeringTeam LeadershipAI SolutionsError TrackingDeveloper ExperienceCloud Automation

Site Reliability Engineer (SDE 3)

Promoted

Mar 2022 – Mar 2024 · 2 yrs · On-site

Key Responsibilities:
Platform Engineering: Developed and implemented innovative tools to streamline application team workflows and improve efficiency.
Automation: Assessed and automated recurring tasks to reduce manual effort and minimize errors.
Cloud Troubleshooting: Troubleshot and resolved complex issues across Azure, GCP, and AWS cloud environments.
Incident Management: Managed incidents effectively across multiple cloud platforms to ensure service continuity.
Programming Proficiency: Strong in Python and Bash for scripting and automation.
Strategic Initiatives & Projects:
GCP Infrastructure Migration: Led a comprehensive project to migrate the Glance Business Unit's services and data to GCP, optimizing infrastructure and improving performance.
Kubernetes Deployment Automation: Implemented Argo CD to automate Kubernetes deployments, ensuring consistency and reliability.
Canary Deployment Strategy: Designed and implemented a deployment strategy for a critical application using canary deployments, combined with continuous verification, to ensure gradual rollouts and minimize risks during updates.
IaC Tool Development: Created a platform tool to facilitate Infrastructure as Code (IaC) deployments using the GitOps methodology, promoting version control and collaboration.
Secure Access Management: Set up Teleport to establish secure system login and access management practices.
Observability Stack Management: Deployed an observability stack (Grafana, Prometheus/Thanos, Loki, Tempo) to gain deep insights into application performance and identify potential issues.
Centralized Observability: Implemented OTEL as a centralize agent for the observability stack, improving data collection and analysis.

Cloud TroubleshootingAutomationIncident ManagementPythonBashSite Reliability Engineering+1

Site Reliability Engineer (SDE 2)

Jul 2020 – Mar 2022 · 1 yr 8 mos · On-site

Key Responsibilities:
Platform Engineering: Developed and implemented innovative tools to streamline application team workflows and improve efficiency.
Automation: Assessed and automated recurring tasks to reduce manual effort and minimize errors.
Cloud Troubleshooting: Troubleshot and resolved complex issues across Azure, GCP, and AWS cloud environments.
Programming Proficiency: Strong in Python and Bash for scripting and automation.

Cloud TroubleshootingAutomationPythonBashSite Reliability EngineeringCloud Automation

Bounce

Senior Devops Engineer

Dec 2019 – Jun 2020 · 6 mos · Bangalore

Automated daily recurring tasks to enhance efficiency.
Dockerize applications and migrated them to the ECS environment.
Implemented security measures at both the application and infrastructure levels.
Set up Jenkins pipelines for production application deployments.
Created ECS environments from scratch, Dockerize applications, and completed migration.
Deployed and configured an ELK stack for centralized logging and monitoring.

AutomationDockerJenkinsDevOps

Ola (ani technologies pvt. ltd)

Devops Engineer

Mar 2018 – Dec 2019 · 1 yr 9 mos · Banglore

Key Responsibilities:
Collaborated with the development team on deployments and troubleshooting.
Automated daily tasks to improve operational efficiency.
Integrated new technologies into the existing tech stack.
Developed automation scripts in Python and wrote Terraform code for infrastructure setup.
Managed CI/CD workflows using Jenkins.
Projects Completed:
Built infrastructure from the ground up in the Azure environment.
Developed cost-effective infrastructure setups for another Business Unit on both AWS and Azure.
Designed and implemented an architectural performance testing environment to ensure infrastructure
scalability.

AutomationPythonTerraformDevOpsCloud Automation

Hostdime india

3 roles

Senior System Engineer | Devops Operations

Promoted

May 2016 – Mar 2018 · 1 yr 10 mos · Trivandrum

Troubleshooting Level 3 Linux system,kernel issues
Troubleshooting Level 3 Webserver issues (Apache,Nginx)
Dealing with P0, P1 issues like server crash, hacks, attacks, spamming, phishing, etc
Implementation and Administration of Internal Systems
Application deployments and System Integration tasks (Mainly OpenSource)
Performing Server Upgrades, Patches, Security fixes
Migration of data’s securely between Data centers using automated Scripts
Automation tasks in servers by writing scripts in bash and python
AWS cloud operations
VM Deployments and application status checks validations