Sreekanth Warrier

DevOps Engineer

Bengaluru, Karnataka, India11 yrs 9 mos experience
AI ML PractitionerAI Enabled

Key Highlights

  • Over a decade of experience in platform and reliability engineering.
  • Led organization-wide observability initiatives at Zepto.
  • Expert in cloud automation and multi-cloud reliability.
Stackforce AI infers this person is a SaaS platform engineering expert with a focus on observability and cloud automation.

Contact

Skills

Core Skills

ObservabilityPlatform EngineeringSite Reliability EngineeringCloud AutomationIncident ManagementDeveloper ExperienceDevopsAutomationSystem Administration

Other Skills

AI SolutionsAWSAWS Elastic BeanstalkAgentic AI SolutionsAmazon ECSAmazon Web Services (AWS)AnsibleApplication DeploymentArgo CDArgoCDAzure DevOpsAzure Kubernetes Service (AKS)BashBash scriptingChef

About

I am a platform and reliability engineering leader with over a decade of experience building high-performance, scalable, and observable systems across cloud-native environments. My work has consistently focused on improving service reliability, streamlining developer workflows, and driving automation at scale. At Zepto, I lead the Observability vertical within the Infra Platform team, where I drive organization-wide visibility, operational maturity, and platform resilience. I mentor a talented team of engineers, own the end-to-end observability strategy, and collaborate across product and engineering groups to deliver a unified, efficient, and cost-optimized telemetry ecosystem. Previously at InMobi, I played a key role in platform engineering and cloud automation—migrating services to GCP, enhancing Kubernetes deployment workflows, and strengthening multi-cloud reliability. My contributions in incident management, infrastructure automation, and cloud observability significantly improved service continuity and reduced operational overhead. Across roles, my mission has remained consistent: build reliable, observable, and scalable platforms that empower engineering teams, reduce toil, and drive business velocity. Always open to conversations around SRE, observability, platform engineering, cloud architecture, and AI-driven operations.

Experience

11 yrs 9 mos
Total Experience
2 yrs 7 mos
Average Tenure
1 yr 3 mos
Current Experience

Zepto

Lead Engineer

Feb 2025Present · 1 yr 3 mos · Bengaluru, Karnataka, India · On-site

  • Currently leading the Observability vertical within Zepto Infra Platform team, driving platform reliability and visibility across the entire organization. Managing a growing team of engineers focused on building scalable and efficient observability solutions.
  • Key Responsibilities:
  • 1) Team leadership with a focus on people development, mentoring, and performance management.
  • 2) Driving sprint planning, retrospectives, and quarterly road-mapping for the Observability track.
  • 3) Cross-functional collaboration with application, and product teams to define and deliver platform capabilities.
  • 4) Ownership of end-to-end observability strategy, ensuring high availability and performance across environments.
  • 5) Budget and resource allocation, optimizing TCO for observability and telemetry stacks.
  • Strategic Initiatives & Projects:
  • 1) Centralized Observability Stack: Consolidated fragmented telemetry systems into a unified LGTM-based observability platform.
  • 2) Application profiling at scale using Grafana Pyroscope to reduce performance bottlenecks and drive CPU/memory optimization
  • 3) In-House APM: Rolled out OpenTelemetry auto-instrumentation across microservices, reducing reliance on third-party agents.
  • 4) SLI/SLO Framework: Implemented org-wide service-level indicators/objectives to track and improve reliability.
  • 5) TCO Analysis: Established cost Observability for every Observability stack to ensure sustainable growth and transparency.
  • 6) Alert as Code: Standardized alerting configurations using GitOps principles, enabling consistent, audit-friendly alert routing across AWS and Kubernetes workloads.
  • 7)SRE Agent (MCP): Built an in-house agentic AI workflow combining Grafana MCP, New Relic MCP, LLMs, and Slack for incident debugging automation.
ObservabilityTeam LeadershipCross-functional CollaborationPerformance ManagementCloud SecurityPlatform Engineering

Inmobi advertising

3 roles

Lead Engineer

Mar 2024Feb 2025 · 11 mos · On-site

  • Key Responsibilities:
  • Platform Engineering: Developed and implemented innovative tools to streamline application team workflows and improve efficiency.
  • Team leadership with a focus on people development, mentoring, and performance management.
  • Driving sprint planning, retrospectives, and quarterly road-mapping
  • Strategic Initiatives & Projects:
  • Error Tracking: Established a self-hosted Sentry instance to monitor and track errors, enabling proactive issue resolution.
  • AI Self-Service: Created an AI bot to provide self-service support for common use cases, reducing support ticket volume and improving user experience.
  • Developer Experience Enhancement: Integrated Jira products to track DORA metrics and implement SDLC checks, empowering developers to improve their productivity and deliver high-quality code.
  • Helmify: Centralized commonly used templates as charts using Chartmuseum, streamlining chart management and enhancing deployment efficiency.
  • Ingress Cost Reduction: Achieved significant cost savings by transitioning from L4 (Contour) to L7 (GCLB) load balancers, optimizing ingress performance and reducing expenses.
Platform EngineeringTeam LeadershipAI SolutionsError TrackingDeveloper ExperienceCloud Automation

Site Reliability Engineer (SDE 3)

Promoted

Mar 2022Mar 2024 · 2 yrs · On-site

  • Key Responsibilities:
  • Platform Engineering: Developed and implemented innovative tools to streamline application team workflows and improve efficiency.
  • Automation: Assessed and automated recurring tasks to reduce manual effort and minimize errors.
  • Cloud Troubleshooting: Troubleshot and resolved complex issues across Azure, GCP, and AWS cloud environments.
  • Incident Management: Managed incidents effectively across multiple cloud platforms to ensure service continuity.
  • Programming Proficiency: Strong in Python and Bash for scripting and automation.
  • Strategic Initiatives & Projects:
  • GCP Infrastructure Migration: Led a comprehensive project to migrate the Glance Business Unit's services and data to GCP, optimizing infrastructure and improving performance.
  • Kubernetes Deployment Automation: Implemented Argo CD to automate Kubernetes deployments, ensuring consistency and reliability.
  • Canary Deployment Strategy: Designed and implemented a deployment strategy for a critical application using canary deployments, combined with continuous verification, to ensure gradual rollouts and minimize risks during updates.
  • IaC Tool Development: Created a platform tool to facilitate Infrastructure as Code (IaC) deployments using the GitOps methodology, promoting version control and collaboration.
  • Secure Access Management: Set up Teleport to establish secure system login and access management practices.
  • Observability Stack Management: Deployed an observability stack (Grafana, Prometheus/Thanos, Loki, Tempo) to gain deep insights into application performance and identify potential issues.
  • Centralized Observability: Implemented OTEL as a centralize agent for the observability stack, improving data collection and analysis.
Cloud TroubleshootingAutomationIncident ManagementPythonBashSite Reliability Engineering+1

Site Reliability Engineer (SDE 2)

Jul 2020Mar 2022 · 1 yr 8 mos · On-site

  • Key Responsibilities:
  • Platform Engineering: Developed and implemented innovative tools to streamline application team workflows and improve efficiency.
  • Automation: Assessed and automated recurring tasks to reduce manual effort and minimize errors.
  • Cloud Troubleshooting: Troubleshot and resolved complex issues across Azure, GCP, and AWS cloud environments.
  • Programming Proficiency: Strong in Python and Bash for scripting and automation.
Cloud TroubleshootingAutomationPythonBashSite Reliability EngineeringCloud Automation

Bounce

Senior Devops Engineer

Dec 2019Jun 2020 · 6 mos · Bangalore

  • Automated daily recurring tasks to enhance efficiency.
  • Dockerize applications and migrated them to the ECS environment.
  • Implemented security measures at both the application and infrastructure levels.
  • Set up Jenkins pipelines for production application deployments.
  • Created ECS environments from scratch, Dockerize applications, and completed migration.
  • Deployed and configured an ELK stack for centralized logging and monitoring.
AutomationDockerJenkinsDevOps

Ola (ani technologies pvt. ltd)

Devops Engineer

Mar 2018Dec 2019 · 1 yr 9 mos · Banglore

  • Key Responsibilities:
  • Collaborated with the development team on deployments and troubleshooting.
  • Automated daily tasks to improve operational efficiency.
  • Integrated new technologies into the existing tech stack.
  • Developed automation scripts in Python and wrote Terraform code for infrastructure setup.
  • Managed CI/CD workflows using Jenkins.
  • Projects Completed:
  • Built infrastructure from the ground up in the Azure environment.
  • Developed cost-effective infrastructure setups for another Business Unit on both AWS and Azure.
  • Designed and implemented an architectural performance testing environment to ensure infrastructure
  • scalability.
AutomationPythonTerraformDevOpsCloud Automation

Hostdime india

3 roles

Senior System Engineer | Devops Operations

Promoted

May 2016Mar 2018 · 1 yr 10 mos · Trivandrum

  • Troubleshooting Level 3 Linux system,kernel issues
  • Troubleshooting Level 3 Webserver issues (Apache,Nginx)
  • Dealing with P0, P1 issues like server crash, hacks, attacks, spamming, phishing, etc
  • Implementation and Administration of Internal Systems
  • Application deployments and System Integration tasks (Mainly OpenSource)
  • Performing Server Upgrades, Patches, Security fixes
  • Migration of data’s securely between Data centers using automated Scripts
  • Automation tasks in servers by writing scripts in bash and python
  • AWS cloud operations
  • VM Deployments and application status checks validations
LinuxBashAWSDevOpsSystem Administration

Datacenter Operations Engineer

Promoted

Jun 2015Apr 2016 · 10 mos · Trivandrum

  • Server Build
  • Hardware Troubleshooting
  • OS Installations and Configurations
  • Data Migration from Servers
  • Disk Management
  • Network management with physical switches and routers
Server BuildHardware Troubleshooting

Jr. System Engineer

Jun 2014Jun 2015 · 1 yr · Trivandrum

  • Responsible for the day to day monitoring of Linux and Windows Servers.
  • Fix Webserver issues (Apache, IIS) issues reported by clients.
  • Administration of Internal Systems.
  • Application deployments for internal use (mainly OpenSource)
  • Track server Abuses and takes necessary actions to stop it.
  • Deliver assigned system integration tasks on time.
  • Performing Server Upgrades, Patches, Security fixes.
  • Migration of data’s securely between Datacentres.
  • Study on the feasibility of new systems and its Deployments.
  • Hands on experience in building servers based on customer requirement.
  • Hands on experience in setting up rack servers and bring them online.
MonitoringServer Administration

Education

Cochin University of Science and Technology

Bachelor of Technology - BTech

Jan 2010Jan 2013

Stackforce found 100+ more professionals with Observability & Platform Engineering

Explore similar profiles based on matching skills and experience