Sumit Chachadi — SRE (Site Reliability Engineer)
Senior Site Reliability Engineer with 8+ years of experience building and operating large-scale distributed systems at Airbnb and Cisco. Deep expertise in Python automation, observability infrastructure (Prometheus, Grafana, OpenTelemetry), and cloud-native platforms (Kubernetes, AWS). Currently building AIOps and LLM-powered operations platforms using Model Context Protocol (MCP), agentic AI workflows, and RAG-based automation to reduce operational toil and accelerate incident resolution. Proven incident commander with 50+ Sev-0/Sev-1 resolutions; experienced in defining SLOs/SLIs, managing error budgets, and driving high availability and fault tolerance across distributed services.. Previously, I worked at Cisco, where I leveraged my Master’s degree to lead networking automation and NetDevOps initiatives.
Stackforce AI infers this person is a Site Reliability Engineer specializing in AI-driven operations and cloud infrastructure.
Location: Bengaluru, Karnataka, India
Experience: 12 yrs 5 mos
Skills
- Site Reliability Engineering
- Cloud Infrastructure
- Ai Operations
- Incident Management
- Observability
- Alert Management
- Frontend Development
- Data Engineering
- Netdevops
- Network Automation
- Test Automation
- Software Development
Career Highlights
- Proven incident commander with 50+ Sev-0/Sev-1 resolutions.
- Architected AI-powered operations platforms reducing operational toil.
- Achieved 300% QoQ growth in user engagement.
Work Experience
Airbnb
Site Reliability Engineer (3 yrs 6 mos)
Cisco
Lead Engineer - NetDevOps (1 yr 3 mos)
Lead Automation Engineer (2 yrs 9 mos)
Software Engineer (2 yrs 9 mos)
University at Buffalo
Student Assistant (1 yr 7 mos)
IEEE-GIT
Chairperson (1 yr)
Webmaster (10 mos)
Education
Master’s Degree at University at Buffalo
Bachelor’s Degree at Gogte Institute of Technology
Associate's Degree at Shri Satya Sai Loka Seva P.U. College