Sudeep Gupta — SRE (Site Reliability Engineer)
I’m a Staff-level Site Reliability and Platform Engineer with over a decade of experience designing and scaling reliable cloud-native infrastructure and internal platforms that support large-scale distributed systems, analytics, and AI workloads in enterprise environments.My work sits at the intersection of reliability engineering, platform architecture, and developer productivity - helping engineering teams ship faster, operate safer, and scale systems with confidence while maintaining strong operational and cost discipline.Over the years, I’ve led initiatives to:- Design and scale internal developer platforms built on Kubernetes, Terraform, and GitOps, enabling standardized and high-velocity deployments across large engineering organizations- Automate observability, incident response, and reliability workflows, improving system resilience and reducing operational toil in distributed environments- Build and operate ML and data infrastructure platforms (Airflow, Databricks, Spark, GCP) supporting training, inference, and large-scale data processing workloads- Modernize infrastructure architecture to improve performance, optimize resource utilization, and reduce cloud cost through automation and spot-instance orchestrationI enjoy driving end-to-end platform outcomes - from architecture and automation to cross-team enablement and reliability strategy. My approach to SRE is systems-oriented and I gravitate toward building foundational platforms and frameworks that simplify complexity and create leverage for large engineering teams through design, automation, and scalable abstractions rather than reactive operations. I’ve worked across B2C, SaaS, AI, and Financial Analytics domains in high-scale, high-stakes environments where platform reliability, developer velocity, and cost efficiency are critical. My work has driven faster deployment cycles, multi-region reliability improvements, and significant infrastructure cost optimizations at enterprise scale.🔹 Core Skills: Site Reliability Engineering (SRE), Platform Engineering, Cloud Infrastructure (AWS/GCP), Kubernetes, Go, Terraform, GitOps, Observability (Prometheus, Grafana, OpenTelemetry), CI/CD, Distributed Systems, Databricks, Airflow, Spark, Python🔹 Focus Areas: Platform Reliability • Internal Developer Platforms • ML/AI Infrastructure • Developer Experience • Cost Optimization • Automation Strategy
Stackforce AI infers this person is a SaaS and Fintech expert with a strong focus on cloud infrastructure and data engineering.
Location: New Delhi, Delhi, India
Experience: 13 yrs 6 mos
Skills
- Site Reliability Engineering
- Platform Engineering
- Cloud Infrastructure
- Data Engineering
Career Highlights
- Over a decade of experience in SRE and Platform Engineering.
- Led initiatives to optimize cloud infrastructure costs significantly.
- Expert in building scalable, reliable cloud-native platforms.
Work Experience
Avalara
Lead Site Reliability Engineer (2 yrs)
FARFETCH
Senior Infrastructure Engineer (3 yrs 11 mos)
Infrastructure Engineer (2 yrs)
BlackRock
Associate (2 yrs 9 mos)
Fractal Analytics
Senior Data Engineer (9 mos)
Data Engineer (9 mos)
stealth mode start-up
Data Scientist and Technical Lead (1 yr)
InnovAccer
SDE (5 mos)
IIIT Delhi
Research Associate (3 mos)
Graduate Teaching Assistant (1 yr 8 mos)
Education
Master's Degree at Indraprastha Institute of Information Technology, Delhi
B.Tech at Guru Gobind Singh Indraprastha University
at Montfort Senior Secondary School, Ashok Vihar, Delhi