Sanat Rohatgi

SRE (Site Reliability Engineer)

India9 yrs 11 mos experience
Most Likely To Switch

Key Highlights

  • Expert in driving reliability strategies across teams.
  • Proven track record in building scalable cloud infrastructure.
  • Strong background in automation and observability practices.
Stackforce AI infers this person is a SaaS-focused Site Reliability Engineer with expertise in cloud infrastructure and automation.

Contact

Skills

Core Skills

Site Reliability EngineeringDevopsCloud Computing

Other Skills

AWSAnsibleAutomationBashCC++CI/CDCloud-Native ApplicationsContinuous Delivery (CD)Continuous Integration (CI)Database RedesignDockerEngineering Change ManagementEnglishGitOps

About

Focused on helping organizations evolve from constant firefighting to predictable, scalable reliability. The approach centers on reducing downtime, eliminating hidden operational costs, and building systems and practices that enable teams to innovate rather than react. Over time, this work has driven transformations that improved platform stability, accelerated delivery cycles, and fostered enduring reliability cultures across teams and business units. By integrating people, process, and technology, it delivers measurable impact and sustainable outcomes. Key Deliverables • Stronger Reliability Standards – Building stable systems that maintain delivery velocity. • Smoother Change Adoption – Embedding reliability practices that persist across teams. • Clear Visibility – Enabling early detection of issues before customer impact. • Connected Teams – Breaking silos to improve information flow and decision speed. • Smarter Automation – Reducing manual effort to focus on higher-value engineering challenges.

Experience

Graviton research capital llp

Lead Site Reliability Engineer

Nov 2023Present · 2 yrs 4 mos

  • Driving reliability strategy across multiple pillars, closing systemic ops gaps, and aligning practices under one umbrella.
  • Directed cross-team observability to improve SLA/SLO adherence and cut alert fatigue.
  • Built self-service automation and infra pipelines for observability first approach.
  • Unified engineering–product–ops workflows for sustainable, proactive reliability.
Site Reliability EngineeringEngineering Change ManagementDevOpsSoftware Observability

Accelbyte

SRE

Jan 2023Oct 2023 · 9 mos · India · Remote

  • SRE- Global Platform Team (AccelByte Engagement)
  • Support a 4-person global SRE team building internal platform and scaling AccelByte’s Multiplayer PaaS offering
  • Led centralized observability: implemented and optimized full stack (metrics, alerting, cardinality & cost review)
  • Built and maintained AWS infrastructure (EKS, EC2, Autoscaling, VPC, CloudWatch, S3, SES, SNS) using Terraform for scalability and repeatability
  • Supported pre-production readiness of Multiplayer Service ahead of launch
  • Handled KTLO operations: ensured reliability, availability, and smooth day-to-day functioning of core systems
  • Collaborated with multi-regional teams to design, review, and deliver new platform features
AWSTerraformObservabilityKubernetesSite Reliability EngineeringCloud Computing

Amagi corporation

2 roles

Staff Site Reliability Engineer

Promoted

Apr 2022Dec 2022 · 8 mos

  • Drove cohesion, reliability, and scalability of the Core Platform as the first SRE hire
  • Designed and documented the Internal Development Platform (PaaS) in collaboration with Architects and Product Owners
  • Reduced toil with automation: built API/UI framework to provision infra, observability, and security at scale; cut provisioning time significantly
  • Built central Observability + Automation Control Plane across multi-cluster environments
  • Improved release velocity by implementing GitOps workflows with ArgoCD
  • Designed and maintained CI/CD pipelines; introduced PR-based preview environments for faster testing
  • Reduced Sprint-0 setup to 1 day with a best-practice microservice template (config testing, static analysis, repo setup, Docker/K8s artifacts)
  • Enhanced DevEx with APIs for common use cases, better environment visibility, and documentation
  • Managed product backlog in Jira and coordinated with stakeholders across BUs to align platform adoption and practices
AutomationGitOpsCI/CDObservabilitySite Reliability EngineeringDevOps

Senior Site Reliability Engineer

Apr 2021Sep 2022 · 1 yr 5 mos

Fareye - enabling digital logistics

2 roles

Senior DevOps Engineer

Promoted

Mar 2020Mar 2021 · 1 yr

  • First DevOps hire; built the charter and roadmap while embedding DevOps/SRE culture in partnership with VP & Principal Architect
  • Migrated microservices to Kubernetes, boosting development velocity and scalability
  • Established platform-wide observability with Prometheus, Grafana, and Loki
  • Architected high-availability, cost-optimized infrastructure on GKE & AWS with on-demand automation
KubernetesPrometheusGrafanaDevOpsSite Reliability Engineering

SDE

Mar 2019Mar 2020 · 1 yr

Blackrock

2 roles

Analyst

Aug 2016Mar 2019 · 2 yrs 7 mos

  • Site Reliability Engineer (SRE) – Tooling & Infrastructure Automation
  • Mission-driven member of the SRE team focused on building scalable tools to improve and automate system operations.
  • Subject Matter Expert (SME) for an enterprise software platform providing end-to-end infrastructure inventory (hardware & software).
  • Led onboarding of new infrastructure inventory, including UI development, backend enhancements, and complete database redesign to support scale and extensibility.
  • Enhanced data collection and correlation pipelines by integrating multiple sources (file systems, third-party APIs), enabling unified data exposure through robust APIs.
Infrastructure AutomationDatabase RedesignSite Reliability Engineering

Intern

Jan 2016Aug 2016 · 7 mos

  • Revamping software tasked with providing administrative functions on
  • applications.
  • Revamping in house tool providing on time job scheduling capabilities to users.

Education

Vellore Institute of Technology

Bachelor of Technology (BTech) — Computer Engineering

Jan 2012Jan 2016

Stackforce found 100+ more professionals with Site Reliability Engineering & Devops

Explore similar profiles based on matching skills and experience