Sanat Rohatgi

SRE (Site Reliability Engineer)

India10 yrs 1 mo experience

Most Likely To Switch

Key Highlights

Expert in driving reliability strategies across teams.
Proven track record in building scalable cloud infrastructure.
Strong background in automation and observability practices.

Stackforce AI infers this person is a SaaS-focused Site Reliability Engineer with expertise in cloud infrastructure and automation.

Contact

rohatgisanat@gmail.com LinkedIn

Skills

Core Skills

Site Reliability EngineeringDevopsCloud Computing

Other Skills

AWSAnsibleAutomationBashCC++CI/CDCloud-Native ApplicationsContinuous Delivery (CD)Continuous Integration (CI)Database RedesignDockerEngineering Change ManagementEnglishGitOps

About

Focused on helping organizations evolve from constant firefighting to predictable, scalable reliability. The approach centers on reducing downtime, eliminating hidden operational costs, and building systems and practices that enable teams to innovate rather than react. Over time, this work has driven transformations that improved platform stability, accelerated delivery cycles, and fostered enduring reliability cultures across teams and business units. By integrating people, process, and technology, it delivers measurable impact and sustainable outcomes. Key Deliverables • Stronger Reliability Standards – Building stable systems that maintain delivery velocity. • Smoother Change Adoption – Embedding reliability practices that persist across teams. • Clear Visibility – Enabling early detection of issues before customer impact. • Connected Teams – Breaking silos to improve information flow and decision speed. • Smarter Automation – Reducing manual effort to focus on higher-value engineering challenges.

Experience

10 yrs 1 mo

Total Experience

2 yrs

Average Tenure

2 yrs 6 mos

Current Experience

Graviton research capital llp

Lead Site Reliability Engineer

Nov 2023 – Present · 2 yrs 6 mos

Driving reliability strategy across multiple pillars, closing systemic ops gaps, and aligning practices under one umbrella.
Directed cross-team observability to improve SLA/SLO adherence and cut alert fatigue.
Built self-service automation and infra pipelines for observability first approach.
Unified engineering–product–ops workflows for sustainable, proactive reliability.

Site Reliability EngineeringEngineering Change ManagementDevOpsSoftware Observability

Accelbyte

SRE

Jan 2023 – Oct 2023 · 9 mos · India · Remote

SRE- Global Platform Team (AccelByte Engagement)
Support a 4-person global SRE team building internal platform and scaling AccelByte’s Multiplayer PaaS offering
Led centralized observability: implemented and optimized full stack (metrics, alerting, cardinality & cost review)
Built and maintained AWS infrastructure (EKS, EC2, Autoscaling, VPC, CloudWatch, S3, SES, SNS) using Terraform for scalability and repeatability
Supported pre-production readiness of Multiplayer Service ahead of launch
Handled KTLO operations: ensured reliability, availability, and smooth day-to-day functioning of core systems
Collaborated with multi-regional teams to design, review, and deliver new platform features

AWSTerraformObservabilityKubernetesSite Reliability EngineeringCloud Computing

Amagi corporation

2 roles

Staff Site Reliability Engineer

Promoted

Apr 2022 – Dec 2022 · 8 mos

Drove cohesion, reliability, and scalability of the Core Platform as the first SRE hire
Designed and documented the Internal Development Platform (PaaS) in collaboration with Architects and Product Owners
Reduced toil with automation: built API/UI framework to provision infra, observability, and security at scale; cut provisioning time significantly
Built central Observability + Automation Control Plane across multi-cluster environments
Improved release velocity by implementing GitOps workflows with ArgoCD
Designed and maintained CI/CD pipelines; introduced PR-based preview environments for faster testing
Reduced Sprint-0 setup to 1 day with a best-practice microservice template (config testing, static analysis, repo setup, Docker/K8s artifacts)
Enhanced DevEx with APIs for common use cases, better environment visibility, and documentation
Managed product backlog in Jira and coordinated with stakeholders across BUs to align platform adoption and practices

AutomationGitOpsCI/CDObservabilitySite Reliability EngineeringDevOps

Senior Site Reliability Engineer

Apr 2021 – Sep 2022 · 1 yr 5 mos

Fareye - enabling digital logistics

2 roles

Senior DevOps Engineer

Promoted

Mar 2020 – Mar 2021 · 1 yr

First DevOps hire; built the charter and roadmap while embedding DevOps/SRE culture in partnership with VP & Principal Architect
Migrated microservices to Kubernetes, boosting development velocity and scalability
Established platform-wide observability with Prometheus, Grafana, and Loki
Architected high-availability, cost-optimized infrastructure on GKE & AWS with on-demand automation

KubernetesPrometheusGrafanaDevOpsSite Reliability Engineering

SDE

Mar 2019 – Mar 2020 · 1 yr

Blackrock

2 roles

Analyst

Aug 2016 – Mar 2019 · 2 yrs 7 mos

Site Reliability Engineer (SRE) – Tooling & Infrastructure Automation
Mission-driven member of the SRE team focused on building scalable tools to improve and automate system operations.
Subject Matter Expert (SME) for an enterprise software platform providing end-to-end infrastructure inventory (hardware & software).
Led onboarding of new infrastructure inventory, including UI development, backend enhancements, and complete database redesign to support scale and extensibility.
Enhanced data collection and correlation pipelines by integrating multiple sources (file systems, third-party APIs), enabling unified data exposure through robust APIs.

Infrastructure AutomationDatabase RedesignSite Reliability Engineering