Rahul Sharma

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India7 yrs 1 mo experience

Most Likely To SwitchAI ML Practitioner

Key Highlights

Expert in building reliable Quantum-as-a-Service platforms.
Led global reliability strategy for IBM Quantum.
Specialized in high availability and low latency systems.

Stackforce AI infers this person is a leader in Quantum Computing and Site Reliability Engineering.

Contact

Skills

Core Skills

Site Reliability Engineering (sre)Reliability StrategySaas Service ReliabilityDevopsCloud Engineering

Other Skills

AIOpsALMAgile MethodologiesAmazon Web Services (AWS)Architectural DesignAzure DevOpsBack-End Web DevelopmentCI/CD AutomationCI/CD PipelinesChatbotsCloud MigrationComputer ScienceContainerizationContinuous Integration and Continuous Delivery (CI/CD)Critical Thinking

About

I’m a global Reliability Engineering (SRE) Team Lead currently driving the reliability strategy, architecture and operational excellence for IBM Quantum Platform worldwide — building the foundational systems that enable reliable Quantum-as-a-Service platforms at planetary scale. With more than 7 years across IBM Research, VMware, SITA , NVIDIA and Infosys , I specialise in designing, operating and scaling distributed systems that must deliver high availability, low latency and zero-surprise operations. My expertise spans cloud infrastructure, fault-tolerant architectures, SLO/SLI engineering, observability, incident automation, and AIOps-powered operational intelligence. My work bridges the gap between research innovation and production-grade reliability — ensuring that tomorrow’s quantum breakthroughs are usable, scalable and enterprise-ready today. I am passionate about cloud-native reliability, distributed systems design, operational maturity, reliability culture, chaos engineering, and the fusion of classical and quantum compute infrastructure. Always open to connecting with leaders, researchers, and teams building the future of scalable computing, platform reliability and next-generation quantum systems. ------------------------------------------------------------------------------ Book one-on-one with me on Topmate:- https://topmate.io/rahul_sharma_nit ------------------------------------------------------------------------------

Experience

Ibm

Global Lead – Reliability Engineering (SRE) | IBM Quantum

Nov 2023 – Present · 2 yrs 4 mos · Bengaluru, Karnataka, India · On-site

■ Leading global reliability strategy and Team for IBM’s Quantum platform.
■ Own reliability roadmap, SLO/SLI architecture and operational health for quantum software backends globally.
■ Designed incident automation through ChatOps, AIOps and system intelligence.
■ Embedded reliability governance into research & engineering workflows, enabling scalable quantum workloads.
■ Built reliability frameworks, tooling & visibility pipelines for multi-region quantum compute systems.
■ Mentor engineering teams, drive readiness reviews, and lead architectural decisions for distributed quantum infrastructure.

Site Reliability Engineering (SRE)Reliability StrategySLO/SLI ArchitectureIncident AutomationAIOpsQuantum Computing

Vmware

Member of Technical Staff -2 | SRE

Jan 2023 – Oct 2023 · 9 mos · Bengaluru, Karnataka, India

Contributed to High-Performance Computing (HPC) and large-scale distributed systems. Within the VMware Command Center SRE team, I led the Reliability & Automation vertical, architecting tooling and frameworks that improved SaaS service reliability, observability, incident response, and cross-stack fault tolerance across multiple business units and heterogeneous service architectures.
Designed and implemented monitoring systems, SLO/SLI models, automated workflows, reliability tooling, and developer-platform guardrails to ensure high availability, operational continuity, and performance consistency across VMware’s global SaaS platform footprint.

High-Performance Computing (HPC)SaaS Service ReliabilityObservabilityIncident ResponseFault ToleranceSite Reliability Engineering (SRE)

Sita

Associate Specialist

Nov 2021 – Jan 2023 · 1 yr 2 mos

SRE & DevOps Engineer supporting the SITA–NVIDIA joint engineering program from the NVIDIA Bangalore Center of Excellence. Focused on reliability engineering, CI/CD automation, cloud migration, and distributed systems.
Automated OS security patching pipelines, improving security posture and reducing manual toil.
Led post-incident reviews, RCAs, and created a standardized “After-Incident Report” framework for org-wide learning.
Supported on-call operations: incident triage, communication, mitigation, and customer-impact reduction during major outages.
Contributed to reliability roadmap definition, cloud-native architecture, CI/CD improvements, and containerisation initiatives (Docker/Kubernetes).
Collaborated with SITA and NVIDIA teams on large-scale data center operations, automation, and reliability engineering.

CI/CD AutomationCloud MigrationDistributed SystemsIncident TriagePost-Incident ReviewsSite Reliability Engineering (SRE)+1

Infosys

Technology Analyst | SRE

Dec 2018 – Oct 2021 · 2 yrs 10 mos · Chandigarh Area, India

Worked at the NVIDIA–Infosys Center of Excellence, owning end-to-end delivery of critical product subcomponents—including requirements, architecture, design, implementation, testing, and ongoing maintenance. Designed and shipped new features with a strong focus on performance, security, scalability, and maintainability.
Drove large-scale DevOps, Cloud, and Containerization adoption across teams, leading the migration and optimization of production environments. Built and automated CI/CD pipelines and delivery workflows using Jenkins, Bitbucket, Maven, Gradle, SonarQube, Selenium, JUnit, and modern Git-based workflows.
Implemented and managed cloud-native infrastructure leveraging Kubernetes, Docker, OpenShift, AWS, and virtualization technologies. Streamlined development and deployment processes through strategic CI/CD scripting, environment automation, monitoring, and best-practice governance.
Highly passionate about DevOps, Cloud Engineering, AI-driven automation, distributed systems, and container orchestration, continuously bringing imagination, innovation, and reliability-thinking into engineering workflows.

DevOpsCloud EngineeringContainerizationCI/CD PipelinesMonitoring