Mansi Gupta

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India10 yrs 10 mos experience
Highly StableAI Enabled

Key Highlights

  • Expert in building resilient cloud infrastructures.
  • Proficient in SRE practices and DevOps methodologies.
  • Strong background in AI-driven automation solutions.
Stackforce AI infers this person is a Cloud Infrastructure and DevOps expert specializing in SaaS and large-scale systems.

Contact

Skills

Core Skills

DevopsAwsAzureSreAutomationLinux Administration

Other Skills

PythonKubernetesAmazon Web Services (AWS)Prometheus.ioLarge Scale SystemsServer ArchitectureArtificial Intelligence (AI)SLI/SLOerror budgetingobservabilitymonitoringscalabilitycontainerization architecturesdistributed systems infrastructureTerraform

About

Senior Site Reliability & DevOps Engineer driving highly available, scalable, and optimized cloud and SaaS platforms. Experienced in Microservices, Kubernetes, AI, Automation (Ansible/Python), Containerization, Observability across Linux, Unix, AWS, and Azure environments. Passionate about building resilient infrastructures, improving reliability with SLI/SLO/error budgeting, and delivering seamless operational excellence.

Experience

10 yrs 10 mos
Total Experience
2 yrs
Average Tenure
6 mos
Current Experience

Apple

Senior Site Reliability Engineer

Nov 2025Present · 6 mos · Bengaluru · On-site

PythonKubernetesAmazon Web Services (AWS)Prometheus.ioLarge Scale SystemsServer Architecture+3

Microsoft

Azure Site Reliability Engineer

Jun 2022Nov 2025 · 3 yrs 5 mos · Hyderabad · Hybrid

  • Driving SLI/SLO, error budgeting, observability, monitoring, and scalability efforts for the Compute platform.
  • Experienced in designing and deploying containerization architectures and distributed systems infrastructure.
  • Developing tools for data analysis and performance profiling, along with development using Terraform and configuration management tools.
  • Designing resilient architectures aligned with Azure zone resilience goals.
  • Leading cost optimization initiatives across compute services.
  • Leveraging AI to build robust and intelligent DevOps solutions in Azure.
  • Working with both Windows and Linux operating systems, including kernel internals.
SLI/SLOerror budgetingobservabilitymonitoringscalabilitycontainerization architectures+7

Walmart global tech india

Site Reliability Engineer lll

Mar 2021Jun 2022 · 1 yr 3 mos · Bengaluru, Karnataka, India · Remote

  • Working for the Walmart Catalog Platform, managing and optimizing an ecosystem of 600+ applications.
  • Driving cost optimization initiatives across large-scale cloud infrastructure.
  • Ensuring platform scalability, availability, and performance through proactive monitoring and automation.
  • Implementing CI/CD pipelines for faster and reliable deployments.
  • Enhancing operational efficiency using automation tools like Ansible, Python, and Shell scripting.
  • Collaborating with cross-functional teams to identify and resolve production issues swiftly.
  • Leveraging observability tools for end-to-end visibility and continuous improvement in reliability metrics (SLI/SLO/Error Budgeting).
  • Contributing to architecture reviews, infrastructure upgrades, and capacity planning to support platform growth.
cost optimizationplatform scalabilityavailabilityperformance monitoringautomationCI/CD pipelines+6

Vmware

Site Reliability Engineer II

Aug 2020Mar 2021 · 7 mos · Bengaluru, Karnataka, India · Remote

  • Developing and managing CI/CD pipelines to ensure seamless deployment processes of VMware Skyline, a SaaS-based product.
  • Defining and maintaining Error Budgets, SLIs, and SLOs to improve service reliability.
  • Reviewing infrastructure capacity and validating go-live prerequisites.
  • Automating workflows and configurations using Python and Ansible.
  • Troubleshooting and resolving critical production issues during on-call rotations.
  • Ensuring high availability of services through proactive monitoring using Wavefront, Grafana, Catchpoint, and Sentry.
CI/CD pipelinesError BudgetsSLIsSLOsPythonAnsible+3

Adobe

Site Reliability Engineer

Nov 2017Aug 2020 · 2 yrs 9 mos · Noida · On-site

  • Architecting and deploying scalable AWS infrastructure.
  • Managing hundreds of Linux instances hosting Adobe Experience Manager (AEM), Adobe’s enterprise Java-based CMS platform.
  • Handling daily server administration tasks and incident management through ServiceNow (ITSM).
  • Administering and maintaining a multi-tenant CI/CD and Docker platform while providing best-practice guidance to customers.
  • Automating recurring operational tasks using Ansible and Shell scripting.
  • Implementing AWS infrastructure cost optimization and performance enhancement strategies.
  • Monitoring infrastructure and applications via AWS CloudWatch, Nagios, New Relic, Splunk, and custom monitoring scripts.
AWS infrastructureLinux administrationCI/CDDockerAnsibleShell scripting+3

Dxc technology

Linux System Engineer

Jul 2015Nov 2017 · 2 yrs 4 mos · Noida · On-site

  • Handling User and Process Management across multiple Linux and Unix environments.
  • Managing basic and advanced partitions, SELinux, ACLs, NFS, SAMBA, Apache, DNS, LVMs, and multipathing.
  • Experienced in kernel initialization, GRUB loader configuration, and system security hardening.
  • Automating configuration and deployments using Ansible.
  • Writing efficient Shell and Perl scripts to streamline system operations.
  • Skilled in compiling and maintaining utilities such as OpenSSH, Sudo, Perl, Rsync, Syslog-ng, and Lsof on AIX, RHEL, Ubuntu, SUSE, and Solaris platforms.
  • Providing 24/7 production support to clients with a focus on system stability, performance, and reliability.
User ManagementProcess ManagementLinuxUnixSELinuxNFS+6

Education

Krishna Institute of Engineering & Technology

Bachelor of Technology - BTech

Jul 2011Jun 2015

Stackforce found 100+ more professionals with Devops & Aws

Explore similar profiles based on matching skills and experience