Avi Nagpal

SRE (Site Reliability Engineer)

Hyderabad, Telangana, India13 yrs 4 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Expert in building scalable, resilient systems.
  • Extensive experience with AWS and Kubernetes.
  • Proficient in automation using Shell and Python.
Stackforce AI infers this person is a Cloud Infrastructure Engineer with strong DevOps capabilities.

Contact

Skills

Core Skills

Site Reliability EngineeringKubernetesDevopsAwsMonitoringNetwork ManagementAutomationReportingLinux AdministrationWeb Development

Other Skills

Aerospike DBAjaxAlicloudAnsibleApache TomcatContinuous IntegrationContinuous Integration and Continuous Delivery (CI/CD)CuratorDockerElastaalertElasticsearchFilebeatHTML 5IT Service DeliveryJenkins

About

As a Site Reliability Engineer, I specialize in building scalable, resilient systems and automating infrastructure operations. I’ve worked extensively with both AWS and Apple’s private cloud, deploying and managing Kubernetes clusters to support high-throughput applications handling 10K+ TPS. I bring hands-on experience across the SRE toolchain—automating with Shell and Python, managing deployments with Jenkins, Spinnaker (including Canary and Red/Black strategies), and configuring infrastructure with Ansible. I’ve also built end-to-end observability stacks using Prometheus, Grafana, and OpenTelemetry for tracing, with a strong focus on real-time alerting and diagnostics. From reverse proxies like NGINX to fine-tuning dashboards and system health metrics, I’ve worked across all layers to ensure availability, performance, and continuous improvement in production environments. Additionally, I’ve implemented intelligent auto-scaling strategies to dynamically scale services up or down based on real-time traffic patterns, optimizing both performance and cost.

Experience

13 yrs 4 mos
Total Experience
2 yrs 8 mos
Average Tenure
5 yrs 2 mos
Current Experience

Apple

Site Reliability Engineer

Apr 2021Present · 5 yrs 2 mos

KubernetesAWSPythonShell ScriptingAnsibleSite Reliability Engineering

Paytm

Senior Devops Engineer

Apr 2019Apr 2021 · 2 yrs · Noida Area, India

  • >Started working on monitoring tool in which deployed telegraf client across all vertical instances which are on AWS Cloud using dynamic Inventory.
  • >Handling and Managing Hawkeye application which is used to monitor and log API Errors.
Technologies used: Nginx, Nodejs , Filebeat, Elasticsearch, Kibana, Elastaalert & Curator.
  • >Taking care of APIPROXY application which is a proxy server used to manage internal and internet facing network and oath verification.
Technologies used: Nginx,Openresty,Lua scripts and Aerospike DB.
  • >Managing Infra for MINIAPPS on AWS and Alicloud.
  • >Migrating AWS Infra to Alicloud Platform.
  • >Developing CI/CD roadmap and implementing to the project.
  • >Play a significant role in establishing operational processes for a fast-growing distributed cloud platform.
  • >Wrote Shell script to filter AKAMI Logs and push the logs on Hawkeye Elaticstack Index.
  • >AWS and ALICLOUD services administration.
  • >VCS: Bitbucket
  • >Automation tools: Ansible, Terraform.
  • >Scripting: Shell, Lua, Python
  • >Jira
  • >Improve automated test and simulation frameworks.
NginxNodejsFilebeatElasticsearchKibanaElastaalert+9

Ericsson

Senior Automation Engineer

Mar 2016Apr 2019 · 3 yrs 1 mo · Noida Area, India

  • › Working on most of the Devops technologies like AWS, Docker, Jenkins, Ansible to deploy in-house applications and setting up build Jobs.
  • › Worked on automation for administrator team for automating Audit dashboards and to generate security checks on daily a basis using shell scripting and MySQL database.
  • › Generated advanced reporting solution such as CIS Monitoring Report Dashboard using Python graph module and fetching reports with the help of Jenkins Job.
  • › Developed ADM in-house tool which is used to generate Audit reports on all nodes which are SSH based and checks can be self-managed by end user, using expect, shell scripting and MySQL.
  • › Building containerised applications and deploying using Docker.
  • › Auto node backup procedure implemented using Expect and Shell scripting.
  • › On Demand root password request module (for Linux/Solaris based nodes) with a limited time window as per requirement, Complete ticketing and Process handled, using expect, shell scripting and MySQL.
  • › Implementing security features on nodes using Ansible playbooks with Jenkins job setup.
  • › Handling LVM and storage cleanup on nodes using Ansible Playbooks.
  • › Till now saved around 10 FTE using the automation work done.
  • › Provided User Access Management system for all nodes which are SSH enabled.
  • › Delivered advanced reporting solution such as Audit Report Dashboard using Shell Scripting and Python.
  • › Proactively analysing requirement documents and implementing them as per client's requirements.
  • › Managed 1500+ nodes using this tool.
  • › Upgraded in-house tools with Key based authentication in place of password based authentication.
  • › Automation of manual support tasks using UNIX shell & Database scripts to improve system availability & avoid recurring issues.
  • › Handled AMS new generation development and developed shell scripts where ever required for the new version of AMS tool.
  • › Enhancement of security features in Access Management System, applications deployed on AWS Cloud .
AWSDockerJenkinsAnsibleShell ScriptingMySQL+3

Hcl technologies

Senior Linux Administrator

Nov 2013Mar 2016 · 2 yrs 4 mos · Noida Area, India

  • Handling 1500+ Redhat Linux, Solaris, OEL and VMware virtual machines
  • Monitoring and management of servers
  • Managing LVMs and Storage Luns
  • Coordination with vendor for any hardware related issue on servers.
  • Coordinating with apps & database teams.
  • RHCE certification on RHEL 7
  • Firmware upgrades, User Management, File system Management, provide RCA.
  • RSCD, Patrol Agent setup and issues
  • Lead of automation team
  • created multiple shell scripts.
  • Automated the L1 tasks of lean IT Team
  • Created script for report gathering and many other time consuming tasks
Redhat LinuxSolarisVMwareShell ScriptingLinux Administration

Niit technologies limited

Web Developer

Jan 2013Oct 2013 · 9 mos · Bikaner Area, India

  • Worked for a website project named www.shreesellers.com
  • Skills used: PHP , MySql, Apache Tomcat, HTML 5, Ajax
PHPMySQLApache TomcatHTML 5AjaxWeb Development

Alfait ltd.

J2EE Developer

May 2011Jul 2011 · 2 mos · Chandigarh Area, India

Netmax technologies

CCNA Project Trainee

May 2010Jul 2010 · 2 mos · Chandigarh Area, India

Education

chitkara university

Bachelor of Engineering (B.E.) — Information Technology

Jan 2009Jan 2013

RSV

Jan 2004Jan 2008

Stackforce found 100+ more professionals with Site Reliability Engineering & Kubernetes

Explore similar profiles based on matching skills and experience