Anmol S.

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India4 yrs 3 mos experience

Key Highlights

Expert in Site Reliability Engineering with AWS.
Proficient in Python and CI/CD automation.
Strong background in monitoring and observability tools.

Stackforce AI infers this person is a Site Reliability Engineer with expertise in cloud infrastructure and automation in SaaS.

Contact

Skills

Core Skills

Site Reliability EngineeringAmazon Web Services (aws)DockerPythonDevops

Other Skills

Android SDKBootstrapC++CSSChefContainerizationContinuous Integration and Continuous Delivery (CI/CD)Data StructuresExpress.jsFlutterGitGitHubHTMLJIRAJMeter

About

Hi I am working as Site Reliability Engineer @ Zscaler , If you're hiring and want to discuss any relevant job opportunities,DM me or mail me the JD at anmolsahu99@gmail.com Skill Set -> Programming Language : Python and Shell Cloud Technologies : AWS EC2, S3, Route53, VPC, ECR —> AWS components like EC2, CloudFormation, RDS, S3, DynamoDB, SQS, Kinesis CI/CD : Jenkins Core Competency : CN , OS(Linux), troubleshooting Container & Orchestration : Docker , Kubernetes Monitoring and Observability: DataDog, New relic, CheckMk, Nagios, pingdom ( Metrics, Logs and Traces ) IAC : Terraform SRE Practice : SLI, SLO, SLA and 99.999 uptime related knowledge. Good problem solving in terms of coding in python and knowledge of System Design.

Experience

4 yrs 3 mos

Total Experience

1 yr 6 mos

Average Tenure

1 yr 2 mos

Current Experience

Zscaler

Senior Site Reliability Engineer

Mar 2025 – Present · 1 yr 2 mos · India · On-site

Zynga

Site Reliability Engineer

Aug 2022 – Mar 2025 · 2 yrs 7 mos · India · On-site

Participated in 24/7 rotational on-call, Followed Day-light model.
Troubleshoot production, stage server errors during the on-call. Mainly in Linux environment of CentOS, etc.
AWS Cloud maintenance for the partner games teams in order to make instances and node-pool reliable & robust.
Automated the recurring alerts with the Stack Storm using Python Scripting. Hence, mostly repetitive alerts where are handled by the Stack Storm itself.
Help and support partner game teams with AWS resources uptime during release such as modification and upgradation of resources like AWS EC2, S3, EBS, Route53, ASG, ELB and ECR.
Used Jenkins Pipeline to effectively replace bad node with the new one, replacing node of membase and Couchbase to keep Uptime.
Used Data Dog, CloudWatch & Grafana and Internal monitoring application for monitoring various internal
services based on various metrics.
Maintaining documentation, playbook and records of changes, troubleshooting steps and steps to accomplish
certain tasks.
Dedicated Projects:
Created a Dashboard for SRE team to coordinate with the USA/UK/BLR team and other partner teams to support and help them to achieve their respective goals.
Created Interactive and attractive UI with HTML, CSS, Bootstrap, JS to display the text and information. Used Node.js runtime environment and Express.js as the Backend framework to host the application. Also, Containerise the application with Docker.
Used Python Script to Automate the CI/CD process from pushing the code to GitHub repo to Docker and then finally AWS EC2 instance.
Integrated the PagerDuty API to fetch the details of SRE members and their on-call availability, alerts, contact details, etc. in-order to help partner teams to connect and coordinate with the SRE.
SRE dashboard helps to coordinate with internal SRE team from global location and external partner teams during events i.e., change of on-call shifts, during escalations, and some unexpected events.

PythonContainerizationPagerDutyJIRADockerStackStorm+4

Leucine

DevOps Engineer

Jan 2022 – Jul 2022 · 6 mos · Remote

>Deploy Application in different environment using python script over the QA and Staged servers.
>Created the CI/CD pipeline for various task i.e. Deployment, remove deployment, backups,
redeployment, etc. using Jenkins.
>Created CI/CD for test cases created by QA team for automation testing of the application and
maintained the allure reports on AWS S3 with the help of GitHub Actions.
> Exposure of AWS services i.e. EC2, S3, and Route53.
>Dockerise the application and its different components using Docker compose & Docker file
>Automate the daily tasks i.e. Backup, API monitoring, cronjobs, etc. using Shell Scripting
>Automate the task of downloading & manipulating file and logs using Python Scripting
>Created monitoring and alerts for the QA, Stage and UAT deployment using Newrelic

Shell ScriptingScriptingPython (Programming Language)PostgreSQLDockerDevOps+4

Cogno ai

Intern

Jul 2021 – Jan 2022 · 6 mos

(PPO was offered)

Codeholic

Intern

Oct 2020 – Jul 2021 · 9 mos

Flutter Developer for different projects. Integrated Video calling APIs in the project, Modification, and Customization of UI of App as per the requirement of the Client. Integrated different services from different API vendors like Google Maps API, Agora Video Calling, and Live Streaming API etc.