Anusha G. — SRE (Site Reliability Engineer)
Experienced in operations, site reliability engineering (SRE) and observability, I specialize in ensuring system stability, availability, automation and security. Project Management – Independently driving projects within SLAs, ensuring efficiency and reliability Disaster Recovery (DR) Testing – Planning and executing DR strategies to ensure business continuity. Automation & Scripting – Proficient in Unix/Linux shell scripting and Python for automating operational tasks. Implemented SRE practices like error budgets, blameless postmortems, and observability-first development. Improved MTTD and MTTR through smart alerting, service maps, and RCA-focused dashboards. Built full-stack observability using Datadog, Prometheus, and Grafana in cloud-native environments. Experience with Jenkins automation workflows. Experience with Linux administration, IBM MQ, Sybase databases, and SQL. Incident Management – Quick response and resolution of critical incidents to minimize downtime. Setup and build AWS infrastructure resources VPC, EC2, S3, IAM, EBS, Security Group, Auto Scaling and RDS in Terraform. Setup and build AWS Elastic load balancer, Elastic Beanstalk, CloudWatch, AMI, SNS, RDS, Route 53, Auto scaling, CloudTrail, IAM, Route 53, DynamoDB, EC2 Container Service, Lambda, Security Groups. Write Lambda functions in python for AWS's Lambda which invokes python scripts to perform various transformations and analytics on large data sets in clusters. Setting up installation, configuration, and image creation of docker containers, and orchestration using Kubernetes. Manage Kubernetes charts using Helm charts. Can create reproducible builds of the Kubernetes applications, Knowledge on managing Kubernetes manifest files and releases of Helm packages. Contributed to Capacity Management initiatives by tracking usage patterns and supporting planning teams with application performance data. Provided 24x7 on-call supports in debugging and fixing issues related to Linux, Solaris Installation/Maintenance of Software in Production, Development & Test Environment as an integral part of the Unix/Linux (RHEL/Windows/SOLARIS) Support team. Cloud & Certification – AWS Certified SysOps Administrator.
Stackforce AI infers this person is a Site Reliability Engineer with expertise in cloud infrastructure and automation in SaaS environments.
Location: Lutz, Florida, United States
Experience: 7 yrs 5 mos
Skills
- Site Reliability Engineering
- Amazon Web Services (aws)
- Kubernetes
Career Highlights
- Expert in Site Reliability Engineering and cloud infrastructure.
- Proficient in automation and disaster recovery strategies.
- Strong experience in Kubernetes and full-stack observability.
Work Experience
Broadridge India
Technology Lead (1 yr 7 mos)
Broadridge
Site Reliability Engineer (1 yr)
Site Reliability Engineer (5 yrs 10 mos)
Education
Bachelor's degree at Jawaharlal Nehru Technological University