Nisha MG

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India3 yrs 9 mos experience

Key Highlights

4+ years of experience in AWS and DevOps
Expertise in automation and infrastructure management
Proficient in incident response and troubleshooting

Stackforce AI infers this person is a DevOps Engineer specializing in cloud infrastructure and automation.

Contact

Skills

Core Skills

AwsAutomation

Other Skills

DockerDocumentationGit BASHGroovyIncident ManagementJenkinsLinuxMonitoringPagerDutyPython (Programming Language)TerraformTroubleshooting

Experience

3 yrs 9 mos

Total Experience

3 yrs 9 mos

Average Tenure

3 yrs 9 mos

Current Experience

Agilon health

SRE

Aug 2022 – Present · 3 yrs 9 mos · Bengaluru, Karnataka, India

As a SRE, my primary responsibility is to ensure the reliable and efficient operation of software systems and infrastructure. I focus on improving the overall reliability, performance, and scalability of the applications and services as below
In System and Service Reliability: I work to ensure the availability, latency, performance, efficiency, change management, and monitoring of critical systems and services.
Under Automation and Tooling: I develop and maintain automation tools and scripts to streamline operational processes and infrastructure management.
At Infrastructure and Architecture: I collaborate with cross-functional teams, including software developers, network engineers to maintain scalable and reliable infrastructure.
Majorly Incident Response and Troubleshooting: I actively participate in incident response and troubleshooting activities, working closely with development teams to identify root causes, implement fixes, and prevent recurrence of incidents. I adhere to incident management best practices, conduct post-incident reviews, and implement remediation actions.
Security and Compliance: I collaborate with security teams to ensure that systems and applications are secure and compliant with industry standards and regulations.
On-call Support and Incident Management: I participate in an on-call rotation to respond to critical incidents and perform troubleshooting outside regular working hours. I follow incident management processes, maintain incident documentation, and contribute to incident post-mortems.
Documentation and Knowledge Sharing: I maintain comprehensive documentation, including runbooks, operational procedures, and system architecture diagrams. I actively participate in knowledge sharing activities to transfer knowledge and ensure the team's collective understanding of systems and processes.