A

Ashwin Balakrishnan

SRE (Site Reliability Engineer)

Mumbai, Maharashtra, India14 yrs 3 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Expert in automation and incident management.
  • Proven track record in cloud-native environments.
  • Skilled in optimizing system reliability and performance.
Stackforce AI infers this person is a Site Reliability Engineer with expertise in cloud infrastructure and automation.

Contact

Skills

Core Skills

Incident ManagementAutomationCloud MigrationMonitoring

Other Skills

TerraformLinuxTroubleshootingPython (Programming Language)KubernetesPuppet (Software)Amazon Web Services (AWS)Apache MesosVMware InfrastructureLinux System AdministrationApacheUnixRubyService Deliverylogstash

About

Experienced Engineer with a focus on optimizing system reliability, scalability, and performance through automation, monitoring, and incident management. Skilled in building and maintaining resilient infrastructure, improving deployment pipelines, and collaborating cross-functionally to ensure high availability and operational efficiency. Proven track record of reducing downtime, enhancing service performance, and implementing best practices for scalability and security in cloud-native environments.

Experience

14 yrs 3 mos
Total Experience
2 yrs
Average Tenure
3 yrs 5 mos
Current Experience

Yugabyte

2 roles

Staff Site Reliability Engineer

Promoted

Mar 2025Present · 1 yr 1 mo · Remote

Senior Site Reliability Engineer

Oct 2022Feb 2025 · 2 yrs 4 mos · Remote

  • Responsible for managing and coordinating all aspects of an incident response.
  • Work closely with the engineering team to ensure system reliability, performance, and scalability on a project basis.
  • Developed and implemented automation solutions to enhance system reliability, streamline operations, and reduce manual intervention for efficient incident response and deployment processes.
TerraformLinuxTroubleshootingPython (Programming Language)KubernetesIncident Management+1

Adobe

Senior Site Reliability Engineer

Jul 2021Oct 2022 · 1 yr 3 mos

  • Work with the engineers to move out the services from the data center to AWS/K8s and make sure appropriate alerting is in place.
  • Deploy infrastructure/services using terraform and deploybot. (AWS and K8s)
  • Documenting upgrades and software maintenance projects to build an accessible record for future requirements.
  • Proven success in coding automation using python for Linux Debian servers and leveraging Python to program different tools.
Puppet (Software)TerraformLinuxTroubleshootingAmazon Web Services (AWS)Python (Programming Language)+3

Opentable

2 roles

Senior Site Reliability Engineer

Feb 2019Jul 2021 · 2 yrs 5 mos

LinuxTroubleshootingPython (Programming Language)

Site Reliability Engineer

Feb 2017Jan 2019 · 1 yr 11 mos

  • Leverage automation tools, especially Puppet and Vester, in order to decrease end-to-end deployment times, reduce downtime, and increase reliability.
  • Act as top-tier on-call support for critical uptime business applications to maintain availability and minimize downtime during outage scenarios.
  • Implement and maintain monitoring solutions at the server and application level in order to increase visibility into day-to-day operations and issues, utilizing Sensu, Graphite.
  • Deploy and maintain international server environment for 24/7 critical uptime business product offering in a mixed Windows/Linux environment utilizing Foreman, Vmware.
  • Manage the infrastructure by creating in-house tool, automate stuff, research different technologies which may help in productivity.
  • Collect and maintain a complete inventory of all systems. Identify and retire unused systems to recycle resources and reduce maintenance costs.
  • Contribute to the development pipeline in ChatBot through Errbot.
  • Provide training for Colleagues Engineers, including brown-bag style training, documentation, and one-on-one mentorship.
LinuxTroubleshootingPython (Programming Language)Apache MesosVMware InfrastructureAutomation+1

Informatica

Cloud Operations Engineer

Jul 2016Jan 2017 · 6 mos · Bangalore

  • Manage the operations part of R&D Informatica Cloud by automating most of the deployment code.
  • Co-ordinate with developers to spawn instances and work on various AWS products.
  • Create monitoring zones for service related alerts and implement centralized logging structure.
  • Work/Explore new technologies in order to experiment/improvise the current infrastructure.
LinuxTroubleshooting

Directi

System Administrator

May 2014Jul 2016 · 2 yrs 2 mos · Andheri East

  • Monitoring the stability of servers using tools like Icinga, Ganglia and other internal tools.
  • Automation and implementation of permanent resolutions to prevent outages / downtimes.
  • Script and code tools for automation and efficient management of sites/products.
  • Handle incident response, troubleshooting and fix for various product/services.
  • Puppet configuration management.
  • Managing products using Linux and Linux application stacks.
LinuxTroubleshooting

Thomson reuters

Operations Engineer

May 2013May 2014 · 1 yr · Goregaon East

  • Configuration, Monitoring, Debugging on Linux Applications.
  • Configure apps related to stock market for our clients.
  • Take care of the servers at data centers. {NSE and BSE}.
  • Played a major role on FPGA project.
LinuxTroubleshooting

Convergys

Sr. Technical officer

Sep 2011Apr 2013 · 1 yr 7 mos · Thane West

  • Worked for Optus Telecommunication {ISP in Australia}.
  • Co-ordinate with business clients in order to configure applications and troubleshoot internet connectivity problems.

Education

Ratnam College

Bachelor's Degree — Information Technology

Jan 2009Jan 2012

RADAV Collge

HSC — Computer Science

Jan 2007Jan 2009

B.P.E.S High School

SSC — Junior High/Intermediate/Middle School Education and Teaching

Jan 2000Jan 2007

Stackforce found 100+ more professionals with Incident Management & Automation

Explore similar profiles based on matching skills and experience