Sakthivel Gopi

SRE (Site Reliability Engineer)

Chennai, Tamil Nadu, India4 yrs 7 mos experience

Most Likely To Switch

Key Highlights

Expert in automating cloud infrastructure and CI/CD processes.
Proficient in observability tools, enhancing system reliability.
Strong problem-solving skills with a focus on performance optimization.

Stackforce AI infers this person is a Site Reliability Engineer with expertise in cloud infrastructure and observability in SaaS environments.

Contact

sakthivelgopi0063@gmail.com LinkedIn

Skills

Core Skills

Site Reliability EngineeringObservabilityDevopsInfrastructure As Code

Other Skills

API DevelopmentAPIsAmazon CloudWatchAngularJSAnsibleApache AirflowApache KafkaAutomationBashCascading Style Sheets (CSS)Change ManagementClickHouseClickhouseCloud InfrastructureCloudflare

About

Sakthivel Gopi has always been a great problem solver, an independent extrovert. He sees every problem as an opportunity to enhance himself. Versatile Engineer in maintaining the reliability of systems, automating cloud infrastructure and optimising mission control deployments over large infrastructure. Proficient with cloud platforms and core devops tools. He is always adding new skills to his repertoire and he is also eager to meet other software engineers in the area, so feel free to connect! Area of Experiences: • DevOps - CI/CD, Jenkins, Gitlab, GitActions, Ansible, AWS, GCP, Terraform, Vault • Scripting - Bash, Python • Database Administration - MySQL, PostgreSQL, MongoDB, Clickhouse • Observability - Grafana, Prometheus, Loki, Datadog, ELK Stack • Reliability Engineering - i) Incident Management - Incident Response, Debugging, Troubleshooting, RCA, Documentation ii) Server Maintenance iii) Release Management iv) Change Management

Experience

4 yrs 7 mos

Total Experience

1 yr 1 mo

Average Tenure

1 yr 11 mos

Current Experience

Angel one

SRE- Dev

Jul 2024 – Present · 1 yr 11 mos · Bengaluru · Remote

Designed, developed and configured modern day observability solutions for complex systems to identify anomalies and proactively ensuring system reliability and performance.
Built internal tools to streamline operations, eliminate manual tasks, automated workflows to improve reliability and enhanced team productivity by tuning processes through scalable tooling.
Manage and monitor scalable Kafka based data pipelines for realtime data streaming, maintain scalable Kafka clusters, optimize performance, and ensure high availability and fault tolerance of event-driven architectures.
Clickstack - designed and developed for log management and analytics platform using open-telemetry exporter which injects data to kafka and sends clickhouse DB using kafka receivers.
APIs developed and integrated with various observability automations.
Multilingual Database Setup and Querying especially OLAP databases
Apache Airflow DAGs design and developed for various automation requirements in scheduled serial executions
Automations using n8n Workflows setup (low code automation platform)
HashiCorp Vault setup, operations using terraform for secrets management and ACL

KafkaObservabilityClickhouseAPIsHashiCorp VaultTerraform+1

Falabella india

Site Reliability Engineer-DevOps Core

Jul 2023 – Jul 2024 · 1 yr · Bengaluru, Karnataka, India · Remote

Owner of Content Delivery Network(CDN), Performance and security - Cloudflare
Infrastructure as Code for GCP services - Terraform
Ad-hoc tasks automation using Python
Owner of observability - Datadog, Prometheus, Grafana, Loki
Automated servers configurations using Ansible playbooks
CI/CD - Gitlab, Jenkins, fixing pipelines and job issues

TerraformPythonDatadogPrometheusGrafanaAnsible+2

Bankbazaar

Site Reliability Engineer

Feb 2022 – Apr 2023 · 1 yr 2 mos · Bengaluru, Karnataka, India · Hybrid

Automated adhoc tasks using bash, cloud infra provisioning using terraform, configuration management using Ansible.
Led incident management for production, business, and application incidents as the primary point of contact, ensuring minimal downtime. Enhanced partner API stability, reducing ticket frequency by identifying root causes, analyzing older logs from S3, and implementing continuous monitoring.
Proficient in observability tools like Prometheus, Grafana, Loki, ELK stack and Zabbix. Successfully reduced downtime by 50% through alerts onboarding, threshold modifications and continuous monitoring.
Developed diverse Kibana dashboards for microservice and API monitoring, utilising ELK stack for log management to enhance observability. Proactively detected and resolved issues, minimising downtime through timely communication and resolution.
Managed comprehensive server maintenance, including repairs and periodic checks for runtime, backups, vacuuming, and restarts.
Experience in production support activities like reload, debug, troubleshooting API, K8s, Servers and Infra issues in 16x7 work model.

BashTerraformAnsiblePrometheusGrafanaELK stack+2

Cognizant

2 roles

Programmer Analyst Trainee

Jul 2021 – Jan 2022 · 6 mos · Chennai, Tamil Nadu, India · Remote

AWS DevOps Intern

Jan 2021 – Jun 2021 · 5 mos · Chennai, Tamil Nadu, India · Remote

Smart india hackathon

Grand Finalist

Jun 2020 – Aug 2020 · 2 mos · Chennai, India

We developed an hybrid application which acts as virtual tourist guide. My responsibility was front end and UI/UX. It was such a pleasant and great learning experience not limited to technical exposure but also given great experience as working cooperatively.