Sakthivel Gopi

SRE (Site Reliability Engineer)

Chennai, Tamil Nadu, India4 yrs 7 mos experience
Most Likely To Switch

Key Highlights

  • Expert in automating cloud infrastructure and CI/CD processes.
  • Proficient in observability tools, enhancing system reliability.
  • Strong problem-solving skills with a focus on performance optimization.
Stackforce AI infers this person is a Site Reliability Engineer with expertise in cloud infrastructure and observability in SaaS environments.

Contact

Skills

Core Skills

Site Reliability EngineeringObservabilityDevopsInfrastructure As Code

Other Skills

API DevelopmentAPIsAmazon CloudWatchAngularJSAnsibleApache AirflowApache KafkaAutomationBashCascading Style Sheets (CSS)Change ManagementClickHouseClickhouseCloud InfrastructureCloudflare

About

Sakthivel Gopi has always been a great problem solver, an independent extrovert. He sees every problem as an opportunity to enhance himself. Versatile Engineer in maintaining the reliability of systems, automating cloud infrastructure and optimising mission control deployments over large infrastructure. Proficient with cloud platforms and core devops tools. He is always adding new skills to his repertoire and he is also eager to meet other software engineers in the area, so feel free to connect! Area of Experiences: • DevOps - CI/CD, Jenkins, Gitlab, GitActions, Ansible, AWS, GCP, Terraform, Vault • Scripting - Bash, Python • Database Administration - MySQL, PostgreSQL, MongoDB, Clickhouse • Observability - Grafana, Prometheus, Loki, Datadog, ELK Stack • Reliability Engineering - i) Incident Management - Incident Response, Debugging, Troubleshooting, RCA, Documentation ii) Server Maintenance iii) Release Management iv) Change Management

Experience

4 yrs 7 mos
Total Experience
1 yr 1 mo
Average Tenure
1 yr 11 mos
Current Experience

Angel one

SRE- Dev

Jul 2024Present · 1 yr 11 mos · Bengaluru · Remote

  • Designed, developed and configured modern day observability solutions for complex systems to identify anomalies and proactively ensuring system reliability and performance.
  • Built internal tools to streamline operations, eliminate manual tasks, automated workflows to improve reliability and enhanced team productivity by tuning processes through scalable tooling.
  • Manage and monitor scalable Kafka based data pipelines for realtime data streaming, maintain scalable Kafka clusters, optimize performance, and ensure high availability and fault tolerance of event-driven architectures.
  • Clickstack - designed and developed for log management and analytics platform using open-telemetry exporter which injects data to kafka and sends clickhouse DB using kafka receivers.
  • APIs developed and integrated with various observability automations.
  • Multilingual Database Setup and Querying especially OLAP databases
  • Apache Airflow DAGs design and developed for various automation requirements in scheduled serial executions
  • Automations using n8n Workflows setup (low code automation platform)
  • HashiCorp Vault setup, operations using terraform for secrets management and ACL
KafkaObservabilityClickhouseAPIsHashiCorp VaultTerraform+1

Falabella india

Site Reliability Engineer-DevOps Core

Jul 2023Jul 2024 · 1 yr · Bengaluru, Karnataka, India · Remote

  • Owner of Content Delivery Network(CDN), Performance and security - Cloudflare
  • Infrastructure as Code for GCP services - Terraform
  • Ad-hoc tasks automation using Python
  • Owner of observability - Datadog, Prometheus, Grafana, Loki
  • Automated servers configurations using Ansible playbooks
  • CI/CD - Gitlab, Jenkins, fixing pipelines and job issues
TerraformPythonDatadogPrometheusGrafanaAnsible+2

Bankbazaar

Site Reliability Engineer

Feb 2022Apr 2023 · 1 yr 2 mos · Bengaluru, Karnataka, India · Hybrid

  • Automated adhoc tasks using bash, cloud infra provisioning using terraform, configuration management using Ansible.
  • Led incident management for production, business, and application incidents as the primary point of contact, ensuring minimal downtime. Enhanced partner API stability, reducing ticket frequency by identifying root causes, analyzing older logs from S3, and implementing continuous monitoring.
  • Proficient in observability tools like Prometheus, Grafana, Loki, ELK stack and Zabbix. Successfully reduced downtime by 50% through alerts onboarding, threshold modifications and continuous monitoring.
  • Developed diverse Kibana dashboards for microservice and API monitoring, utilising ELK stack for log management to enhance observability. Proactively detected and resolved issues, minimising downtime through timely communication and resolution.
  • Managed comprehensive server maintenance, including repairs and periodic checks for runtime, backups, vacuuming, and restarts.
  • Experience in production support activities like reload, debug, troubleshooting API, K8s, Servers and Infra issues in 16x7 work model.
BashTerraformAnsiblePrometheusGrafanaELK stack+2

Cognizant

2 roles

Programmer Analyst Trainee

Jul 2021Jan 2022 · 6 mos · Chennai, Tamil Nadu, India · Remote

AWS DevOps Intern

Jan 2021Jun 2021 · 5 mos · Chennai, Tamil Nadu, India · Remote

Smart india hackathon

Grand Finalist

Jun 2020Aug 2020 · 2 mos · Chennai, India

  • We developed an hybrid application which acts as virtual tourist guide. My responsibility was front end and UI/UX. It was such a pleasant and great learning experience not limited to technical exposure but also given great experience as working cooperatively.

Education

Anna University, Chennai

Bachelor of Engineering — Computer Science and Engineering

Stackforce found 100+ more professionals with Site Reliability Engineering & Observability

Explore similar profiles based on matching skills and experience