Manivannan G

DevOps Engineer

Bengaluru, Karnataka, India14 yrs 7 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

Built and led high-performing DevOps teams.
Expert in cloud infrastructure and automation.
Achieved 99.99% uptime in production environments.

Stackforce AI infers this person is a DevOps leader in Fintech and SaaS industries, specializing in cloud infrastructure and automation.

Contact

Skills

Core Skills

DevopsCloud OperationsSite Reliability Engineering

Other Skills

AWSAWS EKSAlertingAmazon EKSAnsibleAutomationBashCI/CDChefCloud ComputingCollectDCommunicationComputer NetworkingContinuous Integration (CI)Core Java

About

Hands-on DevOps leader with 14 yrs of SRE/DevOps experience, including 5 yrs as an Engineering Manager. At Qubole & Moveworks, built SRE/DevOps teams from the ground up, coached, and mentored engineers varying from Intern to Staff level. Planning and building cost-optimized infrastructure for distributed systems, without affecting performance at a large scale (both monolith and microservice environments). Automation with shell/python scripts. Designing and implementing monitoring & alerting frameworks with Prometheus, Grafana, Chef/Ansible, Kubernetes. Proficient in AWS, Kubernetes, Linux, Python, Terraform. Intermediate in GCP. Striving for automation, scalability, and reliability of the production platform with 99.99% uptime.

Experience

14 yrs 7 mos

Total Experience

2 yrs 5 mos

Average Tenure

4 yrs 10 mos

Current Experience

Moveworks

2 roles

Senior Engineering Manager - Devops

Promoted

Oct 2024 – Present · 1 yr 8 mos

Engineering Manager - DevOps

Aug 2021 – Nov 2024 · 3 yrs 3 mos

First DevOps hire in India; built a balanced team of 10.
Leading various verticals: Multi-region, CI/CD optimisation, Developer productivity/experience initiatives, Cost optimisation, FedRAMP, reliability and fault-tolerant.

Recko | a stripe company

Engineering Manager - DevOps

Jul 2020 – Aug 2021 · 1 yr 1 mo · Bengaluru, Karnataka, India

Led the DevOps and SRE team at Recko, a Fin-Tech startup, we handle large volume of financial transactions data. Our team had built and manage the production platform which is performant, reliable, secure and compliant (PCI, SOC2). Migrated the monolith infrastructure to Kubernetes on AWS EKS. Contributed technically along with managerial.
Led the infra migration from monolith to microservices (Kubernetes on AWS EKS)
Planning, executing and optimising:
Platform reliability, Scaling, Monitoring, Logging, Tracing, Deployment, Infrastructure as code.
Always thrive to identify repeatable/error prone patterns and automate them
Enabled seamless deployment and config management with CICD and GitOps (ArgoCD)
Helped dev team with containerising apps: review reliability, scalability and security aspects
Drive initiatives to improve: cost reduction, infra security, developer productivity
Standardised and streamlined the operational processes/tools for change management
Define SLO/SLI, measure, monitor and review with stakeholders
Review oncall duties and alerts/incidents; reduce the toil and RCA for incidents
Hiring, coaching the DevOps engineers in the team, help with technical guidance

KubernetesAWS EKSCI/CDGitOpsMonitoringLogging+4

Qubole

2 roles

Staff Site Reliability Engineer

Jun 2019 – Jul 2020 · 1 yr 1 mo

Joined as one of the founding SRE, built & lead the team.
Contributions:
Tech Lead for the SRE team, joined as one of the first SREs, built and grown the team of 5.
Building, automating and maintaining infrastructure for highly distributed environments in AWS across multiple regions.
Design monitoring and alerting solutions (CollectD, SignalFx, Prometheus, ELK, NewRelic)
Deployments with Kubernetes, Chef, Jenkins
Cloud cost optimization (AWS, GCP)
Tools Automation for various purpose (python, shell)
Cross-team collaboration with security, QA and dev teams
Troubleshooting in Linux systems and application issues
Disaster recovery plans and tests
Incident management
Also part of 16X7 oncall

AWSKubernetesMonitoringAlertingAutomationIncident Management+2

Senior Site Reliability Engineer

Aug 2017 – Jun 2019 · 1 yr 10 mos

Flipkart

Operations Engineer-III (DevOps)

Jul 2015 – Aug 2017 · 2 yrs 1 mo

DevOps in Flipkart Ads team
Had built a scalable production environment on the KVMs
Supporting tooling that allows the developers to build and deploy seamlessly with Ansible
Designed efficient monitoring and alerting solutions for monitoring critical JVM app & system metrics
Installed, Configured and Managed Storm, Aerospike(NoSQL), MySQL, OpenTSDB clusters with Ansible
Scripting and automation of mundane tasks to reduce manual intervention with Shell and Python scripts
Performance tuning of HAproxy(Load balancers) to scale high qps
Monitoring and ensuring system uptime and performance metrics for system and app metrics (using Nagios, CollectD, Graphite cluster, OpenTSDB)
As part of 16x7 oncall team, support Linux multi-tier infrastructure in production to ensure high availability availability along with DR plans to ensure business continuity
Managed small scale AWS cluster(EC2, S3), Route 53