Gururaju B M — SRE (Site Reliability Engineer)

🌟 Passionate Site Reliability Engineer, Platform and Infrastructure Enthusiast 🌟 Hi there! I'm Guru, and I love turning complex infrastructure challenges into streamlined solutions. With over a decade of experience, I've become adept at architecting, automating, and optimizing deployments to ensure they run reliably and efficiently. 🔍 What I Do Best: K8s, Automation, Observability, Scalability, Reliability I'm passionate about boosting developer productivity and cutting down on repetitive tasks with smart automation. Here are some of the ways I've made an impact: - Streamlined Deployments: Created an automation framework using Cookiecutter, Python, and GitHub APIs, which reduced Kubernetes onboarding and deployment times by 75%. - Terraform Automation: Implemented Atlantis for Terraform pull request automation and customized it to fit our organization's change request process. Leading a complete revamp of Kubernetes environments, I've implemented solutions that enhance system reliability and stability, cutting downtime and boosting efficiency by 80%. One-click K8s production-ready cluster automation which helped increase deployment efficiency by more than 90%. I've contributed a lot to transforming infrastructure for efficiency and cost-effectiveness. I've led initiatives across organizations that cut expenses and improved performance by transitioning from costly observability tools to open-source alternatives, migrating from Elasticsearch to ScyllaDB to enhance performance and reduce costs, and optimizing infrastructure during economic downturns with real-time resource monitoring and strategic adjustments for significant savings. 💡 Skills: Cloud Platforms: AWS, GCP, Azure Containers & Orchestration: Kubernetes, Docker, Istio, Containerd Infrastructure as Code: Terraform, Terragrunt Automation & Scripting: Ansible, Puppet, Python, Go Monitoring & Observability: Prometheus, Victoria Metrics, Datadog, NewRelic, Grafana, Loki, Tempo, Open Telemetry, Cortex I thrive in dynamic environments where I can lead innovative projects that make a real impact. If you're looking for someone to help push the boundaries of what's possible with technology infrastructure, let's chat! 📧 Get in Touch: gururajubm95@gmail.com 📍 Based in: Bengaluru, India

Stackforce AI infers this person is a SaaS Infrastructure Engineer with expertise in automation and Kubernetes.

Location: Bengaluru, Karnataka, India

Experience: 10 yrs 8 mos

Skills

Kubernetes
Terraform
Automation
Monitoring

Career Highlights

Reduced Kubernetes deployment times by 75% through automation.
Achieved 80% efficiency boost in system reliability.
Migrated from costly observability tools to open-source alternatives.

Work Experience

Cisco

Technical Leader - Site Reliability Engineering @ Duo Security (1 yr 9 mos)

Duo Security

Technical Leader - Site Reliability Engineering (1 yr 9 mos)

nference

Staff Site Reliability Engineer (8 mos)

MoEngage Inc.

Lead - Site Reliability Engineering (2 yrs 3 mos)

WebEngage

Senior DevOps Engineer (1 yr 11 mos)

Zapr Media Labs

DevOps Engineer II (1 yr 6 mos)

VROOK - Transforming Learning

Full stack and DevOps (2 yrs 7 mos)

Education

Bachelor of Engineering (B.E.) at JSS Academy Of Technical Education Karnataka

11th and 12th at KLE Independent Pu College

High School at Sri Aurobindo Vidya Mandir

Gururaju B M

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India10 yrs 8 mos experience

Key Highlights

Reduced Kubernetes deployment times by 75% through automation.
Achieved 80% efficiency boost in system reliability.
Migrated from costly observability tools to open-source alternatives.

Stackforce AI infers this person is a SaaS Infrastructure Engineer with expertise in automation and Kubernetes.

Contact

Skills

Core Skills

KubernetesTerraformAutomationMonitoring

Other Skills

AJAXAWSAgile MethodologiesAmazon Web Services (AWS)AndroidAndroid DevelopmentAnsibleCC++CSSChefCloud ComputingCommunicationContainerizationDevOps

About

Experience

10 yrs 8 mos

Total Experience

1 yr 9 mos

Average Tenure

1 yr 9 mos

Current Experience

Cisco

Technical Leader - Site Reliability Engineering @ Duo Security

Sep 2024 – Present · 1 yr 9 mos · Bengaluru, Karnataka, India · Hybrid

Duo security

Technical Leader - Site Reliability Engineering

Sep 2024 – Present · 1 yr 9 mos · Bengaluru, Karnataka, India

Nference

Staff Site Reliability Engineer

Jan 2024 – Sep 2024 · 8 mos · Bengaluru, Karnataka, India

Moengage inc.

Lead - Site Reliability Engineering

Oct 2021 – Jan 2024 · 2 yrs 3 mos · Bengaluru, Karnataka, India

Orchestrated a comprehensive K8s revamp project encompassing Compute, Networking, Storage, Monitoring, and Logging layers; achieved enhanced reliability and stability across the entire K8s stack, minimizing downtime and improving system reliability.
Architected and deployed an innovative one-click production-ready K8s provisioning system, automating the setup of EKS clusters within a remarkable timeframe of 25 minutes.
Optimised deployment efficiency, reducing manual effort by 80% and accelerating project delivery.
Led the project of an automation framework leveraging Cookiecutter, Python, Github APIs, ArgoCD and Kustomize to streamline containerization, K8s manifest generation, and ArgoCD onboarding, resulting in zero developer effort and reducing deployment time and increasing software release velocity
Led the successful deployment of a high-capacity monitoring infrastructure leveraging Victoria Metrics (Prometheus-based), supporting the seamless processing and analysis of over 600k+ datapoints per minute on a single cluster, enhancing operational efficiency.

DockerTerraformAmazon Web Services (AWS)KubernetesContainerizationService Mesh

Webengage

Senior DevOps Engineer

Oct 2019 – Sep 2021 · 1 yr 11 mos · Bengaluru Area, India

Worked on setting up, performance tuning, monitoring and alerting of large scale sharded MongoDB clusters.
Architecting and moving the entire production workload to EKS (Kubernetes) from VMs. Worked on setting up CI/CD pipelines on Gitlab.
Optimised costs and increased performance of the application stack on K8s by -
Building applications on top of ARM64 architecture and running them on Graviton EC2 on AWS.
Onboarded applications to use Spot AWS machines on K8s to reduce costs with no Zero impact/downtime. One of the very early adopters of Spot machines on K8s.
Administration and management of large datasources like Kafka, Elasticsearch, MongoDB, MySQL and Redis.
Worked on cutting down the infrastructure cost by 30% while the market was down during COVID-19.
Moved the entire Infra monitoring stack from Datadog to Prometheus.
Have worked on setting up scalable reverse-proxy infrastructure using HAProxy and Nginx.
Moved our biggest datastore from Elasticsearch to ScyllaDB, which resulted in a massive cost reduction with much better query performance.
Routine security, system updates to entire infrastructure.

CommunicationTerraformKubernetesService Mesh

Zapr media labs

DevOps Engineer II

Mar 2018 – Sep 2019 · 1 yr 6 mos · Bengaluru Area, India

Expertise and production hands on experience with container orchestration tools like Kubernetes using EKS. Deployment through helm charts.
Built high throughput kafka and zookeeper cluster with 15+ nodes, wrote scripts to manage and monitor uptime of these clusters.
Worked on setting up highly scalable monitoring infra (Graphite and Prometheus) to handle high IO throughput. Implemented disaster recovery to backup/restore metrics.
Worked on setting up a customised on-premise Data Centre. Networking, bootstrapping applications, orchestrating and blue green deployment.
Worked with NoSQL databases like DynamoDB / Aerospike / MongoDB clusters which includes Setting up, Performance optimization, Monitoring and Alerting.
Worked on setting up log-aggregation system with Graylog and Fluentd.
Extensively worked on cutting down AWS costs by setting up right observability tools to track costs and optimise resource usage. Ex: unused resources, untagged resources, unused volumes, reservation expiry alerts etc..
Infrastructure capacity planning, as most of our workloads run on AWS. Heading capacity planning meeting with Architects/stakeholders to finalise on what ratio of OnDemand/Spot/Reserved instances to use for each project to cut down costs.

CommunicationTerraform

Vrook - transforming learning

Full stack and DevOps

Jul 2015 – Feb 2018 · 2 yrs 7 mos · Bengaluru Area, India

Designed and Developed E-Commerce Website for custom buying and selling needs on platform.
Developed bash/python scripts to automate server builds, OS patches and deployment to staging and production environments.
Used configuration management like ANSIBLE and CHEF to automate the infrastructure.
Built several JENKINS CI/CD Pipelines to automate deployment process for several modules for the product.
Responsible for determining all necessary coding requirements and implementation of new features.
Built fault-tolerant and highly available infrastructure on AWS Cloud. Scaling huge media and website product content on AWS to manage traffic spikes.
Implemented custom monitoring methods to implement self-healing process

CommunicationTerraform