Vaibhav Thakur

DevOps Engineer

Vancouver, British Columbia, Canada6 yrs 7 mos experience
Highly Stable

Key Highlights

  • Reduced hosting costs by 50%, saving $2.1M annually.
  • Managed over 600 Kubernetes clusters across multiple cloud platforms.
  • Architected GitOps CI/CD pipelines ensuring automated deployments.
Stackforce AI infers this person is a DevOps and Cloud Architect specializing in multi-cloud infrastructure and Kubernetes solutions.

Contact

Skills

Core Skills

KubernetesCloud ComputingGitopsInfrastructure DesignCi/cdCost Management

Other Skills

AWSAmazon EC2Amazon Web Services (AWS)ArgoAzureBaculaBashComputer NetworkingConfiguration ManagementContinuous Integration and Continuous Delivery (CI/CD)CrossplaneDevopsDockerEC2ECS

About

Iโ€™m a DevOps & Cloud Architect specializing in Kubernetes, GitOps, and multi-cloud infrastructure (AWS, GCP, Azure). I design and operate scalable, secure, and cost-optimized platforms that empower development teams to deliver software quickly, reliably, and with confidence At Tigera (Project Calico), I led initiatives that: - Reduced hosting costs by 50% ($2.1M in annual savings) without sacrificing reliability. - Managed 600+ Kubernetes clusters across AWS, GCP, and Azure, using a unified operational model. - Architected GitOps CI/CD pipelines with ArgoCD, ensuring audited, automated, and reproducible deployments. - Implemented SOC2 compliance controls, embedding security and policy-as-code into every stage of the delivery pipeline. I thrive at the intersection of infrastructure, security, and developer productivity, bridging development and operations through automation, observability, and strategic planning. My approach is data-driven, customer-focused, and always aligned with long-term business goals ๐—ง๐—˜๐—–๐—›๐—ก๐—œ๐—–๐—”๐—Ÿ ๐—ฆ๐—ž๐—œ๐—Ÿ๐—Ÿ๐—ฆ: ๐—ฃ๐—ฟ๐—ผ๐—ด๐—ฟ๐—ฎ๐—บ๐—บ๐—ถ๐—ป๐—ด/๐—ฆ๐—ฐ๐—ฟ๐—ถ๐—ฝ๐˜๐—ถ๐—ป๐—ด ๐—Ÿ๐—ฎ๐—ป๐—ด๐˜‚๐—ฎ๐—ด๐—ฒ๐˜€: Python, Bash ๐—ฃ๐˜‚๐—ฏ๐—น๐—ถ๐—ฐ ๐—–๐—น๐—ผ๐˜‚๐—ฑ: AWS, GCP, Azure ๐—–๐—ผ๐—ป๐˜๐—ฎ๐—ถ๐—ป๐—ฒ๐—ฟ ๐—ข๐—ฟ๐—ฐ๐—ต๐—ฒ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ถ๐—ผn: Kubernetes (GKE, AKS, EKS), AWS ECS, Docker Swarm ๐—ข๐—ฏ๐˜€๐—ฒ๐—ฟ๐˜ƒ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜†: Prometheus(Thanos), Alertmanager, Grafana, Datadog, Pagerduty, Opsgenie, Jaeger, Elastic Stack ๐—ฃ๐—ผ๐—น๐—ถ๐—ฐ๐˜† ๐— ๐—ฎ๐—ป๐—ฎ๐—ด๐—ฒ๐—บ๐—ฒ๐—ป๐˜: Kyverno, Kubescape, Checkov, Trivy ๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฐ๐—ฒ ๐—ฃ๐—ฟ๐—ผ๐˜…๐˜† & ๐—ก๐—ฒ๐˜๐˜„๐—ผ๐—ฟ๐—ธ๐—ถ๐—ป๐—ด: Nginx Ingress, Envoy Gateway, Calico ๐ˆ๐š๐‚: Crossplane, Terraform, Ansible ๐—–๐—ผ๐—ป๐˜๐—ถ๐—ป๐˜‚๐—ผ๐˜‚๐˜€ ๐—œ๐—ป๐˜๐—ฒ๐—ด๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป: GitHub Actions, TravisCI, GitlabCI, Jenkins, Argo Workflows ๐—–๐—ผ๐—ป๐˜๐—ถ๐—ป๐˜‚๐—ผ๐˜‚๐˜€ ๐——๐—ฒ๐—น๐—ถ๐˜ƒ๐—ฒ๐—ฟ๐˜†: ArgoCD, FluxCD, Spinnaker ๐—Ÿ๐—˜๐—”๐——๐—˜๐—ฅ๐—ฆ๐—›๐—œ๐—ฃ ๐—”๐—ก๐—— ๐—ฃ๐—ฅ๐—ข๐—™๐—˜๐—ฆ๐—ฆ๐—œ๐—ข๐—ก๐—”๐—Ÿ ๐—ฆ๐—ž๐—œ๐—Ÿ๐—Ÿ๐—ฆ: ๐—œ๐—ป๐—ฐ๐—ถ๐—ฑ๐—ฒ๐—ป๐˜ ๐— ๐—ฎ๐—ป๐—ฎ๐—ด๐—ฒ๐—บ๐—ฒ๐—ป๐˜: Leading high-priority incidents, RCA, and post-incident reviews ๐—ฃ๐—ฟ๐—ผ๐—ท๐—ฒ๐—ฐ๐˜ ๐— ๐—ฎ๐—ป๐—ฎ๐—ด๐—ฒ๐—บ๐—ฒ๐—ป๐˜: Scrum, Agile, end-to-end lifecycle execution ๐—–๐—ฟ๐—ผ๐˜€๐˜€-๐—™๐˜‚๐—ป๐—ฐ๐˜๐—ถ๐—ผ๐—ป๐—ฎ๐—น ๐—–๐—ผ๐—น๐—น๐—ฎ๐—ฏ๐—ผ๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Aligning diverse teams to deliver shared goals ๐—–๐—ผ๐—บ๐—บ๐˜‚๐—ป๐—ถ๐—ฐ๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Effectively presenting technical concepts to all stakeholders ๐—ฆ๐˜๐—ฟ๐—ฎ๐˜๐—ฒ๐—ด๐—ถ๐—ฐ ๐—ฃ๐—น๐—ฎ๐—ป๐—ป๐—ถ๐—ป๐—ด: Driving cost optimization and process improvements ๐—ฃ๐—ฟ๐—ผ๐—ฏ๐—น๐—ฒ๐—บ-๐—ฆ๐—ผ๐—น๐˜ƒ๐—ถ๐—ป๐—ด: Proactive resolution of technical & organizational challenges ๐—ฅ๐—ฒ๐—น๐—ฒ๐—ฎ๐˜€๐—ฒ ๐— ๐—ฎ๐—ป๐—ฎ๐—ด๐—ฒ๐—บ๐—ฒ๐—ป๐˜: Oversee all cloud software releases Codementor: https://www.codementor.io/@vaibhavthakur263 Github: https://github.com/Thakurvaibhav Fight On! โœŒ๐Ÿป

Experience

6 yrs 7 mos
Total Experience
1 yr 7 mos
Average Tenure
--
Current Experience

Tigera

4 roles

Principal DevOps Engineer

Jan 2025 โ€“ Present ยท 1 yr 5 mos

KubernetesArgoPrometheus.ioGoogle Cloud Platform (GCP)Amazon Web Services (AWS)Microsoft Azure+10

Software Engineering Manager, DevOps

Promoted

Oct 2022 โ€“ Jan 2025 ยท 2 yrs 3 mos

  • ๐“๐ž๐š๐ฆ ๐‹๐ž๐š๐๐ž๐ซ๐ฌ๐ก๐ข๐ฉ ๐š๐ง๐ ๐‚๐จ๐ฅ๐ฅ๐š๐›๐จ๐ซ๐š๐ญ๐ข๐จ๐ง
  • Lead a global DevOps team for Tigera's flagship SaaS product, Calico Cloud.
  • Scrum lead for the platform team
  • Collaborate with teams to build effective system designs & ensure cross-functional alignment.
  • Empower Dev teams to seamlessly build, test & deploy features.
  • ๐ˆ๐ง๐Ÿ๐ซ๐š๐ฌ๐ญ๐ซ๐ฎ๐œ๐ญ๐ฎ๐ซ๐ž ๐ƒ๐ž๐ฌ๐ข๐ ๐ง ๐š๐ง๐ ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ๐ข๐ง๐ 
  • Design and engineer scalable & fault-tolerant multi-cloud infrastructure for Calico Cloud (AWS, GCP, Azure).
  • Employ infra-as-code practices using tools like Terraform & Crossplane.
  • Implement strong security measures for data integrity & confidentiality, including network segmentation, encryption, and access controls.
  • Manage 600+ Kubernetes clusters (GKE, AKS, EKS) through a central operational view.
  • ๐ƒ๐ž๐ฏ๐ž๐ฅ๐จ๐ฉ๐ฆ๐ž๐ง๐ญ ๐š๐ง๐ ๐ƒ๐ž๐ฉ๐ฅ๐จ๐ฒ๐ฆ๐ž๐ง๐ญ
  • Architect GitOps CI/CD pipelines for multi-cloud Kubernetes workloads using Argo (CD, Workflow & Events) & SemaphoreCI.
  • Automate deployments to dev/test env
  • Ensure audited deployments for prod
  • Perform CVE scanning
  • Enforce Helm best practices
  • Build an internal developer platform using Crossplane and ArgoCD
  • Implement Kyverno for policy management, ensuring consistency & security
  • ๐Œ๐จ๐ง๐ข๐ญ๐จ๐ซ๐ข๐ง๐  ๐š๐ง๐ ๐Ž๐›๐ฌ๐ž๐ซ๐ฏ๐š๐›๐ข๐ฅ๐ข๐ญ๐ฒ
  • Design a scalable multi-cloud monitoring platform using Prometheus, Alertmanager, Grafana, and Opsgenie.
  • Implement Elasticsearch-based log aggregation.
  • Enable Jaeger-based distributed tracing.
  • ๐‚๐จ๐ฌ๐ญ ๐Ž๐ฉ๐ญ๐ข๐ฆ๐ข๐ณ๐š๐ญ๐ข๐จ๐ง ๐š๐ง๐ ๐•๐ž๐ง๐๐จ๐ซ ๐Œ๐š๐ง๐š๐ ๐ž๐ฆ๐ž๐ง๐ญ
  • Achieved over 50% cost reduction, savings of $2.1M annually.
  • Identify cost drivers & implement optimization techniques, such as right-sizing instances, reserved instances, and spot instances.
  • Negotiate pricing with vendors.
  • ๐‚๐จ๐ฆ๐ฉ๐ฅ๐ข๐š๐ง๐œ๐ž ๐š๐ง๐ ๐€๐ฎ๐๐ข๐ญ๐ข๐ง๐ 
  • Lead engineering efforts for SOC2 & compliance audits.
  • Ensure Calico Cloud meets compliance requirements.
ArgoGoogle Cloud Platform (GCP)SOC 2GitOpsKubernetesPrometheus.io+6

Senior DevOps Engineer

Mar 2021 โ€“ Oct 2022 ยท 1 yr 7 mos

LinuxArgoGrafanaGoogle Cloud Platform (GCP)Infrastructure as code (IaC)Computer Networking+14

DevOps Engineer

Oct 2019 โ€“ Mar 2021 ยท 1 yr 5 mos

  • Tigera, the inventor and maintainer of open source Calico, delivers Calico Cloud, the next-generation cloud service for Kubernetes security and observability. Calico Cloud is offered both as a managed cloud service and a self-managed service in a private VPC. Tigera's Kubernetes native service extends the declarative nature of Kubernetes to specify "security and observability as code," which ensures consistent enforcement of security policies and compliance and provides observability and troubleshooting across multi-cluster, multi-cloud and hybrid deployments. Tigera's solution is used by some of the world's leading companies, including AT&T, Bloomberg, JP Morgan Chase, Morgan Stanley, Robinhood, ServiceNow, and Visa.
  • Project Calico is the most widely adopted solution for Kubernetes networking and security, powering 1M+ nodes daily across 166 countries. Calico is the only solution with a pluggable data plane architecture enabling support for multiple data planes, including standard Linux, eBPF, and Windows.
LinuxArgoGrafanaGoogle Cloud Platform (GCP)Infrastructure as code (IaC)Computer Networking+14

Scalefactor

Site Reliability Engineer

Jul 2019 โ€“ Oct 2019 ยท 3 mos ยท Vancouver, Canada Area ยท On-site

  • Managed AWS infrastructure using Terraform to ensure efficient and scalable resource provisioning and management.
  • Led containerization initiatives for both application deployment and QA, ensuring seamless integration and improved deployment efficiency.
  • Led the migration of applications from AWS Elastic Beanstalk to Kubernetes (KOPS) for enhanced scalability, flexibility, and control.
  • Managed Jenkins Blue Ocean CI/CD pipelines, enabling continuous integration and delivery of software releases.
  • Implemented APM (Application Performance Monitoring) and infrastructure monitoring using Datadog to gain deep insights into application performance and infrastructure health.
  • Leveraged Datadog for efficient collection and analysis of infrastructure and application logs, enabling effective troubleshooting and log management.
LinuxspinnakerArgoNetworkingGrafanaGoogle Cloud Platform (GCP)+17

Quid

DevOps Engineer

Feb 2019 โ€“ Jul 2019 ยท 5 mos ยท San Francisco Bay Area ยท Hybrid

  • Managed Kubernetes clusters on AWS using Kops, ensuring their smooth operation and optimal
  • performance.
  • Incorporated best practices to ensure the security of the Kubernetes clusters and the workloads,
  • safeguarding against potential vulnerabilities.
  • Led the initiative to migrate applications from DC/OS to Kubernetes, leveraging the benefits of
  • Kubernetes for scalability and orchestration.
  • Took charge of architecting a CI/CD pipeline for Kubernetes workloads, utilizing GitHub, Travis CI, and
  • Spinnaker to streamline the software delivery process.
  • Integrated monitoring and alerting systems with Datadog and PagerDuty, enabling proactive
  • monitoring and timely incident response.
  • Integrated logging with the existing VM-based ELK (Elasticsearch, Logstash, Kibana) setup, facilitating
  • centralized log management and analysis.
  • Actively participated in a 24x7 on-call rotation, ensuring the availability and reliability of the systems.
LinuxspinnakerGrafanaGoogle Cloud Platform (GCP)Infrastructure as code (IaC)Computer Networking+12

Hike messenger

DevOps Engineer

Feb 2018 โ€“ Mar 2019 ยท 1 yr 1 mo ยท Aerocity, New Delhi ยท On-site

  • Managed Google Cloud infrastructure, overseeing the provisioning and maintenance of resources.
  • Successfully managed and deployed GKE clusters, ensuring the availability and reliability of the Kubernetes infrastructure.
  • Orchestrated JAVA application containers and DB Stateful sets over Kubernetes clusters, optimizing application performance and data persistence.
  • Scaled TensorFlow recommendation models using TensorFlow Serving and GKE, enabling efficient handling of high volumes of data and user requests.
  • Deployed Kubernetes Ops applications like Ingress Controllers, Monitoring (Prometheus-Alertmanager), and Logging (EFK) stacks, enhancing observability and operational efficiency.
  • Automated Kubernetes deployments using Spinnaker (CD) and Jenkins (CI) pipelines, streamlining the software release process.
  • Managed and optimized hosted data stores like Mongo, MySQL, and Cassandra, ensuring their performance and reliability.
  • Automated infrastructure tasks using Jenkins, improving operational efficiency and reducing manual effort.
  • Configured a highly available Chef setup with multiple backend, frontend, and workstation machines, enabling effective configuration management.
  • Managed infrastructure configuration using Chef, ensuring consistency and compliance across the environment.
  • Implemented Google authentication-enabled OpenVPN setup across multiple VPCs, ensuring secure access to the infrastructure.
  • Performed infrastructure monitoring using Nagios, Ganglia, and VictorOps, proactively identifying and resolving issues.
  • Actively participated in a 24x7 on-call rotation, promptly addressing and resolving critical incidents.
  • Managed and planned OS upgrades, ensuring the security and stability of the infrastructure.
LinuxspinnakerGrafanaGoogle Cloud Platform (GCP)Infrastructure as code (IaC)Computer Networking+11

Broctagon fintech group

Cloud Engineer

Mar 2017 โ€“ Feb 2018 ยท 11 mos ยท Noida Area, India ยท On-site

  • Configured Java, Node, and PHP-Apache application Docker containers with sidecars, ensuring efficient and reliable containerization of applications.
  • Orchestrated containers using Kubernetes and ECS, effectively managing containerized workloads and ensuring scalability.
  • Deployed highly available (multi-master) Kubernetes clusters on AWS using Kops, enabling high resilience and availability of the infrastructure.
  • Automated Kubernetes cluster configuration using AWS OpsWorks (Chef), simplifying the deployment and management of the clusters.
  • Deployed stateful and stateless Docker services to the Kubernetes cluster, leveraging the benefits of container orchestration for both persistent and ephemeral workloads.
  • Configured an end-to-end application container deployment process, implementing rolling deploys for seamless and continuous software releases, with Jenkins as the orchestrator.
  • Automated the build and deploy pipeline for Docker on ECS using GitLab-CI, streamlining the software delivery process and ensuring consistency across environments.
  • Implemented code quality standards using CodeClimate Cli, enforcing code quality and best practices in the development workflow.
  • Managed DNS using Cloudflare, effectively configuring and managing DNS records for the infrastructure, ensuring reliable and efficient DNS resolution.
LinuxGrafanaComputer NetworkingShell ScriptingCloud ComputingContinuous Integration and Continuous Delivery (CI/CD)+8

Retention science

DevOps Engineer

Aug 2015 โ€“ Mar 2017 ยท 1 yr 7 mos ยท Santa Monica ยท On-site

  • Utilized Chef and AWS OpsWorks for configuration management, ensuring consistency and efficiency in managing infrastructure resources.
  • Successfully migrated and upgraded existing AWS-based infrastructure, ensuring minimal disruption and improved performance.
  • Configured site-to-site VPN between AWS and Azure resources, establishing secure and reliable communication between the two cloud environments.
  • Implemented a centralized backup system using Bacula, ensuring data protection and recovery capabilities across the infrastructure.
  • Performed application-level monitoring using Nagios, proactively monitoring and managing the health and performance of critical applications.
  • Automated system-level tasks, streamlining operational processes and reducing manual effort.
  • Handled database administration tasks, including performance tuning, backup and recovery, and ensuring the security and availability of databases.
LinuxGrafanaComputer NetworkingShell ScriptingCloud ComputingContinuous Integration and Continuous Delivery (CI/CD)+8

Trackit

DevOps Engineer

May 2015 โ€“ Aug 2015 ยท 3 mos ยท Greater Los Angeles Area ยท On-site

  • Developed a comprehensive system monitoring solution for Linux, Windows, and macOS, providing real-time monitoring and alerting capabilities.
  • Administered Linux servers, ensuring their optimal performance, security, and availability.
  • Extensively worked on Docker, leveraging containerization technology for efficient deployment and management of applications.
  • Utilized Saltstack for configuration management, simplifying the management and orchestration of infrastructure resources.
  • Developed a Python-based Datamover plugin, enhancing data movement and processing capabilities within the system.
  • Created a Python and d3.js application, running inside a Docker container, to display a network map of a Datacenter, providing visual insights into the network architecture and topology.
LinuxGrafanaBashShell ScriptingCloud ComputingJenkins+4

Bharat heavy electricals limited

Engineering Trainee

Jun 2012 โ€“ Jul 2012 ยท 1 mo ยท Haridwar Area, India

  • Training at the telecom department pertaining primarily to operation, testing and maintenance of Electronic Exchanges;
  • Detailed study of a Main Distribution Frame;
  • Analyzed various testing procedures and faults occurring in an exchange.

Servomax india

Engineering Trainee

Dec 2011 โ€“ Jan 2012 ยท 1 mo ยท New Delhi Area, India

  • Short term practical study on GSM technology based BTS Installation and Commissioning at Telecom Tower sites;
  • Studied the latest NSN BTS solutions like FLEXI Hybrid, NODE-B (3G BTS: with/ without Feeder cable);
  • On-site experience regarding swapping of BTS and dismantling of various modules.

Bharti airtel limited

Engineering Trainee

Jun 2011 โ€“ Jul 2011 ยท 1 mo ยท New Delhi Area, India

  • Gained onsite training in the Network Operation & Maintenance of GSM Networks;
  • Worked in an MSC and learnt its various operations practically;
  • Adept with GSM Architecture and processes such as Call Flow, SMS Flow, Location Update and Signaling.

Education

University of Southern California

Master's degree โ€” Electrical Engineering

Jan 2014 โ€“ Jan 2016

Maharaja Agrasen Institute Of Technology, Delhi

B.Tech โ€” Electronics and Communication

Jan 2009 โ€“ Jan 2013

Sachdeva Public School

SSC

Jan 2008 โ€“ Jan 2009

Sachdeva Public School, Pitampura

HSC

Jan 2006 โ€“ Jan 2007

Stackforce found 100+ more professionals with Kubernetes & Cloud Computing

Explore similar profiles based on matching skills and experience