Harsha Patil

VP of Engineering

San Francisco, California, United States8 yrs 1 mo experience

Highly Stable

Key Highlights

Led global teams managing large-scale Kubernetes clusters.
Achieved significant cost savings through resource optimization.
Passionate mentor fostering high-performance engineering culture.

Stackforce AI infers this person is a SaaS Infrastructure Engineer with expertise in Kubernetes and cloud optimization.

Contact

hpatil@wayfair.com LinkedIn

Skills

Core Skills

KubernetesEngineering LeadershipSite Reliability Engineering

Other Skills

ArtifactoryAutomationBatch ProcessingComputer Network OperationsCost OptimizationCost TrackingCritical ThinkingDatadogDevOpsDockerGKEGitGoogle Cloud Platform (GCP)GrafanaHTML

About

Engineering leader with expertise in Kubernetes, service mesh, cloud infrastructure, and compute platforms, driving scalability, cost efficiency, and innovation in high-traffic environments. Experienced in architecting resilient systems, optimizing platform performance, and leading capacity planning for major events. Proven ability to scale teams, hiring and mentoring engineers across levels while fostering a high-performance, psychologically safe, and collaborative culture. Skilled in engaging with cross-functional stakeholders, including business leaders, product managers, and engineering teams, to align infrastructure strategy with business goals and ensure seamless execution of key initiatives. Passionate about mentorship, operational excellence, and driving meaningful impact through technology leadership.

Experience

8 yrs 1 mo

Total Experience

1 yr 4 mos

Average Tenure

10 mos

Current Experience

Salesforce

Senior Software Engineering Manager

Jul 2025 – Present · 10 mos

Wayfair

3 roles

Senior Engineering Manager, Compute and Mesh Platforms

Promoted

Nov 2022 – Jul 2025 · 2 yrs 8 mos

> Lead a global team of 11 engineers, driving strategy and execution for 10 Kubernetes clusters
(500+ nodes each) across GKE, Service Mesh (Istio), and Artifactory, supporting 2,000+
engineers.
> Scaled the team from 3 to 11 engineers, hiring staff, senior, and mid-level engineers while
collaborating with Talent/HR on hiring, compensation, and retention.
> Partner with business leaders, product managers, and engineering teams to align infrastructure
strategy with business goals and technical requirements.
> Led transition to domain-based architecture, improving isolation, cost tracking, and system
reliability.
> Led GKE Autopilot, Service Mesh, and Google Artifact Registry (GAR) adoption, cutting
compute costs by 40% through pod-hour-based billing.
> Optimized compute reservations, vendor contracts, and software spend governance, reducing costs by another 40%.
> Increased Kubernetes efficiency by 30%, reducing resource waste via auto-scaling,
right-sizing, and workload bin-packing.
> Established SLOs/SLIs, reducing Kubernetes-related incidents by 40% and improving MTTR
by 25%.
> Conduct performance evaluations, lead calibration sessions, mentor engineers, and drive
career development.
> Drive OKR-based planning, ensuring technical initiatives align with business priorities for
high-impact execution.

KubernetesService MeshGKETeam ManagementCost OptimizationPerformance Management+2

Senior Site Reliability Engineer, Kubernetes

Promoted

Mar 2022 – Feb 2023 · 11 mos

> Developed an on-demand scaling application for Kubernetes, ensuring seamless user experience during site failovers and achieving $1M cost savings through optimized resource utilization. This is currently in preparation for open-source release.
> Led capacity planning for Google Kubernetes Engine (GKE) during major events, ensuring optimal performance and cost efficiency by accurately forecasting resource needs and provisioning additional resources as required.
> Built and maintained three production-level Kubernetes clusters for development environments, ensuring high availability and optimal performance for various projects.
> Successfully built and deployed a Kubernetes cluster that is currently serving production traffic, meeting the organization's stringent performance and reliability requirements.
> Implemented Service Level Objectives (SLOs) and Service Level Agreements (SLAs) for the on-demand application - Prescalr, ensuring consistent service quality and uptime.
> Actively engaged in high-level incident troubleshooting and resolution, leading incident
response efforts and minimizing system downtime.
> Conducted postmortems for incidents, analyzing root causes, and implementing preventive
measures to enhance system stability and resilience.

KubernetesGKESLOsCost OptimizationIncident ManagementSite Reliability Engineering

Site Reliability Engineer, Kubernetes

Aug 2020 – Mar 2022 · 1 yr 7 mos

Wheedle

Software engineer, Cloud

Jul 2018 – Oct 2018 · 3 mos · Cleveland, Ohio

Flashstarts, inc.

Software Developer Intern

Jan 2018 – Jul 2018 · 6 mos · Greater Cleveland

> Built and maintained a website for our client using HTML and CSS.
>Built a chrome extension using HTML and JavaScript to launch workspaces.
> Worked with a client to implement the website using PHP and Laravel.
Designed an interface to get all the subscribers from a list in mail chimp on a website using C# and visual studio.
Database maintenance and analysis.

Cleveland state university

Graduate Student Assistant

Jan 2018 – May 2018 · 4 mos · Greater Cleveland

Installed and configured the Linux servers in various environments.
Installing the latest packages on the servers for end-users and upgrading the existing packages
Have knowledge on NFS, FTP, DNS, DHCP, LDAP, SAN, TCP/IP and Active Directory.

Cleveland state university student government association

Graduate senator

Jul 2017 – May 2018 · 10 mos · United States

Junkepool.com

DevOps Engineer

Jun 2016 – Jan 2018 · 1 yr 7 mos · Bengaluru, Karnataka, India

Built S3 storage and backup on AWS buckets and managed policies for S3 buckets.
Enabled versions on S3 objects and applied lifecycle policies for archiving the files in Glacier.
Managing with custom AMI's, created AMI tags and modified AMI permissions
Managed Ansible templates using Jinja.
Set up continuous golden AMI vulnerability with Amazon inspector.
Used Chef server and workstation to manage and configure nodes.
Created Amazon VPC according to the need of the client with high security rules.
Created users and groups using Identity and Access Management(IAM) and assigned policies to improve the login authentication.
Development Life Cycle (SDLC) on windows and Linux Platform.
Deploying, planning, monitoring, and maintenance of amazon AWS stack (Including EC2, Route53, S3, Relational DB, Dynamo DB, Direct Connect, Api Gateway, SNS, Service Catalog SQS,EMR, IAM, Lambda) and virtual machines as required in the environment.
Worked on Sub versioning tool Like SVN and GIT.
Configured the Ansible playbooks with Ansible Tower and wrote playbooks using YAML.
Wrote Ansible Playbooks to Manage Configurations of AWS Nodes and test Playbooks on AWS
instances using Python. Run Ansible Scripts to provision Dev servers.
Administered Jenkins continuous integration server installation and configuration to automate
application packaging and deployments.
Integrated JIRA with GIT to help the change management process run smoothly.
Created parent-child relationships between projects to manage Maven project dependencies.
Automated the continuous integration and deployments using Jenkins, Docker and AWS Cloud