S

Sanket Patel

DevOps Engineer

Seattle, Washington, United States10 yrs 5 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Expert in large scale datacenter capacity management.
  • Proven track record in cloud infrastructure and automation.
  • Strong background in Site Reliability Engineering.
Stackforce AI infers this person is a Cloud Infrastructure and Site Reliability Engineering expert.

Contact

Skills

Core Skills

Cloud ComputingSite Reliability Engineering

Other Skills

PythonLinuxHadoopDevOpsMachine LearningCapacity EngineeringOperational MetadataMonitoringIncident ManagementAutomationHBaseBackup SystemsWeb DevelopmentData StructuresComputer Networking

About

man page: https://man.sanket.plus home page: https://sanket.plus Full-stack generalist engineer specializing in large scale datacenter capacity management. Experienced in working with cloud, metrics and monitoring, hadoop stack and linux based operating systems.

Experience

10 yrs 5 mos
Total Experience
3 yrs 5 mos
Average Tenure
4 yrs 7 mos
Current Experience

Meta

Production Engineer

Nov 2021Present · 4 yrs 7 mos · Seattle, Washington, United States · On-site

  • Building tools to effectively operate gigawatts scale capacity allocations and accounting for all teams within Meta.
  • Built tool to buy/sell/trade infra resources (compute, db, storage etc) between teams. Logged transactions of $1B+ amount of resources.
  • Building and operating the multi-billion dollar capacity accounting service and automating workflows on top of it (eg: A compute 'transfer' workflow between Facebook and WhatsApp team)
PythonLinuxCloud ComputingHadoopSite Reliability EngineeringDevOps

Linkedin

2 roles

Senior Site Reliability Engineer

Promoted

Jul 2020Nov 2021 · 1 yr 4 mos

  • Working with Capacity Engineering team which owns and develops tools that:
  • 1. Stress test the service on live traffic to determine service capacity and then helps owners right-size it.
  • 2. Help identify capacity bottlenecks using machine learning models based on service latency.
  • 3. Store operational metadata that helps correlate issues with relevant events that happened.
Machine LearningCapacity EngineeringOperational MetadataSite Reliability EngineeringCloud Computing

Site Reliability Engineer

May 2018Jul 2020 · 2 yrs 2 mos

Directi

2 roles

Operations Engineer

Jun 2016May 2018 · 1 yr 11 mos

  • Member of Platform and Production Engineering team who owns the infrastructure. I work with or am responsible for following aspects of infrastructure:
  • Commissioning,managing and tuning Hadoop/HBase cluster.
  • Setting up monitoring for infrastructure and services.
  • Setting up auto-scaling and DoS protection for client facing services.
  • System and services/JVM metrics collection.
  • Incident management system (home grown PagerDuty)
  • Automation using Jenkins, AWS Lambda and config management using salt-stack.
HadoopMonitoringIncident ManagementAutomationCloud ComputingSite Reliability Engineering

Intern

Jan 2016Jun 2016 · 5 mos

  • Worked on a Business Continuity Project which ensures the production HBase cluster stays backed-up and in-sync with stand-by cluster. Backup happens once a day and backup cluster stays in shutdown state at the other times on the day, ensuring data is backed up regularly and cost of infra is minimal.
HBaseBackup SystemsCloud Computing

Education

Nirma University

Bachelor of Technology (BTech) — Information Technology

Jan 2012Jan 2016

V D Desai High School

Jan 2010Jan 2012

Stackforce found 100+ more professionals with Cloud Computing & Site Reliability Engineering

Explore similar profiles based on matching skills and experience