A

Austin Dear

Software Engineer

San Francisco, California, United States14 yrs 4 mos experience
Highly Stable

Key Highlights

  • Led a team of 180 engineers at Meta.
  • Developed Meta's cluster management system.
  • Pioneered hiring initiatives for diverse backgrounds.
Stackforce AI infers this person is a high-level infrastructure leader in the tech industry.

Contact

Skills

Core Skills

Production EngineeringSite Reliability Engineering

Other Skills

Infrastructure ManagementCluster ManagementContainerizationSoftware DevelopmentTeam LeadershipReliability EngineeringIncident ManagementService MigrationCapacity PlanningDisaster RecoveryMonitoring Framework DevelopmentTechnical SupportNetwork ManagementWeb Design

About

With over 10 years of experience in Production Engineering and Site Reliability Engineering, I am a visionary and strategic technology leader who drives innovation and efficiency through strategic planning, team building, and implementing cutting-edge technologies at hyperscale. As a Director of Production Engineering at Meta, I led an organization of 180 engineers and managers responsible for enabling Meta's infrastructure to support the growth and diversity of products such as Facebook, Instagram, WhatsApp, and Threads. I supported multiple teams and stakeholders in building Meta's cluster management system, Tupperware/Twine, which is responsible for container allocation and lifecycle management for Meta’s tens of thousands of backend services. I also developed and ran the software responsible for the infrastructure lifecycle operations of Meta's expansive fleet, which consisted of over ten million hosts across 20+ datacenters. I am passionate about improving system reliability, automating processes, and enhancing monitoring capabilities. I thrive in fast-paced and dynamic environments and enjoy collaborating with diverse and talented teams.

Experience

14 yrs 4 mos
Total Experience
4 yrs 8 mos
Average Tenure
2 mos
Current Experience

Anthropic

Member of Technical Staff

Feb 2026Present · 2 mos · San Francisco, CA

Meta

4 roles

Engineering Director

Sep 2024Feb 2026 · 1 yr 5 mos · Menlo Park, California, United States · On-site

  • I support the AI Training Systems team at Meta. We build the scheduling, orchestration, and data processing systems that power some of the largest-scale AI Training in the world. We're responsible for building reliable infrastructure, tools, and services that are used every day by ML engineers and researchers across the company.

Director, Production Engineering

Promoted

Feb 2021Jun 2023 · 2 yrs 4 mos

  • Led an organization of 180 engineers and managers responsible for enabling Meta's infrastructure to support the growth and diversity of products such as Facebook, Instagram, WhatsApp, and Threads.
  • I supported multiple teams and stakeholders building Meta's cluster management system, Tupperware/Twine. Tupperware is Meta's version of Kubernetes/Mesos responsible for container allocation and lifecycle management for Meta’s tens of thousands of backend services.
  • The team developed and ran the software responsible for the infrastructure lifecycle operations (provisioning, configuration, maintenance, etc.) of Meta's expansive fleet, which consisted of over ten million hosts across 20+ datacenters.
  • The team also owned the operating system, packaging, deployment, and host configuration systems for all Meta services.
  • Pioneered a program to hire candidates from nontraditional backgrounds (e.g., self-taught individuals, coding boot camp graduates, career re-entrants) into a year-long immersion program.
Infrastructure ManagementCluster ManagementContainerizationSoftware DevelopmentTeam LeadershipProduction Engineering+1

Production Engineering Manager

Promoted

Jan 2015Feb 2021 · 6 yrs 1 mo

  • Established and led new PE/SRE teams within the Online Data organization, which housed Meta’s backend data stores and caches, each handling billions of requests per second.
  • Led the Production Engineering/SRE teams for TAO (Graph write-through cache), Memcache, ZippyDB (Distributed key/value store), and graph indexing systems.
  • I drove a Reliability initiative that significantly improved the reliability of Meta's online data systems. I led the weekly incident review, participated in an incident manager on-call rotation, and facilitated cross-functional collaboration to improve system reliability.
  • Orchestrated backend service migration from bare metal to containerized environments.
Team LeadershipReliability EngineeringIncident ManagementService MigrationProduction EngineeringSite Reliability Engineering

Production Engineer

Sep 2012Jan 2015 · 2 yrs 4 mos

  • Designed and implemented a datacenter maintenance orchestration system, automating the process of shifting backend service capacity between datacenters.
  • Led capacity and disaster recovery planning for Facebook's distributed data caching layer (TAO and Memcache), including the creation and implementation of operational tools.
  • Developed a reusable monitoring framework to standardize alerting and remediation procedures across multiple backend data and cache services.
  • Engineered an on-call paging system that efficiently looped through on-call personnel, automating call and escalation procedures. This system was deployed company-wide and enabled the phase-out of a 24x7 operations team.
Capacity PlanningDisaster RecoveryMonitoring Framework DevelopmentProduction Engineering

Rankin county school district

System Administrator

Aug 2011Sep 2012 · 1 yr 1 mo · Brandon, MS

  • Supports data center technical services comprised of varying local and wide area networks, email delivery systems, data warehouses, directory services, web infrastructure, and over 200 servers. Develops various enterprise-class systems for enhanced service delivery. Supports high-complexity network backbones in nearly 30 schools across the Jackson, MS Metro Area.
Technical SupportNetwork Management

Mississippi state university

Software Developer

Jan 2009Dec 2009 · 11 mos · Tuscaloosa, Alabama Area

  • Developed mission-critical, internal applications for use by University health officials. Designed websites for University Health Services and the Longest Student Health Center.
Software DevelopmentWeb Design

Education

Mississippi State University

BBA — Business Information Systems

Jan 2007Jan 2011

Stackforce found 100+ more professionals with Production Engineering & Site Reliability Engineering

Explore similar profiles based on matching skills and experience