Omar Baldonado

CTO

Palo Alto, California, United States31 yrs 11 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Expert in managing large-scale AI networking projects
  • Pioneered open-source networking hardware initiatives
  • Led development of high-performance AI clusters
Stackforce AI infers this person is a leader in AI networking and data center infrastructure.

Contact

Skills

Core Skills

Data CenterNetworkingProduct Management

Other Skills

Cloud ComputingAI ClustersPerformance OptimizationOpen SourceDisaggregationSoftware-Defined NetworkingOpenFlowBusiness DevelopmentSocial CommerceVoIPProduct DevelopmentTraffic OptimizationReal-time NetworkingRoutingNetwork Configuration

About

I manage the groups that develop and operate Meta's data center network that support Meta's family of apps (Meta AI, Facebook, Instagram, WhatsApp, Messenger) and our AI models. We have developed some of the largest AI clusters in the world (129K GPUs in 2024), with more gigawatt-scale clusters coming. We are hiring!! We are looking for ICs & managers for multiple groups (hardware/software/network engineers and TPMs). Our groups include : * the overall network topologies & control stack; * the networking switches/NOS (FBOSS); * host-based networking (NICs, eBPF, transport/congestion control, RoCE...) * AI-specific teams working on communication libraries and performance optimization. Our work spans the entire network lifecycle: * hardware/software/network engineering; * topology design & capacity planning; * distributed protocols & centralized control; * provisioning & delivery workflows; * monitoring, debugging, & analytics; * performance benchmarking & tuning Some highlights of our work: * Developing TorchComms as a new high-performance AI networking layer: https://pytorch.org/blog/torchcomms/ * Non-Scheduled Fabric (NSF) for AI scale-out and Ethernet for Scale-Up Networking: https://engineering.fb.com/2025/10/13/data-infrastructure/ocp-summit-2025-the-open-future-of-networking-hardware-for-ai/ * Building large-scale (100K+ GPUs) AI clusters based on RoCE: https://engineering.fb.com/2024/08/05/data-center-engineering/roce-network-distributed-ai-training-at-scale/ * Distributed Scheduled Fabric (DSF) for AI clusters; and our first network ASIC for our FBNIC: https://engineering.fb.com/2024/10/15/data-infrastructure/open-future-networking-hardware-ai-ocp-2024-meta/ * NetEdit: An Orchestration Platform for eBPF Network Functions at Scale: https://dl.acm.org/doi/10.1145/3651890.3672227 * DCTCP at scale: https://www.usenix.org/conference/nsdi24/presentation/dhamija * FBOSS, our Network Operating System (NOS) for our DC network switches: https://engineering.fb.com/2019/03/14/data-center-engineering/f16-minipack/ and https://engineering.fb.com/2021/11/09/data-center-engineering/ocp-summit-2021/ * BGP and Open/R routing protocols in our DCs and WAN: https://research.fb.com/publications/running-bgp-in-data-centers-at-scale/ Learn more about AI networking through the networking @scale conferences that we host: * https://engineering.fb.com/2025/09/26/networking-traffic/networking-at-the-heart-of-ai-scale-networking-2025-recap/ * https://atscaleconference.com/events/networking-scale-2024/ * https://atscaleconference.com/events/networking-scale-2023/

Experience

Meta

2 roles

Senior Director, Data Center & AI Networking

Promoted

Feb 2019Present · 7 yrs 1 mo

Data CenterCloud ComputingNetworkingAI ClustersPerformance Optimization

Director, DC Networking / Net Systems

Jun 2013Jan 2019 · 5 yrs 7 mos

Open compute project

OCP Networking Project Co-Lead

Aug 2014Aug 2022 · 8 yrs · Menlo Park

  • The Open Compute Project brings the concepts of open-source to the hardware industry. There are many projects in OCP, ranging from networking to data center design/mgmt, to servers and storage, to HPC and HW management: http://www.opencompute.org/
  • The OCP Networking Project focuses on the disaggregation and openness of network hardware and software. Meta helped form the networking project, and we have contributed our switch designs to OCP since 2015.
Open SourceNetworkingDisaggregation

Big switch networks

Head of Product Management

Oct 2010May 2013 · 2 yrs 7 mos

  • In product management, helped define, develop, and drive product from inception through company/product launch in the new space of Software-Defined Networking and OpenFlow, working with early customers to define the initial applications of network virtualization and scale-out monitoring
  • Part of initial engineering team that developed the first versions of the company's OpenFlow/SDN controller and managed early deployments
Product ManagementSoftware-Defined NetworkingOpenFlow

Social startup

Founder

Jan 2009Jan 2010 · 1 yr

  • Started company to enable small businesses to promote deals and use social commerce (early Facebook and iPhone apps). Prepared business plans, secured angel funding, worked with partners.
Business DevelopmentSocial Commerce

Avaya

Director of System Management/Assured Networks

Jan 2004Jan 2008 · 4 yrs

  • Brought real-time optimization to enterprises for apps like VoIP and videoconferencing
  • Led RouteScience product development and integration of company into Avaya. Worked cross-functionally to target RouteScience product at Avaya's communication applications (VoIP, videoconferencing, contact centers)
  • Led global team (multiple US locations, Germany, Israel, India) of over 100 engineers supporting enterprise communication management systems.
VoIPProduct DevelopmentProduct Management

Routescience technologies

R&D Director

Jan 2000Jan 2004 · 4 yrs

  • Reroute traffic in real-time to optimize performance, bandwidth, and cost
  • Part of initial startup team that designed/built multiple generations of application performance optimization devices. Developed fundamental algorithms and software for highly scalable/accurate Internet measurements and IP routing optimization with sub-second response times
  • Validated findings with Stanford researchers and major enterprises/service providers with global data center infrastructure
  • Multiple real-time networking measurement and optimization patents
Traffic OptimizationReal-time NetworkingNetworking

Cisco systems

Engineering Manager

Jan 1996Jan 2000 · 4 yrs

  • Led Windows-based Baseliner product (based on technology from NETSYS acquisition) to analyze router misconfigurations
  • Led cross-Cisco user interface group to develop common Java libraries for topology displays, reporting, wizards; developed UI for MPLS VPN provisioning system.
RoutingNetwork ConfigurationNetworking

Netsys technologies

Senior Software Engineer

Mar 1995Nov 1996 · 1 yr 8 mos

  • Built IP traffic collection system for simulation/analysis
  • Worked with Cisco on initial NetFlow spec & with RMON vendors to baseline enterprise traffic as input to routing simulation engines
IP Traffic CollectionRouting Simulation

Make systems

Software Engineer

Jun 1992Feb 1995 · 2 yrs 8 mos

  • Developed routing simulation/design modules for TDM switches & traffic matrix collection for IP networks
Routing SimulationTraffic Matrix Collection

Education

Stanford University

BS

Jan 1987Jan 1992

Stackforce found 100+ more professionals with Data Center & Networking

Explore similar profiles based on matching skills and experience