Omar Baldonado — CTO

I manage the groups that develop and operate Meta's data center network that support Meta's family of apps (Meta AI, Facebook, Instagram, WhatsApp, Messenger) and our AI models. We have developed some of the largest AI clusters in the world (129K GPUs in 2024), with more gigawatt-scale clusters coming. We are hiring!! We are looking for ICs & managers for multiple groups (hardware/software/network engineers and TPMs). Our groups include : * the overall network topologies & control stack; * the networking switches/NOS (FBOSS); * host-based networking (NICs, eBPF, transport/congestion control, RoCE...) * AI-specific teams working on communication libraries and performance optimization. Our work spans the entire network lifecycle: * hardware/software/network engineering; * topology design & capacity planning; * distributed protocols & centralized control; * provisioning & delivery workflows; * monitoring, debugging, & analytics; * performance benchmarking & tuning Some highlights of our work: * Developing TorchComms as a new high-performance AI networking layer: https://pytorch.org/blog/torchcomms/ * Non-Scheduled Fabric (NSF) for AI scale-out and Ethernet for Scale-Up Networking: https://engineering.fb.com/2025/10/13/data-infrastructure/ocp-summit-2025-the-open-future-of-networking-hardware-for-ai/ * Building large-scale (100K+ GPUs) AI clusters based on RoCE: https://engineering.fb.com/2024/08/05/data-center-engineering/roce-network-distributed-ai-training-at-scale/ * Distributed Scheduled Fabric (DSF) for AI clusters; and our first network ASIC for our FBNIC: https://engineering.fb.com/2024/10/15/data-infrastructure/open-future-networking-hardware-ai-ocp-2024-meta/ * NetEdit: An Orchestration Platform for eBPF Network Functions at Scale: https://dl.acm.org/doi/10.1145/3651890.3672227 * DCTCP at scale: https://www.usenix.org/conference/nsdi24/presentation/dhamija * FBOSS, our Network Operating System (NOS) for our DC network switches: https://engineering.fb.com/2019/03/14/data-center-engineering/f16-minipack/ and https://engineering.fb.com/2021/11/09/data-center-engineering/ocp-summit-2021/ * BGP and Open/R routing protocols in our DCs and WAN: https://research.fb.com/publications/running-bgp-in-data-centers-at-scale/ Learn more about AI networking through the networking @scale conferences that we host: * https://engineering.fb.com/2025/09/26/networking-traffic/networking-at-the-heart-of-ai-scale-networking-2025-recap/ * https://atscaleconference.com/events/networking-scale-2024/ * https://atscaleconference.com/events/networking-scale-2023/

Stackforce AI infers this person is a leader in AI networking and data center infrastructure.

Location: Palo Alto, California, United States

Experience: 32 yrs

Skills

Data Center
Networking
Product Management

Career Highlights

Expert in managing large-scale AI networking projects
Pioneered open-source networking hardware initiatives
Led development of high-performance AI clusters

Work Experience

Open Compute Project

OCP Networking Project Co-Lead (8 yrs)

Big Switch Networks

Head of Product Management (2 yrs 7 mos)

Social startup

Founder (1 yr)

Avaya

Director of System Management/Assured Networks (4 yrs)

Routescience Technologies

R&D Director (4 yrs)

Cisco Systems

Engineering Manager (4 yrs)

NETSYS Technologies

Senior Software Engineer (1 yr 8 mos)

Make Systems

Software Engineer (2 yrs 8 mos)

Education

BS at Stanford University

Omar Baldonado

CTO

Palo Alto, California, United States32 yrs experience

Most Likely To SwitchAI Enabled

Key Highlights

Expert in managing large-scale AI networking projects
Pioneered open-source networking hardware initiatives
Led development of high-performance AI clusters

Stackforce AI infers this person is a leader in AI networking and data center infrastructure.

Contact

Skills

Core Skills

Data CenterNetworkingProduct Management

Other Skills

Cloud ComputingAI ClustersPerformance OptimizationOpen SourceDisaggregationSoftware-Defined NetworkingOpenFlowBusiness DevelopmentSocial CommerceVoIPProduct DevelopmentTraffic OptimizationReal-time NetworkingRoutingNetwork Configuration

About

Experience

32 yrs

Total Experience

4 yrs

Average Tenure

7 yrs 3 mos

Current Experience

Open compute project

OCP Networking Project Co-Lead

Aug 2014 – Aug 2022 · 8 yrs · Menlo Park

The Open Compute Project brings the concepts of open-source to the hardware industry. There are many projects in OCP, ranging from networking to data center design/mgmt, to servers and storage, to HPC and HW management: http://www.opencompute.org/
The OCP Networking Project focuses on the disaggregation and openness of network hardware and software. Meta helped form the networking project, and we have contributed our switch designs to OCP since 2015.

Open SourceNetworkingDisaggregation

Big switch networks

Head of Product Management

Oct 2010 – May 2013 · 2 yrs 7 mos

In product management, helped define, develop, and drive product from inception through company/product launch in the new space of Software-Defined Networking and OpenFlow, working with early customers to define the initial applications of network virtualization and scale-out monitoring
Part of initial engineering team that developed the first versions of the company's OpenFlow/SDN controller and managed early deployments

Product ManagementSoftware-Defined NetworkingOpenFlow

Social startup

Founder

Jan 2009 – Jan 2010 · 1 yr

Started company to enable small businesses to promote deals and use social commerce (early Facebook and iPhone apps). Prepared business plans, secured angel funding, worked with partners.

Business DevelopmentSocial Commerce

Avaya

Director of System Management/Assured Networks

Jan 2004 – Jan 2008 · 4 yrs

Brought real-time optimization to enterprises for apps like VoIP and videoconferencing
Led RouteScience product development and integration of company into Avaya. Worked cross-functionally to target RouteScience product at Avaya's communication applications (VoIP, videoconferencing, contact centers)
Led global team (multiple US locations, Germany, Israel, India) of over 100 engineers supporting enterprise communication management systems.

VoIPProduct DevelopmentProduct Management

Routescience technologies

R&D Director

Jan 2000 – Jan 2004 · 4 yrs

Reroute traffic in real-time to optimize performance, bandwidth, and cost
Part of initial startup team that designed/built multiple generations of application performance optimization devices. Developed fundamental algorithms and software for highly scalable/accurate Internet measurements and IP routing optimization with sub-second response times
Validated findings with Stanford researchers and major enterprises/service providers with global data center infrastructure
Multiple real-time networking measurement and optimization patents

Traffic OptimizationReal-time NetworkingNetworking

Cisco systems

Engineering Manager

Jan 1996 – Jan 2000 · 4 yrs

Led Windows-based Baseliner product (based on technology from NETSYS acquisition) to analyze router misconfigurations
Led cross-Cisco user interface group to develop common Java libraries for topology displays, reporting, wizards; developed UI for MPLS VPN provisioning system.

RoutingNetwork ConfigurationNetworking

Netsys technologies

Senior Software Engineer

Mar 1995 – Nov 1996 · 1 yr 8 mos

Built IP traffic collection system for simulation/analysis
Worked with Cisco on initial NetFlow spec & with RMON vendors to baseline enterprise traffic as input to routing simulation engines

IP Traffic CollectionRouting Simulation