Oana Platon

Senior Software Engineer

Redmond, Washington, United States20 yrs 3 mos experience
AI EnabledHighly Stable

Key Highlights

  • Expert in Machine Learning Infrastructure at Meta.
  • Proven track record in scaling distributed systems.
  • Strong leadership in technical strategy and team development.
Stackforce AI infers this person is a Cloud Computing and Machine Learning Infrastructure expert with a focus on scalable solutions.

Contact

Skills

Core Skills

Machine LearningInfrastructureDatabase ManagementCloud ComputingService ManagementWeb DevelopmentQuality Assurance

Other Skills

Machine Learning InfrastructureAI technologyCapacity optimizationGovernance excellenceSystem reliabilityModel Training systemsFeature PlatformsE2E ReliabilityML Training StackDistributed systemsHigh performance InfrastructureAzure Cosmos DBOpen source analytics toolsBig dataDatabase service

About

Senior Engineer Manager with a lot of passion for distributed computing and for growing people & enabling them to do their best work. Currently working on Machine Learning Infrastructure at #Facebook.

Experience

20 yrs 3 mos
Total Experience
7 yrs 2 mos
Average Tenure
5 yrs 11 mos
Current Experience

Meta

3 roles

Senior Software Engineer Manager

Promoted

Sep 2024Present · 1 yr 8 mos

  • As an Senior Engineering Manager at Meta, I empower teams to push the boundaries of what's possible in AI and Ads technology.
  • My teams support initiatives that transform how we build and scale Ads ML Infrastructure—from cutting-edge Model Training systems to robust Feature Platforms that power experiences for billions. My teams architect solutions that redefine capacity optimization, governance excellence, and system reliability at unprecedented scale.
  • What drives me: The intersection of breakthrough technology and exceptional people. I'm deeply invested in shaping technical strategy that moves the industry forward, forging cross-functional partnerships that amplify our impact, and cultivating teams where engineers thrive and grow into their full potential.
  • My goal: Infrastructure that doesn't just support Meta's AI ambitions—it accelerates them.
Machine Learning InfrastructureAI technologyCapacity optimizationGovernance excellenceSystem reliabilityMachine Learning+1

Software Engineering Manager

Jun 2020Present · 5 yrs 11 mos

  • My team is responsible for E2E Reliability for the ML Training Stack. The team is part of AI Infra, which builds the distributed, large scale Machine Learning Infrastructure @ Meta.
  • We work closely with our next gen Machine Learning Framework group to develop scalable, high performance Infrastructure solutions to scale Deep Learning, General ML training, and Inference at Meta Scale.
E2E ReliabilityML Training StackDistributed systemsHigh performance InfrastructureMachine LearningInfrastructure

Senior Software Engineer Manager

Jun 2020Present · 5 yrs 11 mos

Microsoft

3 roles

Software Design Engineer

Mar 2019Jun 2020 · 1 yr 3 mos

  • Working on Azure Cosmos DB, Microsoft's globally distributed, multi-model database service. With a click of a button, Cosmos DB enables you to elastically and independently scale throughput and storage across any number of Azure regions worldwide. #cosmosdb
  • Part of a new team that brings open source analytics tools to Cosmos DB in a managed and integrated fashion. #bigdata
Azure Cosmos DBOpen source analytics toolsBig dataDatabase ManagementCloud Computing

Software Design Engineer

Promoted

Nov 2008Mar 2019 · 10 yrs 4 mos

  • Technical leader for the Azure Service Fabric, the platform that provides support for writing and managing reliable and scalable services both on premise and in any cloud. Worked with my team on the Service Fabric System services, the core functionality of the cluster.
Azure Service FabricReliable servicesScalable servicesCloud ComputingService Management

Software Design Engineer

Oct 2006Nov 2008 · 2 yrs 1 mo

  • I worked in the ASP.NET stress team, making sure the web platform was reliable and scalable. The stress program ran different tests for a long period of time with the CPU used above 70% during all this period. WinDbg (Debugger Tools for Windows) was our best friend while investigating failures.
ASP.NETWinDbgStress testingWeb DevelopmentQuality Assurance

Politehnica university of bucharest

2 roles

Teaching assistant

Mar 2005Jul 2005 · 4 mos

  • Teaching assistant for the Functional Programming course.

Teaching assistant

Mar 2004Jul 2004 · 4 mos

  • Teaching assistant for the Numerical Calculus course.

Education

University POLITEHNICA of Bucharest

BS — Computer Science

Jan 2000Jan 2005

“I.L. Caragiale” National College, Ploiesti

Informatics

Jan 1996Jan 2000

Stackforce found 100+ more professionals with Machine Learning & Infrastructure

Explore similar profiles based on matching skills and experience