Atul J.

Software Engineer

New York, New York, United States13 yrs 8 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Led AI infrastructure initiatives at Meta.
  • Developed scalable machine learning systems.
  • Improved GPU reliability and efficiency.
Stackforce AI infers this person is a Backend-focused Software Engineer with expertise in Machine Learning and Infrastructure Engineering.

Contact

Skills

Core Skills

Machine LearningInfrastructure EngineeringBackend DevelopmentMobile Development

Other Skills

APIsAcoustic GuitarAlgorithmsAlgorithms and optimisationsAndroidAndroid DevelopmentApache StormBig DataCC++Cloud ComputingComputer ArchitectureComputer ScienceCore SystemsData Structures

About

Lead in AI Training focussed on Reliability and Efficiency at Meta's AI Training Infrastructure Organization.

Experience

Meta

Software Engineer

Oct 2016Present · 9 yrs 5 mos · New York, United States · On-site

  • 2025 - Present:
  • Technical Leader for Ads ML Infrastructure, powering Ads and Product Ranking at Meta's Scale.
  • 2020 - 2024:
  • Technical lead for ML training in the Meta's AI Infra org.
  • Dedicated to building highly reliable and efficient Machine Learning systems, with a focus on proactively monitored infrastructure. This enables machine learning engineers to concentrate on advancing the state of the art and creating AI-powered experiences that delight end users.
  • My work spans multiple layers of AI Infrastructure, from hardware deployment and PyTorch core development to front-end ML authoring and evaluation frameworks. I specialize in developing fault-tolerant and elastic distributed training systems, enabling production jobs to scale across thousands of machines.
  • Spearheaded an organization-wide cultural shift towards enhancing the reliability and efficiency of GPUs, fostering an operational mindset across teams.
  • 2016 - 2020:
  • Systems Engineer Lead focused on Performance, reliability and efficiency of our mobile applications. Tackling problems such as cold start time, scroll performance, battery drain and on-device storage, through proactive monitoring, and data driven architectural changes.
C++PythonMachine LearningDistributed SystemsFault-tolerant SystemsGPU Reliability+2

Right relevance, inc

Software Engineer

Jun 2015Sep 2016 · 1 yr 3 mos

  • Handling the backend: Search, API Servers, APIs, Backend services, databases.
Backend DevelopmentAPIsDatabases

Facebook

Software Engineer Intern

May 2014Aug 2014 · 3 mos · New York, United States · On-site

  • Worked in the Core Systems to develop on device storage for all Facebook's mobile applications.
Core SystemsMobile ApplicationsMobile Development

Mozilla

Contributor

Jun 2012Feb 2014 · 1 yr 8 mos · San Francisco Bay Area

  • Worked as a contributor on various projects like Firefox download module, file link in instant bird and Thunderbird.
FirefoxThunderbird

Google

Google Summer of Code with Mozilla

May 2012Aug 2012 · 3 mos · Home

  • Worked with Mozilla to improve the storage mechanism of Thunderbird, resulting in 97X reduction in storage space and threading email support.
Storage MechanismEmail Support

Thewittyshit

Business Development Associate

Feb 2011May 2012 · 1 yr 3 mos · Greater Delhi Area

Education

Indian Institute of Technology, Delhi

Master's degree — Computer Science and Engineering

Jan 2010Jan 2015

Indian Institute of Technology, Delhi

Bachelor's degree — Computer Science and engineering

Jan 2010Jan 2015

Stackforce found 100+ more professionals with Machine Learning & Infrastructure Engineering

Explore similar profiles based on matching skills and experience