Siddharth Choudhary

AI Researcher

Dublin, California, United States16 yrs 6 mos experience
Most Likely To SwitchAI ML Practitioner

Key Highlights

  • Tech lead for Amazon Nova family of foundation models.
  • Published at top-tier venues with over 1,200 citations.
  • Innovated multimodal AI systems integrating vision, speech, and language.
Stackforce AI infers this person is a leading expert in AI and computer vision with a focus on multimodal systems.

Contact

Skills

Core Skills

Multimodal AiFoundation ModelsComputer VisionMachine Learning

Other Skills

Artificial IntelligenceAugmented Reality (AR)CC++CUDAData StructuresDeep LearningGenerative Pre-TrainingHigh Performance ComputingImage ProcessingLarge Language ModelsMatlabOpenCVOpenGLPattern Recognition

About

I'm a Principal Applied Scientist at Amazon AGI, where I am a tech-lead on multimodal pre-training team for the Amazon Nova family of foundation models. My work focuses on developing next-generation AI systems that seamlessly integrate vision, speech, and language understanding with generative capabilities. Currently, I'm the tech lead for next generation Amazon Nova, architecting unified multimodal models that achieve state-of-the-art performance across understanding and generation tasks. My research spans the full spectrum of AI and computer vision—from foundation models and vision-language systems to robotics and SLAM. I've published at top-tier venues including CVPR, IJRR, ICRA, and Nature Digital Medicine, with over 1,200 citations and an h-index of 14. My work on multimodal hallucination control was featured at CVPR 2024 and highlighted in AWS's keynote presentation. Before Amazon, I was a Principal Computer Vision Engineer at Magic Leap, where I architected the 3D object recognition system deployed in Magic Leap One. I earned my Ph.D. in Computer Science from Georgia Tech, focusing on distributed algorithms for multi-robot SLAM systems. I'm passionate about pushing the boundaries of multimodal LLMs and building systems that bridge the gap between understanding and generation. Checkout https://itzsid.github.io/ for up to date information.

Experience

Amazon agi

2 roles

Principal Applied Scientist

Promoted

Apr 2025Present · 11 mos · San Francisco Bay Area · On-site

  • Amazon Nova 2.0 Lite: Tech-lead responsible for designing the multimodal architecture and finalizing the pretraining data mix and recipe integrating vision, speech and language understanding capabilities at scale. Launched at AWS re:Invent 2025. Achieves state-of-the-art performance on 13 out of 15 benchmarks vs Claude Haiku 4.5.
  • Amazon Nova 2.0 Omni: Led development of unified multimodal architecture integrating native image generation with text, vision, and speech understanding. Designed pre-training recipe that enable generative capabilities while preserving performance across all input modalities. Launched at AWS re:Invent 2025.
Multimodal AIFoundation ModelsComputer VisionLarge Language ModelsPre-trainingGenerative Pre-Training+2

Senior Applied Scientist

Jan 2024Apr 2025 · 1 yr 3 mos · San Francisco Bay Area · On-site

  • • Amazon Nova 1.0 Foundation Models: Tech-lead on multimodal pre-training team responsible for developing image/video pretraining recipes for Amazon Nova foundation models launched at AWS re:Invent 2024. Achieved state-of-the-art performance on benchmarks against Claude 3 Haiku and Gemini 1.5 Pro on multimodal understanding benchmarks.

Amazon web services (aws)

Senior Applied Scientist

May 2023Apr 2025 · 1 yr 11 mos · San Francisco, California, United States · Hybrid

  • Worked on multimodal foundation models research and development, focusing on vision-language model architectures and training methodologies.
  • Developed M3ID sampling method reducing VLM hallucinations by up to
  • 28% without additional training. Published at CVPR 2024 and featured in AWS keynote at CVPR.

Amazon lab126

Senior Applied Scientist

Aug 2020May 2023 · 2 yrs 9 mos · Sunnyvale, California, United States · Hybrid

  • Halo Body
  • Lead Research and Development to release Waist-Hip ratio and body circumferences as a part of Body scan on Amazon Halo. WHR is a better indicator of health risks than BMI and BFP.
  • Trained an ML Model which is twice as accurate as state-of-the-art models for estimating body circumference.
  • Our model is trained using only synthetic data but generalizes well when tested on real smartphone captured images.
  • The results demonstrate that our system’s accuracy is considerably higher than that of self-reported manual measurements using tape measure and twice as accurate as other state-of-the-art models.
  • Published in Nature Digital Medicine (2023) and filed multiple patents.
  • Halo Trainer
  • Built transformer-based models for fitness activity understanding, including on-device real-time
  • repetition counting and form error detection. Won Best Paper Award at Amazon Computer Vision Conference 2023.

Magic leap

Principal Computer Vision Researcher/Engineer

Oct 2017Aug 2020 · 2 yrs 10 mos · Sunnyvale, California

  • 3D Object Detection: Lead Architect and Developer for 3D Object Recognition feature running in Cloud and deployed in ML-19. The pipeline was designed and built from scratch.
  • Object Recognition algorithm is scalable in a large number of concurrent users, number of objects or size of map while maintaining a lower memory footprint.
  • Led the team through research and development phase to deliver Object Recognition for ML19 OTA 1.
  • Collaborated across other teams like Data, Cloud and Deep Learning teams to build evaluation and visualization tools.
  • Object Recognition is a key feature for ML19 and one of the most requested feature for ML2 from developer feedback.
  • Research paper is accepted at CVPR 2020 Workshop on AR/VR. Patent filed as well.
  • Learned Keyframe Selection: Designed and implemented a PointNet based neural network which learns a frame embedding given sparse feature descriptors to improve retrieval for localization. Improved the localization recall with respect to Bag of Words algorithm by 10-30% while reducing the memory requirement to store each keyframe by 75%. Optimized the network resulting in similar computational requirement as Bag of Words.
  • Scalable Infrastructure for SLAM Research: Designed and implemented a scalable infrastructure to extract data from various stages in the SLAM pipeline along with ground-truth. This enabled scalable training and evaluation of various SLAM related machine learning algorithms.
Augmented Reality (AR)Computer VisionMachine Learning

Fyusion, inc

Research Intern

May 2016Aug 2016 · 3 mos · San Francisco Bay Area

  • Research and development of algorithm to estimate the trajectory and stabilize loopy fyuses
  • using factor graphs.

Georgia institute of technology

Graduate Research Assistant

Aug 2012Aug 2017 · 5 yrs · Atlanta Metropolitan Area

  • Distributed Object-based SLAM: Developed distributed algorithms using Distributed Gauss-Seidel methods for multi-robot trajectory estimation with minimal information exchange. Extended framework to include object-level semantics for distributed object-based SLAM. Published extensively in IJRR, ICRA, IROS with more than 500 citations.
  • Memory-efficient SLAM: Proposed exactly sparse SLAM approach using multi-block Alternating Direction Method of Multipliers (ADMM) to enforce consistency among subgraphs.
Augmented Reality (AR)Computer VisionMachine Learning

Google

Google Summer of Code Scholar

May 2011Aug 2011 · 3 mos

  • Developed a generic interface for all the search functions in Point Cloud Library
  • Developed a fast Octree implementation on GPU.

Iiit hyderabad

Research Assistant

Apr 2010Jul 2012 · 2 yrs 3 mos · Greater Hyderabad Area

  • Bundle Adjustment on GPU: Developed hybrid CPU-GPU implementations of sparse bundle adjustment
  • achieving 30-40x speedup over standard CPU implementations on datasets with up to 500 images using NVIDIA Tesla C2050 GPU. Published in ECCV 2010 workshop on computer vision on GPUs.
  • Visibility Probability Structure from SfM Datasets: Developed visibility probability structures encoding
  • visibility information between points and cameras as conditional probabilities for improved image localization. Published at ECCV 2012 with 86+ citations.

Drishticare

Research Intern

Jun 2009Jul 2010 · 1 yr 1 mo · Hyderabad, Telangana, India

  • Research and development of medical image analysis algorithms for early detection of Diabetic Retinopathy

Education

Georgia Institute of Technology

Doctor of Philosophy (PhD) — Computer Science

Jan 2012Jan 2017

International Institute of Information Technology Hyderabad (IIITH)

Master of Science (M.S.) — Computer Science and Engineering

Jan 2010Jan 2012

International Institute of Information Technology Hyderabad (IIITH)

Bachelor of Technology — Computer Science and Engineering

Jan 2006Jan 2010

Stackforce found 100+ more professionals with Multimodal Ai & Foundation Models

Explore similar profiles based on matching skills and experience