Siddharth Choudhary — AI Researcher

I'm a Principal Applied Scientist at Amazon AGI, where I am a tech-lead on multimodal pre-training team for the Amazon Nova family of foundation models. My work focuses on developing next-generation AI systems that seamlessly integrate vision, speech, and language understanding with generative capabilities. Currently, I'm the tech lead for next generation Amazon Nova, architecting unified multimodal models that achieve state-of-the-art performance across understanding and generation tasks. My research spans the full spectrum of AI and computer vision—from foundation models and vision-language systems to robotics and SLAM. I've published at top-tier venues including CVPR, IJRR, ICRA, and Nature Digital Medicine, with over 1,200 citations and an h-index of 14. My work on multimodal hallucination control was featured at CVPR 2024 and highlighted in AWS's keynote presentation. Before Amazon, I was a Principal Computer Vision Engineer at Magic Leap, where I architected the 3D object recognition system deployed in Magic Leap One. I earned my Ph.D. in Computer Science from Georgia Tech, focusing on distributed algorithms for multi-robot SLAM systems. I'm passionate about pushing the boundaries of multimodal LLMs and building systems that bridge the gap between understanding and generation. Checkout https://itzsid.github.io/ for up to date information.

Stackforce AI infers this person is a leading expert in AI and computer vision with a focus on multimodal systems.

Location: Dublin, California, United States

Experience: 16 yrs 8 mos

Skills

Multimodal Ai
Foundation Models
Computer Vision
Machine Learning

Career Highlights

Tech lead for Amazon Nova family of foundation models.
Published at top-tier venues with over 1,200 citations.
Innovated multimodal AI systems integrating vision, speech, and language.

Work Experience

Amazon AGI

Principal Applied Scientist (1 yr 1 mo)

Senior Applied Scientist (1 yr 3 mos)

Amazon Web Services (AWS)

Senior Applied Scientist (1 yr 11 mos)

Amazon Lab126

Senior Applied Scientist (2 yrs 9 mos)

Magic Leap

Principal Computer Vision Researcher/Engineer (2 yrs 10 mos)

Fyusion, Inc

Research Intern (3 mos)

Georgia Institute of Technology

Graduate Research Assistant (5 yrs)

Google

Google Summer of Code Scholar (3 mos)

IIIT Hyderabad

Research Assistant (2 yrs 3 mos)

DrishtiCare

Research Intern (1 yr 1 mo)

Education

Doctor of Philosophy (PhD) at Georgia Institute of Technology

Master of Science (M.S.) at International Institute of Information Technology Hyderabad (IIITH)

Bachelor of Technology at International Institute of Information Technology Hyderabad (IIITH)

Siddharth Choudhary

AI Researcher

Dublin, California, United States16 yrs 8 mos experience

Most Likely To SwitchAI Enabled

Key Highlights

Tech lead for Amazon Nova family of foundation models.
Published at top-tier venues with over 1,200 citations.
Innovated multimodal AI systems integrating vision, speech, and language.

Stackforce AI infers this person is a leading expert in AI and computer vision with a focus on multimodal systems.

Contact

itzsid@gmail.com LinkedIn

Skills

Core Skills

Multimodal AiFoundation ModelsComputer VisionMachine Learning

Other Skills

Artificial IntelligenceAugmented Reality (AR)CC++CUDAData StructuresDeep LearningGenerative Pre-TrainingHigh Performance ComputingImage ProcessingLarge Language ModelsMatlabOpenCVOpenGLPattern Recognition

About

Experience

16 yrs 8 mos

Total Experience

2 yrs

Average Tenure

2 yrs 4 mos

Current Experience

Amazon agi

2 roles

Principal Applied Scientist

Promoted

Apr 2025 – Present · 1 yr 1 mo · San Francisco Bay Area · On-site

Amazon Nova 2.0 Lite: Tech-lead responsible for designing the multimodal architecture and finalizing the pretraining data mix and recipe integrating vision, speech and language understanding capabilities at scale. Launched at AWS re:Invent 2025. Achieves state-of-the-art performance on 13 out of 15 benchmarks vs Claude Haiku 4.5.
Amazon Nova 2.0 Omni: Led development of unified multimodal architecture integrating native image generation with text, vision, and speech understanding. Designed pre-training recipe that enable generative capabilities while preserving performance across all input modalities. Launched at AWS re:Invent 2025.

Multimodal AIFoundation ModelsComputer VisionLarge Language ModelsPre-trainingGenerative Pre-Training+2

Senior Applied Scientist

Jan 2024 – Apr 2025 · 1 yr 3 mos · San Francisco Bay Area · On-site

• Amazon Nova 1.0 Foundation Models: Tech-lead on multimodal pre-training team responsible for developing image/video pretraining recipes for Amazon Nova foundation models launched at AWS re:Invent 2024. Achieved state-of-the-art performance on benchmarks against Claude 3 Haiku and Gemini 1.5 Pro on multimodal understanding benchmarks.

Amazon web services (aws)

Senior Applied Scientist

May 2023 – Apr 2025 · 1 yr 11 mos · San Francisco, California, United States · Hybrid

Worked on multimodal foundation models research and development, focusing on vision-language model architectures and training methodologies.
Developed M3ID sampling method reducing VLM hallucinations by up to
28% without additional training. Published at CVPR 2024 and featured in AWS keynote at CVPR.

Amazon lab126

Senior Applied Scientist

Aug 2020 – May 2023 · 2 yrs 9 mos · Sunnyvale, California, United States · Hybrid

Halo Body
Lead Research and Development to release Waist-Hip ratio and body circumferences as a part of Body scan on Amazon Halo. WHR is a better indicator of health risks than BMI and BFP.
Trained an ML Model which is twice as accurate as state-of-the-art models for estimating body circumference.
Our model is trained using only synthetic data but generalizes well when tested on real smartphone captured images.
The results demonstrate that our system’s accuracy is considerably higher than that of self-reported manual measurements using tape measure and twice as accurate as other state-of-the-art models.
Published in Nature Digital Medicine (2023) and filed multiple patents.
Halo Trainer
Built transformer-based models for fitness activity understanding, including on-device real-time
repetition counting and form error detection. Won Best Paper Award at Amazon Computer Vision Conference 2023.

Magic leap

Principal Computer Vision Researcher/Engineer

Oct 2017 – Aug 2020 · 2 yrs 10 mos · Sunnyvale, California

3D Object Detection: Lead Architect and Developer for 3D Object Recognition feature running in Cloud and deployed in ML-19. The pipeline was designed and built from scratch.
Object Recognition algorithm is scalable in a large number of concurrent users, number of objects or size of map while maintaining a lower memory footprint.
Led the team through research and development phase to deliver Object Recognition for ML19 OTA 1.
Collaborated across other teams like Data, Cloud and Deep Learning teams to build evaluation and visualization tools.
Object Recognition is a key feature for ML19 and one of the most requested feature for ML2 from developer feedback.
Research paper is accepted at CVPR 2020 Workshop on AR/VR. Patent filed as well.
Learned Keyframe Selection: Designed and implemented a PointNet based neural network which learns a frame embedding given sparse feature descriptors to improve retrieval for localization. Improved the localization recall with respect to Bag of Words algorithm by 10-30% while reducing the memory requirement to store each keyframe by 75%. Optimized the network resulting in similar computational requirement as Bag of Words.
Scalable Infrastructure for SLAM Research: Designed and implemented a scalable infrastructure to extract data from various stages in the SLAM pipeline along with ground-truth. This enabled scalable training and evaluation of various SLAM related machine learning algorithms.

Augmented Reality (AR)Computer VisionMachine Learning

Fyusion, inc

Research Intern

May 2016 – Aug 2016 · 3 mos · San Francisco Bay Area

Research and development of algorithm to estimate the trajectory and stabilize loopy fyuses
using factor graphs.

Georgia institute of technology

Graduate Research Assistant

Aug 2012 – Aug 2017 · 5 yrs · Atlanta Metropolitan Area

Distributed Object-based SLAM: Developed distributed algorithms using Distributed Gauss-Seidel methods for multi-robot trajectory estimation with minimal information exchange. Extended framework to include object-level semantics for distributed object-based SLAM. Published extensively in IJRR, ICRA, IROS with more than 500 citations.
Memory-efficient SLAM: Proposed exactly sparse SLAM approach using multi-block Alternating Direction Method of Multipliers (ADMM) to enforce consistency among subgraphs.

Augmented Reality (AR)Computer VisionMachine Learning

Google

Google Summer of Code Scholar

May 2011 – Aug 2011 · 3 mos

Developed a generic interface for all the search functions in Point Cloud Library
Developed a fast Octree implementation on GPU.

Iiit hyderabad

Research Assistant

Apr 2010 – Jul 2012 · 2 yrs 3 mos · Greater Hyderabad Area

Bundle Adjustment on GPU: Developed hybrid CPU-GPU implementations of sparse bundle adjustment
achieving 30-40x speedup over standard CPU implementations on datasets with up to 500 images using NVIDIA Tesla C2050 GPU. Published in ECCV 2010 workshop on computer vision on GPUs.
Visibility Probability Structure from SfM Datasets: Developed visibility probability structures encoding
visibility information between points and cameras as conditional probabilities for improved image localization. Published at ECCV 2012 with 86+ citations.