Vasu Sharma — AI Researcher

I am presently working as an Applied Research Scientist at Facebook AI research, working on building Multimodal foundational generative AI models. I am also interested in the domain of self supervised learning. I have published 80+ papers across top AI conferences like NeurIPS, CVPR, ACL, EMNLP, TMLR, ICLR, NAACL, COLM, WACV, Interspeech among others garnering over 9000+ citations.I routinely work with billion scale datasets to train these massive multimodal models. Previously I worked as Quantitative Researcher at Citadel where my work involved leveraging the power of Machine Learning and Statistical methods in an attempt to fathom the enigmatic world that is the financial markets. I have also worked at Amazon Alexa AI on large scale multimodal models and Embodied AI applications to bring smart robot intelligence to Alexa devices. I actively advise several early stage startups and often guest lecture at Stanford and CMU.I graduated from Indian Institute of Technology, Kanpur completing my Bachelors in Computer Science and Engineering and then completed my Masters in Machine Learning and Artificial Intelligence at the Language Technologies institute at Carnegie Mellon University.I am deeply passionate about research in my field. My research interests include: Deep Learning and it's uses in the field of Computer Vision, Speech and Music Processing and Natural Language Processing. My goal in life is to use technology to make this world a better place for everyone to live in. It is with this goal in mind that I work on several interesting projects which help me realize this dream, one step at a time.I have had the good fortune of working with some amazing people at some fantastic places and have learnt a lot from them. I hope to continue learning, travelling to new places and meeting new people. My mantra in life is- "Live life with passion - Love what you do, do what you love".Besides being a Technology enthusiast, I am also very passionate about sports. I was a part of the IIT Kanpur Aquatics team and love to swim, play Water Polo, Soccer and Cricket.I am also an ardent traveler and love exploring new places and cultures and love travelling the world making new friends on the way.

Stackforce AI infers this person is a leading expert in AI research and multimodal machine learning.

Location: Sunnyvale, California, United States

Experience: 8 yrs 9 mos

Skills

Generative Ai
Multimodal Machine Learning
Deep Learning
Machine Learning
Ai Strategies

Career Highlights

Published 80+ papers with over 9000 citations.
Expert in multimodal generative AI and self-supervised learning.
Advises startups on AI strategies and product development.

Work Experience

Algoverse

AI Research Director (1 yr)

Freelance

Startup Advisor (4 yrs 4 mos)

Citadel LLC

Quantitative Researcher (Non compete) (3 yrs)

Quantitative Researcher (2 yrs 7 mos)

Research Intern (3 mos)

Amazon Lab126

Applied Scientist (1 yr)

Carnegie Mellon University

Graduate Research Assistant (1 yr)

Graduate Research Assistant (9 mos)

Localite

Head of AI (1 yr)

EPFL (École polytechnique fédérale de Lausanne)

Research Intern (2 mos)

Abzooba

Research Consultant (11 mos)

University of Toronto

Research Intern (2 mos)

Xerox

Research Intern (0 mo)

Carnegie Mellon University

Research Intern (2 mos)

Winter Intern (0 mo)

Education

Masters in Artificial Intelligence (MLT) at Carnegie Mellon University

Bachelor of Technology (B.Tech.) at Indian Institute of Technology, Kanpur

Class XII at St.Columba's School, New Delhi

Vasu Sharma

AI Researcher

Sunnyvale, California, United States8 yrs 9 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

Published 80+ papers with over 9000 citations.
Expert in multimodal generative AI and self-supervised learning.
Advises startups on AI strategies and product development.

Stackforce AI infers this person is a leading expert in AI research and multimodal machine learning.

Contact

Skills

Core Skills

Generative AiMultimodal Machine LearningDeep LearningMachine LearningAi Strategies

Other Skills

Algorithm DesignAlgorithmsAudio ProcessingAutomatic Speech Recognition SystemsCC++CaffeComputer VisionData MiningData StructuresHTMLImage ProcessingLarge Language Models (LLM)MatlabNLP

About

Experience

8 yrs 9 mos

Total Experience

2 yrs 1 mo

Average Tenure

3 yrs 9 mos

Current Experience

Algoverse

AI Research Director

Jan 2024 – Jan 2025 · 1 yr · Remote

Led the development of a cutting-edge AI program to empower students with industry-relevant skills, leveraging top AI research experience from leading labs.
Implemented strategies to enhance students' prospects for admission to top universities and successful careers in the tech industry.
Collaborated with teams to create unparalleled opportunities in AI education, nurturing future innovators in the field.
Published papers:
FrontierScience Bench: Evaluating AI Research Capabilities in LLMs (ICML: REALM 2025)
Rosetta-PL: Propositional Logic as a Benchmark for Large Language
Model Reasoning: https://arxiv.org/pdf/2505.00001
FaceSafe: An Inpainting Pipeline for Privacy-Compliant Scalable Image Datasets (ICML 2025 : DIG-BUGS)
COREVQA: A Crowd Observation and Visual Entailment Visual Question Answering Benchmark (ICML 2025 : DIG-BUGS)
Pause-Tuning for Long-Context Comprehension: A Lightweight Approach to LLM Attention Recalibration (ICML 2025: LCFM 2025)
NovelHopQA: Diagnosing Multi-Hop Reasoning Failures in Long Narrative Contexts (ICML 2025: LCFM 2025)
Rewrite-to-Rank: Optimizing Ad Visibility via Retrieval-Aware Text Rewriting (ICML 2025: MoFA 2025)
TRUTH DECAY: Quantifying Multi-Turn Sycophancy in Language Models:https://arxiv.org/pdf/2503.11656
Deconstructing bias: A multifaceted framework for diagnosing cultural and compositional inequities in text-to-image generative models: https://arxiv.org/pdf/2505.01430
Advancing Uto-Aztecan Language Technologies:
A Case Study on the Endangered Comanche Language: https://aclanthology.org/2025.americasnlp-1.4.pdf

Freelance

Startup Advisor

Jan 2022 – Present · 4 yrs 4 mos

Advised multiple startups and VCs on technical and AI strategies for scaling from ideation to series A/B/C stages.
Designed AI infrastructure and built AI native products to drive customer acquisition and growth.
Facilitated connections with VCs and assisted in hiring top talent to support company expansion.

AI strategiesTechnical advisingStartup growth

Citadel llc

3 roles

Quantitative Researcher (Non compete)

Aug 2021 – Aug 2024 · 3 yrs

Quantitative Researcher

Jan 2019 – Aug 2021 · 2 yrs 7 mos

Research Intern

May 2018 – Aug 2018 · 3 mos · Greater Chicago Area

Working on using Machine Learning/ Deep Learning techniques to better model Financial Time series data and ensure scalability of the algorithms to arbitrary number of input features

Amazon lab126

Applied Scientist

Aug 2021 – Aug 2022 · 1 yr · Sunnyvale, California, United States · On-site

Working with Alexa AI on a variety of problems like:
A dialog enabled visual-language navigation bot leveraging the multimodal data sources to faithfully navigate a virtual environment based on user instruction. Created a new benchmark for visual language navigation as a part of the Alexa Prize Simbot challenge and designed benchmark models for the same
Designing efficient multimodal transformers to speed up their training and deployment by improving the computational complexity of the self attention mechanism
Video processing applications like video action recognition, video question answering, video summarization, moment retrieval etc working directly with compressed video streams
Created a benchmark for cooperative heterogenous multi agent reinforcement learning platform including open sourcing the collected dataset and it's benchmark models
Working on creating a massively multimodal transformer pipeline capable of handling a wide range of input modalities with modality agnostic transformer blocks which work well for a several tasks leveraging a multitude of modalities
Published list of papers:
Alexa arena: A user-centric interactive platform for embodied ai (Published at Neurips 2023) (https://www.amazon.science/publications/alexa-arena-a-user-centric-interactive-platform-for-embodied-ai)
Alexa, play with robot: Introducing the first Alexa Prize SimBot Challenge on embodied AI (https://www.amazon.science/alexa-prize/proceedings/alexa-play-with-robot-introducing-the-first-alexa-prize-simbot-challenge-on-embodied-ai)
CHMARL: A Multimodal Benchmark for Cooperative, Heterogeneous Multi-Agent Reinforcement Learning (Published at RSS 2022) (https://www.amazon.science/publications/chmarl-a-multimodal-benchmark-for-cooperative-heterogeneous-multi-agent-reinforcement-learning)
ε-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer (Published at WACV 2024) (https://arxiv.org/abs/2311.17267)

Carnegie mellon university

2 roles

Graduate Research Assistant

Aug 2018 – Aug 2019 · 1 yr · Multicomp Lab, Language Technologies Institute, School of Computer Science

I worked with Prof. Loius Phillipe Morency on a wide range of project related to Multi Modal Machine Learning and on building robust, explainable Deep Learning models
We designed adversarial attack mechanisms on Visual Question Answering models to identify their vulnerabilities and then fix the same by proposing a variety of robust training mechanisms.
We worked on developing a neural network model which uses a Deep Convolutional Neural Network based pipeline alongside a geometrically conditioned point distribution model for Facial Landmark Detection.
We alsondeveloped the first fully ecologically validated models of visual perception. We will combine intracranial EEG (iEEG)recordings captured during long stretches of natural visual behavior with cutting-edge computer vision, machine learning, and statistical analyses to understand the neural basis of natural, real-world visual perception.
We also explored facial expression recognition in extreme face scenarios like profile face views, occluded faces, non centric and rotated faces alongside recognition for gender, age and racially diverse faces.

Graduate Research Assistant

Aug 2017 – May 2018 · 9 mos · Articulab, Language Technologies Institute, School of Computer Science

I worked on the SARA and the Yahoo! InMind projects at the ArticuLab which focus on building a socially aware robotic assistant. My primary focus was on trying to combine the user’s visual, vocal and verbal cues to better gauge the ‘rapport’ between the user and the conversational agent and using it to enable the agent to become socially more aware to the user’s emotional needs.

Localite

Head of AI

May 2017 – May 2018 · 1 yr · Los Angeles Metropolitan Area · Remote

Founding team at Localite Inc - a tours and activities marketplace for connecting people with local tour agencies and local tour guides. Raised $200k in pre-seed funding.
Implemented the core recommendation systems, feed ranking and search retrieval systems.

Epfl (école polytechnique fédérale de lausanne)

Research Intern

May 2017 – Jul 2017 · 2 mos · Lausanne, Vaud, Switzerland · On-site

Worked on learning unsupervised document embeddings using Continuous Bag of Words model implemented in a Deep Convolutional Neural Network framework and using transfer learning techniques to make cross domain use of these generalized embeddings to demonstrate improved performance on a wide array of tasks like similarity matching, sentiment analysis etc.

Abzooba

Research Consultant

Aug 2016 – Jul 2017 · 11 mos · Milpitas, California, United States · Remote

> Worked on building "A Smart E-commerce Virtual Assistant"
> Implemented features like cloth parsing from images, similar image retrieval from a huge fashion catalogue, a state of the art Deep Recommender system and a multi turn conversational voice agent to facilitate user interaction
> "Query based document retrieval" by learning rich semantic document embeddings using a deep LSTM pipeline and using these to find the match the queries to relevant documents
> "Abstractive summarization using Attention based encoder-decoder networks": Worked on building a deep residual LSTM pipeline which used temporal attention over both encoder and decoder networks to generate an abstractive summary of documents.

University of toronto

Research Intern

May 2016 – Jul 2016 · 2 mos · Greater Toronto Area, Canada · On-site

Research intern with the Computer Vision and Machine Learning group with Raquel Urtasun and Sanja Fidler in Geofffery Hinton's lab.
Worked on the problem of Instance and semantic segmentation from videos with direct applications in Autonomous driving and video surveillance.
Implemented a 2 stream network to combine base Segmentation masks generated by Deep Convolutional -
Deconvolutional Neural Networks and optical flow information obtained via implementing FlowNet based on Deep CNN's to achieve improved performance on video semantic segmentation task.

Xerox

2 roles

Research Intern

Jan 2015 – Jan 2015 · 0 mo

Worked as a research intern with the Computer Vision team at Xerox Research Europe, working on building deep learning frameworks for large scale object recognition.

Research Intern

Jan 2015 – Jan 2015 · 0 mo

Worked with the Speech and Signal processing team at XRCI to create Deep Neural Network based Speech Recognition systems.
Worked on 3 projects during this internship: “Application of Deep Learning for Automatic Speech
Recognition”, “ A comprehensive analysis of Activation Functions in Deep Nets” and “A new
hashing technique to enhance Deep Net performance ”. Also got the Best Project award for the same.
The projects primarily focused on constructing Deep Learning frameworks for Speech Recognition. -The Internship provided me extensive research and coding experience of how to efficiently train Deep Nets.

Carnegie mellon university

2 roles

Research Intern

May 2014 – Jul 2014 · 2 mos · Pittsburgh, Pennsylvania, United States · On-site

Worked on exploring Applications of Deep Learning to Audio and Speech Signal Processing, particularly exploring the use of Gated Recurrent Neural Network for denoising speech signals