Vansh Kapoor — AI Researcher

Hello! I’m Vansh Kapoor, a graduate student at Carnegie Mellon University pursuing a Master’s in Machine Learning. I completed my undergraduate studies at the Indian Institute of Technology Bombay (IITB) in Electrical Engineering with Honors. Do check out my academic webpage (https://vansh28kapoor.github.io/). During my undergraduate studies, I had the privilege of working with Prof. Jayakrishnan Nair and Prof. Nikhil Karamchandani. My research focused on a special case of discounted cost Partially Observable Markov Decision Processes (POMDPs), where the agent initially knows its state but remains uncertain about its exact state after taking an action unless a fixed cost is paid to reveal it. This work earned me the Undergraduate Research Award and was submitted as a first-author paper to AAAI-25. For my Bachelor’s thesis, I collaborated with Google Research to develop online learning algorithms aimed at mitigating coordinated bot attacks that spread rumors on social networks like YouTube. My algorithm outperformed existing greedy methods and scaled effectively to large networks. Additionally, I served as a teaching assistant for a graduate-level course in Error Correcting Codes, assisting with course preparation, grading, and holding weekly office hours for over 40 students. In the summer of 2023, I gained industry experience through an internship at Google as a Silicon Intern. There, I improved the design verification process by 15% using Python-based automation for toggle coverage analysis and developed automated checkers for data retention flops in low-power mode applications. I have also worked on various deep learning projects, including Text-to-Image Diffusion Models with Enhanced Semantic Understanding, Deep Recurrent Q-Networks (DRQN) for Partially Observable MDPs, CGAN-based Generative AI models, LSTM-based stock trading systems, and U-Net-based nuclei semantic segmentation.

Stackforce AI infers this person is a Machine Learning Engineer with expertise in AI research and algorithm development.

Location: Pittsburgh, Pennsylvania, United States

Experience: 1 yr 6 mos

Skills

Deep Reinforcement Learning
Natural Language Processing (nlp)
Machine Learning
Reinforcement Learning
Python (programming Language)
Teaching

Career Highlights

Developed innovative algorithms for social media rumor detection.
Improved design verification processes at Google by 15%.
Awarded Undergraduate Research Award for POMDP research.

Work Experience

Amazon

Applied Scientist Intern (3 mos)

Google

Research Collaborator (1 yr)

Hardware Engineering Intern (2 mos)

Indian Institute of Technology, Bombay

Graduate Teaching Assistant (5 mos)

Research Assistant (1 yr 7 mos)

Education

Master of Science - MS at Carnegie Mellon University

Bachelor of Technology - BTech at Indian Institute of Technology, Bombay

Vansh Kapoor

AI Researcher

Pittsburgh, Pennsylvania, United States1 yr 6 mos experience

Key Highlights

Developed innovative algorithms for social media rumor detection.
Improved design verification processes at Google by 15%.
Awarded Undergraduate Research Award for POMDP research.

Stackforce AI infers this person is a Machine Learning Engineer with expertise in AI research and algorithm development.

Contact

Skills

Core Skills

Deep Reinforcement LearningNatural Language Processing (nlp)Machine LearningReinforcement LearningPython (programming Language)Teaching

Other Skills

GNU/LinuxAcademic PublishingLarge Language Models (LLM)TensorFlowImage SegmentationGenerative Neural NetworksDeep LearningMachine Learning AlgorithmsLong Short-term Memory (LSTM)Deep Neural Networks (DNN)Computer ScienceData StructuresControl TheoryAlgorithmsGraph Theory

About

Experience

Amazon

Applied Scientist Intern

May 2025 – Aug 2025 · 3 mos · San Francisco Bay Area · On-site

Primary Guides: Prof. Aviral Kumar, Anurag Beniwal (Accepted at ICLR 2026)
Introduced TRIM, a step-level LLM routing framework for reasoning tasks (math, code) that prevents cascading failures by routing only critical steps to stronger models, while delegating routine continuations to smaller ones.
Designed routing policies ranging from threshold-based heuristics to RL-trained and POMDP-based controllers that use stepwise rewards and uncertainty estimates to make budget-aware, step-level intervention decisions.
Demonstrated up to 5× higher cost efficiency than SOTA routing baselines on challenging benchmarks such as AIME and MATH-500, while matching the accuracy of the expensive model using 80% fewer expensive-model tokens.

Deep Reinforcement LearningNatural Language Processing (NLP)

Google

2 roles

Research Collaborator

Aug 2023 – Aug 2024 · 1 yr

Guides: Dr. Manish Jain, Google AI Research
Prof. Nikhil Karamchandani, Electrical Engineering, IIT Bombay
Collaborated with Google AI Research to design effective models for analyzing contagion processes, particularly focusing on nullifying coordinated bot attacks that seek to spread rumors on social platforms like YouTube
Designed the Look-Ahead algorithm utilizing crowd signals for rumor detection with performance guarantees adapted for large-scale social networks

Machine LearningReinforcement Learning

Hardware Engineering Intern

May 2023 – Jul 2023 · 2 mos

Carried out toggle coverage analysis by utilizing Smart Exclusion feature involving python-based automation flows to optimize the design verification process
Designed automated checkers for flop retention during low-power mode
Developed a script responsible for parameter extraction and verification to ensure smooth parameter flow

Python (Programming Language)GNU/Linux

Indian institute of technology, bombay

2 roles

Graduate Teaching Assistant

Jul 2023 – Dec 2023 · 5 mos

Prepared solutions and evaluated quizzes/assignments on Error-Correcting Codes for a class of 40+ students
Held weekly doubt-solving sessions

Teaching

Research Assistant

Jan 2023 – Aug 2024 · 1 yr 7 mos

MDPs with State Sensing Costs (Accepted at AISTATS 2026)
Guide: Prof. Jayakrishnan Nair, Electrical Engineering, IIT Bombay
Developed algorithms and framework for analysis of Markov Decision Processes with state sensing costs
Formulated theorems to evaluate the optimal value policy within any specified error threshold ϵ by exploring a finite set of policies
Designed a computationally efficient heuristic algorithm, which performs close to the optimal policy in practice

Academic PublishingReinforcement Learning