Raghav Gupta

AI Researcher

Mountain View, California, United States10 yrs 1 mo experience

Highly Stable

Key Highlights

Expert in NLP and LLM research.
Led multi-objective reinforcement learning projects.
Pioneered scalable dialog systems at Google.

Stackforce AI infers this person is a leading AI researcher specializing in NLP and reinforcement learning.

Contact

Skills

Core Skills

Natural Language Processing (nlp)Reinforcement LearningSupervised Learning

Other Skills

AlgorithmsCC++Computer VisionData AnalysisDeep LearningJavaLarge Language Models (LLM)LinuxMatlabProgrammingPythonResearchSynthetic Data Generation

About

ML researcher at Biohub (continuing on from EvolutionaryScale) working at the intersection of AI and biology. Previously at Google DeepMind and Google Research working on NLP and LLM research, focusing on LLM post-training and evaluation, reinforcement learning, conversational agents, and synthetic data. Before that, at Stanford and IIT Bombay studying computer science and artificial intelligence.

Experience

10 yrs 1 mo

Total Experience

2 yrs 5 mos

Average Tenure

5 mos

Current Experience

Biohub

Research Scientist

Dec 2025 – Present · 5 mos · Redwood City, California, United States · Hybrid

Evolutionaryscale

Member of Technical Staff

Sep 2025 – Dec 2025 · 3 mos · San Francisco, California, United States

Google deepmind

Research Engineer

May 2024 – Aug 2025 · 1 yr 3 mos · Mountain View, California, United States

Research and development in reinforcement learning and LLM-powered conversational agents (published work at AAAI'25, Findings EMNLP'24):
Multi-objective reinforcement learning for LLM alignment:
Co-led team of 10+ researchers on algorithms for multi-objective reinforcement learning (CLP) and preference optimization (MO-ODPO) for inference-time steerable LLMs with param/prompt-based conditioning. Results on Anthropic-HH and OpenAI summarization. (AAAI'25, Findings EMNLP'24)
Deployed CLP in AI Overview in Google Search; replaced manual reward tuning; accelerated development cycles by 30%.
Project Astra: Core contributor; TL for team of 4 improving tool use & reasoning capabilities for Astra across verticals.
Built cross-vertical framework for synthetic conversational tool use data with LLM self-play (task success 76 90%).
Developed multi-aspect, multi-turn LLM-as-a-judge framework; devised novel conversation-level and turn-level metrics.

Natural Language Processing (NLP)Supervised LearningSynthetic Data GenerationLarge Language Models (LLM)Reinforcement Learning

Google

Software Engineer

Aug 2017 – Apr 2024 · 6 yrs 8 mos · Mountain View, CA

Research and development in task-oriented dialog (TOD) systems and efficient BERT models deployed in multiple products. Published extensively at *ACL/EMNLP, AAAI:
Schema-guided dialog: Pioneered 'schema-guided' paradigm of TOD systems that scale with little/no training data.
Released dialog datasets: Schema-Guided Dialogue, MultiWoZ 2.2 (1.5k GitHub stars) (AAAI'22, NLP4ConvAI'22, AAAI'20)
Created SotA scalable dialog models for natural language understanding & state tracking. (EMNLP'23, NAACL'22, ACL'19)
On-device BERT modeling: Research lead on mixed-vocabulary BERT distillation.
Developed first sub-5 MB (unquantizated) BERT models (latency 97% vs. BERT-Base, negligible downstream accuracy loss). (EACL'21)
Launched distilled multilingual BERT to Voice Access (accuracy 63 96%, 200K DAU), Google Recorder (662K MAU).
Maintained BERT models at multiple sizes supporting 20+ partner product teams.
Conversational retrieval: Developed effective sparse document retrieval methods (train time 98% vs. dense retrievers) for customer support chats; SotA results on conversational recommendation. (NLP4ConvAI'23 - Outstanding Paper)

Natural Language Processing (NLP)Supervised LearningSynthetic Data GenerationLarge Language Models (LLM)Reinforcement Learning

Recruiter.ai

Data Science Intern

Jun 2016 – Sep 2016 · 3 mos · Palo Alto, CA

Applied deep learning and social network analysis to discover high-quality and relevant candidates from GitHub for the candidate search portal, and effected numerous improvements to the relevance ranking engine.

Stanford university

2 roles

Teaching Assistant

Jan 2016 – Jun 2017 · 1 yr 5 mos · Palo Alto, CA

Served as teaching assistant for
CS224S: Spoken Language Processing (Spring 2016-17)
CS124: From Languages to Information (Winter 2015-16 and Winter 2016-17)
CS154: Automata and Complexity Theory (Autumn 2016-17)

Research Assistant

Sep 2015 – Jun 2016 · 9 mos · Palo Alto, CA

Worked in the Stanford NLP Group on tree-structured neural network models with attention mechanisms for natural language inference (paper in ACL). Also worked on the self-training strategies for slot filling in knowledge base construction for the TAC-KBP challenge.

Bar-ilan university

Research Intern

Jun 2015 – Aug 2015 · 2 mos · Ramat Gan, Israel

Worked at the intersection of corpus creation, crowdsourcing and linguistic theory. Explored approaches to enlarge annotated treebank through MTurk, to be further used for the development of advanced parsing algorithms, using cues from linguistic theory and with minimally trained annotators. For this project, we focused on verb-particle constructions and light-verb constructions in English.

Samsung electronics

Software Engineering Intern

May 2014 – Jul 2014 · 2 mos · Gyeonggi, South Korea

Worked on standardization efforts for MPEG-DASH (Dynamic Adaptive Streaming over HTTP). Experimented with various DASH architectures on top of HTTP/2.0 and proposed a new framework better suited to the future of the web.

Institute of science and technology austria

Research Intern

May 2013 – Jul 2013 · 2 mos · Klosterneuburg, Austria

Worked on biological auction theory and combinatorial game theory
Generalized classical results for evolutionarily stable strategies in all-pay auctions from the one reward per auction case to the multiple rewards per auction case. Published in Proceedings of the Royal Society: Biological Sciences.
Devised and implemented a novel approximate algorithm to minimize the total expected cost of the almost-sure reachability objective in POMDPs. Papers in AAAI 2015 and ICRA 2015.