Varun Prashant Gangal — AI Researcher

At Patronus AI, I do research on language generation & LLM evaluation, with a specific focus on devising novel envs to further abilities of Large Language Models (LLMs) as well as more generally, LLM-driven agentic AI frameworks and systems. Before Patronus AI, I was an AI/NLP/LLM researcher at Amazon AGI , NYC researching LLM evaluation & post-training [reasoning benchmarks, reward models, LLM Judges, long context ability]. I was a contributor to the Nova family of LLMs (https://tinyurl.com/5ea726cv) Before Amazon AGI, I was a Research Scientist at ASAPP Inc, NYC from Dec'22 - Aug'24, performing both research & product dev in the broad areas of generative AI/ NLP [Natural Language Processing], exploring & solving problems in LLMs, efficient training & finetuning, prompt optimization, data augmentation inter alia. Formerly, I did my PhD from 2016-22 at the CMU Language Technologies Institute (LTI) , advised by Prof Ed Hovy, with research foci in Natural Language Generation and Data Augmentation. Some of my recent and current work has been focussed on: 1. How well LLMs can meta-critique LLM-driven agentic traces (https://huggingface.co/papers/2505.08638) 2. Aligning LLMs for Task Dialog [Multi Turns + Tool-Call] Settings, to appear at Findings of ACL 2025 (Preprint here: https://arxiv.org/abs/2409.04617) 3. Efficient ML / Accelerating NN Training E.g., DYAD, a blocksparse approx. for neural net linear layers, NEURIPS'23 WANT WS (Paper here: https://openreview.net/forum?id=obE6BSiUjt OR surf X thread at https://x.com/VarunGangal/status/1727366831575347468) 4. Creative NLG/ Generative AI Generating creative language artifacts such as tongue twisters (accepted at EACL 2023) (Paper: https://arxiv.org/pdf/2209.06275.pdf) & personification (accepted at COLING'22) (Paper: https://aclanthology.org/2022.coling-1.547/ ) 5. LLM Robustness, Alignment & Safety a) Detecting hallucinations through counterfactual data synthesis (Our paper at ACL 2024 Findings: https://openreview.net/forum?id=T1kZ0tdOtZM [Anon preprint]) b) Designing jailbreak filters, unit testing frameworks inter alia to ensure reliable, safe and compliant behaviour of LLM powered AI systems c). Detecting Euphemisms - our team's system EUREKA placed first in the Figlang '22 task, co- located with EMNLP'22 (Paper: https://aclanthology.org/2022.flp-1.15.pdf) You can access a list of [and links to] my research on my Google Scholar page (https://scholar.google.com/citations?user=rWZq2nQAAAAJ&hl=en)

Stackforce AI infers this person is a leading expert in AI/NLP with a focus on generative models and LLMs.

Location: Jersey City, New Jersey, United States

Experience: 13 yrs 1 mo

Career Highlights

Expert in LLM evaluation and generative AI frameworks.
Contributed to multiple high-impact AI research publications.
Strong background in data augmentation and NLP techniques.

Work Experience

Patronus AI

Research Scientist (1 yr)

Amazon AGI

Applied Scientist II (8 mos)

ASAPP

Research Scientist (1 yr 8 mos)

Ai2

Summer Research Intern (3 mos)

Facebook

Summer Intern (3 mos)

Snap Inc.

Summer Research Intern (3 mos)

Carnegie Mellon University - School of Computer Science - Language Technologies Institute

Teaching Assistant, Neural Networks for NLP (4 mos)

Teaching Assistant, Grammars & Lexicons (4 mos)

PhD [Defended 30th September] (6 yrs)

Indian Institute of Technology, Madras

Teaching Assistant, Introduction to Machine Learning MOOC, NPTEL (5 mos)

Teaching Assistant, Reinforcement Learning (5 mos)

Teaching Assistant, Introduction To Machine Learning (4 mos)

IBM Research India

Research Intern (3 mos)

Microsoft

Summer Intern at Bing Ads (2 mos)

TCS Innovation Labs

Summer Intern (2 mos)

The Fifth Estate, IIT Madras

Correspondent, Science Writer (3 yrs)

Education

Doctor of Philosophy (Ph.D.) at Carnegie Mellon University

B.Tech + M.Tech (Dual Degree) at Indian Institute of Technology, Madras

Varun Prashant Gangal

AI Researcher

Jersey City, New Jersey, United States13 yrs 1 mo experience

AI EnabledAI ML Practitioner

Key Highlights

Expert in LLM evaluation and generative AI frameworks.
Contributed to multiple high-impact AI research publications.
Strong background in data augmentation and NLP techniques.

Stackforce AI infers this person is a leading expert in AI/NLP with a focus on generative models and LLMs.

Contact

varun.gangal@patronus.ai LinkedIn

Skills

Other Skills

AdvertisingArtificial IntelligenceBERTCC++Data MiningEnglish LiteratureGenerative Adversarial Networks (GANs)Graph TheoryHadoopJavaLaTeXLinear AlgebraLinuxMachine Learning

About

Experience

13 yrs 1 mo

Total Experience

1 yr 7 mos

Average Tenure

1 yr

Current Experience

Patronus ai

Research Scientist

May 2025 – Present · 1 yr · New York, New York, United States · On-site

Amazon agi

Applied Scientist II

Sep 2024 – May 2025 · 8 mos · New York, New York, United States · Hybrid

Was a part of the team that contributed to [and a co-author on] the Nova family of LLMs [1]. Specifically worked closely on Evaluation and Post-Training stages, particularly on a) Long-Context Evaluation b) Backtracking / Linguistic Puzzles (Connections, Wordle, Einstein Puzzles) c) Early releasing, deploying and analyzing feedback for the Nova model family to LMSysArena
[1]: https://www.amazon.science/publications/the-amazon-nova-family-of-models-technical-report-and-model-card

Asapp

Research Scientist

Dec 2022 – Aug 2024 · 1 yr 8 mos · New York City Metropolitan Area · Hybrid

I spent 1.75 exciting years at ASAPP NYC as a AI/NLP Research Scientist --- building , post-training, evaluating, guardrailing and customizing end-to-end LLM-driven customer support agents for B2B/enterprise usecases, capable of performing rich multi-turn dialog interwoven with tool-calling / API-calling to execute complex actions, while being knowledge-aware as well as policy-aware.
More broadly, I had contributed code directly to ASAPP's GenerativeAgent product while also making core research contributions on post-training on multi-turn dialog trees (https://arxiv.org/abs/2409.04617) [Findings of ACL '25] , co-training hallucination detectors (https://aclanthology.org/2024.findings-acl.789) [Findings of ACL'24 ] and scaling LLMs for low-GPU inference through Block Sparsity (https://neurips.cc/virtual/2023/80692) [NEURIPS'23 WS on Advancing NN Training].

Ai2

Summer Research Intern

May 2021 – Aug 2021 · 3 mos · Pittsburgh, Pennsylvania, United States · Remote

I was a research intern in the Semantic Scholar team, advised by Iz Beltagy and Arman Cohan.

Facebook

Summer Intern

May 2019 – Aug 2019 · 3 mos · Menlo Park, California, United States

I worked with the Facebook Conversational AI (Assistant) team, with my specific project being "Unsupervised OOD Detection For Task Based Dialog". Further details below:
Mentors : Sonal Gupta, Arash Einolghozati, Abhinav Arora May – August 2019
◦ Task-based dialog systems on deployment often get user inputs which aren’t actually intents pertaining to any domain, such as rhetorical remarks, subjective questions and ill-specified search queries.
◦ If not filtered, these inputs can wreak havoc on downstream components like slot detection.
Furthermore, it is infeasible to curate training data for these “OOD” inputs. Hence, we need
unsupervised approaches to detect these at test-time jointly with intent classification.
◦ We explore likelihood ratio with a background likelihood as an alternative to plain likelihood. We
find this to consistently improve OOD detection for multiple types of likelihood functions.
◦ We propose learning a generative classifier and computing a marginal likelihood (ratio) for OOD
detection. This outperforms approaches based on simple likelihood as well as discriminative classifiers.
◦ The project culminated in the publication below
Likelihood Ratios and Generative Classifiers For Unsupervised OOD Detection In Task-Based Dialog
Varun Gangal, Abhinav Arora, Arash Einolghozati, Sonal Gupta
Accepted for AAAI 2020

Snap inc.

Summer Research Intern

May 2018 – Aug 2018 · 3 mos · Los Angeles, California, United States · On-site

I was a summer research intern at Snap Research's Venice Beach office [now moved to Santa Monica].
I worked with William Brendel , Luis Marujo and Leonardo Neves on text style transfer and creative text generation.

Carnegie mellon university - school of computer science - language technologies institute

3 roles

Teaching Assistant, Neural Networks for NLP

Jan 2018 – May 2018 · 4 mos

Teaching Assistant, Grammars & Lexicons

Aug 2017 – Dec 2017 · 4 mos

PhD [Defended 30th September]

Sep 2016 – Sep 2022 · 6 yrs

I did my PhD between 2016-22 with Language Technologies Institute @ CMU, advised by Prof. Ed Hovy.
My research was broadly on generative models for Natural Language; often with pre-InstructGPT LLMs such as GPT-2/T5/BART, with specific task domains I explored including style transfer, data-to-text generation and low-resource & creative generation. A key theme of my work been to equip NLG with style, creativity and commonsense.
Corollary of my interest in low-resource generation, I increasingly delved into data augmentation (DA), leading to many fruitful directions:
Crafting lightweight DA methods to finetune base LLMs e.g. GPT-2.
Augmenting references for dialog generation, improving evaluation by automatic metrics.
A well-received survey on DA in NLP
DA for improving commonsense plausibility of Concept-to-Text Generation (🏆 Best Long Paper @ INLG'21)
Corollary of my interest in narrative, I investigated probing extra-sentential abilities of representations, such as finding event arguments and infilling sentences a.k.a ``sentence cloze".
I've also co-organized many collaborative research efforts:
1. The Controllable Generative Modelling in Language and Vision Workshop (CtrlGen) at NEURIPS'21, which aimed to explore controllability, disentanglement and manipulation for NLP and CV tasks.
2. GEM benchmark, associated workshops@ACL'21 and EMNLP'22, and paper for better and standardized evaluation and comparison of NLG models - a parallel to GLUE for generation
The challenge sets module of GEM, where we built domain-shifted sets for NLG tasks, using many perturbation, sub-selection and other domain shift methods; and its companion work was accepted @ NEURIPS'21 Datasets.
3. The NL-Augmenter participative repo, which provides a structure for NLPers to contribute and evaluate task-specific DA methods a.k.a transformations. We create a large, usable suite of 150+ augmentation leveraging wisdom-of-the-crowd - opening doors to systematic deployment of DA.

Indian institute of technology, madras

3 roles

Teaching Assistant, Introduction to Machine Learning MOOC, NPTEL

Dec 2015 – May 2016 · 5 mos · Chennai

Teaching Assistant, Reinforcement Learning

Dec 2015 – May 2016 · 5 mos · Chennai

Teaching Assistant, Introduction To Machine Learning

Aug 2015 – Dec 2015 · 4 mos · Chennai

Ibm research india

Research Intern

May 2015 – Aug 2015 · 3 mos · Bengaluru Area, India

The internship was funded as part of a Joint Science Project between IIT Madras and IBM Research. We explored the problem of defining cooperative game theoretic centrality measures for signed networks. The work led to publications in the NIPS 2015 Networks Workshop, AAAI 2016 Student Abstract as well as one of the IJCAI workshops in 2016.

Microsoft

Summer Intern at Bing Ads

Jun 2014 – Aug 2014 · 2 mos · Microsoft India Development Center, Bangalore

Developed methods to build profiles for ad customers, using techniques from Machine Learning and NLP. Required working with non-trivially large datasets using Microsoft internal equivalents of Apache Hive.

Tcs innovation labs

Summer Intern

May 2013 – Jul 2013 · 2 mos · IITM Research Park, Chennai

Predicting outliers in data from a set of electricity meters recording temperature over a timespan of about 4 months.

The fifth estate, iit madras

Correspondent, Science Writer

Aug 2012 – Aug 2015 · 3 yrs · IIT Madras, Chennai

Wrote articles as a correspondent for the student run news body at IIT Madras. Also contributed an article on the evolutionary game theory research of Dr. AJ Shaiju (Dept. Of Mathematics, IIT Madras) for the research magazine "Immerse".