Vyom P.

Co-Founder

Palo Alto, California, United States3 yrs 11 mos experience
AI EnabledAI ML Practitioner

Key Highlights

  • Expert in large language models and reasoning.
  • Proven track record in NLP and generative AI.
  • Published researcher with multiple peer-reviewed papers.
Stackforce AI infers this person is a leading expert in AI research and development, specializing in NLP and machine learning.

Contact

Skills

Core Skills

Large Language Models (llm)ReasoningGenerative AiMachine LearningNatural Language Processing (nlp)Text-to-speech

Other Skills

AlgorithmsAmazon Web Services (AWS)Artificial Intelligence (AI)C (Programming Language)C#C++Cascading Style Sheets (CSS)Computer ScienceCore JavaData AnalysisData ScienceData StructuresDeep LearningDigital Image ProcessingExploratory Data Analysis

About

I’m an Applied Scientist in Amazon’s Core Search org, where I design and train large-language-model systems. My day-to-day spans everything from devising “scaling laws” for 1-20B-parameter models to leading automated evaluation pipelines that keep our search experience fast, fair, and trustworthy. I’m equally passionate about research: my recent work probes how to measure & improve the faithfulness of chain of thought reasoning, and I’ve published on topics ranging from low-resource speech recognition to multimodal vision-language grounding. Before Amazon I trained 8B-10B scale foundation-model pipelines that processed millions of financial documents at Chronograph, and researched multimodal reasoning at University of Florida while earning my M.S. in Computer Science. Earlier stints at Amazon Alexa, Apple Siri, and ISRO honed my love for turning cutting-edge ideas into real-world impact, earning a Kaggle silver medal and multiple peer-reviewed papers along the way. Outside the arena, you’ll find me training for the Explorer’s Grand Slam, learning Japanese, or reading papers on AI for science. Always excited to connect with people who are pushing the boundaries in AI research.

Experience

3 yrs 11 mos
Total Experience
1 yr 9 mos
Average Tenure
1 yr 7 mos
Current Experience

Self-employed

Researcher

Nov 2024Present · 1 yr 7 mos · San Francisco Bay Area · On-site

  • Working on improving faithfulness in chain of thought reasoning for large language models
  • Working on alignment research
Large Language Models (LLM)LLM EvalsReasoning

Amazon

Applied Scientist

Nov 2024Present · 1 yr 7 mos · Palo Alto, California, United States · On-site

  • Working on Amazon Core Search Team
  • Building Reasoning Models which are Search Aware
  • Formulating scaling laws using model size and data composition for LLM Semantic Matching models (1B–20B scale)
  • Spearheaded automated evaluation and query benchmarking to iteratively refine LLM Semantic Matching models using LLM as a Judge
Large Language Models (LLM)LLM EvalsReasoning

Chronograph

2 roles

Applied Scientist II

May 2024Oct 2024 · 5 mos · San Francisco Bay Area · Remote

  • Built general representation for solving various NLP problems on long financial documents
  • (Built Financial Artificial General Intelligence)
  • Lead a team of 3 for planning on training large foundation model for all NLP tasks across the company outlining relevant work, large compute scaling costs, data size estimations, model training debugging, and model inference optimization [Training 8B open-source variants such as Llama 3.1 8B, Mixtral-7B, and Gemma 1.1 7B]
  • Engineered a document information retrieval system for the financial domain using a supervised fine-tuned Llama 3.1 8B model trained on a multi-GPU setup, resulting in a 55% improvement in analyst annotation efficiency
  • Trained Llama 3.1 8B model with LoRA, Quantization and mixed precision using FSDP for multi-GPU training enabling data and model parallelism across 8 A100s (40 GB). Model results range in the order of 80% in exact match metrics on out of domain dataset for information extraction in JSON format.
  • Explored the application of chain of thought in-context learning for information retrieval from long
  • financial text using Claude 3.5 Sonnet showing 3% improvement compared to previous production models
Generative AIResearch SkillsPython (Programming Language)Natural Language Processing (NLP)Deep LearningLarge Language Models (LLM)

Applied Scientist

Sep 2023May 2024 · 8 mos · San Francisco Bay Area · Remote

  • Worked on applied NLP research in financial domain
  • Built an end-to-end DeepSpeed-powered training framework for distributed multi-GPU training across all stages, from tokenization training, pre-training to downstream task fine-tuning, achieving a 40% improvement in model development efficiency
  • Studied the application of in-context learning for Information retrieval from long financial documents using LLM showing 3% improvement as compared to current production models
  • Deployed a multi-class text classification model based on sentence embeddings using AutoML improving upon the previous production model's performance by 8%
  • Deployed an efficient longformer for long-context multi-class classification improving upon the previous production model’s performance by 10% on Kubernetes
  • Orchestrated an offline calibration framework, and online regression testing framework to monitor performance, and rapidly perform iterative updates over production model
Generative AIResearch SkillsMachine LearningPython (Programming Language)Natural Language Processing (NLP)Large Language Models (LLM)

University of florida

2 roles

Researcher

Promoted

Jun 2023Dec 2024 · 1 yr 6 mos · Gainesville, Florida, United States

  • Worked on a DARPA funded research project called Environment-driven Conceptual Learning (ECOLE) at the Data Science Research (DSR) lab UF. I am working under the guidance of Dr. Daisy Wang from UF, Dr. Eric Xing from CMU, and Dr. Zhiting Hu from UCSD.
  • Worked on the alignment problem and multimodal reasoning
  • Researched the application of reinforcement learning from human feedback for improving Knowledge Graphs and Scene Graphs
  • Studied improvement, and faithfulness of multi-modal reasoning using in-context learning and human feedback for large language and vision models
  • Investigated video reasoning with SOTA models like Qwen CogVLM2-Video (EVA-CLIP-E 12B + Llama 3 8B), and EMU2 (EVA-CLIP 4B + Llama 33B) to enhance multimodal comprehension.
  • Examined methods to improve LLM reasoning faithfulness by refining chain-of-thought (CoT) processes through human and self-model feedback and subsequent training on corrected CoT
Generative AILarge Language Models (LLM)

Research Assistant

Dec 2022May 2023 · 5 mos · Gainesville, Florida, United States

  • Worked on distance-sensitive range querying for high-dimensional data using deep learning at the Data Science Research (DSR) lab UF
  • Orchestrated several experimental designs for training SelNet, Mixture of experts (MoE), XGB, LightGBM, and Support Vector Regressor on 3 text based and 3 image based embedding datasets using distributed training framework
  • Composed the introduction, related work, and experimental details sections of the research paper for baseline experiments using Latex
Generative AIMachine LearningLarge Language Models (LLM)

Amazon

Applied Scientist

Aug 2022Dec 2022 · 4 mos · Seattle, Washington, United States · On-site

  • Worked on Alexa Smart Home Team
  • Removed redundant features by performing feature importance analysis using visualization techniques, and model ablation study
  • Adopted a novel self-attention based architecture for multi-variate time-series classification task to model user behaviors
  • Performed mechanistic interpretability analysis on attention heads to uncover key insights from model layers
Generative AIMachine LearningNatural Language Processing (NLP)Speech ProcessingLarge Language Models (LLM)TensorFlow

Apple

Machine Learning Researcher

May 2022Aug 2022 · 3 mos · Seattle, Washington, United States · On-site

  • Worked on Siri Text-to-Speech Team
  • Design and implement an end-to-end acoustic model for text-to-speech synthesis
  • Built a data synthesis and training pipeline to train deep learning models on large scale speech corpus (500 hours)
Text-to-SpeechGenerative AIMachine LearningPytorchSpeech RecognitionNatural Language Processing (NLP)+2

University of florida

2 roles

Research Assistant

Mar 2022May 2022 · 2 mos · Gainesville, Florida, United States · On-site

  • MISQA: A Dataset for Query-based Misinformation Mining
  • Worked on developing a multi-answer QA Benchmarking Dataset at Data Science Research (DSR) lab UF
  • Generated 1,000 QA pairs for the multi-answer QA benchmark dataset, to track misinformation and disinformation by prompting finetuned mT5 model
  • Performed Tweet stance annotation for 1,000 samples QA pairs from Twitter API Querier using Dense Distinct Tweet Retriever (DDTR) model
Generative AIMachine LearningLarge Language Models (LLM)

Research Assistant

Sep 2021Mar 2022 · 6 mos · Gainesville, Florida, United States · On-site

  • Worked on Active Interpretation of Disparate Alternatives (AIDA) a DARPA Hypothesis Generation Project at Data Science Research (DSR) lab UF
  • The project aims to automatically ingest web documents and transform them into a semantic space representation (Knowledge Graph) that analysts can use to query about uncertain situations and obtain a variety of related hypotheses
  • Extracted sentence embeddings from 100,000 sentences using XLM Roberta for similarity clustering
  • Improved data quality by performing text augmentation using prompt-based paraphrasing on Parrot model
  • Built pipeline for cross-lingual Natural Language Inferencing using mT5 model on hand-annotated and prompt based paraphrased 5,000 cross-claim pairs, improving the score by 38%
  • Developed and deployed a website to showcase the query-claims relationship using AngularJS and PrimeNG for 200 claims
Generative AIMachine LearningLarge Language Models (LLM)

Isro - indian space research organization

Machine Learning Researcher

Dec 2020Apr 2021 · 4 mos · Ahmedabad, Gujarat, India · Remote

  • Retrieval of inherent optical properties (IOPs) of coastal waters using satellite data
  • Worked on applied machine learning for oceanography using satellite data
  • Researched and developed a modified Neural Network algorithm to solve the inversion problem of acquiring 6 IOPs from remote surface reflectance data.
  • Achieved a good settlement with respect to R-Square values for each water quality parameter of about 97%.
Machine Learning

Atliq technologies

Software Engineer

May 2020Jul 2020 · 2 mos · Vadodara, Gujarat, India · Remote

  • Worked on large-scale applications based on PHP, Laravel, and Javascript
  • Refactored the codebase of 5 modules, reducing Npath and Cyclometric Complexity by 85%
  • Improved search and sort speeds by 35% by fixing search and sort issues in 5 modules using custom AJAX calls
  • Prepared a report on code complexity analysis on different modules using tools like code forensics
  • Managed commit messages' length using commit message hooks in git.

Education

University of Florida

Master of Science - MS — Computer Science

Aug 2021May 2023

Dharmsinh Desai University

Bachelor of Technology - BTech — Computer Engineering

Jul 2017May 2021

Shree Satya Sai Vidhyalaya

Jan 2003Jan 2017

Stackforce found 100+ more professionals with Large Language Models (llm) & Reasoning

Explore similar profiles based on matching skills and experience