Yash Bhalgat

AI Researcher

Oxford, England, United Kingdom7 yrs 9 mos experience

Most Likely To SwitchAI ML Practitioner

Key Highlights

PhD researcher at the prestigious Visual Geometry Group.
Expertise in video generation and multimodal AI.
Proven track record in developing cutting-edge AI solutions.

Stackforce AI infers this person is a Researcher specializing in AI and Computer Vision with a focus on generative models.

Contact

Skills

Core Skills

Computer VisionMachine LearningVideo GenerationGenerative AiDeep LearningArtificial Intelligence (ai)Natural Language Processing (nlp)Software Development

Other Skills

Computer Graphics3D ReconstructionAugmented Reality (AR)Virtual Reality (VR)World ModelsLarge Language Models (LLM)Retrieval-Augmented Generation (RAG)MultimodalEdge ComputingAR/VRProject ManagementAlgorithmsPython (Programming Language)C++Object-Oriented Programming (OOP)

About

PhD Researcher with the Visual Geometry Group at Oxford, working on Video Generation for World Modeling, 3D Computer Vision (Understanding and Generation) and Vision-Language Foundation Models. Previously, I was a Research Scientist at Qualcomm AI Research, where I worked on algorithm and system design to develop efficient deep networks for computer vision usecases. I also worked at a startup, Voxel51 Inc., developing video processing pipelines for the cloud. I have a MS in CS from University of Michigan, Ann Arbor and Bachelors from IIT Bombay. I have interned at Meta Reality Labs, IBM Research - Almaden, IBM India Research Lab, TCS Research and Infurnia (a startup based in Mumbai). SKILLS: Python, C++, C, SQL, Julia, MATLAB, R, PyTorch, TensorFlow, Keras, OpenAI gym, Theano, CUDA, git For more information, please visit my webpage: https://yashbhalgat.github.io

Experience

7 yrs 9 mos

Total Experience

1 yr 7 mos

Average Tenure

4 yrs 8 mos

Current Experience

Multiple startups

AI Consultant

Feb 2023 – Mar 2025 · 2 yrs 1 mo · Remote

1. AI chip company: Developing real-time low-power Computer Vision algorithms for augmented reality on smart glasses.
2. Content moderation company: Deploying Large Language Model (LLM) solutions to moderate multimodal data online.
3. Togal.AI : Building Computer Vision solutions for detecting, measuring and comparing project features on architectural plans and drawings

Large Language Models (LLM)Augmented Reality (AR)Retrieval-Augmented Generation (RAG)Generative AIComputer VisionMultimodal

University of oxford

DPhil (PhD) Researcher, Visual Geometry Group

Oct 2021 – Present · 4 yrs 8 mos · Oxford, England, United Kingdom · On-site

Research focus: 3D/4D Reconstruction and Generation, Vision-Language (Multimodal) Foundation models, 3D+LLMs
Advisors: Andrew Zisserman, Andrea Vedaldi, Joao Henriques, Iro Laina.
Publications at CVPR, NeurIPS, ECCV, ACCV, ICLR and 3DV.

Generative AIArtificial Intelligence (AI)Computer VisionMachine LearningComputer Graphics3D Reconstruction+2

Qualcomm

2 roles

Senior Machine Learning Researcher - Qualcomm AI Research

Nov 2020 – Jul 2021 · 8 mos

Edge ComputingDeep LearningComputer VisionAR/VRProject Management

Machine Learning Researcher - Qualcomm AI Research

Jun 2019 – Oct 2020 · 1 yr 4 mos

Efficient Deep Learning for Computer Vision -- algorithm development and system design
Spearheaded the ultra-low resource always-on vision project from model design, quantization to final hardware mapping
Filed 12 inventions in 2020-21 of which 6 ideas have been filed for patent protection. Notable works on 3D hand-pose estimation, low-bit quantization, structured and unstructured pruning.
Led Qualcomm’s team in the MicroNet Challenge at NeurIPS 2019, and achieved 3rd position in ImageNet track [https://github.com/yashbhalgat/QualcommAI-MicroNet-submission-MixNet]
Managed/mentored interns - Jangho Kim and John Yang (PhD @ SNU) with contributions to the AR/VR project

Edge ComputingComputer VisionArtificial Intelligence (AI)AR/VRAlgorithms

Voxel51

Computer Vision and Machine Learning Engineer

Jan 2019 – May 2019 · 4 mos

Researched and developed production pipelines for real-time vehicle tracking for querying on large-scale video databases

Machine LearningComputer VisionSoftware Development

Ibm

AI Research Intern

Jun 2018 – Aug 2018 · 2 mos · Almaden, San Jose

Worked at IBM Almaden Research Lab with the Watson Languages group on task-agnostic classification in the presence of label noise
Built ensemble-based frameworks for combining weakly-labeled (or mislabeled) and high-quality samples for the training of a sentiment model.
Work accepted to KONVENS 2019

Natural Language Processing (NLP)Machine LearningArtificial Intelligence (AI)

Ifp energies nouvelles

Research Intern

May 2017 – Jul 2017 · 2 mos · Paris

Used Scattering Wavelet Networks for the classification an segmentation of seismic structural "monads". Work done has been accepted as a paper at ICASSP 2018. You can read about it here: http://www.laurent-duval.eu/opus-cats-eyes-seismic-data-classification-scattering-networks.html

Ibm

Research Intern

May 2016 – Jul 2016 · 2 mos · Bangalore

Used CorrNets, an autoencoder-based architecture, to learn the joint representation for images
and captions. We were able to obtain state-of-art results for large fashion catalogues search without manual tagging.

Tata research development and design centre (trddc)

Research Intern

Nov 2015 – Dec 2015 · 1 mo · Pune, Maharashtra, India

With specific recognition to stamp detection and segmentation, we proposed a shape-based ranking
algorithm to learn the 1st layer of a CNN. Detection accuracy 94% and segmentation IoU 74.81%. Work accepted as a short Paper at the DAS 2016 conference.

Infurnia

Software Engineer Intern

May 2015 – Jul 2015 · 2 mos · Mumbai, Maharashtra, India · On-site

Software module development using CAD modelling engine
Developed a range of ‘constraint-modules’ for automated construction of furniture parts using the FreeCAD engine.

Focus analytics

Indoor Navigation System - Intern

Nov 2014 – Dec 2014 · 1 mo

Designing an Indoor Navigation System using Pedometry and Particle filters.
My work (Pedometry) involved on IMU sensors:
1. Using various algorithms like PCA, Triad algo to determine the heading of motion,
2. Use FFT, CWT etc. to determine the step frequency and model the step length of the user.

Mars society of india, iit bombay

Navigation and Image Processing Engineer

Aug 2014 – May 2015 · 9 mos

Development of a Mars rover with Mars Society of India for the University Rover Challenge organized by Mars Society, Utah, United States.
● Worked on Sensor calibration and testing for navigation
● Also a part of the Image Processing subsystem for vision-guided navigation of the rover.

Python (Programming Language)C++Object-Oriented Programming (OOP)Software Development