Abhinav Goyal

Lead ML Engineer

Bengaluru, Karnataka, India6 yrs 9 mos experience

AI ML PractitionerHighly Stable

Key Highlights

Expert in optimizing LLMs for efficiency and cost.
Led significant ASR initiatives improving user experience.
Strong foundation in AI research and real-world applications.

Stackforce AI infers this person is a Senior AI Engineer specializing in Deep Learning and Speech Technologies.

Contact

Skills

Core Skills

Artificial IntelligenceDeep LearningLarge Language Models (llm)Automatic Speech Recognition

Other Skills

Machine LearningPython (Programming Language)Knowledge DistillationSpeculative DecodingInference PipelinesLLaMA-FactoryModel TrainingPyTorchIntent RecognitionPythonTensorFlowSpeech Recognition

About

I am a Senior ML Engineer at Google, specializing in Deep Learning and AI. Previously I was a Senior Data Scientist at Flipkart where my work focused on building scalable ML solutions, enhancing AI efficiency, and driving innovation in NLP and speech technologies. With a Computer Science background from IIT Bombay, my expertise lies in LLMs, ASR, and NLP. I’m passionate about AI research, real-world deep learning applications, and pushing the boundaries of machine learning innovation. Always open to discussions and collaborations!

Experience

6 yrs 9 mos

Total Experience

5 yrs 11 mos

Average Tenure

10 mos

Current Experience

Google

Senior ML Engineer

Jul 2025 – Present · 10 mos · Bengaluru, Karnataka, India · Hybrid

Artificial IntelligenceDeep LearningLarge Language Models (LLM)Machine Learning

Flipkart

2 roles

Senior Data Scientist

Oct 2024 – Jul 2025 · 9 mos · Hybrid

Led a team of 4 in optimizing LLM inference to reduce latency and costs, using knowledge distillation, speculative decoding etc. Delivered a distilled 3B parameter model that is 1.6x faster than 8B model.
Built a unified LLM toolkit consolidating training workflows, inference pipelines, latency optimization strategies and automatic prompt suggestions, streamlining LLM adoption across Flipkart.
Developed an enhanced fork of LLaMA-Factory to support draft model training for speculative decoding via Medusa and EAGLE, and distillation, now widely adopted for Fine-Tuning as a Service across the organization.

Artificial IntelligenceDeep LearningPython (Programming Language)Large Language Models (LLM)

Data Scientist I, II and III

Jul 2019 – Sep 2024 · 5 yrs 2 mos · Hybrid

Led ASR initiatives, developing a streaming voice search model (WER: 16% → 3.5%) and integrating end-of-speech detection to cut latency by 50% (1.2s), enhancing search experience for 5M+ users.
Adapted the Hierarchical CTC ASR model for video captioning using 44 hours of transcribed audio,
achieving a 16% WER with noisy student training and near-perfect time sync through CTC alignment.
Designed a joint ASR and intent recognition model for customer support using 900 hours of transcribed audio, reducing WER to 8% and speeding up intent recognition by 3x with a 24% higher F1 score.
Built an ASR framework in PyTorch, speeding up training by 4x through dynamic batching and multi-processing, and simplifying experimentation for self-supervised learning, PEFT, and language adaptations.
Optimized in-house LLMs using FP8 quantization and truncated-Medusa speculative decoding, achieving 2x higher tokens/sec. Enabled vLLM deployments that are 10x cheaper than external models.
Collaborated in building and deploying LLMs for customer support call summarization, Q&A systems and product description generation. Call summarization reduced agent call handling times by 25-40 sec.
Mentored 4 junior DS and SDEs through Flipkart’s Data Science Niketan program for custom ASR models.