Vivek Govindan — AI Researcher
A proven technical leader and Principal Engineer with over 20 years of experience, currently leading GPU-based Large Language Model (LLM) inference optimization and large-scale deployment within AWS Bedrock, implementing advanced serving techniques including speculative decoding, KV-cache optimization, dynamic batching, tensor-parallel execution, and kernel-level performance tuning to deliver low-latency, high-throughput production inference. I led the delivery of state-of-the-art foundation models and production-grade speech and Natural Language Processing (NLP) systems powered by generative AI, with a strong focus on low-latency, cost-efficient, and highly reliable model serving. My role spans defining technical strategy, guiding architectural decisions across model, systems, and infrastructure layers, and fostering collaboration across globally distributed engineering and science teams to advance the state of machine learning systems and large-scale distributed inference. - Driving the inference optimization strategy for Amazon Nova foundation models, leading efforts across model architecture, decoding algorithms, and systems-level optimizations (e.g., batching, KV-cache management, kernel fusion, and hardware-aware serving) to achieve low-latency, high-throughput, and cost-efficient deployment at Bedrock scale. - Leading the technical vision and long-term roadmap for AWS Transcribe, defining architecture and model evolution while guiding large, cross-functional engineering and science teams to deliver state-of-the-art deep learning systems for large-scale speech recognition in production. - Extensive experience in overseeing the design, development, and deployment of complex real-time distributed applications handling massive scale. - Proven ability to drive consensus on technical decisions within larger AWS AI organization, aligning diverse teams across different geographies toward a unified strategic direction. - Demonstrated success in managing and mentoring cross-functional teams spanning over 100 engineers, ensuring seamless collaboration and efficient execution of product roadmaps. - Fostering a culture of innovation and continuous learning, encouraging team members to contribute to industry knowledge through publications, patents, and research endeavors. - Hands-on experience in Inference Optimization of LLM (100+ Billion Parameter Models) using NVIDIA technologies - Serving as an IEEE paper reviewer, contributing to the advancement of industry best practices. Published papers in NeuralPS and Interspeech conferences
Stackforce AI infers this person is a highly experienced AI/ML engineer specializing in large-scale distributed systems and real-time applications.
Location: Seattle, WA, USA
Experience: 21 yrs 5 mos
Skills
- Llm Inference Optimization
- Machine Learning Algorithms
- Natural Language Processing (nlp)
- Big Data Analytics
- Technical Architecture
- Intelligent Networks
- Algorithm Design
- Java
Career Highlights
- 20+ years of experience in AI and ML.
- Led GPU inference optimization at Amazon Bedrock.
- Spearheaded AWS Transcribe expansion to 100+ languages.
Work Experience
Amazon
Principal Machine Learning Engineer (3 yrs 11 mos)
Sr. SDE (8 yrs 7 mos)
SDE (3 yrs 2 mos)
Huawei
System Architect (6 yrs 10 mos)
TCS
Asistant Systems Engineer (2 yrs 9 mos)
Education
MCA at Bharathiar University
Bachelor of Science - BSc at Calicut University