Vivek Govindan

AI Researcher

Seattle, WA, USA21 yrs 5 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • 20+ years of experience in AI and ML.
  • Led GPU inference optimization at Amazon Bedrock.
  • Spearheaded AWS Transcribe expansion to 100+ languages.
Stackforce AI infers this person is a highly experienced AI/ML engineer specializing in large-scale distributed systems and real-time applications.

Contact

Skills

Core Skills

Llm Inference OptimizationMachine Learning AlgorithmsNatural Language Processing (nlp)Big Data AnalyticsTechnical ArchitectureIntelligent NetworksAlgorithm DesignJava

Other Skills

Large Language Models (LLM)High Performance Computing (HPC)Low LatencyKV-cache managementDynamic batchingKernel fusionScalabilityPyTorchSoftware DevelopmentSystem ArchitectureNoSQLSoftware ArchitectureHigh Throughput ComputingData ScienceTensorFlow

About

A proven technical leader and Principal Engineer with over 20 years of experience, currently leading GPU-based Large Language Model (LLM) inference optimization and large-scale deployment within AWS Bedrock, implementing advanced serving techniques including speculative decoding, KV-cache optimization, dynamic batching, tensor-parallel execution, and kernel-level performance tuning to deliver low-latency, high-throughput production inference. I led the delivery of state-of-the-art foundation models and production-grade speech and Natural Language Processing (NLP) systems powered by generative AI, with a strong focus on low-latency, cost-efficient, and highly reliable model serving. My role spans defining technical strategy, guiding architectural decisions across model, systems, and infrastructure layers, and fostering collaboration across globally distributed engineering and science teams to advance the state of machine learning systems and large-scale distributed inference. - Driving the inference optimization strategy for Amazon Nova foundation models, leading efforts across model architecture, decoding algorithms, and systems-level optimizations (e.g., batching, KV-cache management, kernel fusion, and hardware-aware serving) to achieve low-latency, high-throughput, and cost-efficient deployment at Bedrock scale. - Leading the technical vision and long-term roadmap for AWS Transcribe, defining architecture and model evolution while guiding large, cross-functional engineering and science teams to deliver state-of-the-art deep learning systems for large-scale speech recognition in production. - Extensive experience in overseeing the design, development, and deployment of complex real-time distributed applications handling massive scale. - Proven ability to drive consensus on technical decisions within larger AWS AI organization, aligning diverse teams across different geographies toward a unified strategic direction. - Demonstrated success in managing and mentoring cross-functional teams spanning over 100 engineers, ensuring seamless collaboration and efficient execution of product roadmaps. - Fostering a culture of innovation and continuous learning, encouraging team members to contribute to industry knowledge through publications, patents, and research endeavors. - Hands-on experience in Inference Optimization of LLM (100+ Billion Parameter Models) using NVIDIA technologies - Serving as an IEEE paper reviewer, contributing to the advancement of industry best practices. Published papers in NeuralPS and Interspeech conferences

Experience

Amazon

3 roles

Principal Machine Learning Engineer

Promoted

Apr 2022Present · 3 yrs 11 mos · On-site

  • Led GPU inference optimization for Amazon Nova-2 and Nova Forge variants, implementing advanced decoding and serving techniques including speculative decoding, KV-cache optimization, dynamic batching, tensor parallel execution, and kernel fusion to deliver low-latency, high-throughput inference at Bedrock scale.
  • https://www.aboutamazon.com/news/aws/aws-agentic-ai-amazon-bedrock-nova-models
  • https://aws.amazon.com/nova/forge/
  • AWS Transcribe is Speech Recognition Service offering state of the art speech transcription capabilities. As a leader at AWS, I spearheaded numerous groundbreaking initiatives for the AWS Transcribe speech recognition service. Some of them are
  • Increased AWS Transcribe foot-print to 100+ languages using Speech Foundation Model - (https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-transcribe-over-100-languages/, https://www.theverge.com/2023/11/27/23978822/aws-transcription-amazon-generative-ai)
  • Launched All-Neural ASR for Streaming/Sync-API that involved building non-autoregressive ASR engine from ground-up customized for bi-directional streaming.
  • Launched All-Neural ASR for Batch/Async-API that involved building non-autoregressive ASR engine from ground-up.
  • Launched HealthScribe - one of AWS's first generative AI services for healthcare conversational intelligence. (https://aws.amazon.com/healthscribe/).
  • Customer Testimonial - https://aws.amazon.com/healthscribe/customers/
  • Launched Transcribe Call Analytics a cutting-edge generative AI-powered API that delivers highly accurate call transcripts and insightful conversation analysis. (https://aws.amazon.com/transcribe/call-analytics/)
LLM Inference OptimizationLarge Language Models (LLM)High Performance Computing (HPC)Low LatencyMachine Learning AlgorithmsNatural Language Processing (NLP)

Sr. SDE

Promoted

Aug 2017Present · 8 yrs 7 mos · On-site

  • AWS Transcribe is Speech Recognition Service offering state of the art speech transcription capabilities. As a Sr.SDE at AWS, I led launching many initiatives
  • Customization of Automatic Speech Recognition (ASR) to help Transcribe understand domain specific terminologies such as gaming, finance, medical etc. This resulted in launching Transcribe Medical and Custom Language Model.
  • Custom Language Model - https://aws.amazon.com/about-aws/whats-new/2020/08/amazon-transcribe-launches-custom-language-models/
  • Transcribe Medical - https://aws.amazon.com/about-aws/whats-new/2020/11/amazon-transcribe-medical-streaming-transcription-support-medical-specialties/
  • Building infra for large scale distributed model training and evaluation which could scale to PBs of data and handle all safe and secure data handling practices.
Machine Learning AlgorithmsNatural Language Processing (NLP)

SDE

Jun 2014Aug 2017 · 3 yrs 2 mos · On-site

  • Associate Data Feeds is a system for delivering products and product offers of Amazon.com to associates like pricegrabber.com, nextag.com, Google shopping etc. As the leader of Amazon's Ads division vending out catalog information, I led a transformative re-architecting initiative leveraging big data technologies. This groundbreaking effort enabled us to process a staggering petabyte of data daily, ensuring real-time catalog updates for third-party systems. Under my leadership, team implemented a robust architecture handling hundreds of thousands of product updates per second, keeping our partners' systems consistently synchronized with the latest catalog details. This remarkable feat empowered them to provide accurate, up-to-date product information to their customers.
Big Data AnalyticsTechnical Architecture

Huawei

System Architect

Jul 2007May 2014 · 6 yrs 10 mos · Bengaluru, Karnataka, India · On-site

  • Carrier grade application delivery platform provides infrastructure for the development of High Available, low latency, high throughput distributed IN (Intelligent Network) services. I led a team dedicated to delivering Next Generation Intelligent Network Systems and Platforms capable of low-latency call processing ranging from microseconds to nanoseconds. The cutting-edge solutions enable the execution of telecom protocol stacks on intelligent switches, ensuring seamless and efficient telecommunication services. These innovative platforms have garnered widespread adoption among tier-1 telecom providers across Europe, Asia, and South America, with industry giants such as Telefonica and Vodafone among esteemed clients. Under my leadership, team developed and deployed advanced network systems that empower telecom operators to provide reliable, high-performance networks capable of handling massive data volumes with ultra-low latency.
  • Roles handled include
  • Design and development of daemons used for synchronization of process states between nodes to enable transparent fail-over to ensure 99.999% availability
  • Design and development of object reference repository for ultra-low latency look-up of distributed objects
  • Design of Component(telecom applications) deployment and its life-cycle Management
  • Performance engineering activities of many critical flows like application statistic management, component configuration, data synchronization, optimization of data channel.
Intelligent NetworksAlgorithm Design

Tcs

Asistant Systems Engineer

Sep 2004Jun 2007 · 2 yrs 9 mos

  • eTREASURY® is an integrated investment banking system which enables foreign exchange and domestic treasury operations, securities trading, portfolio management, payments and funds transfer, foreign exchange transactions, and security management. I served as an engineer in the FinTech division (eTreasury), where I played a pivotal role in developing and delivering cutting-edge Treasury applications. These robust solutions power the financial operations of several prominent banks across Europe and Asia.
Algorithm DesignJava

Education

Bharathiar University

MCA — Compter Science

Jan 2001Jan 2004

Calicut University

Bachelor of Science - BSc — Computer Science

Sep 1997Dec 2000

Stackforce found 100+ more professionals with Llm Inference Optimization & Machine Learning Algorithms

Explore similar profiles based on matching skills and experience