Vishwam Shukla

AI Researcher

Seattle, Washington, United States6 yrs 5 mos experience
AI EnabledAI ML Practitioner

Key Highlights

  • Led optimization for AI Assistant impacting 2B+ users.
  • Architected advanced AI pipelines reducing hallucinations by 28%.
  • Deployed real-time multilingual AI systems with significant latency reduction.
Stackforce AI infers this person is a highly skilled AI/ML Engineer with expertise in large-scale AI systems and MLOps.

Contact

Skills

Core Skills

Large Language Models (llms)MlopsGenerative AiArtificial Intelligence (ai)Machine Learning

Other Skills

AWS GlueAWS LambdaAWS SageMakerAirflowAlgorithm DevelopmentAmazon AthenaAmazon DynamodbAmazon Elastic MapReduce (EMR)Amazon RedshiftAmazon Web Services (AWS)AndroidAngularApache KafkaApache SparkBM25

Experience

6 yrs 5 mos
Total Experience
3 yrs 4 mos
Average Tenure
3 yrs 1 mo
Current Experience

Meta

AI/ML Engineer

May 2023Present · 3 yrs 1 mo · Washington, United States

  • Led optimization of Meta AI Assistant for 2B+ users by advancing transformer inference with PyTorch 2.1, TorchScript, and FBGEMM, achieving a 37% latency reduction and boosting daily engagement by 22%.
  • Optimized Meta AI Assistant for 2B+ users by advancing transformer inference with PyTorch 2.1, TorchScript, and FBGEMM, reducing latency 37% and boosting daily engagement 22%.
  • Architected retrieval-augmented generation pipelines with FAISS, BM25, Hugging Face, and LangChain, grounding answers from 50M+ documents and cutting hallucinations 28%.
  • Deployed LLaMA-Edge models for Ray-Ban smart glasses using 4-bit quantization, Glow compiler, and TorchScript,
  • reducing memory usage 45% while maintaining cross-lingual ASR accuracy.
  • Directed development of real-time multilingual ASR + TTS systems with wav2vec-U and quantized HiFi-GAN, enabling seamless voice-first experiences across smart glasses and mobile platforms.
  • Orchestrated large-scale distributed training across 2,048 GPUs with DeepSpeed ZeRO-3 and NCCL, sustaining 5.7
  • TFLOP/GPU throughput to accelerate iteration cycles of 70B-parameter LLaMA models.
  • Automated ingestion and transformation of 12TB+ daily conversational logs using Spark, PyArrow, Hive, and Airflow,
  • building privacy-compliant RLHF corpora and integrating feature engineering pipelines in SageMaker, MLflow, Docker,
  • and Kubernetes (EKS).
  • Delivered production-grade MLOps and deployment pipelines with ArgoCD GitOps, canary rollouts, and
  • Prometheus/Grafana monitoring, ensuring 99.98% uptime for global inference APIs.
  • Partnered with product and research teams to align LLM training with RLHF, applying A/B testing, drift detection, and human evaluation to validate improvements with 99%+ statistical confidence while implementing responsible AI with SHAP, LIME, and Fairlearn.
Large Language Models (LLMs)Generative AITransformer ArchitecturesPyTorchAirflowLangChain+6

Nvidia

AI/ML Engineer

Apr 2019Aug 2022 · 3 yrs 4 mos

  • Deployed real-time multilingual conversational AI systems with Python, PyTorch, Riva ASR/TTS, TensorRT, and CUDA, reducing response latency by 45% and enabling scalable cloud, edge, and on-premise deployments.
  • Fine-tuned speech recognition and TTS models on domain-specific datasets, leveraging supervised learning and sequence-to-sequence architectures to improve accuracy by 30% across enterprise-grade use cases.
  • Integrated NLP services, embeddings, and conversational workflows into chatbots and assistants, raising query resolution by 25%, enhancing engagement, and improving overall customer satisfaction metrics.
  • Built end-to-end multimodal pipelines using NeMo, Spark, and SQL, enabling seamless speech, text, and NLP processing for regulated industries such as healthcare (HIPAA) and finance (PCI-DSS).
  • Optimized inference pipelines with TensorRT, NVIDIA GPUs, and MLflow, cutting latency by 40% and boosting GPU throughput by 35%.Streamlined lifecycle management through a deployment matrix spanning Docker, Kubernetes, and AWS SageMaker.
  • Built and maintained MLOps workflows with Python connectors, ensuring 99.9% uptime, 50% faster release cycles, and secure, compliance-aligned enterprise AI/ML deployments.
  • Led benchmarking, drift monitoring, and evaluation, improving GPU utilization by 60%, reducing drift incidents by 25%, and strengthening enterprise guardrails for safety and regulatory needs.
  • sPartnered with cross-functional engineers, scientists, and architects to design domain-specific AI/ML workflows, embedding evaluation, safety, and best practices into production-ready AI applications.
PythonPyTorchRiva ASR/TTSTensorRTCUDANeMo+8

Education

MIT ADT University

Bachelor of Science - BS — Computer Science

Northeastern University

Master of Science - MS — Computer Software Engineering

Stackforce found 100+ more professionals with Large Language Models (llms) & Mlops

Explore similar profiles based on matching skills and experience