Sunil Patel

CEO

Mumbai, Maharashtra, India10 yrs 6 mos experience
Most Likely To SwitchAI ML Practitioner

Key Highlights

  • 10 years of experience in Deep Learning and AI.
  • Contributed to NVIDIA's growth to a three trillion-dollar company.
  • Expert in optimizing and scaling Deep Learning models.
Stackforce AI infers this person is a Deep Learning and AI expert with a focus on scalable solutions in the tech industry.

Contact

Skills

Core Skills

Deep LearningModel OptimizationCuda ComputingArtificial Intelligence

Other Skills

AI-Based Banking SolutionsASR DevelopmentAccelerated ComputingAgile MethodologiesAmazon Web Services (AWS)Apache SparkArtificial Intelligence (AI)Artificial Neural NetworksCC++CSSCUDAClinical Trial LinkingCloud ComputingComputer Vision

About

A part of "TAC Team" of 24 people help clocking 1.5 Billion Dollars a year and counting. With 10 years of deep experience in Deep Learning, I have witnessed firsthand the remarkable transformation of NVIDIA from a company valued at a few hundred billion to a three trillion-dollar powerhouse, driving the AI revolution. My journey began with Conversational AI, where I contributed to developing and deploying advanced language models for a variety of applications. At NVIDIA, I have been deeply involved in optimizing and scaling Deep Learning models, focusing on large-scale training and inference workloads. My work includes model optimization, profiling, writing optimized operations, and ensuring scalable deployments over GPUs. I’ve also gained hands-on experience with LLM fine-tuning and RAG workflows, enabling AI datacenters to handle massive LLM training runs. Now, I am dedicated to helping customers realize AI-driven datacenters capable of running these cutting-edge models at scale, pushing the boundaries of what’s possible in AI.

Experience

Nvidia

4 roles

Conglomerates & Industries | Manager Solutions Architecture and Engineering

Promoted

Mar 2025Present · 1 yr

Data Scientist - IV Deep Learning

Jun 2021Present · 4 yrs 9 mos

  • 1) LLM Training and model optimization
  • LLM Finetuning: Worked on finetuning Mistral 70B and LLaMa-2/3 7/8/70B for triplet extraction for building graph RAG on financial Data.
  • LLM Model optimization: Optimizing Mistral, Mixtral, and LLaMa and scaling the deployment over large pool of GPUs.
  • RAG Pipeline: Building RAG pipleines on various modalities like text, Images, Video and docs for HR bot, Marketting bot, Code Documetnation, Product etc..
  • 2) Deep Learning inference over 1000s of live camera
  • Architecting platform for large-scale Intelligent Video Analytics
  • Model training on multiple GPU, Multiple Node with Mixed Precision. Trained and deployed end-to-end pipeline for Face Detection and Recognition, ANPR, and Intruder Detection for a small object over a long distance.
  • I have worked on, Model Training, model profiling, model ensembles, pipeline profiling, component optimization, scaling, and deployment over Kubernetes.
  • I have worked with optical flow and tiling-based approaches to make the DNN model compute-friendly.
  • Model optimization, Ensemble with different backend/models, Model/application profiling, writing end to end C++/python Deepstream Application, and deployment with Kubernetes orchestration.
  • 3) ASR - Indian language support
  • Developed standard procedure for ASR and helped customers to develop ASR for multiple Indic languages. Quartznet-15x5 was used, final WER over IITM data was 6.4.
  • 4) CUDA: My profile doesn’t demand core CUDA skills but out of my interest, I learned CUDA. This helps in many ways:
  • Deepstream postprocessors require custom libraries for each model to do box filtration
  • Offloading CPU-intensive processes of the model such as ROI filtering and NMS to GPU.
  • Reading existing codes and modifying them for TensorRT custom layers.
  • I understand CUDA code at Grid, Block, and Thread levels and can write parallel kernels.
LLM TrainingModel OptimizationDeep Learning InferenceKubernetesModel ProfilingFace Detection+5

Data Scientist - III Deep Learning

Nov 2019Jun 2021 · 1 yr 7 mos

  • Unreal Engine | Autonomous Driving | Computational linguistics | Computer vision | GPU Acceleration
  • CUDA Computing:
  • 1) Efficiency improvement by explicit memory assignment
  • 2) Partitioning schemes such as interleaved and block partitioning
  • 3) Compressing data using CSR, ELL, and COO techniques, Cuda Parallelism on the compressed data.
  • 4) Utilizing Unified memory APIs for faster prototyping
  • Nvidia DeepStream
  • 1) Writing an application for object detection and event initialization
  • 2) Application profiling and fine-tuning
  • 3) Large scale deployment of such applications with Kubernetes
  • Nvidia Tensorrt
  • Model conversion from ONNX, Pytorch, or Tensorflow to FP32, FP16, and INT-8 optimization level
  • Multi-Language ASR, and TTS
  • 1) Development of models training NEMO API for Quartz-Net and Jasper with lightning-PyTorch
  • 2) Optimizing models using Tensorrt and deploying using triton inference server
  • 3) Supported language - Hindi, Gujarati, Punjabi
  • Deployment at scale
  • 1) Deploying dockerized application over Kubernetes
  • 2) Grounds up application development that takes advantage of multiple GPU-multiple node environment while deployed.
  • Talks:
  • 0. GPU Technology Conference session - Building Indic ASR Using NVIDIA NeMo and Deploying Models Using Jarvis : https://gtc21.event.nvidia.com/media/1_kz17xdau
  • 1. Nvidia - Accelerated Database Query Using GPU : https://info.nvidia.com/india-accelerated-database-query-reg-page.html?ondemandrgt=yes
  • 2. IISER Pune - Nvidia Data Science Ecosystem Tools : https://www.iiserpune.ac.in/events/Workshop+on+Data+Science+Ecosystem+Tools
  • 3. Intel and Analytics Vidhya - Representing Language Mathematically : https://www.innoplexus.com/news/the-convergence-of-big-data-and-machine-learning/
CUDA ComputingNvidia DeepStreamNvidia TensorRTMulti-Language ASRDeployment at ScaleDeep Learning

Solutions Architect - Deep Learning

Jul 2019Nov 2019 · 4 mos

  • Unreal Engine | Autonomous Driving | Computational linguistics | Computer vision | GPU Acceleration

Innoplexus

Data Scientist - Deep Learning

Nov 2017Jun 2019 · 1 yr 7 mos · Eschborn, Germany

  • 1) Document Comparison
  • Detecting syntactic and semantic similarity between two documents also detect
  • Insertion, deletion, and Modifications.
  • Tools/Technology: Skip-thought Sentence vectors, Siamese Networks, Pytorch, Nvidia
  • V100
  • 2) Primary/Secondary Clinical Trial Linking
  • Linking clinical trials as primary or secondary by comparing the content of clinical
  • trials with millions of research papers.
  • Tools/Technology: Skip-thought Sentence vectors, Siamese Networks, Pytorch, Nvidia
Document ComparisonClinical Trial LinkingDeep Learning

Gvk

Senior Research Associate - Deep Learning

May 2016Nov 2017 · 1 yr 6 mos · Hyderabad Area, India

  • 1) Developed Highly Scalable Named Entity Resolution utilizing GPU Computing -
  • Character level LSTM
  • Completed and delivered a general-purpose framework for any kind of Named Entity
  • Resolution (NER) problem. The solution uses state of art ensemble model of
  • Convolutional Network and Long Short-Term Memory (LSTM), Deployed with on
  • Tensorrt Ver-2 with custom layers of unidirectional LSTM
Named Entity ResolutionGPU ComputingDeep Learning

Tata group

Software Engineer (Artificial Intelligence R&D)

Aug 2015May 2016 · 9 mos

  • 1) AI-Based Banking Transaction Reconciliation
  • Bank of Switzerland wanted to solve their complex Ledger - Statement reconciliation
  • problem. Used 1D-CNN. Completed the entire project from R&D to the production-
  • ready solution using Keras and Python.
  • 2) Conversational Interface
  • For internal IT service enhancement and as a part of Ignio (TCS's IT Cognitive System
  • for enterprise IT Ops) Completed a project on building a conversational system using
  • Natural Language Processing utilizing Word2Vec, H2O and Python.
AI-Based Banking SolutionsConversational InterfaceArtificial Intelligence

Supercomputing facility

Internship

May 2014Jul 2014 · 2 mos · IIT-Delhi

  • A software development on "Protein-protein interaction prediction for dimer formation on the basis of chemical properties and geometrical factors using Deep Learning".

Education

Indian Institute Of Information Technology Allahabad

Master of Technology - MTech — Information Technology

Jan 2013Jan 2015

Stackforce found 100+ more professionals with Deep Learning & Model Optimization

Explore similar profiles based on matching skills and experience