Pradyumna Marathe

Product Manager

Pune, Maharashtra, India7 yrs 11 mos experience
Most Likely To SwitchAI Enabled

Key Highlights

  • Expert in optimizing deep learning workloads.
  • Strong background in safety software design.
  • Proficient in CUDA and deep learning compilers.
Stackforce AI infers this person is a specialist in AI/Deep Learning with a focus on performance optimization.

Contact

Skills

Core Skills

Deep LearningCuda

Other Skills

GPGPUSystem on a Chip (SoC)Software ArchitectureAssembly LanguageVectorizationOptimizationDeep Learning CompilersMLIRC++DMAMakefileQNXARMInfrastructure DesignCMake

About

Optimizing DL Inference Workloads at Nvidia - HPC CUDA C/C++ SIMT, Tensor Cores - Deep Learning Compilers (TensorRT, Myelin, MLIR) and on the fly cuda codegen - e2e DL Performant Safety Solutions for L2++ Driving - Safety Software Design and Architecture - Also well versed with Coverity, VectorCast, Testing for Safety

Experience

7 yrs 11 mos
Total Experience
2 yrs 7 mos
Average Tenure
4 yrs 4 mos
Current Experience

Nvidia

2 roles

Senior Deep Learning Compute Engineer

Promoted

Mar 2025Present · 1 yr 2 mos

  • Safety DL compiler for dynamic fusions
  • Safe Fast DL Kernels (Cuda, Tensor Cores) and dynamic kernel codegen
  • Systems level inference architecture and optimizations for optimal throughput and efficiency
  • Assembly / instruction level expertise for performance vs accuracy tradeoffs
GPGPUCUDASystem on a Chip (SoC)Software ArchitectureDeep LearningAssembly Language+5

Deep Learning Compute Engineer

Jan 2022Mar 2025 · 3 yrs 2 mos

  • Solution Engineering and Architecture for DL compute workloads.
CUDASystem on a Chip (SoC)Software ArchitectureDMADeep LearningVectorization+10

Vishwakarma institute of technology

Member, Industry Advisory Board

Oct 2024Present · 1 yr 7 mos · Pune District, Maharashtra, India

  • Member of IAB (Industry Advisory Board) in personal capacity (not representing NVIDIA).

Cadence design systems

2 roles

Software Engineer 1 - Artificial Intelligence

Jun 2020Jan 2022 · 1 yr 7 mos · Pune, Maharashtra, India

  • 1. Optimization of Neural Networks for Inference - HPC
  • 2. Vectorization and Parallelization on proprietary Digital Signal Processors (Assembly, C Intrinsics)
  • 3. Processors programmed: VisionP6, VisionQ7, VisionQ8, VisionP1
  • 4. Implemented custom optimizations for RNNs, low precision integer convolutions and adaptive learning.
  • 5. Intermediate Statistical Data Analysis, Machine Learning (Model Fitting), Performance Modelling for DSPs
  • 6. Knowledge in general DSP architecture

Intern-Software Engineer AI

Jul 2019Jan 2020 · 6 mos · Pune, Maharashtra, India

  • Intern in AI team working on -
  • 1. Vision DSPs P6/Q7
  • 2. Predictive Analytics for Optimizer: Automated selection of vectorized kernels pertaining to Deep Learning operations
  • 3. Implementation of Deep Learning algorithms (CNNs and RNNs) through fixed point vectorized DSP codes (Bit Exact and Optimized Implementations to reduce cycles per pixel in output)
  • 4. Study and analysis of various CNNs/RNNs
  • 5. Bash Shell scripting for process automation

Vishwakarma institute of technology

Student Researcher

Jun 2018Jun 2020 · 2 yrs · Pune/Pimpri-Chinchwad Area

The robotics forum vit pune

Summer Intern

Jun 2017Aug 2017 · 2 mos · Pune/Pimpri-Chinchwad Area

Education

Vishwakarma Institute Of Technology

Bachelor of Technology (BTech) — Electronics and Telecommunications Engineering

Jan 2016Jan 2020

Stackforce found 100+ more professionals with Deep Learning & Cuda

Explore similar profiles based on matching skills and experience