Pradyumna Marathe

Product Manager

Pune, Maharashtra, India7 yrs 11 mos experience

Most Likely To SwitchAI Enabled

Key Highlights

Expert in optimizing deep learning workloads.
Strong background in safety software design.
Proficient in CUDA and deep learning compilers.

Stackforce AI infers this person is a specialist in AI/Deep Learning with a focus on performance optimization.

Contact

Skills

Core Skills

Deep LearningCuda

Other Skills

GPGPUSystem on a Chip (SoC)Software ArchitectureAssembly LanguageVectorizationOptimizationDeep Learning CompilersMLIRC++DMAMakefileQNXARMInfrastructure DesignCMake

About

Optimizing DL Inference Workloads at Nvidia - HPC CUDA C/C++ SIMT, Tensor Cores - Deep Learning Compilers (TensorRT, Myelin, MLIR) and on the fly cuda codegen - e2e DL Performant Safety Solutions for L2++ Driving - Safety Software Design and Architecture - Also well versed with Coverity, VectorCast, Testing for Safety

Experience

7 yrs 11 mos

Total Experience

2 yrs 7 mos

Average Tenure

4 yrs 4 mos

Current Experience

Nvidia

2 roles

Senior Deep Learning Compute Engineer

Promoted

Mar 2025 – Present · 1 yr 2 mos

Safety DL compiler for dynamic fusions
Safe Fast DL Kernels (Cuda, Tensor Cores) and dynamic kernel codegen
Systems level inference architecture and optimizations for optimal throughput and efficiency
Assembly / instruction level expertise for performance vs accuracy tradeoffs

GPGPUCUDASystem on a Chip (SoC)Software ArchitectureDeep LearningAssembly Language+5

Deep Learning Compute Engineer

Jan 2022 – Mar 2025 · 3 yrs 2 mos

Solution Engineering and Architecture for DL compute workloads.

CUDASystem on a Chip (SoC)Software ArchitectureDMADeep LearningVectorization+10

Vishwakarma institute of technology

Member, Industry Advisory Board

Oct 2024 – Present · 1 yr 7 mos · Pune District, Maharashtra, India

Member of IAB (Industry Advisory Board) in personal capacity (not representing NVIDIA).

Cadence design systems

2 roles

Software Engineer 1 - Artificial Intelligence

Jun 2020 – Jan 2022 · 1 yr 7 mos · Pune, Maharashtra, India

1. Optimization of Neural Networks for Inference - HPC
2. Vectorization and Parallelization on proprietary Digital Signal Processors (Assembly, C Intrinsics)
3. Processors programmed: VisionP6, VisionQ7, VisionQ8, VisionP1
4. Implemented custom optimizations for RNNs, low precision integer convolutions and adaptive learning.
5. Intermediate Statistical Data Analysis, Machine Learning (Model Fitting), Performance Modelling for DSPs
6. Knowledge in general DSP architecture

Intern-Software Engineer AI

Jul 2019 – Jan 2020 · 6 mos · Pune, Maharashtra, India

Intern in AI team working on -
1. Vision DSPs P6/Q7
2. Predictive Analytics for Optimizer: Automated selection of vectorized kernels pertaining to Deep Learning operations
3. Implementation of Deep Learning algorithms (CNNs and RNNs) through fixed point vectorized DSP codes (Bit Exact and Optimized Implementations to reduce cycles per pixel in output)
4. Study and analysis of various CNNs/RNNs
5. Bash Shell scripting for process automation