A

Aadesh Deshmukh

Lead ML Engineer

San Francisco, California, United States5 yrs 2 mos experience
Most Likely To SwitchAI Enabled

Key Highlights

  • Expert in optimizing deep learning models for high-performance computing.
  • Proven ability to enhance GPU programming for scientific applications.
  • Strong collaboration skills across multidisciplinary teams.
Stackforce AI infers this person is a highly skilled Machine Learning Engineer with expertise in scientific computing and GPU programming.

Contact

Skills

Core Skills

Machine LearningGpu ProgrammingScientific ComputingSoftware Engineering

Other Skills

AlgorithmsC++CUDACmakeComputer ScienceData StructuresDatabasesDeepSpeed ChatGEMMGenerative AIGitGriffinHIPHW/SW co-designJulia

About

Skilled in software development/research with extensive experience in ML Systems, HPC, GPU programming, and scientific computing, enhancing deep learning models at AMD/Xilinx. Proven ability to solve complex problems and optimize performance in large-scale computing environments. - Blog ~ https://nextjournal.com/aa25desh - I take great joy in biking, standup comedy, and hiking - know 5 programming languages but the compiler can understand only two of them

Experience

5 yrs 2 mos
Total Experience
11 mos
Average Tenure
1 yr 7 mos
Current Experience

Amd

Sr. Deep Learning Compiler Engineer

Sep 2024Present · 1 yr 7 mos · San Jose, California, United States · Hybrid

  • HW/SW co-design for AMD’s XDNA architecture, building an innovative data-movement compiler that maximizes memory-hierarchy efficiency for high-throughput GEMM, convolution, and memory-bound multi-head attention workloads.
  • Developed a flexible attention compiler supporting arbitrary tensor shapes, masks, padding, and reordering; implemented layer fusion, software pipelining for VLIW-SIMD cores, and automated op-shape mapping to minimize memory roundtrips.
  • Collaborated across architecture, emulation, and quantization teams to validate silicon, implement aggressive inference flows, and deploy production-grade AI models.
HW/SW co-designdata-movement compilermemory-hierarchy efficiencyGEMMconvolutionmulti-head attention+5

Idaho national laboratory

Computational Scientist

Sep 2023Oct 2024 · 1 yr 1 mo · Idaho Falls, Idaho, United States · Hybrid

  • Optimized a deep learning model for neutronics using Torch Compiler/TVM for reduced memory usage and faster inference. Benchmark with torchprofiler across 200+ isotopes and integrated with Griffin for pebble-bed reactor simulations.
  • Implemented a software pipeline for fluidized bed reactors, launching 4000+ MFIX/Barracuda simulations. Implemented MPI/NCCL based parallel algorithm to process simulation data for Generative AI, enabling real-time validation.
  • Developed full research proposal as PI for Large Language Model for Computational Tools, optimizing combined training and interface using ZeRO-Infinity (FSDP) DeepSpeed Chat.
Torch CompilerTVMneutronicsmemory optimizationGriffinfluidized bed reactors+6

Berkeley lab

GRA, Software Engineering

May 2022Aug 2022 · 3 mos · 1 Cyclotron Rd, Berkeley, CA 94720

  • Developed a wrapper for superlu_dist subroutines to solve a sparse linear system on 3 targets Linux/Mac/Win supporting different versions of MPI; using Clang compiler, MPITrampoline and compiled a shared library of superlu_dist
  • Worked on integration between function calls and objects of two different languages C++ and Julia. Used Cmake for make file generation and compilation process and NERSC DDT tool for debugging the distributed code.
superlu_distMPIC++JuliaCmakeNERSC DDT+1

University of utah

Research Assistant

Aug 2021Sep 2023 · 2 yrs 1 mo · Salt Lake City, Utah, United States · Hybrid

  • Developed GPU (CUDA/HIP) kernel optimizations for finite volume simulations, focusing on matrix assembly, SpMV and divergent code scheduler. Implemented operator fusion, loop flattening, and tiling. Achieved a 10x increase in performance on single GPU (Nsight) nodes and 20x on multi-GPU setups, leveraging system heterogeneity.
  • Co-developed a domain-specific language (DSL) and code generator for FE/FV scientific computing, enabling efficient code distribution across CPU/GPU platforms. Developed a proof of concept based on MLIR to optimize data movement, enhance tensor operations, integrate mixed precision, and automate design space optimization.
CUDAHIPfinite volume simulationsmatrix assemblySpMVdivergent code scheduler+7

Google summer of code

GSOC 2020, Software Engineering

May 2020Aug 2020 · 3 mos

  • Developed a Machine Learning library for time series classification with 2 ML algorithms, time series forest and KNN, with unified user APIs to fit, predict, and tune models using data randomization and classic tabular ML algorithms.
  • Achieved 6 times speedup for time series forest algorithm. Built pipeline for Time Series ML models and tested the models by performing benchmarking and validation using 104 dissimilar data sets.
Machine Learningtime series classificationKNNdata randomizationbenchmarking

Numfocus

Software Engineer Intern

May 2019Aug 2019 · 3 mos

  • Implemented Range over-approximation algorithm; Improves bounds over a range of Taylor models by 70%.
  • Developed RangeEnclosures.jl a Julia package that bounds polynomials using four different algorithms.
  • Implemented branch and bound algorithm using low degree polynomial; This minimizes relative precision to 1e−5.
Range over-approximation algorithmJuliabranch and bound algorithmSoftware Engineering

National institute of technology, tiruchirappalli

Intern

Nov 2018Jan 2019 · 2 mos · Tiruchirappalli

  • Trained Naive Bayes ML model using NLTK and scikit-learn for sentiment analysis with accuracy of 76%.
  • Collected 2 thousand data points using Twitter APIs; handled missing values and outliers using Pandas, NumPy.
Naive BayesNLTKscikit-learnsentiment analysisTwitter APIsPandas+2

Education

University of Utah

Master — Computer Science

Indian Institute of Information Technology, Tiruchirappalli

B.TECH. — Computer Science and Engineering

Stackforce found 100+ more professionals with Machine Learning & Gpu Programming

Explore similar profiles based on matching skills and experience