Navin Anwani

CTO

Hyderabad, Telangana, India12 yrs 6 mos experience

Most Likely To SwitchAI ML Practitioner

Key Highlights

Expert in AI compiler design and optimization.
Led innovative ML features in 5G modem technology.
Strong background in digital signal processing and wireless communications.

Stackforce AI infers this person is a Semiconductor and Telecommunications expert with a strong focus on AI and DSP technologies.

Contact

Skills

Core Skills

Ai CompilerCompiler OptimizationOptimizationDigital Signal ProcessingMachine LearningSignal ProcessingAudio ProcessingComputer Architecture

Other Skills

AlgorithmsAlgorithm DesignArtificial Intelligence (AI)C++Collaborative R&DCompilersConvolutional Neural Networks (CNN)DMALarge Language Models (LLM)Python (Programming Language)SIMDResearchSoftware OptimizationTransformersHands-on Technical Leadership

About

Experienced ML/AI, Signal Processing and Wireless Communication Systems Engineer with a demonstrated history of working on AI inference acceleration, 4G/5G modem design and application of ML to it, 3D-audio and DSP core design. Skilled in Signal Processing, Algorithms, Optimization and Machine Learning. Master of Technology (M.Tech.) focused on Communication Engineering from Indian Institute of Technology, Bombay.

Experience

12 yrs 6 mos

Total Experience

2 yrs 6 mos

Average Tenure

2 yrs 9 mos

Current Experience

Amd

2 roles

Principal Member of Technical Staff

Promoted

Jul 2025 – Present · 9 mos · Hyderabad, Telangana, India · Remote

Leading AI compiler design team for on-device AI inference acceleration targeting AMD XDNA NPUs.
Providing technical leadership on designing algorithms, compute architecture, problem partitioning and optimization of various AI OPs.
Contributions across the compiler stack:
1. Front-end:
a. operator fusion,
b. operator translation
2. Middle-end:
a. Tiler
b. Cycle accurate cost modeling,
c. Dataflow scheduler
3. Back-end:
a. Kernel design and optimization
b. Codegen for DMA programming and control
Covered operators such as convolution, GeMM, multi-head attention (MHA), layer norm, softmax, non-linear activation, etc.

AI compilerAlgorithmsCompiler OptimizationComputer ArchitectureAlgorithm DesignArtificial Intelligence (AI)+17

Senior Member of Technical Staff

Jul 2023 – Jul 2025 · 2 yrs · Hyderabad, Telangana, India · Remote

1. Leading a team for on-device AI inference compiler design targeting AMD NPU comprising of an array of interconnected vector processors namely AI engines.
2. Major contributions across the compiler stack: OP fusion, Tiler, dataflow scheduler and kernel design for layers such as convolution, GeMM, multi-head attention (MHA), layer norm, softmax, etc.
3. AI inference acceleration & DSP system design architect for AMD NPU.
4. Providing technical leadership on designing algorithms, compute architecture, problem partitioning and optimization of various AI inference and DSP use cases targeted for XDNA architecture.
4. Performed optimal multi-core mapping, dataflow architecture and implementation of various state of the art open source and proprietary AI models such as Procyon models and various LLM/transformer blocks such as layer normalization, multi-head attention, GeMM, GeLU, Sigmoid, SiLU, softmax, convolution, etc., on AMD NPU.
5. The DSP use cases shipped include 5G cell search, radar space time adaptive processing, large generalized matrix multiply, high performance FFT mapped to ML inference engine, large QR decomposition, etc.

OptimizationDigital Signal ProcessingC++Machine LearningProbabilityTechnical Project Leadership+26

Qualcomm

4 roles

Staff Engineer

Jul 2022 – Jul 2023 · 1 yr

Led productization of ML based feature in 5G modem
ML based adaptive channel estimation

Learning AlgorithmsOptimizationDigital Signal ProcessingProbability TheoryC++Machine Learning+14

Staff Engineer

Nov 2021 – Jul 2022 · 8 mos

Took one of the first ML/AI based features in 5G modem from conception all the way through intensive lab and field testing, debug and refinements to productization.

OptimizationProbability TheoryC++Machine LearningProbabilityAdaptive Learning+11

Senior Lead Engineer

Promoted

Dec 2019 – Nov 2021 · 1 yr 11 mos

Seeded the ML in Wireless team at Qualcomm India.
Contributed to:
Ideation of Machine Learning based enhancements to 4G/5G modem design.
Design and proof-of-concept of NN & RL assisted channel state information estimation and adaptation
Validation of the solution and optimizing performance and compute
System design and prototyping of select features on 5G modem
It involved working on following stages of an ML based application:
∗ Problem formulation
∗ Feature selection and preprocessing
∗ Simulation setup for training data generation and/or end-to-end validation
∗ Neural network architecture optimization
∗ Inference and post-processing
∗ Integration and end-to-end validation

Probability TheoryC++ProbabilityReinforcement LearningPython (Programming Language)Algorithms+7

Senior Engineer

Nov 2017 – Dec 2019 · 2 yrs 1 mo

Designing adaptive downlink channel state feedback algorithms to optimize performance of LTE modems.

Probability TheoryC++ProbabilityAdaptive LearningAlgorithmsResearch+4

Nvidia

System Software Engineer

Dec 2016 – Nov 2017 · 11 mos · Pune Area, India

1. VRWorks Audio: The Project aimed to perform "Real-time immersive 3D audio rendering for virtual reality and gaming applications".
It involved real time geometric acoustic simulations of given scene geometry by exploiting massive computational power of NVIDIA GPUs. The NVIDIA OptiX Ray-Tracing Engine is employed to perform geometric acoustic simulation and obtain the binaural impulse responses of the environment with respect to the listener.
These impulse responses are then used for binaural rendering of immersive 3D-audio in real-time.
2. 360 Audio: Aimed at recording a sound field muxed with a 360 video on a VR camera rig.
3. Ambisonics to Binaural Rendering: It involved decoding ambisonics encoded audio for a given listener orientation followed by reproduction of the sound field at listener's ears by application of HRTFs to emulate virtual speakers around the listener.

Digital Signal ProcessingProbability TheoryC++ProbabilityAlgorithmsResearch+4

Cadence design systems

Design Engineer II

Jul 2015 – Nov 2016 · 1 yr 4 mos · Pune Area, India

In general the work involved providing software support for optimal implementation of DSP kernels on Tensilica IP DSP cores supporting SIMD, VLIW, FLIX and software pipelining. In particular made following contributions:
1. Optimal realization of DSP kernels. It involved:
vectorized implementation of DSP kenels
achieving best packed software pipelined loops
algorithmic changes for given instruction set architecture (ISA)
2. Extensively worked on optimal implementation of FFTs.
Worked on both fixed point as well as floating point FFTs
Utilized the special ISA support for FFTs by programming using intrinsic operations and ensuring best packed software pipelined loops
Work on fixed point FFTs covered variants like those with no scaling, static scaling and dynamic scaling
3. Contributed to development of multiple new DSP cores by providing ISA suggestions to improve the performance of certain DSP benchmarks on them
4. Enabled migration of DSP cores from 5 or 7 stage pipeline to 10 stage pipeline while still improving on cycles performance by
providing analysis of impact on performance of various DSP kernels in corresponding DSP libraries due to increased latencies/stalls
optimization of DSP kernels to compensate for degradations where possible
providing suggestions of new improvements in ISA (instruction set architecture)
setting up daily regressions testing framework of the corresponding DSP libraries to monitor functionality
5. Performed review of DSP libraries with both fixed point as well as floating point processing kernels for best possible performance on multiple DSP cores
In particular covered the FFT routines exhaustively and provided crucial algorithm and ISA suggestions for improved performance
Worked on implementation of DSP kernels for best performance with cache
6. Performed thorough analysis and performance estimation of FMCW radar processing using CA-CFAR and OS-CFAR for Tensilica IP DSP cores.

OptimizationDigital Signal ProcessingProbability TheoryC++ProbabilityAlgorithms+5

Electrical engineering department, iit bombay

Teaching Assistant

Jul 2013 – May 2015 · 1 yr 10 mos · Powai, Mumbai, India

Conducted Communication Lab and Digital Signal Processing Lab over first three semesters.
Then was a teaching assistant for Wireless Mobile Communication course in the last semester, performing evaluation of course assignments and exam answer scripts.