Hemanth Rudra

Software Engineer

Hyderabad, Telangana, India4 yrs 11 mos experience
Most Likely To SwitchAI ML Practitioner

Key Highlights

  • Expert in LLM enablement on edge devices.
  • Proficient in performance optimization and benchmarking.
  • Strong background in AI systems on Qualcomm platforms.
Stackforce AI infers this person is a specialized AI Engineer focused on edge AI and performance optimization.

Contact

Skills

Core Skills

Large Language Models (llm)Inference OptimizationProfilingBenchmarking

Other Skills

C++Python (Programming Language)LLMQuantization TechniquesModel profilingMachine LearningDeep LearningJavaScriptAutomationMulti-agent SystemsAccuracySwift (Programming Language)Core MLONNXQNN

About

Compute AI Engineer specializing in on-device LLM enablement, inference optimization, and agentic AI systems on Qualcomm Snapdragon platforms. My work focuses on bringing large language models and multi-modal models to edge devices, leveraging Hexagon NPU (QNN/HTP) for high-performance inference. I build end-to-end inference pipelines, optimize execution across runtimes, and design agentic workflows using local LLMs. Key areas I work on: Enabling LLMs on Qualcomm NPU. Building efficient inference pipelines (QNN, ONNX Runtime, local runtimes). Designing agent-based systems with on-device LLMs. Deep performance profiling & benchmarking (latency, TPS , memory). I’m particularly interested in edge AI, agentic workflows, and performance optimization, pushing LLM capabilities closer to real-world deployment on devices.

Experience

4 yrs 11 mos
Total Experience
2 yrs 5 mos
Average Tenure
4 yrs 2 mos
Current Experience

Qualcomm

2 roles

Senior Engineer

Promoted

Nov 2024Present · 1 yr 7 mos · Hyderabad, Telangana, India · On-site

  • Compute AI.
  • Enabling open source LLM models to run on Qualcomm NPU. LLM quantization and optimization. Accuracy pipelines and KPI's.
  • LLM, Mutlimodal llm's, tool calling use cases.
  • Model performance and Tuning on Snapdragon Compute platforms(Snapdragon X Elite) for ONNX and QNN converted models. Quantization. Geek bench AI (GBAI). Model profiling. ORT. Chrome trace Analysis. Llm pipelines. MMLU Scripts.Core ML Analysis. Core ML Compute Plan Analysis for MLprogram and Mlmodelc models. Core ML profiling. XCode Instruments.
C++Python (Programming Language)Large Language Models (LLM)Inference Optimization

Engineer

Apr 2022Nov 2024 · 2 yrs 7 mos · Hyderabad, Telangana, India · On-site

  • Working on Stability of Qualcomm external and internal tools like Qualcomm Simulation Platform, Qualcomm Software Centre, Qualcomm Device Cloud, Qualcomm Profiler , Hexagon IDE on top of Visual Studio code IDE which includes Hexagon SDK, QNN SDK and Hexagon-nn-V3.
  • Worked on different Soc's like Mobile, Automotive, XR, Compute for Stability of Tools.
  • Worked on Profiling of various use cases in Automotive, Mobile using Qualcomm Profiler.
  • Created Scripts for various use cases profiling and plotting.
  • Worked on Benchmarking of Qualcomm Simulation Platform Based on Qualcomm Automotive Soc's with Deep learning models on different OS platforms like QNX and LRH .
  • Created scripts to trigger the Benchmark results for different deep learning models(Open source and Customer Models).
Machine LearningDeep LearningProfilingBenchmarking

Nokia

2 roles

Graduate Engineering Trainee

Jun 2021Mar 2022 · 9 mos · Chennai, Tamil Nadu, India

JavaScriptAutomation

Research And Development Intern

Aug 2020May 2021 · 9 mos · Chennai, Tamil Nadu, India

JavaScriptAutomation

Education

Vellore Institute of Technology

Mtech Integrated — Computer Software Engineering

Jan 2016Jan 2021

Stackforce found 100+ more professionals with Large Language Models (llm) & Inference Optimization

Explore similar profiles based on matching skills and experience