Prasanna Biswas

Lead ML Engineer

Bengaluru, Karnataka, India7 yrs 8 mos experience

AI EnabledAI ML Practitioner

Key Highlights

Expert in optimizing AI systems for performance and accuracy.
Co-authored papers for prestigious conferences in AI.
Developed patented algorithms enhancing compiler efficiency.

Stackforce AI infers this person is a Machine Learning Engineer specializing in AI systems and performance optimization.

Contact

Skills

Core Skills

Generative AiAi SystemsGpu ProgrammingLarge Language Models (llm)Machine LearningNatural Language Processing (nlp)

Other Skills

PyTorchInference-time optimizationsTritonContinuous Integration and Continuous Delivery (CI/CD)KinetoKernel OptimizationC++Heterogeneous ProgrammingSYCL & DPC++Unit TestingResearch and Development (R&D)Team LeadershipONNXData StructuresAlgorithms

About

I am an Advisory Research Engineer at IBM Research – India Lab with close to five years of experience in machine learning, deep learning, and AI systems. My work lies at the intersection of AI algorithms, system-level optimizations, and high-performance computing. At Intel, I developed high-performance GPU kernels with SYCL for Falcon Shores architecture, optimized performance-critical operators, and explored advanced ML research combining VAEs with Diffusion Models, co-authoring papers submitted to CVR 2025 and IEEE CONNECT 2025. Previously at Qualcomm, I optimized ONNX models for NLP, CV, and LLMs on the AI100 accelerator and contributed to custom node fusion operations for inference acceleration. I hold a Master’s in Computer Science from IIT Bombay, where my research focused on multimodal meta-learning for sarcasm and emotion analysis. My expertise spans deep learning for NLP, CV, and LLMs, GPU kernel optimization, AI systems, and bridging research with real-world performance.

Experience

7 yrs 8 mos

Total Experience

1 yr 7 mos

Average Tenure

9 mos

Current Experience

Ibm

Advisory Research Engineer

Sep 2025 – Present · 9 mos · Bengaluru, Karnataka, India · Hybrid

Working on

PyTorchInference-time optimizationsGenerative AITritonContinuous Integration and Continuous Delivery (CI/CD)Kineto+1

Intel corporation

AI Software Solutions Engineer

Jan 2024 – Aug 2025 · 1 yr 7 mos · Bengaluru, Karnataka, India · Hybrid

Developed high-performance kernels for deep learning operators on the upcoming Intel GPU using SYCL.
Worked on enhancing the software stack by creating an optimized graph in C++ to handle complex operations and utilizing MLIR.

Kernel OptimizationC++Heterogeneous ProgrammingSYCL & DPC++Unit TestingGPU Programming+1

Qualcomm

2 roles

Senior Machine Learning Engineer

Promoted

Nov 2022 – Jan 2024 · 1 yr 2 mos

Spearheaded ONNX optimizations for NLP and CV models on Qualcomm’s AI100 accelerator, achieving a notable performance boost for large language models (LLMs).
Doubled the efficiency of NLP transformer decoder models by implementing key onnx optimizations, and caching Key-Value matrices.
Developed a Graph Neural Network algorithm to enhance compiler efficiency, resulting in a filed patent.
Led a three-member team in optimizing and deploying models from Hugging Face on AIC 100.

Generative AIResearch and Development (R&D)Team LeadershipONNXInference-time optimizationsLarge Language Models (LLM)+1

Machine Learning Engineer

Nov 2020 – Nov 2022 · 2 yrs

Worked on ONNX optimizations for NLP (Natural Language Processing) and CV (Computer Vision) models for faster inference on Qualcomm’s AI100 accelerator.
Designed and implemented software modules for Artificial Intelligence/Deep Neural Network frameworks and tools in C++ & Python automating general (ONNX / TF's forzen) graph optimizations.
Implemented auto-detection of post processing part for Image classification, and object detection models, and replaced it with optimized kernels to improve the accuracy of the model during quantization.
Implemented Graph algorithms for sorting nodes and removing unused nodes in a graph for faster inference.
Deployed models of different ML frameworks (PyTorch, TensorFlow, ONNX) for cloud/ edge use-cases.

Data StructuresAlgorithmsNatural Language Processing (NLP)Computer VisionPython (Programming Language)Deep Learning+6

Indian institute of technology, bombay

3 roles

Research Assistant

Promoted

Aug 2020 – Nov 2020 · 3 mos · Mumbai, Maharashtra, India

While working as Research Assistant, I developed a multi-modal approach to understanding emotions in Sarcasm.
The work touches upon the incongruities of the language. The hidden emotions behind a sentence. In text-to-text, my focus has been on problems such as emotion classification, sentiment analysis, and understanding sarcasm.
Worked in joint collaboration of IBM and IIT Bombay on Understanding emotions in Sarcasm.
Trained a transformer based architecture for leveraging the relation between video, audio and textual features. Proved that emotion information was necessary to identify sarcasm more precisely. Experiments with emotion information had 15.6% better performance.

Natural Language Processing (NLP)Python (Programming Language)PyTorchDeep LearningData ScienceSpeech Processing+1