Aparnaa Ramani

AI Researcher

Santa Clara, California, United States6 yrs 5 mos experience

Highly StableAI Enabled

Key Highlights

Expert in optimizing deep learning models for GPUs.
Experience with major AI frameworks like TensorFlow and PyTorch.
Strong background in machine learning and image processing.

Stackforce AI infers this person is a deep learning engineer specializing in AI/ML solutions for enterprise applications.

Contact

Skills

Core Skills

Generative AiLarge Language Models (llm)Deep LearningMachine Learning

Other Skills

OpenCLGPUCaffeTensorFlowPyTorchQuantizationModel OptimizationAsynchronous ProgrammingModel ExecutionSlurm Workload ManagerDockerNatural Language Processing (NLP)Python (Programming Language)Public SpeakingResearch

About

I am a deep learning engineer working in the GenAI space, helping customers leverage NVIDIA products for their AI pipeline. I focus on RAG, Inference optimization and training optimization for GPUs. I have also been involved in supporting popular frameworks; TensorFlow, Pytorch, Caffe, ONNX for android-based systems. I have experience with ML front-end from my Masters at Georgia Tech, having handled several projects related to intelligent systems with image data. My projects dealt with areas like posture detection, object recognition, sequential data processing, optimizations of Convolution Neural Networks, image processing concepts like Saliency, etc. C++ | Pytorch | TensorFlow | LLM | Machine | Image Processing

Experience

6 yrs 5 mos

Total Experience

4 yrs 2 mos

Average Tenure

2 yrs 3 mos

Current Experience

Nvidia

Gen AI Solutions Architect

Feb 2024 – Present · 2 yrs 3 mos · Santa Clara, California, United States · On-site

Generative AILarge Language Models (LLM)

Qualcomm

2 roles

Machine Learning Engineer

Mar 2020 – Feb 2024 · 3 yrs 11 mos · San Diego

ML Engineer in the Qualcomm Machine Learning Group's GPU team. I was involved in designing deep learning libraries like the Qualcomm AI Engine that enable execution and inference of models present in different frameworks like Caffe, TensorFlow, tf-lite, ONNX, PyTorch, etc. I have worked extensively ARM and x86 architectures to create SDKs for top android-based systems in the market as well as Open-Source Developers. Some of my projects include;
Working on optimizations to run Stable Diffusion, LLMs like LLAMA (7B, 13B), GPT3, GPT-Neo, Falcon LLM,
Vicuna, etc. with improved execution time, including different implicit layer fusion enablement
Enabling quantization support which would help presenting the graph output in a compact fixed precision format and integrating it into the backend codebase
Creating specific accuracy testing script to test quantization performance
Designing interface for graph finalizing and execution in specific precision modes opted by developer
Designing and creating an efficient module for asynchronously executing models for a given sequence of images
Contributing to graph-caching for repeated execution of a finalized model for input images; created modules to
cache and retrieve kernel frameworks for model execution.
Writing kernels for operations like Casting, Convolution, Reshape, PReLU, etc. and creating support for the library to add them as graph layers
I also work with cross-functional teams like GPU Hardware, ML Accelerator, CPU Backend, etc

OpenCLGPUDeep LearningMachine Learning

ML intern

May 2019 – Aug 2019 · 3 mos · Greater San Diego Area

Part of the SNPE-GPU team.
Determined the extent of specialization possible in an OpenCL GPU kernel by optimizing the conditional statements
Developing a generalized interface to specialize a kernel based on the developers choice

Iit madras

Intern, ASR team

May 2015 – Jul 2015 · 2 mos · Chennai Area, India

I was involved in building real time neural network models for speech recognition in around twenty native Indian languages, to be integrated into a farmer’s assistant system.