Prakhar Jagwani

Software Engineer

Atlanta, Georgia, United States0 mo experience

Key Highlights

Stackforce AI infers this person is a Backend-focused Software Engineer with expertise in Machine Learning and Mobile Development.

Machine LearningLarge Language Models (llm)Android Development

BashC++CUDAComputer NetworkingGitJavaJavaScriptLinuxONNXOpenCVParallel ProgrammingPyTorchPythonPython (Programming Language)React.js

0 mo

Total Experience

Average Tenure

Current Experience

May 2025 – Present · 1 yr · Santa Clara, California, United States · Hybrid

Jan 2025 – Apr 2025 · 3 mos · Atlanta, Georgia, United States · On-site

Aug 2024 – Dec 2024 · 4 mos · Mountain View, California, United States · On-site

Contributed to the ML inference backend:
Boosted throughput by 30% with 3X larger batch sizes, through dynamic capacity estimation and memory tracking
Reduced memory fragmentation by 50% with static and paged allocations; synchronized allocations across workers
Improved decode forward pass speed by 9% for a 70B model sharded 4 ways by adapting a custom AllReduce implementation (based on NVLink Sharp) to a distributed setting and integrating it into the inference server
Extended context length support to 128K tokens by optimally integrating an open source attention mechanism into the inference server
Implemented fine-tuning capabilities for a new LLM architecture by analyzing and adapting existing implementation

C++CUDAMachine LearningPyTorchLarge Language Models (LLM)

May 2024 – Aug 2024 · 3 mos · San Diego, California, United States · Hybrid

Contributed to the deployment and optimization of ML models on Exynos NPU:
Increased throughput by 7% on the Exynos NPU for an object detection model by modifying ML compiler code and optimizing execution schedules to improve prefetching
Assisted in the speech-to-text model selection process by assessing quality and performance across various ML configurations and comparing alternative approaches
Reviewed updates to the compiler for handling data transfer and frontend IRs, and presented detailed findings

C++PythonMachine Learning

Jan 2024 – May 2024 · 4 mos · Atlanta, Georgia, United States · On-site

Part-time role at the College of Computing Data Centers:
Maintained observability for 600+ research and instruction servers by deploying and managing Sensu for monitoring
Automated system statistics reporting by creating Bash scripts that feed data into InfluxDB and Grafana dashboards

LinuxBash

Jun 2022 – Jul 2022 · 1 mo · Hyderabad, Telangana, India · Remote

Developed an Android accessibility service with 20+ customizable actions (inspired by iPhones Assistive Touch)
Supported foldable screens, landscape mode, lock screen functionality, animations, and extensive customization

JavaAndroid Development