Saksham Gupta

Product Engineer

Pittsburgh, Pennsylvania, United States3 yrs 9 mos experience
AI EnabledAI ML Practitioner

Key Highlights

  • Expert in optimizing LLMs for faster inference.
  • Proven track record in AI and machine learning projects.
  • Strong background in software engineering and data science.
Stackforce AI infers this person is a Machine Learning Engineer with a focus on AI optimization and software development.

Contact

Skills

Core Skills

Machine LearningData ScienceSoftware Engineering

Other Skills

PythonCUDASparse attention methodsAI searchQwenCoderVLLMSpecDecodeRoBERTaTorch FSDPLlamaSGLangHF AccelerateData cleaningStreamlitPostgreSQL

About

I am Interested in making LLMs faster

Experience

3 yrs 9 mos
Total Experience
9 mos
Average Tenure
--
Current Experience

Carnegie mellon university

Research Assistant

Aug 2025Present · 9 mos · On-site

  • Working with the catalyst group and Professor Zhihao Jia on making LLMs faster using Sparse attention methods and enabling fast inference using CUDA kernels.
PythonCUDASparse attention methodsMachine LearningData Science

Zomato

2 roles

Research Engineer 2

Jan 2025Jun 2025 · 5 mos

Research Engineer

Apr 2024Jan 2025 · 9 mos

  • Built AI search and in-house Cursor for India’s top food delivery company
  • Fine-tuned QwenCoder 2.5 8B for code generation, improving Go performance by 14%.
  • Reduced inference latency by 4× using VLLM and SpecDecode to generate code in under 200 ms.
  • Fine-tuned Llama 3.1 8B to understand and recommend relevant dishes, improving results for tail queries.
  • Increased catalog searchability by 33% and retrieval precision by 41% via fine-tuned RoBERTa.
  • Decreased training time by 92% for embedding models utilizing Torch FSDP.
  • Built automated data-generation pipelines for synthetic training data using Llama 3.1 70B and asyncio.
AI searchQwenCoderVLLMSpecDecodeRoBERTaTorch FSDP+3

Boson ai

Member of Technical Staff

Sep 2023Mar 2024 · 6 mos

  • Worked at Boson AI as a Member of the Technical Staff led by Dr. Mu Li and Dr. Alex Smola.
  • Worked towards building SOTA multimodal LLM for image data extraction.
  • Built baselines for image content extraction, used SGLang, and HF Accelerate to perform Parallel Inference on
  • models like LLava, Nougat, etc, and sped up inference execution by 30X.
  • Scraped more than 200,000 game scripts from various online sources and performed Data cleaning and labeling
  • using GPT-4 API to generate high-quality synthetic data for training bigger LLMs to generate fictional content.
  • Deployed a Streamlit app for Data labeling tasks and viewing, and easily editing the gathered data. Used an AWS EC2 instance to host the application internally and store information in PostgreSQL
SGLangHF AccelerateData cleaningStreamlitPostgreSQLMachine Learning+1

Microsoft

Research Fellow

Sep 2022Sep 2023 · 1 yr

  • Research Fellow on the Prose research group working on AI-assisted Program Synthesis
  • Created a novel metric for computing code similarity using a Fine-Tuned and Pre-trained CodeT5+ and achieving 28% improvement over SOTA CodeBERTScore.
  • Created benchmark dataset for fine-grained code comparison using Tree-sitter for AST manipulation based on CodeSearchNet used to fine-tune and test model performance.
  • Worked on Personalized code suggestions in low-code automation platform, paper accepted at ICSE 2023.
  • Integrated personalization mechanism into decoders to increase prediction accuracy by more than 22%
  • Worked on Visual Studio CoPilot chat and led the effort on Code Explanation testing. Created several unit tests to ensure code explanations by our models are reliable
CodeT5+Tree-sitterVisual Studio CoPilotMachine LearningData Science

Prodigal

Software Engineer

Jan 2022Aug 2022 · 7 mos

  • Part of the Infra and Data team at Fintech SaaS startup backed by Y Combinator, Accel, and Menlo Ventures
  • Built Cupid a service to match metadata files to audio files as REST API in Go. Reduced down stream processing delays by 7 times with ability to match and move 1k audio files to S3 in under 1 second
  • Built Hermes a real-time service to move files from SFTP to S3, and reduced audio processing start time delays by 99.6%, and built a REST API in Go to fetch call metadata and modify filenames in AWS RDS.
  • Built aggregator service to update call counter of all tenants in REDIS and by processing messages from SQS using Goroutines. Achieved processing speed of 1600 messages/sec providing 10X speed in comparison to sequential code.
  • Gave company wide talk on why we should build more services using Go.
  • Spearheaded Databricks POC and used Spark Streaming to build auto-generating billing dashboards for tenants.
GoREST APIAWSSpark StreamingSoftware Engineering

Mitacs

Undergraduate Research Fellow

May 2021Jul 2021 · 2 mos · Remote

  • Learned about optimization problems and worked on the Exam scheduling problem
  • 1) Carried out an extensive literature survey to understand the problem
  • 2) Implemented various nature-inspired algorithms including Ant colony optimization
  • 3) Implemented a combinatorial PSO algorithm that hasn't been used for solving the particular problem
  • Guide: Prof. Malek Mouhoub

Stanford university

Volunteer CS106A Section Leader and Teacher Mentor

Apr 2021May 2021 · 1 mo

  • The Code In Place initiative was started by Prof. Chris and Prof Mehran to provide Computer science education to the world for free during COVID-19.
  • 1) Provided SL Training and Mentorship to New Section Leaders who were leading their first section for Code In Place
  • 2) Lead and taught python to 11 students from around the world, and discuss the weekly section problems provided by the CIP staff.

Wave learning festival

Director of Technology

Jan 2021Jan 2022 · 1 yr · Remote

  • Managed 10+ student developers at Wave, a nonprofit EdTech providing equitable education to over 15,000 users
  • Built the lambda function in Node.js to automatically update attendance in DynamoDB saving 11,000 man-hours
  • Used React-JS and various AWS services to build new features and roll out modifications to our product
Node.jsReact-JSAWSSoftware Engineering

Carnegie mellon university

Research Intern

Nov 2020May 2021 · 6 mos

  • Used data science and ML/DL to generate better offline content for students under Dr. Mostow
  • Regenerated and fixed student-level interaction dataset from the json logs of 40 villages using Python
  • Achieved a 3% gain in learning on the student simulator by reordering activities using Simulated Annealing

Indraprastha institute of information technology, delhi

Undergraduate Student Researcher

Oct 2020Dec 2020 · 2 mos · Remote

  • Worked as a research intern at the Graphics research lab at IIIT-D.
  • Responsibilities included:
  • 1) Worked with volumetric data and visualized it using tools like VTK.
  • 2) Performed Saliency Computation for 2D images and compared different Saliency Algorithms.
  • 3) Worked with Medical Imaging data like MRI and CT Scans.
  • Guide: Prof. Ojaswa Sharma
PythonData Science

Wave learning festival

Instructor

Jul 2020Aug 2020 · 1 mo

  • Instructor for the course 𝘐𝘯𝘵𝘳𝘰𝘥𝘶𝘤𝘵𝘪𝘰𝘯 𝘵𝘰 𝘈𝘭𝘨𝘰𝘳𝘪𝘵𝘩𝘮𝘪𝘤 𝘛𝘩𝘪𝘯𝘬𝘪𝘯𝘨 at the wave learning festival, a volunteer project that was started by college students to provide free and accessible education to high school students during the pandemic.
  • My job involved teaching 60 students and developing content for the classes which revolved around different algorithms.

Coding together

Head Section Leader and Admin

Jun 2020Aug 2020 · 2 mos

  • Part of the teaching team and the administrative team for Coding Together, a Stanford Rebuild project offering free computer science education during the time of the COVID-19 pandemic using Harvard’s CS50X.
  • Responsibilities involved:
  • 1) Recruiting section leaders
  • 2) Give SL training to people who were new to Leading a section
  • 3) Building assignments, and section problems
  • 4) Teaching the first 4 weeks of Harvard CS50X in C

Stanford university

Volunteer CS106A Code In Place Section Leader

Apr 2020May 2020 · 1 mo

  • The Code In Place initiative was started by Prof. Chris and Prof Mehran to provide Computer science education to the world for free during COVID-19.
  • As a section leader along with other amazing section leaders, our aim was to help teach computer science to everyone via online learning tools. My role was to teach python to 11 students from around the world, and discuss the weekly section problems provided by the CIP staff.

Samsung india

Samsung Prism Project

Oct 2019Apr 2020 · 6 mos · Vellore, Tamil Nadu, India

  • Selected to work on the On-Device AI project which tackled the problem of generating live captions for images on your phone by analyzing the image at hand.
  • Project Status:
  • Incomplete due to no contribution or guidance from the mentors and team involved in the project.

Education

Carnegie Mellon University

Masters in ML and NLP — Computer Science

Jan 2025Jan 2027

Vellore Institute of Technology

Bachelor of Technology - BTech — Computer Science

Jan 2018Jan 2022

Delhi Public School Mathura Road-PCM with Computer Science

High School Diploma

Jan 2016Jan 2018

Delhi Public school -Mathura road

Class 10th

Stackforce found 100+ more professionals with Machine Learning & Data Science

Explore similar profiles based on matching skills and experience