Amardeep Kumar

AI Researcher

New York, New York, United States4 yrs 3 mos experience

Key Highlights

Expert in building scalable AI solutions.
Proficient in Generative AI and LLMs.
Strong background in applied research and software development.

Stackforce AI infers this person is a Fintech and SaaS expert with a focus on AI and machine learning solutions.

Contact

Skills

Core Skills

Large Language Models (llm)Retrieval-augmented Generation (rag)Natural Language Processing (nlp)Machine LearningSoftware Development

Other Skills

AlgorithmsAmazon Web Services (AWS)BM25CC++Cascading Style Sheets (CSS)ColBERTComputer VisionCross-functional Team LeadershipData StructuresDeep LearningDjangoDockerElastic SearchExpress.js

About

I love to do applied research to build scalable AI solutions (current focus on Generative AI and LLMs), and build efficient infra to productionize them.

Experience

4 yrs 3 mos

Total Experience

1 yr

Average Tenure

1 yr

Current Experience

Doordash

Machine Learning Engineer

Jun 2025 – Present · 1 yr · New York, United States · Hybrid

GenAI Platform - Foundations.

Pinegap.ai

Machine Learning Engineer

May 2024 – Aug 2024 · 3 mos · New York, United States · Hybrid

Designed high-level and low-level components of an AI assistant for equity research and financial market analysis. I was the first AI hire, owned and built everything from scratch.
Developed end-to-end components of a keyword-guided retrieval and search engine. Used BM25, Elastic Search, dense vector search, fine-tuned Splade, and ColBERT models for query augmentation and candidate reranking.
Built an advanced agentic RAG framework with CI/CD pipelines for query understanding, document ingestion, and knowledge graph building to support Q&A over the past 10 years of financial documents from S&P 2000 companies.
Developed and containerized microservices to fine-tune and serve LLMs (LLaMa-3, Phi-3 and Mixtral) as APIs using Docker and Kubernetes. Employed techniques like quantization, flash attention, LoRA, data and model parallelism for efficient fine-tuning on GPUs. Utilized vLLM, Paged-attention, Kafka, and Redis for faster LLM inference service.

Large Language Models (LLM)Python (Programming Language)PyTorchDockerKubernetesAmazon Web Services (AWS)+3

Instabase

Software Engineer, Machine Learning

Jan 2022 – Aug 2023 · 1 yr 7 mos

Designed and implemented Model inference service, Regression and load-testing framework to integrate Large-Language-Models and Generative AI capabilities into Instabase platform. https://aihub.instabase.com/
Designed and implemented an async Model serving platform, which combines async workers, rabbit-MQ, two levels of caching, and sticky routing techniques to improve inference time by 40% on a compute-limited Kubernetes environment.
Devised a model drift detection pipeline that uses layout change detectors to detect format drift and uses Least-Squares Density Difference and Maximum Mean Discrepancies on document embedding to catch content drift in the documents.

Natural Language Processing (NLP)Machine LearningSoftware DevelopmentDeep LearningFlaskSoftware Design+4

Walmart global tech india

Software Engineer 2

Aug 2020 – Jan 2022 · 1 yr 5 mos · Bengaluru, Karnataka, India

Implemented the entity embedding technique for sparse categorical values, which improved the accuracy of the existing forecasting systems by 17% and reduced the inference time by 15%. This also reduced the effort required for feature engineering.
Tweaked the gradient boosting algorithm to incorporate the feedback of the distribution center’s manager while calculating errors to form decision trees and implemented weighted loss to penalize misclassifications of perishable items such as fruits and dairy products more heavily.

Software Development

Aossie

Google Summer of Code (GSoC'19)

May 2019 – Aug 2019 · 3 mos

Developed a Google Chrome extension that uses natural language processing (NLP) to detect toxic comments, clickbait, and fake news on news websites and social media platforms like Facebook and Twitter. Trained a stance detection model for fake news classification and implemented a backend service using Flask for model inference. Designed a three-level cache system to optimize latency and avoid duplicate inference requests.

Software Development

Genesys

Applied Research (NLP) Intern

May 2019 – Jul 2019 · 2 mos · Hyderabad Area, India

Implemented passage ranking algorithm to boost the accuracy of in-house Question and Answer services’ model on some client data sets.Created a framework to train AllenNLP’s machine reading comprehension models on in-house medical data as part of a client’s POC and proposed its integration with the homegrown Chat Bot.