Niraj Bhandarwar

Co-Founder

Delhi, India1 yr 2 mos experience
Most Likely To SwitchAI ML Practitioner

Key Highlights

  • Expert in AI and Generative AI technologies.
  • Proven track record in developing LLM evaluation frameworks.
  • Strong background in quantitative finance and machine learning.
Stackforce AI infers this person is a skilled AI Engineer with expertise in developing scalable AI systems for the tech industry.

Contact

Skills

Core Skills

Artificial Intelligence (ai)Machine Learning

Other Skills

Amazon Web Services (AWS)Deep LearningDockerGenerative AIGitGoogle Cloud Platform (GCP)Knowledge GraphsLarge Language Models (LLM)Natural Language Processing (NLP)Optimization AlgorithmsQuantitative FinanceReinforcement Learning

About

I'm an AI Engineer with a background in GenAI, agentic systems, and quantitative finance, and trained at IIT Delhi. At Scale AI, I contributed to rubric design and agent task annotation for real-world GitHub repositories, enhancing evaluation rigor for Claude-based LLMs. I've also worked on RL-driven trading strategies, RAG pipelines, and LLM reliability across developer workflows. I thrive in fast-paced environments where research meets execution. My toolkit includes Python, PyTorch, MLflow, Kubernetes, Airflow, and cloud platforms like AWS and GCP. I'm seeking full-time roles where I can help build reliable, scalable AI systems — particularly in GenAI, applied ML, or quant-driven startups.

Experience

Stealth ai startup

Founding AI Engineer

May 2025Present · 10 mos · New York, United States · Remote

  • Building AI Products.

Scale ai

AI Engineer

Jan 2025May 2025 · 4 mos · San Francisco, California, United States · Remote

  • Ballerina Capuchina – LLM Rubric Development
  • Led the creation of 30+ rubric-based evaluation tasks for LLMs as part of the Ballerina Capuchina project, targeting real GitHub repositories like pandas, DVC, and mitmproxy.
  • Ensured each task met the benchmark of 60%+ rubric failure for Claude 3.7 baseline, increasing evaluation rigor and model differentiation.
  • Authored 10–20 atomic, objective, and self-contained rubric items per task, with clear mapping to prompt goals and critical vs. non-critical classifications.
  • Hyperion Augmentation - SWE Agent Task Annotator & Reviewer
  • Generated and refined over 250 software engineering problem statements and requirements with high specificity to guide LLMs in solving real GitHub PR issues.
  • Conducted test validation across >500 unit test logs (F2P/P2P) and ensured JSON accuracy, improving test coverage and agent reliability.
  • Documented public interfaces (functions/classes) from golden patches across multiple languages, enhancing modularity, clarity, and LLM performance in code tasks.
Large Language Models (LLM)Artificial Intelligence (AI)Generative AIMachine LearningReinforcement LearningNatural Language Processing (NLP)+2

Education

Indian Institute of Technology, Delhi

Master of Technology - MTech — Industrial Engineering

Jul 2023Jun 2025

Government College of Engineering Karad

Bachelor of Technology - BTech — Mechanical Engineering

Aug 2016Oct 2020

Stackforce found 100+ more professionals with Artificial Intelligence (ai) & Machine Learning

Explore similar profiles based on matching skills and experience