Shaheen Nabi

Co-Founder

Bengaluru, Karnataka, India0 mo experience
AI EnabledAI ML Practitioner

Key Highlights

  • Expert in reinforcement learning and post-training systems.
  • Developed open-source AI solutions for crop detection.
  • Founded an edtech platform for AI education.
Stackforce AI infers this person is a specialist in AI and EdTech with a focus on reinforcement learning and computer vision.

Contact

Skills

Core Skills

Reinforcement LearningPost-trainingComputer VisionEntrepreneurship

Other Skills

post-training systemsalignment optimizationreasoning optimizationopen-weight pipelinesPPOpolicy gradientsactor–criticSFTRLHFDPOreward modelingPython (Programming Language)YOLOv5NVIDIA A100 GPUsJenkins

About

I study how large language models perform multi-step reasoning and how training and post-training methods can improve their reliability, efficiency, and scalability. My work focuses on the post-training stack for LLMs — supervised fine-tuning (SFT), preference optimization, reinforcement learning methods such as RLVR, and inference-time compute strategies that improve reasoning without requiring larger models. I’m also interested in the interpretability of reasoning models: understanding the internal mechanisms that support multi-step reasoning and diagnosing failures such as shortcut reasoning, reward hacking, and unfaithful chain-of-thought. Currently building and open-sourcing implementations of reasoning-focused training pipelines and contributing to LLM infrastructure and post-training frameworks.

Experience

0 mo
Total Experience
--
Average Tenure
--
Current Experience

Self-employed

GitHub (Open Source)

Dec 2025Present · 5 mos · India

  • Designing and implementing reinforcement learning and post-training systems for large language models, with focus on alignment, reasoning optimization, and reproducible open-weight pipelines. Completed core reinforcement learning algorithm implementations and actively developing full post-training stacks.
  • Key work includes:
  • Implementation of reinforcement learning algorithms (PPO, policy gradients, actor–critic) for sequence models
  • Post-training and alignment pipelines (SFT, RLHF, DPO, reward modeling)
  • Reward model training, evaluation, and alignment optimization
  • End-to-end training, evaluation, and open-source release of language model systems
  • Efficient inference and serving using modern LLM infrastructure (vLLM, optimized decoding)
  • All systems, experiments, and training pipelines are developed from first principles and released publicly through GitHub.
reinforcement learningpost-training systemsalignment optimizationreasoning optimizationopen-weight pipelinesReinforcement Learning+1

Career break

Career transition

Apr 2025Dec 2025 · 8 mos · Bengaluru, Karnataka

  • Took a planned career break to prepare for research roles focused on reasoning, thinking models, and reinforcement learning in advanced AI systems.
  • This period is dedicated to building deep foundations in:
  • Sequential decision-making and reinforcement learning
  • Policy optimization, credit assignment, and exploration
  • in-coming:
  • Reasoning and planning as learned behaviors
  • RL-based post-training and alignment for language models
  • Research-grade implementations and open experimentation
  • The objective is to transition into full-time research and open-source development on reasoning-centric and alignment-focused AI systems.

Ineuron.ai

Data Science Intern

Jan 2025Mar 2025 · 2 mos · Bengaluru, Karnataka, India · Remote

  • Developed an object detection model using YOLOv5 to accurately identify and classify various crops/plants.
  • Annotated 25,000 images, later open-sourced on Hugging Face, contributing to the broader research community.
  • Designed and deployed a fully automated AI agents pipeline, streamlining post-detection research and insights for detected crops/plants.
  • Trained the model on NVIDIA A100 GPUs, achieving high performance and optimizing for real-world deployment.
  • Conducted extensive model testing and evaluation, ensuring robustness and accuracy in diverse agricultural environments.
  • Deployed the solution using Jenkins, AWS ECR, and EC2, leveraging a scalable infrastructure for real-time inference.
  • Integrated an SMTP service to enable automated email delivery of summarized reports, allowing users to receive personalized 1 minute automated research report directly in their inbox upon entering their email on the UI.
Python (Programming Language)Computer Vision

Lasso pacific pvt ltd

Founder

Jan 2022Dec 2022 · 11 mos · Anantnag, Jammu & Kashmir, India · On-site

  • Mission was to democratize AI and computer vision education in autonomous vehicles, making it accessible and affordable, especially for students in rural areas worldwide.
  • Designed and launched an AI-driven edtech platform, providing hands-on training in AI and computer vision with a focus on real-world applications.
  • Developed structured courses on AI for autonomous systems, enabling students to gain practical experience in self-driving technology.
  • Acquired early student clients and built an initial user base, validating demand for affordable AI education.
  • Managed curriculum development, partnerships, and community outreach to expand educational impact globally.
  • Faced and navigated challenges in funding, team scaling, and balancing startup growth with personal education.
  • Ultimately closed the venture but gained deep expertise in entrepreneurship, product development, and the business of AI education.
  • Attracted over 2 million annual visitors organically, driven by providing high-value career roadmaps and resources.
Python (Programming Language)Entrepreneurship

Education

Indira Gandhi National Open University

Bachelor of Arts - BA

Jan 2025Aug 2025

Ineuron.ai

1 year course — Full Stack Data Science

Oct 2021Oct 2022

Jammu and Kashmir Board of School Education (JKBOSE)

High School Diploma — Mathematics and Computer Science

Apr 2021Oct 2023

Stackforce found 100+ more professionals with Reinforcement Learning & Post-training

Explore similar profiles based on matching skills and experience