Shaheen Nabi

Co-Founder

Bengaluru, Karnataka, India0 mo experience

AI EnabledAI ML Practitioner

Key Highlights

Expert in reinforcement learning and post-training systems.
Developed open-source AI solutions for crop detection.
Founded an edtech platform for AI education.

Stackforce AI infers this person is a specialist in AI and EdTech with a focus on reinforcement learning and computer vision.

Contact

Skills

Core Skills

Reinforcement LearningPost-trainingComputer VisionEntrepreneurship

Other Skills

post-training systemsalignment optimizationreasoning optimizationopen-weight pipelinesPPOpolicy gradientsactor–criticSFTRLHFDPOreward modelingPython (Programming Language)YOLOv5NVIDIA A100 GPUsJenkins

About

I study how large language models perform multi-step reasoning and how training and post-training methods can improve their reliability, efficiency, and scalability. My work focuses on the post-training stack for LLMs — supervised fine-tuning (SFT), preference optimization, reinforcement learning methods such as RLVR, and inference-time compute strategies that improve reasoning without requiring larger models. I’m also interested in the interpretability of reasoning models: understanding the internal mechanisms that support multi-step reasoning and diagnosing failures such as shortcut reasoning, reward hacking, and unfaithful chain-of-thought. Currently building and open-sourcing implementations of reasoning-focused training pipelines and contributing to LLM infrastructure and post-training frameworks.

Experience

0 mo

Total Experience

Average Tenure

Current Experience

Self-employed

GitHub (Open Source)

Dec 2025 – Present · 5 mos · India

Designing and implementing reinforcement learning and post-training systems for large language models, with focus on alignment, reasoning optimization, and reproducible open-weight pipelines. Completed core reinforcement learning algorithm implementations and actively developing full post-training stacks.
Key work includes:
Implementation of reinforcement learning algorithms (PPO, policy gradients, actor–critic) for sequence models
Post-training and alignment pipelines (SFT, RLHF, DPO, reward modeling)
Reward model training, evaluation, and alignment optimization
End-to-end training, evaluation, and open-source release of language model systems
Efficient inference and serving using modern LLM infrastructure (vLLM, optimized decoding)
All systems, experiments, and training pipelines are developed from first principles and released publicly through GitHub.

reinforcement learningpost-training systemsalignment optimizationreasoning optimizationopen-weight pipelinesReinforcement Learning+1

Career break

Career transition

Apr 2025 – Dec 2025 · 8 mos · Bengaluru, Karnataka

Took a planned career break to prepare for research roles focused on reasoning, thinking models, and reinforcement learning in advanced AI systems.
This period is dedicated to building deep foundations in:
Sequential decision-making and reinforcement learning
Policy optimization, credit assignment, and exploration
in-coming:
Reasoning and planning as learned behaviors
RL-based post-training and alignment for language models
Research-grade implementations and open experimentation
The objective is to transition into full-time research and open-source development on reasoning-centric and alignment-focused AI systems.

Ineuron.ai

Data Science Intern

Jan 2025 – Mar 2025 · 2 mos · Bengaluru, Karnataka, India · Remote

Developed an object detection model using YOLOv5 to accurately identify and classify various crops/plants.
Annotated 25,000 images, later open-sourced on Hugging Face, contributing to the broader research community.
Designed and deployed a fully automated AI agents pipeline, streamlining post-detection research and insights for detected crops/plants.
Trained the model on NVIDIA A100 GPUs, achieving high performance and optimizing for real-world deployment.
Conducted extensive model testing and evaluation, ensuring robustness and accuracy in diverse agricultural environments.
Deployed the solution using Jenkins, AWS ECR, and EC2, leveraging a scalable infrastructure for real-time inference.
Integrated an SMTP service to enable automated email delivery of summarized reports, allowing users to receive personalized 1 minute automated research report directly in their inbox upon entering their email on the UI.

Python (Programming Language)Computer Vision

Lasso pacific pvt ltd

Founder

Jan 2022 – Dec 2022 · 11 mos · Anantnag, Jammu & Kashmir, India · On-site

Mission was to democratize AI and computer vision education in autonomous vehicles, making it accessible and affordable, especially for students in rural areas worldwide.
Designed and launched an AI-driven edtech platform, providing hands-on training in AI and computer vision with a focus on real-world applications.
Developed structured courses on AI for autonomous systems, enabling students to gain practical experience in self-driving technology.
Acquired early student clients and built an initial user base, validating demand for affordable AI education.
Managed curriculum development, partnerships, and community outreach to expand educational impact globally.
Faced and navigated challenges in funding, team scaling, and balancing startup growth with personal education.
Ultimately closed the venture but gained deep expertise in entrepreneurship, product development, and the business of AI education.
Attracted over 2 million annual visitors organically, driven by providing high-value career roadmaps and resources.

Python (Programming Language)Entrepreneurship