Sankhyesh Singh Thakur

CTO

Bengaluru, Karnataka, India3 yrs 2 mos experience

AI EnabledAI ML Practitioner

Key Highlights

Led development of reward models achieving 3x faster training convergence.
Implemented multi-agent orchestration reducing development cycle time by 60%.
Engineered cloud-native generative AI solutions for financial crime analytics.

Stackforce AI infers this person is a skilled AI Engineer with expertise in Machine Learning and Generative AI across Healthcare and Fintech sectors.

Contact

Skills

Core Skills

Reward ModelingRlhfAgentic Ai SystemsFoundation ModelsModel OptimizationDistributed TrainingMachine LearningData Pipeline ArchitectureGenerative AiCloud ComputingData ScienceAndroid Development

Other Skills

Python (Programming Language)Reward Model TrainingLangGraphLangChainOpen-source modelsLoRA4-bit NF4 quantizationPyTorch DDPGradient checkpointingPythonPySparkApache SparkTensorFlowPyTorchscikit-learn

About

Currently at Samsung Research leading development of reward models and autonomous agentic AI systems for the Gauss Foundation Model, achieving 3x faster training convergence and 75% memory reduction through advanced techniques. TECHNICAL EXPERTISE • Reward Modeling & RLHF for Foundation Models • Agentic AI Systems (LangGraph, LangChain) • Distributed Training (2x NVIDIA A100 GPUs, PyTorch DDP) • Model Optimization (Flash Attention 2, LoRA, 4-bit Quantization) • Production ML Infrastructure (AWS, Linux On-Premise) FOCUS AREAS • Foundation Model Alignment | Multi-Agent Orchestration | Production ML Infrastructure | SDLC Automation EDUCATION • MTech in Computer Science (AI Specialization) | IIIT Lucknow Open to connecting with AI researchers, ML engineers 📧 sankhyesh3@gmail.com

Experience

3 yrs 2 mos

Total Experience

1 yr

Average Tenure

1 yr

Current Experience

Samsung r&d institute india - bangalore

Lead AI Engineer

Jun 2025 – Present · 1 yr · Bengaluru, Karnataka, India · Hybrid

Leading research on Reward Modeling and RLHF systems for Samsung Gauss Foundation Model, designing custom RewardTrainer architectures with Flash Attention 2 for Qwen3-4B based reward models.
KEY CONTRIBUTIONS:
Autonomous Agentic AI System
Developed POC using LangGraph and LangChain to automate complete SDLC workflows including requirements analysis, code generation, testing, and deployment
Implemented multi-agent orchestration reducing development cycle time by 60%
LLM-as-Judge Evaluation Framework
Built evaluation framework using open-source models achieving 94% correlation with expert human evaluators across 10K+ preference pairs
Enabled automated quality assessment at scale for Foundation Model training
Advanced Training Systems
Implemented ensemble-based adaptive margin training system following "Secrets of RLHF Part II" methodology
Incorporated preference strength measurement with ranking-based label flipping for bottom 10% samples
Distributed Training Infrastructure
Built distributed training infrastructure on 2x NVIDIA A100 GPUs (80GB) with PyTorch DDP
Implemented LoRA (rank-64), 4-bit NF4 quantization, and gradient checkpointing
Achieved 75% model memory footprint reduction while maintaining alignment performance

Python (Programming Language)Reward Model TrainingReward ModelingRLHF

Zs

Advanced Data Science Associate

Sep 2024 – Jun 2025 · 9 mos · Pune, Maharashtra, India · Hybrid

Delivered enterprise-scale predictive analytics solutions that drove strategic healthcare decisions through advanced machine learning and AI technologies.
Key Achievements:
Advanced Analytics Development: Built and deployed comprehensive predictive models using Apache Spark, PySpark, and Python, integrating deep learning frameworks (TensorFlow, PyTorch) with traditional machine learning (scikit-learn) to enable accurate forecasting and real-time inference for healthcare clients.
Data Pipeline Architecture: Designed and implemented end-to-end scalable data processing systems that aggregate and transform large-scale Veeva Pulse and claims data. These pipelines automated model training, tuning, and deployment workflows while ensuring reliable pre- and post-quarter data extrapolation for critical business insights.
Machine Learning Infrastructure: Developed robust data preprocessing frameworks featuring automated train-test splits, intelligent missing value imputation, and advanced feature encoding techniques. These innovations significantly improved predictive model accuracy for patient classification and segmentation initiatives.
Generative AI Innovation: Contributed to cutting-edge generative AI solutions using FastAPI, OpenAI APIs, and Hugging Face Transformers, successfully transforming complex clinical narratives into clear, actionable business intelligence for healthcare stakeholders.
Technical Leadership: Collaborated effectively with cross-functional teams in a fast-paced environment, continuously expanding expertise in emerging AI technologies while delivering high-impact solutions that met evolving business requirements.
Technologies: Python, PySpark, Apache Spark, TensorFlow, PyTorch, scikit-learn, FastAPI, OpenAI API, Hugging Face Transformers, Veeva Pulse, SQL

PythonPySparkApache SparkTensorFlowPyTorchscikit-learn+7

Nice

Full-stack Developer

Apr 2023 – Sep 2024 · 1 yr 5 mos · Pune, Maharashtra, India · Hybrid

Engineered a cloud-native generative AI solution in Python on AWS for financial crime analytics, automating text summarization to enhance alert evaluation in fraud detection and AML investigations.
Implemented scalable AWS Lambda handlers, custom Lambda layers, and AWS Bedrock integrations—fully automated via Terraform—to ensure reliable real-time transaction monitoring and risk management.
Fine-tuned AI models using TensorFlow, PyTorch, and Hugging Face Transformers to generate actionable narratives, significantly improving decision-making efficiency and compliance reporting.
Developed secure RESTful APIs using AWS API Gateway to integrate financial data streams, coupled with robust GitLab and Jenkins CI/CD pipelines for streamlined deployment and regulatory compliance.

PythonAWSTerraformTensorFlowPyTorchHugging Face Transformers+3

Amd

Full-stack Developer

Jul 2022 – Mar 2023 · 8 mos · Hyderabad, Telangana, India · On-site

Improved driver testing processes by building machine learning models that prioritize test cases and automatically classify failures, reducing overall testing time by about 30%.
Used scikit-learn and PySpark to create a K-means clustering model for grouping test cases, and applied Random Forest classification to pinpoint issues quickly.
Shifted descriptive analysis from an LSTM-based model to a BERT-based approach for generating clearer, more accurate issue descriptions.
Applied Low-Rank Adaptation (LoRA) techniques with PyTorch to fine-tune model performance and developed a CNN-based solution for real-time audio anomaly detection.
Combined full-stack methods with data science by setting up end-to-end pipelines for data collection, preprocessing, and model integration.

scikit-learnPySparkTensorFlowBERTLoRAMachine Learning+1

The entrepreneurship network

Android Developer Associate

Aug 2021 – Oct 2021 · 2 mos · India

Participated in designing and Developing App screens and their workflow using Activity and Fragments.
Written application logic using Android SDK and Android Studio.
Developed rich UI for the applications modules using ListView, Scroll View, View Pager Navigation Drawer and
Developed Custom View.

Android SDKAndroid StudioAlgorithmsAndroid Development

Education

Indian Institute of Information Technology Lucknow

Master of Technology - MTech — Computer Science

Sep 2021 – May 2023

Atal Bihari Vajpayee Govt Institute of Engineering & Technology Pragatinagar, Shimla (H.P) - 171202

Bachelor of Technology - BTech — Computer Science

Aug 2015 – Jun 2019

Stackforce found 100+ more professionals with Reward Modeling & Rlhf

Explore similar profiles based on matching skills and experience

Bhavuk Chawla

Lead Software Engineer

at EPAM Systems

Bengaluru, India8 yrs 7 mos exp

JavaAWSSpring Boot

Maninder Singh

Data Scientist

at Fidelity Investments

Bengaluru, India4 yrs 11 mos exp

Data ScienceMachine LearningTeam Management

Yashwant Sawant

Principal AI Platform Engineer

at Oracle

Hyderabad, India11 yrs 9 mos exp

Technical Staff ManagementMachine LearningTechnical UnderstandingModel-View-Controller (MVC)

Shreyas Pandey

Senior AI Engineer, Azure Copilot

at Microsoft

Bengaluru, India7 yrs 11 mos exp

Multi-agent orchestration for AI systemsAI model augmentation through SFTCloud-Native Architecture

Ankit Shah

Software Engineer (GCP Compute & AI Infra)

at Google

Hyderabad, India9 yrs 6 mos exp

Distributed SystemsSoftware ArchitectureSoftware Development

Anurag Dekhane

Computer Scientist 2

at Adobe

Bengaluru, India11 yrs 4 mos exp

Artificial Intelligence (AI)Robotics Process Automation (RPA)Intelligent Process Automation (IPA)

Shubham P Lahoti

Member of Technical Staff

at Oracle

Bengaluru, India5 yrs 8 mos exp

MicroservicesSoftware ArchitectureREST APIsBack-End Web DevelopmentSoftware Development

Chirag Goyal

Data Scientist - I

at Meesho

Bengaluru, India2 yrs 8 mos exp

Reinforcement LearningRecommender SystemsNatural Language Processing (NLP)OptimizationGenerative AI