Sankhyesh Singh Thakur

CTO

Bengaluru, Karnataka, India3 yrs 2 mos experience
AI EnabledAI ML Practitioner

Key Highlights

  • Led development of reward models achieving 3x faster training convergence.
  • Implemented multi-agent orchestration reducing development cycle time by 60%.
  • Engineered cloud-native generative AI solutions for financial crime analytics.
Stackforce AI infers this person is a skilled AI Engineer with expertise in Machine Learning and Generative AI across Healthcare and Fintech sectors.

Contact

Skills

Core Skills

Reward ModelingRlhfAgentic Ai SystemsFoundation ModelsModel OptimizationDistributed TrainingMachine LearningData Pipeline ArchitectureGenerative AiCloud ComputingData ScienceAndroid Development

Other Skills

Python (Programming Language)Reward Model TrainingLangGraphLangChainOpen-source modelsLoRA4-bit NF4 quantizationPyTorch DDPGradient checkpointingPythonPySparkApache SparkTensorFlowPyTorchscikit-learn

About

Currently at Samsung Research leading development of reward models and autonomous agentic AI systems for the Gauss Foundation Model, achieving 3x faster training convergence and 75% memory reduction through advanced techniques. TECHNICAL EXPERTISE • Reward Modeling & RLHF for Foundation Models • Agentic AI Systems (LangGraph, LangChain) • Distributed Training (2x NVIDIA A100 GPUs, PyTorch DDP) • Model Optimization (Flash Attention 2, LoRA, 4-bit Quantization) • Production ML Infrastructure (AWS, Linux On-Premise) FOCUS AREAS • Foundation Model Alignment | Multi-Agent Orchestration | Production ML Infrastructure | SDLC Automation EDUCATION • MTech in Computer Science (AI Specialization) | IIIT Lucknow Open to connecting with AI researchers, ML engineers 📧 sankhyesh3@gmail.com

Experience

3 yrs 2 mos
Total Experience
1 yr
Average Tenure
1 yr
Current Experience

Samsung r&d institute india - bangalore

Lead AI Engineer

Jun 2025Present · 1 yr · Bengaluru, Karnataka, India · Hybrid

  • Leading research on Reward Modeling and RLHF systems for Samsung Gauss Foundation Model, designing custom RewardTrainer architectures with Flash Attention 2 for Qwen3-4B based reward models.
  • KEY CONTRIBUTIONS:
  • Autonomous Agentic AI System
  • Developed POC using LangGraph and LangChain to automate complete SDLC workflows including requirements analysis, code generation, testing, and deployment
  • Implemented multi-agent orchestration reducing development cycle time by 60%
  • LLM-as-Judge Evaluation Framework
  • Built evaluation framework using open-source models achieving 94% correlation with expert human evaluators across 10K+ preference pairs
  • Enabled automated quality assessment at scale for Foundation Model training
  • Advanced Training Systems
  • Implemented ensemble-based adaptive margin training system following "Secrets of RLHF Part II" methodology
  • Incorporated preference strength measurement with ranking-based label flipping for bottom 10% samples
  • Distributed Training Infrastructure
  • Built distributed training infrastructure on 2x NVIDIA A100 GPUs (80GB) with PyTorch DDP
  • Implemented LoRA (rank-64), 4-bit NF4 quantization, and gradient checkpointing
  • Achieved 75% model memory footprint reduction while maintaining alignment performance
Python (Programming Language)Reward Model TrainingReward ModelingRLHF

Zs

Advanced Data Science Associate

Sep 2024Jun 2025 · 9 mos · Pune, Maharashtra, India · Hybrid

  • Delivered enterprise-scale predictive analytics solutions that drove strategic healthcare decisions through advanced machine learning and AI technologies.
  • Key Achievements:
  • Advanced Analytics Development: Built and deployed comprehensive predictive models using Apache Spark, PySpark, and Python, integrating deep learning frameworks (TensorFlow, PyTorch) with traditional machine learning (scikit-learn) to enable accurate forecasting and real-time inference for healthcare clients.
  • Data Pipeline Architecture: Designed and implemented end-to-end scalable data processing systems that aggregate and transform large-scale Veeva Pulse and claims data. These pipelines automated model training, tuning, and deployment workflows while ensuring reliable pre- and post-quarter data extrapolation for critical business insights.
  • Machine Learning Infrastructure: Developed robust data preprocessing frameworks featuring automated train-test splits, intelligent missing value imputation, and advanced feature encoding techniques. These innovations significantly improved predictive model accuracy for patient classification and segmentation initiatives.
  • Generative AI Innovation: Contributed to cutting-edge generative AI solutions using FastAPI, OpenAI APIs, and Hugging Face Transformers, successfully transforming complex clinical narratives into clear, actionable business intelligence for healthcare stakeholders.
  • Technical Leadership: Collaborated effectively with cross-functional teams in a fast-paced environment, continuously expanding expertise in emerging AI technologies while delivering high-impact solutions that met evolving business requirements.
  • Technologies: Python, PySpark, Apache Spark, TensorFlow, PyTorch, scikit-learn, FastAPI, OpenAI API, Hugging Face Transformers, Veeva Pulse, SQL
PythonPySparkApache SparkTensorFlowPyTorchscikit-learn+7

Nice

Full-stack Developer

Apr 2023Sep 2024 · 1 yr 5 mos · Pune, Maharashtra, India · Hybrid

  • Engineered a cloud-native generative AI solution in Python on AWS for financial crime analytics, automating text summarization to enhance alert evaluation in fraud detection and AML investigations.
  • Implemented scalable AWS Lambda handlers, custom Lambda layers, and AWS Bedrock integrations—fully automated via Terraform—to ensure reliable real-time transaction monitoring and risk management.
  • Fine-tuned AI models using TensorFlow, PyTorch, and Hugging Face Transformers to generate actionable narratives, significantly improving decision-making efficiency and compliance reporting.
  • Developed secure RESTful APIs using AWS API Gateway to integrate financial data streams, coupled with robust GitLab and Jenkins CI/CD pipelines for streamlined deployment and regulatory compliance.
PythonAWSTerraformTensorFlowPyTorchHugging Face Transformers+3

Amd

Full-stack Developer

Jul 2022Mar 2023 · 8 mos · Hyderabad, Telangana, India · On-site

  • Improved driver testing processes by building machine learning models that prioritize test cases and automatically classify failures, reducing overall testing time by about 30%.
  • Used scikit-learn and PySpark to create a K-means clustering model for grouping test cases, and applied Random Forest classification to pinpoint issues quickly.
  • Shifted descriptive analysis from an LSTM-based model to a BERT-based approach for generating clearer, more accurate issue descriptions.
  • Applied Low-Rank Adaptation (LoRA) techniques with PyTorch to fine-tune model performance and developed a CNN-based solution for real-time audio anomaly detection.
  • Combined full-stack methods with data science by setting up end-to-end pipelines for data collection, preprocessing, and model integration.
scikit-learnPySparkTensorFlowBERTLoRAMachine Learning+1

The entrepreneurship network

Android Developer Associate

Aug 2021Oct 2021 · 2 mos · India

  • Participated in designing and Developing App screens and their workflow using Activity and Fragments.
  • Written application logic using Android SDK and Android Studio.
  • Developed rich UI for the applications modules using ListView, Scroll View, View Pager Navigation Drawer and
  • Developed Custom View.
Android SDKAndroid StudioAlgorithmsAndroid Development

Education

Indian Institute of Information Technology Lucknow

Master of Technology - MTech — Computer Science

Sep 2021May 2023

Atal Bihari Vajpayee Govt Institute of Engineering & Technology Pragatinagar, Shimla (H.P) - 171202

Bachelor of Technology - BTech — Computer Science

Aug 2015Jun 2019

Stackforce found 100+ more professionals with Reward Modeling & Rlhf

Explore similar profiles based on matching skills and experience