Abhay kumar

Co-Founder

Abu Dhabi, Abu Dhabi Emirate, United Arab Emirates12 yrs 5 mos experience

AI EnabledAI ML Practitioner

Key Highlights

Co-led pretraining of a 7B-parameter LLM.
Developed ZClip for improved LLM training stability.
Passionate about efficient LLM development.

Stackforce AI infers this person is a Senior Research Engineer specializing in AI/ML with a focus on Large Language Models.

Contact

Skills

Core Skills

Large Language Models (llm)Distributed TrainingDeep LearningTransformers

Other Skills

AlgorithmsApache SparkArtificial Neural NetworksBERT (Language Model)Big DataClassificationData ScienceData StructuresEclipseGPTGenAIGenerative Language ModelingGenerative Pre-TrainingHBaseInformation Extraction

About

I'm a Senior Research Engineer with 8+ years in data science and over 5 years focused on language modeling. I specialize in building and training Large Language Models (LLMs), with a deep interest in training stability, optimization, and scaling. Most recently, I co-led the pretraining of a 7B-parameter LLM on 15 trillion tokens. At BluOrion, I also developed ZClip—an adaptive gradient clipping algorithm that improves convergence stability and eliminates the need for manual batch skipping. Additionally, I’ve worked on initialization strategies and activation variance control techniques to stabilize and accelerate LLM training. My experience spans training models at various scales, distributed training frameworks (FSDP, DeepSpeed, PyTorch Lightning), and open-source contributions including miniLLaMA, GPT2-TensorFlow, and miniGPTF. I'm passionate about solving real-world challenges in large-scale model training and pushing the boundaries of efficient LLM development

Experience

12 yrs 5 mos

Total Experience

1 yr 11 mos

Average Tenure

9 mos

Current Experience

Technology innovation institute

Senior LLM Research Engineer

Sep 2025 – Present · 9 mos · Abu Dhabi, Abu Dhabi Emirate, United Arab Emirates · On-site

Bluorion limited

Senior LLM Research Engineer

Sep 2024 – Jul 2025 · 10 mos · Dubai, United Arab Emirates · On-site

▪ Designed and led the development of ZClip, an adaptive gradient clipping algorithm that improves stability and eliminates manual batch skipping during LLM training.
▪ Worked on initialization and variance control techniques to enhance LLM performance.
▪ Explored parameter-efficient Transformer designs using factorized embeddings and low-rank attention.
▪ Contributed to distributed LLM pre-training recipe design and built scalable training pipelines using FSDP, PyTorch Lightning, and DeepSpeed, applied in training 7B-parameter models.

Distributed TrainingGenerative Pre-TrainingLarge Language Models (LLM)PyTorch

Yellow.ai

2 roles

Research Scientist (NLP) 3

Jan 2022 – Sep 2024 · 2 yrs 8 mos

▪ Co-Author of Komodo LLM: A foundational large language model tailored for a specific language.
▪ Trained and deployed a variety of task-specific Language Models (LLMs) in production, significantly improving the system's ability to comprehend user messages. 🚀
▪ Engineered a series of embedding and generative zero-shot models with distributed pre-training, enabling rapid bot creation and reducing unidentified utterances by 30%. 🤖✨

Deep LearningLarge Language Models (LLM)ClassificationLanguage ModelingDistributed TrainingGenerative Pre-Training+1

Research Scientist (NLP) 2

Dec 2020 – Dec 2021 · 1 yr

▪ Developed and fine-tuned custom GPT and BERT models from scratch using in-house chat data for content generation and embeddings.
▪ Developed a custom probabilistic spelling correction model leveraging our in-house corpus, enhancing accuracy in fixing domain-specific words.
▪ Developed a Conversational Sentiment Model utilizing deep learning techniques.

TransformersPyTorchDeep LearningLanguage ModelingDistributed TrainingGenerative Pre-Training+3

Edge networks pvt. ltd.

2 roles

Senior Data Scientist (NLP)

Promoted

Jul 2018 – Dec 2020 · 2 yrs 5 mos · Bengaluru, Karnataka, India

▪ Working on a neural ranking model using deep learning with semi-supervised techniques.
▪ Built a Pairwise Job recommendation engine that recommends jobs to the candidate looking at the candidate's profile.
▪ Implemented Transformer AE, GPT-2, and a custom autoregressive transformer model from scratch in TF2.0, with distributed pre-training on a large corpus.

Data Scientist (NLP)

Sep 2017 – Jun 2018 · 9 mos · Bengaluru, Karnataka, India

▪ Did POC on context-aware information retrieval using semi-supervised learning for longer query text.
▪ Built multiple autoencoders for short and long sequences text documents using CNN and LSTM.

Scry analytics

Data Scientist

May 2016 – Sep 2017 · 1 yr 4 mos · Gurugram, Haryana, India

▪ Built an opinion Mining module using Deep Learning.
Used a sequential classification model using LSTM to extract opinions from documents.
Used Word2vec and Character-CNN for feature generation.
Used Text CNN for relation extraction.
▪ Developed a NER module to extract entities from text documents.
Built a sequential classification model to identify the entity among domain-specific documents.
Used Word2Vec and K-Means Clustering for feature generation.
Implemented Big-Data pipeline using PySpark for computation of Big-Data on Cluster of Computer.
Used Natural Language Processing in conjunction with the acquired knowledge domain to extract features from documents.
▪ Developed an extraction module to extract phrases from text documents.
Built a sequential classification model to extract phrase-based entity among the domain-specific
documents.
Used Word2Vec and K-Means Clustering for feature generation.
Algorithms used: Neural Network, LSTM, CNN, Word2Vec, Conditional Random Fields
Technologies or Technical tools used: TensorFlow, Core Java, Python, HBase, Apache Spark

Gauge data solutions

Data Analyst

May 2015 – May 2016 · 1 yr · Noida, Uttar Pradesh, India

▪ Built a sequential classification model to remove the junk part of the data.
▪ Used Producer-Consumer Design pattern in crawler to handle multi-threading.
▪ Developed a web crawler for crawling the legal documents from across the websites and used IP Switcher API.