O

Onkar Pandit, Ph.D.

AI Researcher

Abu Dhabi, Abu Dhabi Emirate, United Arab Emirates1 yr 10 mos experience
Most Likely To SwitchAI Enabled

Key Highlights

  • Expert in developing large language models for multiple languages.
  • Proven track record in enhancing AI model safety and evaluation.
  • Published research in top-tier AI conferences.
Stackforce AI infers this person is a Senior Applied Scientist specializing in NLP and AI for Energy and Weather domains.

Contact

Skills

Core Skills

Large Language Models (llms)Domain-specific AiMultilingual NlpNatural Language Processing (nlp)Machine Learning

Other Skills

FastAPIsuper-resolution modelsdata curationmodel deploymentseismic analysisfault detectionhorizon pickingmodel mergingevaluation benchmarksnemo-curatorRLHFDPOKTOevaluation frameworkfine-tuning datasets

About

Senior Applied Scientist (NLP) with a Ph.D. and 13+ years across research and industry. I develop and align large language models (LLMs) for Arabic, Hindi, and Kazakh; for specialized domains such as mathematical reasoning; and for the energy sector (Oil & Gas). My work spans the full AI lifecycle—data curation, model post‑training (RLHF, DPO, KTO), rigorous evaluation, and deployment. At Inception Institute of AI, I've led initiatives to enhance LLM safety, built evaluation frameworks for Arabic models, developed computer vision systems for weather forecasting super-resolution, and created domain-specific LLMs for Oil & Gas. I'm passionate about building scalable, impactful AI and am skilled in PyTorch, NeMo Curator, and the HuggingFace ecosystem. My research has been published in top-tier conferences including NAACL, ACL, and ICML. Let's connect if you're scaling applied LLMs in energy, climate, or multilingual settings—where rigor meets real‑world impact. Core Expertise: Large Language Models (LLMs) | RLHF & Model Alignment | Domain-Specific AI (Energy, Weather) | Multilingual NLP (Arabic, Hindi, Kazakh) | Transformers & Diffusion Models | PyTorch, Hugging Face, and NVIDIA NeMo.

Experience

1 yr 10 mos
Total Experience
11 mos
Average Tenure
1 yr 7 mos
Current Experience

Inception

2 roles

Senior Applied Scientist

Nov 2024Present · 1 yr 7 mos · On-site

  • Engineered a custom dataloader for NetCDF weather data, trained super-resolution models (diffusion/CNN), and deployed a FastAPI service with a UI for real-time forecasting.
  • Curated a domain-specific dataset for an Energy LLM using advanced models (GPT-4, Claude, etc.), developing expertise in seismic analysis, fault detection, and horizon picking.
  • Improved model capabilities by implementing model merging techniques and establishing a tailored evaluation benchmark for the energy domain.
  • Built a continuous pre-training data pipeline for a domain-specific Energy LLM using the nemo-curator library.
FastAPIsuper-resolution modelsdata curationmodel deploymentseismic analysisfault detection+3

Applied Scientist

Apr 2023Oct 2024 · 1 yr 6 mos · On-site

  • Aligned LLMs for safety and helpfulness by creating preference data and applying RLHF, DPO, and KTO techniques, followed by rigorous evaluation.
  • Developed an evaluation framework using LM Harness and LLM-as-a-judge to assess the performance of Arabic large language models.
  • Curated fine-tuning datasets, and enhanced model performance on benchmarks like GSM8K and IQ tests for LLM mathematical reasoning.
RLHFDPOKTOevaluation frameworkfine-tuning datasetsLarge Language Models (LLMs)+1

Inria

Doctoral Researcher

Dec 2017Sep 2021 · 3 yrs 9 mos · Greater Lille Metropolitan Area

  • Designed novel event representations by incorporating contextual n-words and character-based models to capture tense and aspect, resulting in substantial performance gains for temporal relation classification.
  • Conducted a comprehensive probe of transformer language models (BERT) to evaluate their inherent capacity for bridging inference, informing the design of improved mention representations.
  • Developed a principled approach to inject commonsense knowledge by integrating graph node embeddings from knowledge graphs with contextual text embeddings, creating knowledge-aware representations.
temporal relation classificationtransformer language modelscommonsense knowledgegraph node embeddingsNatural Language Processing (NLP)Machine Learning

Indian statistical instiute, kolkata

Project Scientist

Jul 2016Nov 2017 · 1 yr 4 mos · Greater Kolkata Area

  • Designed and developed a CNN based neural architecture for close-domain multiple choice
  • question-answering (QA).
  • Proposed approach produced competent results over standard QA dataset (published in ACL’18).
  • Developed neural network based architecture for lemmatization that achieved new state-of-the-art results over many languages: Catalan, Dutch, Hindi, Hungarian, Italian, Latin, Romanian and Spanish (published in ACL’17).
  • Proposed SVM based model to predict difficult words in a document from a readers’ eye gaze data (published in ICDAR’17).
CNNmultiple choice question-answeringlemmatizationSVMNatural Language Processing (NLP)Machine Learning

Oracle

2 roles

Senior Member Technical Staff

Mar 2016Jun 2016 · 3 mos · Bengaluru, Karnataka, India

  • Contributed to the product development of Oracle Big Data Preparation Cloud Service (BDP).
  • Worked as a part of the infrastructure provisioning team.

Member Technical Staff

Jul 2012Feb 2016 · 3 yrs 7 mos · Bengaluru, Karnataka, India

  • Contributed to the development of Oracle WebCenter Portal product, specifically, Portal Security, and services like Notifications, Notes, Relations.
  • Improved test automation framework to effectively catch bugs.

Education

University of Lille 1 Sciences and Technology

Doctor of Philosophy - PhD — Computer Science

Dec 2017Sep 2021

Indian Institute of Technology, Kanpur

Master's degree

Jan 2010Jan 2012

SGGS Institute of technology , Nanded

Bachelor of Technology (BTech)

Jan 2006Jan 2010

Stackforce found 100+ more professionals with Large Language Models (llms) & Domain-specific Ai

Explore similar profiles based on matching skills and experience