Sarthak Mishra

AI Researcher

Bengaluru, Karnataka, India7 yrs experience

Highly StableAI Enabled

Key Highlights

Led a team of data scientists on Generative AI projects.
Achieved 30-40% productivity improvement in AI solutions.
Developed innovative applications using Large Language Models.

Stackforce AI infers this person is a SaaS-focused Data Scientist with expertise in Generative AI and Machine Learning.

Contact

sarthak405@gmail.com LinkedIn

Skills

Core Skills

Generative AiLarge Language Models (llm)Predictive ModelingNatural Language Processing (nlp)Machine LearningData Engineering

Other Skills

Azure AI FoundryPrompt EngineeringFine TuningAWSMLOpsAI GovernanceGPT-4oMultimodal Document UnderstandingRAG PipelinesLangChainLangGraphAutoGenStrandsSQLHyperparameter Tuning

About

As a Senior Data Scientist at IBM, I work on cutting-edge generative AI projects, using large language models, prompt engineering, and fine-tuning techniques to create novel applications across diverse domains. I am passionate about harnessing the power of AI to solve complex problems and generate value for users and businesses. I have a strong background in data science, machine learning, natural language processing, and computer vision, with a Bachelor of Technology degree in Computer Science from IIT Delhi.

Experience

7 yrs

Total Experience

7 yrs

Average Tenure

7 yrs

Current Experience

Ibm

3 roles

Advisory Data Scientist

Promoted

Apr 2024 – Present · 2 yrs 2 mos

Led and mentored a team of 6–8 data scientists and ML engineers, owning sprint planning, technical direction, and delivery for multiple Generative AI initiatives.
Defined GenAI solution architecture and delivery roadmap, aligning agentic system design with business KPIs, security constraints, and client expectations.
Spearheaded a commercialized SAP Generative AI product, automating Technical Specification generation using GPT-4o, multimodal document understanding, and custom RAG pipelines, resulting in 30–40% productivity improvement.
Designed and delivered multi-agent GenAI systems using LangChain, LangGraph, AutoGen, and Strands, deployed on AWS and Azure, enabling deterministic orchestration via tool-based agents.
Acted as technical owner and escalation point for LLM architecture, RAG quality, agent evaluation, and prompt governance, reducing rework and iteration cycles.
Guided team members on LLM fine-tuning (PEFT/LoRA), quantization (4-bit), and RoPE scaling, achieving lower inference costs and improved latency in production.
Led development and deployment of LLM inference endpoints on AWS EC2, integrating Amazon Bedrock, Azure OpenAI (GPT-4o), and IBM watsonx.ai.
Collaborated with product owners, SMEs, and architects to translate ambiguous requirements into production-ready AI system designs.
Conducted architecture and code reviews, mentoring team members on agentic design patterns, MLOps readiness, and enterprise AI governance.
Defined evaluation metrics and acceptance criteria for GenAI outputs, balancing accuracy, determinism, and usability for enterprise adoption.

Large Language Models (LLM)Azure AI FoundryGenerative AIPrompt EngineeringFine TuningAWS+2

Senior Data Scientist

Promoted

Jul 2022 – Apr 2024 · 1 yr 9 mos

Generative AI Initiative:
Fine-tune Large Language Models such as StarCoder, Llama2, and Code Llama, across diverse use cases using PEFT. Notably, the instruction-tuned StarCoder model outperformed the base model on the Huggingface Leaderboard.
Spearheaded the development of a Langchain-based application, harnessing VectorDB's capabilities and a fine tuned StarCoder model to seamlessly translate natural language into SQL queries and visualize results through a Streamlit user interface.
Implemented prompt engineering and fine-tuned few-shot techniques to enhance the performance of Large Language Models (LLMs) effectively.
Skillfully fine-tuned hyperparameters for LLMs, tailoring them to the specific requirements of each use case to deliver optimal results.
Led the development of an inference endpoint for various Large Language Models (LLMs) and achieved a successful deployment on an AWS EC2 instance.
Demonstrated a dynamic approach by integrating Azure GPT-3 and IBM WatsonX inferencing endpoints into Langchain applications, enhancing their capabilities.
Implemented innovative RoPE scaling techniques to expand the context length of Llama2 and Code Llama for more effective inferencing.
Successfully deployed 4-bit quantized versions of the models for inferencing to reduce operating costs.
CollabAI, IBM Research:
Developed and trained ARX, ARIMAX & Prophet models for forecasting time series sales inventory data.
Trained and hypertuned AR, Naive, LSTM, RF, XGBoost models for the inventory forecasts.
Developed an ensemble machine learning model for sales inventory forecasts with a MAPE of 0.19.
Leveraged CPLEX, qpsolver & Pulp to programmatically solve linear & quadratic optimization problems

Fine TuningPrompt EngineeringLarge Language Models (LLM)AWSLangchainSQL+2

Cognitive Data Scientist

Jun 2019 – Jul 2022 · 3 yrs 1 mo

Acoustic Analysis:
Developed gmm, dcase & YoloV5 machine learning models for welding audio files & trained a classifier.
Developed and trained a GoogLeNet classifier in IBM Maximo Visual Inspection suite.
The classifier had an accuracy of ~90%, and was able to process an audio in a time 10% of file length.
Designed and developed a Python Flask app to classify an audio file against a ML classifier of choice.
Sandvik Optimine Analytics:
Developed script for data validation of IoT streaming data to generate flags for mining equipment.
Used IBM Watson Studio to deploy, monitor and improve upon various data science jobs.
Published an asset to IBM Lighthouse Repository for automating replication of database objects.
Pearson UK:
Developed an application to extract and transform data from various SQL & NoSQL databases.
Developed a Python application to securely PGP encrypted data over a SFTP server.
Created Docker images of various applications and deployed them to a Kubernetes cluster.
Leveraged various parallel computing libraries for Python such as Dask, Vaex and Modin with Ray.
Sensor Detection AAHK:
Analyzed the intensity of vibrations of sensors to detect the quality of pillars to estimate the degradation.
Developed a K-means classification model on sensor data for analyzing the quality.

Natural Language Processing (NLP)SQLMachine LearningPythonDockerKubernetes

New york university

Summer Research Intern at Department of Computer Science

May 2018 – Jul 2018 · 2 mos · New York

Internet of Things Behavioral Scanner:
Monitored traffic sent by various IoT devices of different domains on a network in a controlled home environment
Captured all traffic on the simulated home network using Wireshark, tshark, tcpdump and model devices behavior
Developed an interface to visualize the data for end users using Django and Elasticsearch

Genesis media llc

Data Science Intern

May 2017 – Jul 2017 · 2 mos · New York, New York

Advertisment Performance vs Uniqueness
Extracted data from the company's MySQL and Elasticsearch database for the last 6 months. (~5 billion entries)
Determined Uniqueness Coefficient of a particular webpage from the database by using Google's Custom Search API
Analyzed correlation between a page's uniqueness and effectiveness of an advertisement run across various domains