S

Soham N.

AI Researcher

Washington, DC, United States1 yr 2 mos experience
Most Likely To SwitchAI ML Practitioner

Key Highlights

  • Built an AI pipeline reducing data-entry costs by 50%.
  • Developed an automated reporting platform doubling user engagement.
  • Published NLP research on Transformer architectures.
Stackforce AI infers this person is a Data Scientist with expertise in AI and Machine Learning for Healthcare and Retail sectors.

Contact

Skills

Core Skills

Machine LearningPython (programming Language)Large Language Models (llm)Data ScienceNatural Language Processing (nlp)Data Analysis

Other Skills

Health DataAgile MethodologiesLangChainJSONOptical Character Recognition (OCR)Healthcare AnalyticsExploratory Data AnalysisImage ProcessingRAGMicrosoft AzureSQLNoSQLAnthropic ClaudeCI/CDHypothesis Testing

About

Hi, I am Soham! A current MS student at University of Washington, Seattle. I am studying data science and I have had the opportunity to work at amazing companies like Johnson and Johnson and Castor as an intern for the last ~1 year. I will be graduating in June'25, and am actively looking for full-time job opportunities as a Data Scientist/Data Engineer/Data Analyst/AI engineer/ML Engineer and Business Analyst. I build end‑to‑end data products—efficient pipelines, RAG systems, and ML models—that turn messy data into business value. Recent impact: - At Castor, I built an AI pipeline that extracts structured data from scanned and handwritten EHRs, cutting data‑entry costs 50 %. - At Johnson & Johnson, I led the development of an automated reporting platform (Python, Snowflake, Tableau) that is set to double user engagement. - At Metro AG, I delivered a SARIMA forecasting model that boosted sales‑prediction accuracy 20 %. I’m also a published NLP researcher (2 peer‑reviewed papers on Transformer architectures) and fluent in Python, SQL, Azure, TensorFlow, and modern ML ops. I am motivated by the potential of data and technology to improve lives and create positive impact. A quote that I live by is: "Bridging the gap between THINKING that I can do it and KNOWING that I can do it"

Experience

1 yr 2 mos
Total Experience
7 mos
Average Tenure
11 mos
Current Experience

Guidehouse

AI Engineer

Jul 2025Present · 11 mos

Castor

AI Engineer Intern

Jun 2024Jun 2025 · 1 yr · Remote

  • Developed a HIPAA compliant data extraction and transformation pipeline in Python to extract and auto-fill medical data, saving 50% in expenditure costs by reducing manual data entry time
  • Built a RAG pipeline in Haystack from scratch, resulting in 30% faster document retrieval for source document verification
  • Set up a CI/CD pipeline using Azure DevOps for automated testing, building, and deployment of Docker containers to Azure, reducing deployment time by 70%
Health DataAgile MethodologiesLangChainLarge Language Models (LLM)JSONOptical Character Recognition (OCR)+10

University of washington

LLM Research Assistant

Feb 2024Jul 2024 · 5 mos · Seattle, Washington, United States · On-site

  • Implemented a zero-shot dense retrieval framework to simulate candidate selection, analyzing 500+ resumes and job descriptions across 9 occupations
  • Conducted hypothesis testing on three fine-tuned Mistral-7B LLMs for resume screening, defining metrics that revealed a bias in 85% of candidate selections
  • Collaborating with data scientists to perform K-Means clustering and KDE analysis to improve the prediction accuracy of Bayesian classifiers
Large Language Models (LLM)Python (Programming Language)Exploratory Data AnalysisHypothesis TestingAnthropic ClaudeEDA+1

Metro global solution center in

Data Science Intern

Mar 2023Jun 2023 · 3 mos · Remote

  • Used scikit-learn, TensorFlow and image segmentation to develop an ID authentication tool, improving system throughput by 40% and reducing compute resource load for real-time validation
  • Streamlined pipeline for 10M+ rows of transactional retail data by using SQL to extract data without premium connectors, enabling direct PowerApps integration and saving $120K annually
  • Developed a sales prediction system using a SARIMA forecasting model and deployed it via FastAPI, improving retail sales forecasting accuracy by 20%
Data AnalysisData VisualizationAnalytical SkillsDeep LearningExtract, Transform, Load (ETL)Optical Character Recognition (OCR)+8

Pune institute of computer technology

Research Assistant

Jun 2022Dec 2022 · 6 mos · Pune, Maharashtra, India · On-site

  • Achieved state-of-the-art accuracies in image captioning for static image data in Marathi, Gujarati, Tamil and Kannada by devising attention-based merge architecture models; Research work published at the International Conference on Computer Graphics and Image Processing.
  • Link:- https://www.joig.net/show-84-353-1.html
  • Led the research project on text summarization in low-resource languages using Transformer Architecture like BERT and BART. Invited to publish the resulting findings at the Indian Statistical Institute, Kolkata.
  • Link:- https://www.semanticscholar.org/paper/Abstractive-Text-Summarization-for-Hindi-Language-Agarwal-Naik/6a21a8ec4252cf79d62d30c77ac1cfc51cb96783
  • Built custom neural networks for multi-label, multi-class classification of fundus images. Used Generative Adversarial Networks to solve image data integrity issues.
Data AnalysisData VisualizationNatural Language Processing (NLP)Large Language Models (LLM)Deep LearningLong Short-term Memory (LSTM)+11

Atomic loops

Blockchain Intern

Jan 2022Feb 2022 · 1 mo · Pune, Maharashtra, India

  • Wrote smart contracts in Solidity using Remix and deployed on the Ethereum blockchain.

Omdena

Machine Learning Engineer Intern

Aug 2021Nov 2021 · 3 mos

  • 1. Created a product that automates the process of quantifying the economic impact of a particular indigenous endangered species, driving growth in their conservation efforts.
  • 2. Led the data collection and parsing team, responsible for a decrease of manual efforts by about 30%, using Beautiful Soup and Selenium.
Data AnalysisData VisualizationData ScrapingSystem DeploymentMachine LearningWeb Scraping+4

Education

University of Washington

Master of Science - MS

Sep 2023May 2025

Pune Institute of Computer Technology

Bachelor's degree — Computer Science

Aug 2019Jul 2023

Stackforce found 100+ more professionals with Machine Learning & Python (programming Language)

Explore similar profiles based on matching skills and experience