Divyanshu Mittal

Data Scientist

Mathura, Uttar Pradesh, India4 yrs experience

AI EnabledAI ML Practitioner

Key Highlights

Designed scalable AI pipelines for patent datasets.
Optimized OCR pipelines, improving extraction accuracy by 60%.
Developed an accounting chatbot using Generative AI.

Stackforce AI infers this person is a Data Science and AI professional with a focus on cloud technologies.

Contact

Skills

Core Skills

Data ScienceCloud ComputingData EngineeringGenerative AiEducation

Other Skills

Amazon Web Services (AWS)AWS SageMakerRetrieval-Augmented Generation (RAG)OpenSearchGlue CrawlersAthenaTerraformAWS BedrockAWS AthenaUSPTO APIGoogle BigQueryJSONOptical Character Recognition (OCR)n8nMake

About

I am a Junior Data Scientist at Adastra working on building scalable AI and data solutions using cloud and Generative AI technologies. My work primarily focuses on developing data pipelines, AI-driven workflows and GenAI applications to process and analyze large-scale datasets. At Adastra, I work extensively with AWS services such as Athena, Glue, S3 and Bedrock, where I help build and deploy AI pipelines for large patent datasets. My work includes extracting and transforming data from sources like the USPTO API and Google BigQuery, converting complex JSON structures into analytics-ready formats for AWS Athena and developing scalable pipelines for data processing and dashboard analytics. I have also worked on OCR pipeline optimization to improve extraction of text, tables and images from documents, as well as prompt engineering and testing of Bedrock models for Generative AI applications. Previously, I worked as a Data Engineer Intern at Heart It Out, where I built automated ETL pipelines using n8n and Make, managed data workflows and handled data ingestion into PostgreSQL and MySQL databases. My work involved pipeline debugging, workflow automation and data monitoring to ensure reliable and efficient data processing. I am particularly interested in AWS, Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), chatbots and AI pipelines and I enjoy exploring how these technologies can be applied to build practical and scalable systems. Outside of work, I enjoy playing badminton and I like reading books and continuously learning about new developments in AI, data engineering and cloud technologies.

Experience

4 yrs

Total Experience

2 yrs 6 mos

Average Tenure

1 yr 6 mos

Current Experience

Adastra

Junior Data Scientist

Dec 2024 – Present · 1 yr 6 mos · Munich · Remote

> Designed and implemented a scalable Retrieval-Augmented Generation (RAG) pipeline by indexing hierarchical patent documents into OpenSearch with hybrid search (BM25 + vector KNN using FAISS/HNSW), enabling context-aware LLM responses through AWS Bedrock and improving semantic search relevance.
> Implemented infrastructure on AWS including Glue Crawlers, Athena, Terraform and storage pipelines to support scalable data querying and dashboard KPIs.
> Built automated data pipelines by extracting patent datasets from the USPTO API and Google BigQuery, transforming JSON data into AWS Athena-compatible formats and enabling large-scale analytics.
> Optimized OCR pipelines by 60% speed latency to improve extraction accuracy of text, tables, and images from structured and unstructured documents.
> Developed GenAI features using prompt engineering and AWS Bedrock models, improving chatbot response quality and logging feedback for iterative model tuning.
> Lead and presented in client meetings and client discussions

Amazon Web Services (AWS)AWS SageMakerRetrieval-Augmented Generation (RAG)OpenSearchGlue CrawlersAthena+3

Heart it out

Data Engineer Intern

Aug 2024 – Oct 2024 · 2 mos · Bengaluru · Remote

> Developed automated ETL pipelines using n8n and Make to collect, transform, and load data from multiple sources into PostgreSQL.
> Designed and managed data workflows for reliable data ingestion, improving data availability for analytics and internal tools.
> Implemented workflow automation and monitoring to streamline data processing and reduce manual intervention.

n8nMakePostgreSQLMySQLData MonitoringData Engineering

Logiciel analytics

Generative AI Intern

Apr 2024 – Jul 2024 · 3 mos · Lucknow · Remote

-> Worked with AI Agents, RAG, Mongo DB, LLMs and JSON data to develop an accounting chatbot with AI.

AI AgentsRAGMongoDBLLMsJSONGenerative AI

Freelance

Private Tutor

Mar 2020 – Sep 2022 · 2 yrs 6 mos · Mathura, Uttar Pradesh, India · On-site

I was a private tutor, catering to students from grades 6 to 12, where I focused on teaching them Statistics, Mathematics and Physics. My approach involved fostering a deep understanding of the subjects through interactive lessons and personalized attention. By employing various teaching strategies and adapting to individual learning styles, I aimed to instill confidence and proficiency in my students. Additionally, I provided support outside of regular sessions, offering guidance on homework and exam preparation to ensure their academic success.

StatisticsMathematicsPhysicsEducation