Yugal Jain — AI Researcher

- Experienced in building , Data Engineering (ETL) pipelines, end-to-end NLP pipelines and deep learning architectures. - Proficient in Python, PyTorch, Sklearn, Pandas, Transformers, ElasticSearch, Flask, Docker, AWS, Spacy, NLTK. - Have worked on supervised NLP problems such as Sequence Classification tasks(Aspect classification), Multi-Label Text Classification , Document Classification , Sentiment Analysis and Unsupervised Tasks such as Hierarchic Topic Modeling and Top2Vec. * Projects - Umeed : Advanced Language Analytics Tool(Delhi Police) • Experimented with NLP models to combat fake news, hate speech, and abusive content. Cleaned pre-existing data and trained Deep Learning model to detect abuse, traumatic and disturbing content in images and videos shared on socialmedia. • This is being undertaken to shed some light on the practical prospects of stopping the cycle of online crime, harassment and abuse. • Worked with multiple stakeholders to reduce the legal liabilities of models below the human level bias. - Auto Content Moderation System • Developed an auto content moderation system to provide family friendly web shows and integrated with speech,text,and vision system which has ability to detect and mute abusive videos. Used video classifcation model to classify NSFW videos and CMU sphinx for Text to Speech(TTS) to detect abusive words in audio and replace it with beep. • Proposed a mode named SAFE MODE/FAMILY MODE . This mode uses our ACS(Auto Censoring System) to minimize explicit visuals and abusive language from the content user wants to watch. Designed it in a way that it’s easily integrable with OTT platforms. - Social Media Bias Free Bot • This bot can reply unbiasedly on comments posted by users on social media platforms like reddit,twitter or discord and will try to stop spreading racism, religious bias on these platforms through positive sentiment comments. • Trained GPT-2 based model on Jigsaw Toxic Comment Classifcation Dataset to detect toxicity in comments and AG-News Dataset to reply according to topic of comments posted on subreddits like politics, sports , technology etc and deployed on discord and reddit

Stackforce AI infers this person is a Machine Learning Engineer specializing in Natural Language Processing and AI solutions.

Location: Rohtak, Haryana, India

Experience: 1 yr 4 mos

Skills

Data Science
Natural Language Processing (nlp)
Large Language Models (llm)
Machine Learning
Optical Character Recognition (ocr)
Computer Vision
Deep Learning
Machine Translation

Career Highlights

Developed AI-based translation system for 100+ languages.
Engineered real-time speech-to-text capabilities for sales meetings.
Implemented OCR solutions for automated text extraction.

Work Experience

H10 AI

Machine Learning Engineer (4 mos)

Valona Intelligence

Data Science Consultant (3 yrs 3 mos)

Expedite Commerce

Machine Learning Engineer (7 mos)

Trantor

Machine Learning Engineer (10 mos)

Optima Ideas, s.r.o.

Data Science Consultant (3 mos)

Ritsumeikan University

Research Collaborator (2 mos)

BlackNet

Machine Learning Engineer (1 yr 1 mo)

Education

Bachelor of Technology at Guru Gobind Singh Indraprastha University

Yugal Jain

AI Researcher

Rohtak, Haryana, India1 yr 4 mos experience

AI EnabledAI ML Practitioner

Key Highlights

Developed AI-based translation system for 100+ languages.
Engineered real-time speech-to-text capabilities for sales meetings.
Implemented OCR solutions for automated text extraction.

Stackforce AI infers this person is a Machine Learning Engineer specializing in Natural Language Processing and AI solutions.

Contact

Skills

Core Skills

Data ScienceNatural Language Processing (nlp)Large Language Models (llm)Machine LearningOptical Character Recognition (ocr)Computer VisionDeep LearningMachine Translation

Other Skills

AWS LambdaAlgorithmsAmazon Web Services (AWS)Artificial Intelligence (AI)Audio SynthesisC (Programming Language)C++CalculusChatbot DevelopmentComputational LinguisticsData AnalysisData AnalyticsData StructuresEngineeringFast API

About

Experience

1 yr 4 mos

Total Experience

8 mos

Average Tenure

Current Experience

H10 ai

Machine Learning Engineer

Aug 2024 – Dec 2024 · 4 mos · Toronto, Ontario, Canada · Remote

● Architected and implemented an automated industry classification code harmonization system using a unified compliance API gateway, enhancing organizational regulatory alignment.
● Implemented sophisticated prompt engineering solutions using LangChain, applying few-shot learning, prompt chaining, and advanced tuning techniques to optimize LLM performance for industry-specific classification tasks.
● Conducted research and deployed multiple advanced NLP solutions, including pre-trained textual embedding models and Large Language Models (Claude Sonnet 3.5), utilizing Retrieval Augmented Generation (RAG) to accurately recommend industry classification codes (SIC, NAICS, GICS) within established compliance parameters and getting accuracy up to 95%.
● Designed and orchestrated an end-to-end industry classification codes recommendation pipeline on AWS infrastructure, optimizing system performance and scalability.

Python (Programming Language)Deep LearningNatural Language Processing (NLP)Amazon Web Services (AWS)Production DeploymentUnit Testing+4

Valona intelligence

Data Science Consultant

Mar 2023 – Present · 3 yrs 3 mos · Finland · Remote

● Developed an AI-based translation system encompassing 100+ languages for an actionable competitive market intelligence platform, resulting in a 2x performance improvement and 50% cost reduction, which is achieved by deploying a multilingual model on a single machine handling thousands of requests daily.
● Designed and deployed an end-to-end industry insights classification pipeline on an Azure Kubernetes cluster as a real-time online endpoint, achieving a 50% improvement in accuracy compared to the previous classifier.
● Integrated a benchmarking pipeline to continuously enhance model performance in a production environment.

Production DeploymentData ScienceArtificial Intelligence (AI)Natural Language Processing (NLP)Microsoft Azure Machine LearningDeep Learning

Expedite commerce

Machine Learning Engineer

Aug 2022 – Mar 2023 · 7 mos · Texas, United States · Remote

● Contributed to developing an AI-based sales intelligence platform that enhanced Sales Performance Management (SPM) for startups and companies, leading to more efficient sales calls and doubling client conversions in less time.
● Designed and implemented end-to-end generic Kinesis routing pipelines, including unit testing and detailed documentation.
● Build, train and deployed ML pipelines for Masked Language modeling and classification on sales data using various transformer architectures, resulting in understanding of sales context in detail and accuracy up to 95%.
● Engineered real-time speech-to-text and speaker diarization capabilities for sales meetings.
● Developed cost-efficient, low-latency lambda containers for multiple microservices, optimizing high-level API code.

LearningEngineeringPyTorchTensorFlowProblem SolvingArtificial Intelligence (AI)+7

Trantor

Machine Learning Engineer

Oct 2021 – Aug 2022 · 10 mos · Gurugram, Haryana, India · Remote

● Worked on legal text analytics project, extracting 10+ key clauses from legal agreements such as MSA, PSA, Nodal, licenses, and more.
● Implemented OCR solutions using Tesseract for digitizing and processing scanned legal documents, enabling automated text extraction from image-based contracts and agreements to support downstream NLP pipelines.
● Built and fine-tuned BERT-based pre-trained NER models to detect custom entities like person names,
organizations, and addresses.
● Deployed an end-to-end model on AWS Lambda using Docker while maintaining an Elasticsearch database.

Text AnalyticsArtificial Intelligence (AI)Natural Language Processing (NLP)Unstructured DataAWS LambdaAmazon Web Services (AWS)+4

Optima ideas, s.r.o.

Data Science Consultant

Mar 2021 – Jun 2021 · 3 mos · Bratislava, Slovakia · Remote

● Consulted with users, vendors and technicians to determine computing
needs and system requirements.
● Analyzed problems to develop solutions involvingcomputer hardware andsoftware.
● Developed Real time violence detection system using
Tensorfow/Keras andNvidia Tensorrt on RWF-2000 dataset.
● RWF-2000 is a database with 2000 videos captured by surveillance
cameras in real-world scenes.

LearningTensorFlowKerasComputer VisionDeep Learning

Ritsumeikan university

Research Collaborator

Jan 2021 – Mar 2021 · 2 mos · Kyoto, Japan · Remote

● Conducted comprehensive research, synthesized information from multiple sources, and presented actionable results.
● Collaborated closely with an Assistant Researcher on her thesis, maintaining optimal communication to effectively and efficiently complete project.
● Researched and implemented best practices for a Character-Based Neural Machine Translation System to translate from Minangkabau to Indonesian using TensorFlow, achieving over 95% accuracy. Leveraged the close phonetic similarities but distinct spelling differences between the two languages to enhance translation performance.
● Experimented with Char-Transformers, Char-RNN, BiLSTM, Sentencepiece, and Byte Pair Encoding, ultimately achieving the best results with Char-Transformers on a 14,000-entry word-to-word dictionary dataset.

LearningMachine TranslationTransformerTensorFlowNatural Language Processing (NLP)Keras

Blacknet

Machine Learning Engineer

Sep 2019 – Oct 2020 · 1 yr 1 mo · Delhi, India

● Analyzed Twitter data using statistical and machine learning techniques to provide insights into trending topics in India and applied robust topic modeling with Top2Vec on COVID-19 tweets and visualized the results.
● Experimented with various machine learning and deep learning models for sentiment analysis, including pQRNN (Projection-based Quasi-Recurrent Neural Network), achieving over 92% accuracy on the Jigsaw Toxic Comment Classification dataset with a model size of just 320KB.
● Developed an aspect-based sentiment analysis model using LCF-BERT, incorporating local context embeddings to determine aspect sentiment and deployed it end-to-end using serverless AWS Lambda and API Gateway.

FlaskLearningMachine LearningNatural Language Processing (NLP)Computer VisionPython (Programming Language)