Subhalingam D

Data Scientist

Chennai, Tamil Nadu, India5 yrs 3 mos experience
Most Likely To SwitchAI Enabled

Key Highlights

  • Designed advanced NLP systems for e-commerce.
  • Achieved significant accuracy improvements in product classification.
  • Developed innovative translation models for Indian languages.
Stackforce AI infers this person is a Data Scientist specializing in NLP and Machine Learning for E-commerce applications.

Contact

Skills

Core Skills

Natural Language Processing (nlp)Deep LearningData ScienceMachine LearningDigital Signal Processing (dsp)Back-end Development

Other Skills

Amazon Web Services (AWS)Artificial Intelligence (AI)Back-End Web DevelopmentBashC (Programming Language)C++Cascading Style Sheets (CSS)Cloud ComputingComputational LinguisticsData MiningData StructuresDjangoGitJavaKeras

About

Subhalingam is interested in the broad areas of Natural Language Processing (NLP), Information Retrieval (IR) and Deep Learning. He currently works as a Data Scientist at KnowDis Data Science and holds a B.Tech. degree in Mathematics and Computing from the Indian Institute of Technology, Delhi (IIT Delhi). He is working on building neural Q&A, translation and recommender systems for the e-commerce domain and deploying them in production. He is specifically interested in applying NLP techniques to Indian languages. During his undergraduate days, he worked on entity tracking and next-step recommendation on technical procedural texts. This has importance in automating technical troubleshooting (similar to questions on Stack Overflow). His Bachelor's thesis was on identifying hate speech spreaders on Twitter. Previously, he worked at Samsung Research on sound source localization and separation using Digital Signal Processing (DSP) techniques.

Experience

Knowdis ai

2 roles

Data Scientist

May 2022Present · 3 yrs 10 mos · Remote

  • Product Category Prediction for Automated Product Mapping
  • Designed a retriever-reranker pipeline to classify products into one of 100K+ categories using metadata like titles and specifications
  • Fine-tuned a decoder-based retriever; aggregated results using Mean Reciprocal Rank to shortlist candidates across multiple retrievers
  • Developed few-shot prompts for a local LLM reranker with single-step decoding; optimized latency using prefix-caching and vLLM
  • Achieved 13% accuracy gain, reduced grossly wrong predictions by 66%, and maintained 87%+ high-confidence coverage
  • Deployed to production on IndiaMART
  • Attribute-Value Extraction from Product Titles and User Queries (Research accepted at KDD 2025 ADS Track)
  • Generated weakly supervised training data with incomplete labeling from product specifications
  • Designed a novel two-stage system that employs a marker-augmented generative model to identify potential attributes, followed by a cross-encoder based token classification model that determines the associated values for each attribute
  • Regenerated training data to expand attribute-value annotations and trained a standard NER classifier on the enriched dataset
  • Improved recall by 20% while maintaining 90% precision; used for dynamic feature highlighting to enhance search experience
  • Style-Controllable English-to-Hindi Translator
  • Extended an encoder-decoder model with style tokens to control translation style; fine-tuned on in-house parallel corpora
  • Obtained English translations for scraped style-specific monolingual data using Google Translate API to augment the training data
  • Developed a style classifier to sample representative training examples, leading to improved in-style words usage in translations
  • Additional Projects
  • Dockerized a product search system with large disk-based ANN indexes, deployed using NVIDIA Triton Inference Server
  • Developed a shopping assistant to enhance user experience via conversational product recommendations & discovery
Natural Language Processing (NLP)Deep LearningData SciencePythonMachine LearningLarge Language Models (LLM)+1

Data Science Intern

Jan 2022May 2022 · 4 mos · Remote

  • Product Category Prediction for Product Search
  • Built a transformer-based classifierto predict the most relevant product category from 100K+ labels using bootstrapped search logs
  • Designed heuristics to enhance atomic label representations and sampled data to improve category distribution and coverage
  • Achieved similar accuracy (~88%) as the previous seq2seq model while significantly reducing average response time (3x faster) and completely eliminating timeouts; the model was integrated with IndiaMART’s search system and was deployed in production
Natural Language Processing (NLP)Machine LearningPython

Indian institute of technology, delhi

2 roles

Teaching Assistant

Aug 2021Dec 2021 · 4 mos

  • COL764: Information Retrieval & Web Search (Graduate-level course taught by Prof. Srikanta Bedathur at IIT Delhi)

Undergraduate Researcher

Feb 2021Apr 2022 · 1 yr 2 mos

  • Supervised by Prof. Srikanta Bedathur & Prof. Maya Ramanath • In collaboration with IBM AI Horizons Network
  • Prepared a dataset consisting of How-to troubleshooting FAQs by scraping WikiHow pages from Computers and Electronics category
  • Constructed BERT-based baselines to predict changes in properties of the entities involved at each step of the process
  • Surveyed the literature to build next-step recommenderfrom a given sequence of performed actions and developed LSTM baselines
Natural Language Processing (NLP)Machine LearningPython

Samsung india

Software Engineer Intern

Jun 2021Jul 2021 · 1 mo

  • Acoustic Sound Source Localization, Tracking and Separation
  • Developed sound source direction estimation module using time delay of arrival of signals between pairs of microphones in an array
  • Added modules for tracking active sound sources and extracting individual signals for downstream object identification pipeline
  • Integrated stationary noise estimation module for ambient noise removal and reduced maximum direction of arrival error to 7°
  • Received Pre-Placement Offer (PPO) for impeccable performance during the internship
Digital Signal Processing (DSP)Python

Materate education

2 roles

Machine Learning Developer

May 2020Aug 2020 · 3 mos

  • Modelled latent knowledge space of students using response patterns
  • Developed Item Response Theory-based probabilistic models to estimate and analyze the ability of 5000+ students & difficulty of 200+ questions
Machine LearningPythonDjango

Back End Developer

May 2020Aug 2020 · 3 mos

  • Results Portal Development
  • Designed database schema and built Web APIs using Django REST framework to display students' performance reports to parents
  • Deployed Django backend using Elastic Beanstalk with MySQL on RDS and React frontend to S3 with CloudFront CDN integration
  • Set up Auto Scaling group and attached Load Balancer for horizontal scaling
  • The portal went live with the results of 5000+ students
DjangoMySQLCloud ComputingBack-End Development

Education

Indian Institute of Technology, Delhi

B.Tech. — Mathematics and Computing

Jan 2018Jan 2022

Chennai Public School - India

CBSE

Jan 2010Jan 2018

Stackforce found 100+ more professionals with Natural Language Processing (nlp) & Deep Learning

Explore similar profiles based on matching skills and experience