Dileep Kumar Sahu

Data Scientist

West Delhi, Delhi, India5 yrs 1 mo experience
Most Likely To SwitchAI Enabled

Key Highlights

  • Expert in building classical and deep learning models.
  • Architected a large-scale NLP pipeline processing millions of messages.
  • Achieved 96.5% accuracy through advanced ML algorithms.
Stackforce AI infers this person is a Fintech Data Scientist with expertise in NLP and machine learning.

Contact

Skills

Core Skills

Natural Language Processing (nlp)Machine LearningData Science

Other Skills

Sentence-TransformersLangChainChromaDBOpenAI APIFastTextXGBoostDistilBERTKerasAirflowJenkinsApache Livy APIFernet-based encryptionPySparkregex-based NLPNeural Networks

About

Highly skilled and experienced Data Scientist having around 3.5 years of expertise in building classical ML models including regression, classification, clustering, bagging, and boosting. Proficient in developing deep learning models in the NLP domain utilizing RNN, LSTM, GRU, BERT, Spacy, and Transformer-based architectures. Adept at deploying solutions at scale using Python, PySpark, and various databases and cache systems. Experienced in deploying solutions using engineering tools such as Kafka, EMR, EC2, Airflow, Jenkins, S3, and Amazon SageMaker. Strong communicator with a passion for problem-solving and delivering actionable recommendations.

Experience

5 yrs 1 mo
Total Experience
3 yrs 2 mos
Average Tenure
4 yrs 7 mos
Current Experience

Paytm

Data Scientist

Nov 2021Present · 4 yrs 7 mos · Noida, Uttar Pradesh, India

Data Science

One97 communications limited

Senior Data Scientist

Nov 2021Present · 4 yrs 7 mos · Noida, Uttar Pradesh, India

  • Architected and deployed a Retrieval-Augmented Generation (RAG) system using Sentence-Transformers and LangChain to process structured and unstructured data (Confluence/relational data) into vector embeddings stored in ChromaDB. Delivered real-time infrastructure insights by integrating the RAG system with the OpenAI API, enhancing provides contextually relevant answers by combining retrieved knowledge with large language model capabilities. Developed an intelligent SMS parsing platform leveraging a hybrid ensemble system (FastText, XGBoost, DistilBERT) for real-time and batch classification across 15+ financial domains. Instrumental in creating customer financial profiles, leading to a 28% uplift in targeted outreach for Personal Loan and Bill Reminder teams. Optimized data model performance, achieving 96.5% accuracy through the implementation of advanced ML algorithms and statistical modeling. Architected and deployed a large-scale PYSPARK pipeline processing 10+ Million transactional SMS messages daily across 27+ financial use cases using regex-based Natural Language Processing (NLP). Improved data refresh efficiency by 40% by engineering incremental processing, deduplication, and aggregation jobs (30d/90d/lifetime rolling windows). Automated deployment and cluster management by building a dynamic Airflow DAG factory orchestrating sequential/parallel Spark jobs and managing EMR via Jenkins/Apache Livy API. Ensured PII compliance and platform reliability by implementing Fernet-based encryption for sensitive data and multi-level Slack-based alerting. Developed and deployed an XGBoost gradient boosting model using ensemble methods to predict device failure probability based on CPU, memory, and network metrics. Enabled proactive maintenance through automated anomaly scoring, reducing downtime/issue detection time by 45%. Developed an enterprise-scale Deep Learning PII detection pipeline using Keras neural networks for parallel processing of structured data.
Sentence-TransformersLangChainChromaDBOpenAI APIFastTextXGBoost+8

Emilence pvt. ltd.

Machine Learning Engineer

Aug 2020Feb 2021 · 6 mos

  • Increased user engagement by 60% by deploying a Hybrid User-based Collaborative Filtering Recommendation Engine on Amazon Web Services (AWS) that outperformed existing models in A/B testing. Designed, implemented, and evaluated new models and rapid software prototypes to solve problems in machine learning and systems engineering.
Collaborative FilteringAWSMachine Learning

Education

Indraprastha Institute of Information Technology, Delhi

PG Diploma — Data Science & Artificial Intelligence

Jan 2020Jan 2021

Panjab University, Chandigarh

MCA — Computer Application

Jan 2017Jan 2020

Delhi University

Bachelor's degree — Mathematics

Jan 2013Jan 2016

Aryabhatta College, University Of Delhi

Bachelor of Science — Mathematics

Indraprastha Institute of Information Technology, Delhi

Postgraduate Degree — Data Science

Panjab University, Chandigarh

Master of Computer Applications

Stackforce found 100+ more professionals with Natural Language Processing (nlp) & Machine Learning

Explore similar profiles based on matching skills and experience