Vatsal Parsaniya

AI Researcher

Bangalore, Karnataka, India7 yrs 6 mos experience
AI EnabledAI ML Practitioner

Key Highlights

  • Expert in optimizing multilingual search systems.
  • Developed scalable entity extraction algorithms.
  • Proficient in NLP and machine learning technologies.
Stackforce AI infers this person is a Data Scientist specializing in NLP and search optimization within the SaaS industry.

Contact

Skills

Core Skills

Natural Language Processing (nlp)Search Engine OptimizationMicroservice DevelopmentMachine LearningConversational Ai

Other Skills

AlgorithmsApache AirflowArduinoArtificial Intelligence (AI)Benchmark EvaluationsC (Programming Language)C++ChessCompetitive ProgrammingComputer NetworkingComputer VisionData EngineeringData ScienceData VisualizationDeep Learning

About

As a Data Scientist at Embibe, I collaborate with the product team within the Discovery Search Science group to transform intricate business challenges into data science problem statements. My role revolves around enhancing the search experience for users by optimizing multilingual search outcomes across a wide range of customer products and internal tools. I have over 3 years of expertise in Information Retrieval, NLP, and micro-service development, with hands-on experience in innovating new products using Data Science and Machine Learning. Some of my notable achievements include developing a scalable text entity extraction algorithm that identifies academic entities from multiple ontology datasets and synonym dictionaries, and designing a Retrieval Augmented Generation system for search and chatbot applications that retrieves academic content with ontologies information and dynamically generates responses. I am proficient in various data stores, development tools, backend tools, observability tools, and frameworks and models for NLP and ML. I am passionate about multilingual search systems and uncovering valuable insights from complex data. โ€ข ๐ƒ๐š๐ญ๐š ๐’๐ญ๐จ๐ซ๐ž๐ฌ : Elasticsearch, Milvus, Solr, MongoDB, Redis, PostgreSQL โ€ข ๐ƒ๐ž๐ฏ๐ž๐ฅ๐จ๐ฉ๐ฆ๐ž๐ง๐ญ ๐“๐จ๐จ๐ฅ๐ฌ : Git, Curl, Jupyter Notebook, PyCharm, Postman โ€ข ๐๐š๐œ๐ค๐ž๐ง๐ ๐“๐จ๐จ๐ฅ๐ฌ : FastAPI, Airflow, Docker, Azure, Jenkins(CI/CD) โ€ข ๐Ž๐›๐ฌ๐ž๐ซ๐ฏ๐š๐›๐ข๐ฅ๐ข๐ญ๐ฒ : Newrelic, Loggly, Pyinstrument ๐’๐ญ๐š๐ญ๐ข๐ฌ๐ญ๐ข๐œ๐ฌ / ๐Œ๐š๐œ๐ก๐ข๐ง๐ž ๐‹๐ž๐š๐ซ๐ง๐ข๐ง๐  / ๐ƒ๐ž๐ž๐ฉ ๐‹๐ž๐š๐ซ๐ง๐ข๐ง๐  / ๐๐‹๐ : โ€“ ๐…๐ซ๐š๐ฆ๐ž๐ฐ๐จ๐ซ๐ค๐ฌ ๐Ÿ๐จ๐ซ ๐๐‹๐ : NLTK, Spacy, PyTorch, Pandas, Scikit-Learn, Text Blob โ€“ ๐๐‹๐ ๐Œ๐จ๐๐ž๐ฅ๐ฌ ๐”๐ฌ๐ž๐ : BERT, RoBERTa, Elastic-ELSER, ALBERT, T5, LLM โ€“ ๐Œ๐จ๐๐ž๐ฅ ๐ƒ๐ž๐ฉ๐ฅ๐จ๐ฒ๐ฆ๐ž๐ง๐ญ & ๐‹๐ข๐Ÿ๐ž๐œ๐ฒ๐œ๐ฅ๐ž : NVIDIA Triton, MLflow โ€“ ๐Œ๐‹ ๐€๐ฅ๐ ๐จ๐ซ๐ข๐ญ๐ก๐ฆ๐ฌ ๐ˆ๐ฆ๐ฉ๐ฅ๐ž๐ฆ๐ž๐ง๐ญ๐ž๐ : Linear Regression, Logistic Regression, XGBoost, KNN, KMeans, PCA, TSNE, TF-IDF, Word2Vec, Ensemble Algorithms, Topic Modeling โ€“ ๐•๐ข๐ฌ๐ฎ๐š๐ฅ๐ข๐ณ๐š๐ญ๐ข๐จ๐ง : Elastic-Kibana, Metabase, Matplotlib, Seaborn, Plotly โ€“ ๐€๐ฉ๐ฉ๐ฅ๐ข๐œ๐š๐ญ๐ข๐จ๐ง ๐ƒ๐ž๐ฆ๐จ : Gradio, Streamlit

Experience

Pw (physicswallah)

Senior Data Scientist

May 2024 โ€“ Present ยท 1 yr 10 mos ยท Bengaluru, Karnataka, India ยท Hybrid

Embibe

3 roles

Data Scientist

Promoted

Apr 2022 โ€“ May 2024 ยท 2 yrs 1 mo

  • I closely collaborate with the product team within the Discovery Search Science group to transform intricate business challenges into data science problem statements. My role revolves around enhancing the search experience for users by optimizing multilingual search outcomes across a wide range of customer products and internal tools and also engage in both offline and online assessments with the engineering team to ensure the feasibility of our solutions, utilizing methodologies like NLP and Rule-Based systems.
  • Retrieval Augmented Generation (RAG):
  • Involved in development of Retrieval Augmented Generation for search and chatbot applications. This entails retrieving academic content with ontologies information and dynamically selecting prompts to generate contextually relevant responses from a generative model.
  • Multilingual Hybrid Search:
  • Developed search capabilities in 11 Indic languages, including query understanding (QU) and query expansion (QE) modules, enabling users to seamlessly search in any language.
  • Conducted benchmark evaluations on various vector databases, including Milvus, Qdrant, Elasticsearch, and Solr, with a primary focus on retrieval latency and vector index types.
  • Integrated a fine-tuned embedding model into the inference server and incorporated a vector database into the hybrid search pipeline, enabling semantic search capabilities.
  • Established a search feedback pipeline using the Gradio Interface for SME validation and the evaluation of various search algorithms.
  • Setup search utilization dashboard by consuming user event logs to monitor and measure search performance metrics.
  • Achieved a substantial 8% increase in Click-Through Rate (CTR) by optimizing the intent and entity based ranking algorithm, contributing significantly to the system's overall performance.
NLPMultilingual SearchRetrieval Augmented GenerationSearch OptimizationVector DatabasesGradio+5

Jr. Data Scientist

Dec 2021 โ€“ Apr 2022 ยท 4 mos

  • Entity Extraction (NER):
  • Designed and implemented a scalable text entity extraction algorithm integrated into a microservice. This system identifies academic entities through the utilization of multiple ontology datasets, synonym dictionaries, and a spellcheck mechanism, all seamlessly integrated with the Solr analyzer to achieve heightened accuracy.
  • The algorithm serves as a core service in various client products, tasked with highlighting academic entities and retrieving associated academic content.
  • Spell Check:
  • Spell corrector is fundamental component of our NLU pipeline and search engine. Worked on building a corpus for spellchecking using user search query data. I implemented a hybrid spellcheck algorithm that combines popularity-based and context-based approaches. This algorithm successfully improved the accuracy of our system, resulting in the correction of approximately 37% of total user search terms with an impressive accuracy of around 79%.
Entity ExtractionSpell CheckMicroservicesNLPNatural Language Processing (NLP)Microservice Development

Data Science Intern

Sep 2021 โ€“ Dec 2021 ยท 3 mos

Intellica.ai

2 roles

Machine Learning Engineer

Jun 2021 โ€“ Sep 2021 ยท 3 mos ยท Ahmedabad, Gujarat, India

  • โ€ฃ worked on building a real-time telephonic conversational-ai system for an effective interview pre-screening round.
  • โ€ฃ Improved ๐ฌ๐ฉ๐ž๐ž๐œ๐ก-๐ญ๐จ-๐ญ๐ž๐ฑ๐ญ ๐ฉ๐ข๐ฉ๐ž๐ฅ๐ข๐ง๐ž ๐Ÿ๐จ๐ซ ๐ˆ๐ง๐๐ข๐š๐ง ๐„๐ง๐ ๐ฅ๐ข๐ฌ๐ก ๐š๐œ๐œ๐ž๐ง๐ญ with transfer learning on the deep-speech STT model.
  • โ€ฃ Microservices were developed to evaluate the Montreal Cognitive Assessment (MoCA). The microservices contain a computer vision-based cube and clock ๐๐ซ๐š๐ฐ๐ข๐ง๐  ๐ž๐ฏ๐š๐ฅ๐ฎ๐š๐ญ๐ข๐จ๐ง ๐ฌ๐ฒ๐ฌ๐ญ๐ž๐ฆ, as well as a context-based ๐š๐ง๐ฌ๐ฐ๐ž๐ซ ๐ž๐ฏ๐š๐ฅ๐ฎ๐š๐ญ๐ข๐จ๐ง ๐ฌ๐ฒ๐ฌ๐ญ๐ž๐ฆ leveraging NLP concepts.
Conversational AISpeech-to-TextMicroservicesNLPMachine Learning

Machine Learning Intern

Nov 2020 โ€“ Jun 2021 ยท 7 mos ยท Ahmedabad, Gujarat, India

Cretus- the robotics and automation club of pdpu

3 roles

Advisor

Jun 2020 โ€“ Jul 2021 ยท 1 yr 1 mo

Event Management Head

Aug 2019 โ€“ Jun 2020 ยท 10 mos

Committee Member

Jul 2018 โ€“ Aug 2019 ยท 1 yr 1 mo

Education

Pandit Deendayal Energy University

Bachelor of Engineering - BE โ€” information and communication technology

Jul 2017 โ€“ Jul 2021

Stackforce found 100+ more professionals with Natural Language Processing (nlp) & Search Engine Optimization

Explore similar profiles based on matching skills and experience