Sahiti Labhishetty

AI Researcher

Urbana, Illinois, United States7 yrs 6 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

Ph.D. in Computer Science with a 4.0 GPA
6+ years of research experience in ML and NLP
Expertise in user simulation and interactive AI systems

Stackforce AI infers this person is a Data Scientist specializing in Machine Learning and Natural Language Processing for B2C applications.

Contact

sahiti.labhishetty@target.com LinkedIn

Skills

Core Skills

Machine LearningInformation RetrievalBioinformaticsUser SimulationNatural Language ProcessingUser ModelingData Engineering

Other Skills

AlgorithmsAmazon Web Services (AWS)Apache SparkArtificial Intelligence (AI)CComputer ScienceData AnalyticsData MiningData ScienceDeep LearningGoogle Cloud Platform (GCP)Graph RepresentationImitation learningJavaKeras

About

I am a Sr. Data Scientist at Target. I graduated Ph.D. from the University of Illinois, Urbana-Champaign. My advisor is Prof. Chengxiang Zhai, who is leading the Text Information Management and Analysis Group (TIMAN). Research interests - machine learning (ML), search/recommendation, natural language processing (NLP), big data analytics, personalization, and interactive AI systems. Research and development work - solving problems in query understanding and ranking using ML models, optimization of search system-user interaction by proposing formal user models in search domains and data analysis methods leveraging search logs, developing ML and deep learning models for applications like prioritization of scientific experiments in biosynthesis, content organization/intelligence, search ranking, image ranking and classification. Experience - 6+ years of research experience (1+ years of industry experience) in machine learning (ML), information retrieval/search, natural language processing (NLP) and programming.

Experience

7 yrs 6 mos

Total Experience

2 yrs 2 mos

Average Tenure

3 yrs 2 mos

Current Experience

Target

Sr Data scientist

Apr 2023 – Present · 3 yrs 2 mos · Sunnyvale, California, United States · Remote

Solving problems in query understanding and ranking.

Machine LearningInformation RetrievalNatural Language Processing (NLP)

Amazon

Intern

May 2022 – Aug 2022 · 3 mos · Seattle, Washington, United States

Research Internship
Leveraging User search logs to learn User Simulation model
Method: We proposed a deep learning framework for training a User simulation model based on behaviour cloning learning method. User search logs are used for learning to predict user search actions. We evaluate the performance of the model and show that it indeed learns to predict coherent search actions relevant to the user task. The model can be used to simulate sessions given a user task as input.
User simulation model has many applications in both evaluation and training of IIR(interactive Information retrieval) models.

Computer SciencePython (Programming Language)Information RetrievalMachine LearningArtificial Intelligence (AI)Imitation learning+9

Walmart labs

Intern

May 2020 – Aug 2020 · 3 mos · Sunnyvale, California, United States

Differential Query semantic analysis: Discovery of Explicit interpretable knowledge from E-commerce search logs.
Presented a novel analysis method, called Differential Query Semantic Analysis (DQSA), for analyzing e-commerce (E-comm) search logs.
DQSA utilizes pairs of comparable query strings and their associated product attribute engagements to derive semantics of the complex query words/phrases.
Improved search ranking utilizing the semantics derived from DQSA.
DQSA also has applications in search ranking, user intent analysis over time and across different users, catalogue product descriptions, enabling interpretable transfer learning from ``head queries" to ``tail queries", and more.

Computer SciencePython (Programming Language)Natural Language Processing (NLP)Machine LearningPattern RecognitionArtificial Intelligence (AI)+10

University of illinois urbana-champaign

4 roles

Research Assistant

Promoted

Jan 2020 – Dec 2022 · 2 yrs 11 mos

TriGORank: A Gene Ontology Enriched Learning-to-Rank Framework for Trigenic Fitness Prediction
We introduce and study the task of using ML to recommend high-fitness triplet mutants as candidates for wet-lab experiments.
We utilize individual fitness, digenic fitness scores as features, and further utilize prior metabolic knowledge from an existing gene ontology, by designing a novel graph representation and deducing features capturing gene similarity and gene interactions. These features are used to train machine learning models that produce a ranked list, from high to low fitness scores for triplet gene mutants of S. cerevisiae. TriGORank is shown to be effective in term of both performance and explainability.

Computer SciencePython (Programming Language)TensorFlowPandas (Software)Learning to rankNatural Language Processing (NLP)+12

Teaching Assistant

Aug 2019 – Nov 2019 · 3 mos

TA for CS510 Advanced Information Retrieval, UIUC.

Computer SciencePython (Programming Language)Machine LearningMachine Learning Algorithms

Research Assistant

Dec 2018 – May 2019 · 5 mos

Learning Explainable ranking functions from search logs.
We conduct an in-depth study of explainability of ranking functions and propose five requirements a relevance explainable ranking function should satisfy. We present a novel evidence-based learning (EBL) approach that can learn an explainable ranking function from search logs by combining multiple probabilistic retrieval models with variable weights reflecting their respective amount of evidence in the search log.

Computer SciencePython (Programming Language)Machine LearningPattern RecognitionArtificial Intelligence (AI)Machine Learning Algorithms+1

PHD Graduate Student

Aug 2018 – Dec 2022 · 4 yrs 4 mos

1. User simulation for Interactive search
We aim to propose formal models for modelling user decision processes in an interactive search environment. User simulation is useful for the quantitative evaluation of interactive search systems, for optimization of a search engine interaction with users and for augmenting training data to supervised ranking models.
Proposed a novel interpretable optimization framework, referred to as PRE, for query formulation in IR. It maximizes recall and precision of the anticipated retrieval results while minimizing the effort of the query conditioned on user knowledge.
Modelled different types of queries arising from different types of user behaviours used to personalize query suggestions and optimize search results.
Derived new query simulation methods and existing methods using the proposed query model. Experiments show that the new instantiations outperform existing baseline query simulation methods.
2. An Exploration of Tester-based Evaluation of User Simulators to compare IIR systems
Addressed the problem of how to quantify the quality of a user simulator.
Proposed a novel Tester-based evaluation approach to evaluate the reliability of user simulators for evaluating interactive information retrieval (IIR) systems.
Proposed an evaluation module referred to as "tester", which is a set of IR systems with an expected performance pattern, to compute the reliability score of a user simulator.
Extended the approach to propose "reliability-aware" testers and proposed a RATE algorithm to learn the reliability of the testers and simulators jointly, which is shown to be effective and robust.
The aim is to develop an open evaluation platform to create a benchmark for offline evaluation of both IIR systems and user simulators.

Computer SciencePython (Programming Language)TensorFlowInformation RetrievalNatural Language Processing (NLP)Machine Learning+12

Walmart labs

Intern

May 2019 – Aug 2019 · 3 mos · Sunnyvale, CA, United States

A cognitive user model for E-comm search
We present a novel cognitive user model for E-Commerce (E-Comm) search that models a user's cognitive state, including the information need and knowledge state of the user. The model includes components to model all the major search behavior of a user, including query formulation, clicking on results, and query reformulation providing a complete model for users of E-Comm search.
The model has interpretable parameters that can be estimated using E-Comm search log data to analyze and understand users' behavior as well as can be manually adjusted to simulate different kinds of users.
Preliminary evaluation on an E-Comm search log shows that the model performs well for predicting user behavior in a search session. The analysis of search sessions reveals multiple interesting findings of the search behavior of E-Comm search users.

Computer SciencePython (Programming Language)Pandas (Software)Machine LearningArtificial Intelligence (AI)Data Engineering+6

Microsoft

Intern

Jun 2016 – Aug 2016 · 2 mos · Bengaluru, Karnataka, India

We worked on the fraud detection problem of Bing Ads customers. Pattern mining using BingAds customer data and WHOIS data is used to develop new features which are shown to increase precision and recall when added to the production model.

Computer SciencePython (Programming Language)Natural Language Processing (NLP)Machine LearningPattern RecognitionArtificial Intelligence (AI)+2