Niyati B.

Product Engineer

Pune, Maharashtra, India5 yrs 7 mos experience

Key Highlights

Expert in low-resource NLP and multilinguality.
Experience with machine translation and speech technologies.
Strong background in computational linguistics and research.

Stackforce AI infers this person is a specialist in NLP and computational linguistics with a focus on multilingual applications.

Contact

Skills

Core Skills

Machine TranslationMachine LearningComputational LinguisticsNatural Language ProcessingSoftware Development

Other Skills

Dialogue ModelsLinguistically Inspired Language ModelsMorphological DatasetsMultilingual ResearchEmbeddingsMultilingual TransferConstrained DecodingStatistical Machine TranslationLanguage-Agnostic ModelsNERClause Final Verb PredictionSyntax Bot DevelopmentDocument ClusteringTopic ModellingPolitico-Sociolinguistic Analysis

About

I'm a PhD student at CLSP, JHU, advised by David Yarowsky. I work on low-resource NLP, multilinguality, machine translation, and speech technologies for low-resource languages. Previously, I was a Research Engineer at Inria, Paris, mentored by Rachel Bawden and Benoît Sagot. Even before that, I was an MSc. student of Computer Science with a specialization in Computational Linguistics/NLP at the Erasmus Mundus Language and Communication Technologies programme, with a full scholarship to study for one year at Charles University, Prague, and my second year in Saarland University, Germany.

Experience

5 yrs 7 mos

Total Experience

2 yrs 9 mos

Average Tenure

2 yrs 10 mos

Current Experience

The johns hopkins university

Doctoral Student

Aug 2023 – Present · 2 yrs 10 mos

Inria

Research Engineer

Oct 2022 – Jun 2023 · 8 mos · Paris, Île-de-France, France

I'm working on linguistically inspired language models for dialect families.

Linguistically Inspired Language ModelsComputational Linguistics

Charles university

3 roles

Research Collaborator

Nov 2021 – Feb 2022 · 3 mos · Prague, Czechia

We are working on harmonising morphological datasets for 18 resources and over a 100 languages. The motivation is to boost multilingual research and insights in the field of morphology, e.g. neural architectures. The resulting work is headed for an LREC '22 submission.

Morphological DatasetsMultilingual ResearchComputational Linguistics

Research Intern

May 2021 – Aug 2021 · 3 mos · Prague, Czechia

I studied the transfer of embeddings from a high-resource Indian language (Hindi) to an artificially low-resource genealogically related language (Marathi), starting with two monolingual embedding spaces. I chose this problem because I believe that the task of leveraging linguistic properties and commonalities meaningfully (encouraging non-English-centric transfer) to make multilingual transfer and representation less data-hungry as well as more principled, is rich with potential for the network of Indian languages that constantly borrow from and influence each other, and further often have common typologies. Accepted at SIGMORPHON, colocated with NAACL '22.

EmbeddingsMultilingual TransferMachine Learning

Course project: Constrained Decoding, Statistical Machine Translation

Mar 2021 – Jun 2021 · 3 mos · Prague, Czechia

I worked with another student to implement constrained decoding on an English-Hindi MT model, with the objective of retaining English technical terms exactly as they appear in the source and producing fluent code mixed target output. This is targeted at translations of technical lectures or academic material. The resulting paper was accepted at ICON-21.

Constrained DecodingStatistical Machine TranslationMachine Translation

Amazon

Applied Scientist Intern

May 2020 – Aug 2020 · 3 mos · Bengaluru, Karnataka, India

I worked as a NLP research intern on the problem of developing language-agnostic models for NER, for English, French and German.

Language-Agnostic ModelsNERNatural Language Processing

Indian institute of technology, delhi

Research Intern

Mar 2020 – Aug 2020 · 5 mos

I worked with Professor Samar Husain and another student on modelling clause final verb prediction in Hindi. We simulated verb prediction as it takes place during online comprehension of SOV languages using various models that incorporate different hypotheses about the same, and compared these models to test these theories, notably, the adaptability hypothesis and the noisy channel hypothesis.
We used sentence completion data collected from native Hindi speakers for evaluation. I presented our work at CMCL, colocated with NAACL '21.

Clause Final Verb PredictionNatural Language Processing

Ashoka university

2 roles

Course Project, Advanced Programming

Aug 2019 – Dec 2019 · 4 mos

I built a syntax bot: Gina, who uses simple syntactical rules of Hindi and English to expand a lexical English-Hindi database when given parallel corpora of highly simplified sentences. This project was intended to develop students' skills at software development. Code has been open-sourced.

Syntax Bot DevelopmentSoftware Development

Course Project, Machine Learning

Aug 2019 – Dec 2019 · 4 mos

I attempted document clustering on the Enron email dataset. We tried different approaches to ‘solving’ the infamous Enron scandal using the corpus, including different methods of representing the text, measuring clusterability, and feature marking. We finally implemented topic modelling using Latent Semantic Indexing and Latent Dirichlet Allocation models (comparing the two) with the aim of performing an intermediate step i.e. identifying the content of emails to track suspicious scandal-related behaviour. We got reasonable results: topic coherence of about .6, and visible correlations between the respective words in resultant topics.

Document ClusteringTopic ModellingMachine Learning

Indian institute of information technology

Research Intern

Jun 2019 – Aug 2019 · 2 mos · Greater Hyderabad Area

I worked on identifying and handling verb phrase ellipsis as a stumbling block for English-Hindi machine translation. I took this project up under the broader ambit of sentence simplification for speech-to-speech English-Hindi machine translation. Applying both ML and linguistics/rule-based approaches, I found the best-performing solution, and wrote a research paper under Professor Dipti Misra Sharma. The paper has been selected for poster presentation and publication in the proceedings of ICON-2019 (International Conference on Natural Language Processing). I will be presenting the same in December in Hyderabad. This experience was my first in open-ended research; I was helped and guided by a number of Ph.D. students at IIIT-H, and learnt many things inside and outside my project subject, both from them and the other interns. Code has been open-sourced.

Verb Phrase Ellipsis HandlingMachine Translation

Speaking tiger

Non-fiction Translator

Apr 2019 – Aug 2019 · 4 mos

Translated a series of memoirs by Prabhat Ranjan, publication at Speaking Tiger ongoing.

Ashoka university

2 roles

Course Project, Computational Linguistics

Promoted

Jan 2019 – May 2019 · 4 mos

I, along with my course partner, conducted a politico-sociolinguistic analysis of code mixing. We created two corpora on Dalit politics and feminism from Twitter using keyword combinations. Putting together the theories of politic language of prominent Dalit and feminist activists in India, we attempt to understand our observations of code-mixing (given by a certain framework) in these corpora based on the sociolinguistic and political history of each movement in India. We contribute our results and the corpora in a paper; the paper is currently under review at a conference.Code has been open-sourced.

Politico-Sociolinguistic AnalysisComputational Linguistics