Anamika Sinha — AI Researcher

Data scientist with hands on knowledge of machine learning , natural language processing using neural networks, big data storage as well as experiment design. With a Masters in Data Science from Berkeley combined with experience in data analysis, business systems analysis and programming in transactional as well as data warehousing analytics, I bring a rich skill set to gain deep data insights rooted in statistical concepts. Languages: R, Python, SQL Competencies: Machine learning, cloud computing (AWS, Google Cloud), advanced statistics, SPARK, Git/Github, LINUX command line, natural language processing, Tableau, Scikit learn libraries Databases: Oracle, Hive, Postgres, HDFS, DB2, graph database Neo4j DeepLearning: Feed Forward Neural Network, Convolutional Neural Network(CNN), Recurrent Neural Network (RNN) Tools: Tableau, Git, , Hadoop, Hadoop Streaming, SPARK distributed computing, TensorFlow, Keras Select Data Science Projects (For more projects and code, please refer to github): Data engineering project on quality of care for Medicare patients - Built a pipeline to access publicly available data bout hospital performance, loaded into a HDFS data lake, created ER diagram and transformed data to use SQLs to answer key questions. Image recognition (deep learning) Kaggle project-Used TensorFlow to build a convolutional neural network for detecting facial keypoints to understand key principles of deep learning. Transfer learning with sentiment analysis(NLP) - Applied natural language processing techniques to understand model transferability from a source domain in order to optimize performance when a small amount of labeled data is available in the target domain (used Amazon review dataset). Adverse drug reactions chatbot (adriabot.com)- Facilitating easy Information retrieval about adverse drug reactions from the biomedical literature and presenting it to a patient in a format that is easy to use and easy to interpret.

Stackforce AI infers this person is a Data Science expert in Healthcare and SaaS industries.

Location: San Mateo, California, United States

Experience: 17 yrs 3 mos

Skills

Data Engineering
Machine Learning
Data Analysis
Natural Language Processing
Data Science
Business Analysis
Software Development

Career Highlights

Reduced troubleshooting effort by 50% in data quality.
Generated $4.2 million savings through patient ranking system.
Improved model prediction quality by 12% in production.

Work Experience

Tendo

Principle Data Scientist Consultant (11 mos)

agilon health

Lead Data Scientist (1 yr 2 mos)

Senior Data Scientist & Scrum Team Lead (2 yrs 2 mos)

SugarCRM (Through Node.io acquistion)

Senior Data Scientist (1 yr 2 mos)

Node.io

Data Scientist (1 yr 1 mo)

UC Berkeley School of Information

Graduate Student of Data Science (2 yrs)

Silicon Tech Lab/ UC Berkeley

Analytics Lead (3 yrs 9 mos)

Infosys

Software Programmer (5 yrs)

Education

Master's degree at UC Berkeley School of Information

Bachelor’s Degree at BIT Sindri

Anamika Sinha

AI Researcher

San Mateo, California, United States17 yrs 3 mos experience

AI ML PractitionerAI Enabled

Key Highlights

Reduced troubleshooting effort by 50% in data quality.
Generated $4.2 million savings through patient ranking system.
Improved model prediction quality by 12% in production.

Stackforce AI infers this person is a Data Science expert in Healthcare and SaaS industries.

Contact

post2anamika@yahoo.com LinkedIn

Skills

Core Skills

Data EngineeringMachine LearningData AnalysisNatural Language ProcessingData ScienceBusiness AnalysisSoftware Development

Other Skills

A/B test experimentAWS Comprehend MedicalAgile MethodologiesAlgorithmsBusiness IntelligenceBusiness ProcessBusiness Process AnalysisClient Co-ordinationDBT data pipelineDatabasesICD10 code diagnosisIntegrationML/AI modelsProject RetrospectionsPython (Programming Language)

About

Experience

17 yrs 3 mos

Total Experience

2 yrs 8 mos

Average Tenure

11 mos

Current Experience

Tendo

Principle Data Scientist Consultant

Jul 2025 – Present · 11 mos

Designed and implemented a scalable data quality framework to assess and quantify the daily
quality of customer data ingested via databricks pipeline. This reduced troubleshooting effort by
the Customer Support team by 50%.
Working on an agent to decrease the clinical algorithm development time by half with human in
the loop evaluations.

data quality frameworkdatabrickscustomer datadata engineeringmachine learning

Agilon health

2 roles

Lead Data Scientist

Promoted

Jan 2024 – Mar 2025 · 1 yr 2 mos

Designed and led the patient ranking system solution to suggest patients with undocumented conditions to clinical reviewers. Partnered with the clinical team for rapid feature engineering and deployed two step random forest models. The pilot generated savings of $4.2 million and paved the way for a wider implementation.
Evaluated prompt engineering techniques on open-sourced healthcare large language models. This
helped assess out of the box llm performance for missing patient diagnosis prediction task.
Built DBT data pipeline, designed evaluation metrics and created a dashboard to integrate and evaluate an off the shelf AI product for ICD10 code diagnosis predictions. Complexity involved was in designing the pipeline to handle different comparisons involving common data rows that simplified and enabled fair comparisons with human generated diagnoses.
Built high-cost risk stratification logistic regression model using claims and demographics data resulting in 15% improvement in recall over the rule-based methodology.
Planned roadmap, did aggressive backlog pruning and simplified project success metric definition with clinical stakeholders to deliver quick alpha version in record time of two months.

patient ranking systemrandom forest modelsDBT data pipelineICD10 code diagnosislogistic regression modelmachine learning+1

Senior Data Scientist & Scrum Team Lead

Nov 2021 – Jan 2024 · 2 yrs 2 mos

Built an entity recognition PoC pipeline with AWS Comprehend Medical api to tag diseases, symptoms, medications in physician notes with the end goal of using entities for downstream modeling tasks.
Reduced production pipeline failures by 20% by using workarounds for data quality challenges by
writing DBT data tests.
Designed and implemented A/B test experiment to measure algorithm prioritized clinical review
effectiveness over appointment/rule-based methodology. The 20% lift in new diagnoses in the highest
category paved the way for release in all markets.

entity recognitionAWS Comprehend MedicalA/B test experimentnatural language processingdata engineering

Sugarcrm (through node.io acquistion)

Senior Data Scientist

Aug 2020 – Oct 2021 · 1 yr 2 mos · Cupertino, California, United States

Improved model prediction quality by 12% measured via event conversion by monitoring ML/AI models in production. Measured drift in training versus production data distributions by using metrics like Jensen Shannon divergence for categorical features and Kolmogorov-Smirnov test for numerical features.
Improved end user adoption of predictions by 15% measured via increased user clicks by implementing model and individual prediction interpretability. Used Shapely values and clustering techniques to categorize features to facilitate interpretability. Leveraged this further to do weighted distance metrics that provided dynamic and interpretable similar entities to end users.
Increased percentage of SaaS clients eligible for data science usecases by 12% by identifying common data patterns/issues across different clients using python and SQL. Generalized data cleaning & labeling of data across heterogeneous client datasets. This helped in smarter feature engineering for our models.

model prediction qualityML/AI modelsdata cleaningmachine learningdata analysis

Node.io

Data Scientist

Jul 2019 – Aug 2020 · 1 yr 1 mo · San Francisco Bay Area

● Accomplished stability in model performance for small datasets by
leveraging statistical techniques. The stability test helped qualify models
that withstood small changes in data. This helped foster confidence in pilot
users of the software.
● Assisted sales team by rapidly executing customer PoCs for various use
cases such as churn reduction, lead ranking, free trial conversion prediction
using multiple ML/AI models. Presented and interpreted model results to
potential clients. Used learnings from PoCs to build functionality into a
generic ml platform for domain users.
● Improved model performance by 5% by extensive error analysis to identify
records where model was making mistakes. Iteratively improved
performance by smarter feature engineering and better metrics for data
quality.

statistical techniquescustomer PoCserror analysisdata sciencemachine learning

Uc berkeley school of information

Graduate Student of Data Science

May 2017 – May 2019 · 2 yrs

Machine learning at Scale- Algorithm design in parallel computing (Using SPARK)
Natural Language Processing with Deep Learning(Using Tensorflow)
Experiments and Causality - Statistical Analysis with Experimentation and design based inference(Using R)
Fundamentals of Data Engineering- Data Storage and Retrieval(SQLs, Tableau)
Applied Machine Learning - Feature Engineering and Algorithms(Scikit learn libraries)
Statistics for Data Science - descriptive and inferential statistics(Using R)
Behind the Data:Human and Values(Legal, policy and Ethical implications of data)
Research Design and Application
Python for Data Science

machine learningstatistical analysisdata storagedata engineering

Silicon tech lab/ uc berkeley

Analytics Lead

Apr 2013 – Jan 2017 · 3 yrs 9 mos

Conducted requirements analysis for reports and dashboards. Extensive client interaction in understanding user requirements; developed reports for user demo. Participated in defining dimensions and facts for star schema. Helped implement dashboards and migration to production.

requirements analysisdashboard implementationbusiness analysisdata engineering

Infosys

Software Programmer

Jan 2007 – Jan 2012 · 5 yrs

Key contributor on client server development project with responsibilities to write detailed specifications, develop test plan and coding for individual modules.
Designed for data model compatibility to support migration from legacy to new system by analyzing data schemas for both systems. Led a six-member onsite team for the design, development, testing and implementation of project SYNCH, a project designed to synchronize data from DB2 on IBM mainframe & MS Access to Oracle on UNIX.

client server developmentdata model compatibilitysoftware developmentdata engineering