Anamika Sinha

AI Researcher

San Mateo, California, United States17 yrs 3 mos experience
AI ML PractitionerAI Enabled

Key Highlights

  • Reduced troubleshooting effort by 50% in data quality.
  • Generated $4.2 million savings through patient ranking system.
  • Improved model prediction quality by 12% in production.
Stackforce AI infers this person is a Data Science expert in Healthcare and SaaS industries.

Contact

Skills

Core Skills

Data EngineeringMachine LearningData AnalysisNatural Language ProcessingData ScienceBusiness AnalysisSoftware Development

Other Skills

A/B test experimentAWS Comprehend MedicalAgile MethodologiesAlgorithmsBusiness IntelligenceBusiness ProcessBusiness Process AnalysisClient Co-ordinationDBT data pipelineDatabasesICD10 code diagnosisIntegrationML/AI modelsProject RetrospectionsPython (Programming Language)

About

Data scientist with hands on knowledge of machine learning , natural language processing using neural networks, big data storage as well as experiment design. With a Masters in Data Science from Berkeley combined with experience in data analysis, business systems analysis and programming in transactional as well as data warehousing analytics, I bring a rich skill set to gain deep data insights rooted in statistical concepts. Languages: R, Python, SQL Competencies: Machine learning, cloud computing (AWS, Google Cloud), advanced statistics, SPARK, Git/Github, LINUX command line, natural language processing, Tableau, Scikit learn libraries Databases: Oracle, Hive, Postgres, HDFS, DB2, graph database Neo4j DeepLearning: Feed Forward Neural Network, Convolutional Neural Network(CNN), Recurrent Neural Network (RNN) Tools: Tableau, Git, , Hadoop, Hadoop Streaming, SPARK distributed computing, TensorFlow, Keras Select Data Science Projects (For more projects and code, please refer to github): Data engineering project on quality of care for Medicare patients - Built a pipeline to access publicly available data bout hospital performance, loaded into a HDFS data lake, created ER diagram and transformed data to use SQLs to answer key questions. Image recognition (deep learning) Kaggle project-Used TensorFlow to build a convolutional neural network for detecting facial keypoints to understand key principles of deep learning. Transfer learning with sentiment analysis(NLP) - Applied natural language processing techniques to understand model transferability from a source domain in order to optimize performance when a small amount of labeled data is available in the target domain (used Amazon review dataset). Adverse drug reactions chatbot (adriabot.com)- Facilitating easy Information retrieval about adverse drug reactions from the biomedical literature and presenting it to a patient in a format that is easy to use and easy to interpret.

Experience

17 yrs 3 mos
Total Experience
2 yrs 8 mos
Average Tenure
11 mos
Current Experience

Tendo

Principle Data Scientist Consultant

Jul 2025Present · 11 mos

  • Designed and implemented a scalable data quality framework to assess and quantify the daily
  • quality of customer data ingested via databricks pipeline. This reduced troubleshooting effort by
  • the Customer Support team by 50%.
  • Working on an agent to decrease the clinical algorithm development time by half with human in
  • the loop evaluations.
data quality frameworkdatabrickscustomer datadata engineeringmachine learning

Agilon health

2 roles

Lead Data Scientist

Promoted

Jan 2024Mar 2025 · 1 yr 2 mos

  • Designed and led the patient ranking system solution to suggest patients with undocumented conditions to clinical reviewers. Partnered with the clinical team for rapid feature engineering and deployed two step random forest models. The pilot generated savings of $4.2 million and paved the way for a wider implementation.
  • Evaluated prompt engineering techniques on open-sourced healthcare large language models. This
  • helped assess out of the box llm performance for missing patient diagnosis prediction task.
  • Built DBT data pipeline, designed evaluation metrics and created a dashboard to integrate and evaluate an off the shelf AI product for ICD10 code diagnosis predictions. Complexity involved was in designing the pipeline to handle different comparisons involving common data rows that simplified and enabled fair comparisons with human generated diagnoses.
  • Built high-cost risk stratification logistic regression model using claims and demographics data resulting in 15% improvement in recall over the rule-based methodology.
  • Planned roadmap, did aggressive backlog pruning and simplified project success metric definition with clinical stakeholders to deliver quick alpha version in record time of two months.
patient ranking systemrandom forest modelsDBT data pipelineICD10 code diagnosislogistic regression modelmachine learning+1

Senior Data Scientist & Scrum Team Lead

Nov 2021Jan 2024 · 2 yrs 2 mos

  • Built an entity recognition PoC pipeline with AWS Comprehend Medical api to tag diseases, symptoms, medications in physician notes with the end goal of using entities for downstream modeling tasks.
  • Reduced production pipeline failures by 20% by using workarounds for data quality challenges by
  • writing DBT data tests.
  • Designed and implemented A/B test experiment to measure algorithm prioritized clinical review
  • effectiveness over appointment/rule-based methodology. The 20% lift in new diagnoses in the highest
  • category paved the way for release in all markets.
entity recognitionAWS Comprehend MedicalA/B test experimentnatural language processingdata engineering

Sugarcrm (through node.io acquistion)

Senior Data Scientist

Aug 2020Oct 2021 · 1 yr 2 mos · Cupertino, California, United States

  • Improved model prediction quality by 12% measured via event conversion by monitoring ML/AI models in production. Measured drift in training versus production data distributions by using metrics like Jensen Shannon divergence for categorical features and Kolmogorov-Smirnov test for numerical features.
  • Improved end user adoption of predictions by 15% measured via increased user clicks by implementing model and individual prediction interpretability. Used Shapely values and clustering techniques to categorize features to facilitate interpretability. Leveraged this further to do weighted distance metrics that provided dynamic and interpretable similar entities to end users.
  • Increased percentage of SaaS clients eligible for data science usecases by 12% by identifying common data patterns/issues across different clients using python and SQL. Generalized data cleaning & labeling of data across heterogeneous client datasets. This helped in smarter feature engineering for our models.
model prediction qualityML/AI modelsdata cleaningmachine learningdata analysis

Node.io

Data Scientist

Jul 2019Aug 2020 · 1 yr 1 mo · San Francisco Bay Area

  • ● Accomplished stability in model performance for small datasets by
  • leveraging statistical techniques. The stability test helped qualify models
  • that withstood small changes in data. This helped foster confidence in pilot
  • users of the software.
  • ● Assisted sales team by rapidly executing customer PoCs for various use
  • cases such as churn reduction, lead ranking, free trial conversion prediction
  • using multiple ML/AI models. Presented and interpreted model results to
  • potential clients. Used learnings from PoCs to build functionality into a
  • generic ml platform for domain users.
  • ● Improved model performance by 5% by extensive error analysis to identify
  • records where model was making mistakes. Iteratively improved
  • performance by smarter feature engineering and better metrics for data
  • quality.
statistical techniquescustomer PoCserror analysisdata sciencemachine learning

Uc berkeley school of information

Graduate Student of Data Science

May 2017May 2019 · 2 yrs

  • Machine learning at Scale- Algorithm design in parallel computing (Using SPARK)
  • Natural Language Processing with Deep Learning(Using Tensorflow)
  • Experiments and Causality - Statistical Analysis with Experimentation and design based inference(Using R)
  • Fundamentals of Data Engineering- Data Storage and Retrieval(SQLs, Tableau)
  • Applied Machine Learning - Feature Engineering and Algorithms(Scikit learn libraries)
  • Statistics for Data Science - descriptive and inferential statistics(Using R)
  • Behind the Data:Human and Values(Legal, policy and Ethical implications of data)
  • Research Design and Application
  • Python for Data Science
machine learningstatistical analysisdata storagedata engineering

Silicon tech lab/ uc berkeley

Analytics Lead

Apr 2013Jan 2017 · 3 yrs 9 mos

  • Conducted requirements analysis for reports and dashboards. Extensive client interaction in understanding user requirements; developed reports for user demo. Participated in defining dimensions and facts for star schema. Helped implement dashboards and migration to production.
requirements analysisdashboard implementationbusiness analysisdata engineering

Infosys

Software Programmer

Jan 2007Jan 2012 · 5 yrs

  • Key contributor on client server development project with responsibilities to write detailed specifications, develop test plan and coding for individual modules.
  • Designed for data model compatibility to support migration from legacy to new system by analyzing data schemas for both systems. Led a six-member onsite team for the design, development, testing and implementation of project SYNCH, a project designed to synchronize data from DB2 on IBM mainframe & MS Access to Oracle on UNIX.
client server developmentdata model compatibilitysoftware developmentdata engineering

Education

UC Berkeley School of Information

Master's degree — Data Science

Jan 2017Jan 2019

BIT Sindri

Bachelor’s Degree — Electrical Engineering

Stackforce found 100+ more professionals with Data Engineering & Machine Learning

Explore similar profiles based on matching skills and experience