Chirag S. — AI Researcher

I am actively looking for full time opportunities in Staff/Lead Data Scientist roles in India. I am a seasoned Data Science & Analytics professional with 9 years of full-time work experience in data science & analytics, statistics, and machine learning. With valuable expertise across 5 industries - catastrophe modeling, insurance, travel, healthcare, and semiconductors, my potent mix of technical skills and analytical acumen enables me to transform data into actionable insights. I've built data pipelines processing 40 Billion rows of data and deployed ML models scoring 132 Million records of data at Walgreens Boots Alliance. At Micron, I have deployed priority sorting algorithms and delivered production ready code under tight deadlines and built and deployed powerful Streamlit apps. I hold a Master of Science degree in Operations Research from Northeastern University, USA, and I am currently pursuing a second Master of Science degree in Computational Data Analytics from Georgia Tech, USA, a top-10 globally ranked school in Data Science (QS World Rankings 2023) - graduating in August 2025. *************************************** SKILLS ☑ MLOps - Model Deployment with Azure and Streamlit ☑ Deep Learning - (CNNs, LSTMs, RNNs, BERT, VAE, GAN, Transformers) ☑ Reinforcement Learning - (DQN, MDP, SARSA, Actor-Critic) ☑ Big Data - (SQLite, Spark, Scala, AWS Athena, Docker, GCP, Microsoft Azure Databricks) ☑ Python, PyTorch, PySpark, SQL, R, Tableau, PowerBI, HTML, CSS, D3.js, JavaScript ☑ Data Visualization ☑ Statistics ☑ NLP ☑ A/B Testing, Hypothesis Testing ☑ Complex Problem-Solving ☑ Cross-Functional collaboration *************************************** SELECTED CAREER HIGHLIGHTS ✔ Built and deployed a Credit Card Propensity Tool prediction framework at scale for Walgreens Front of Store Customers in Python in Microsoft Azure Databricks using an XGBOOST multi-class classification model on a dataset of 200,000 customers with an accuracy of 76% across 4 classes, a precision of 85.7%, and a recall of 91.2%. Wrote an efficient data processing PySpark pipeline and successfully scored 132 Million customers using the trained XGBOOST model, and then deployed the model in Azure Databricks. The propensity scores from the deployed model were used to generate new credit card offers for Walgreens customers, who are using the credit card to make purchases at a regular cadence. It has streamlined the credit card acceptance process, making it more efficient and resulted in a 35% increase in Walgreens Credit Card transactions.

Stackforce AI infers this person is a Data Science expert with extensive experience in healthcare and analytics.

Location: Mumbai, Maharashtra, India

Experience: 7 yrs 10 mos

Skills

Machine Learning
Data Science
Data Visualization
Predictive Modeling
Predictive Analytics

Career Highlights

9 years of experience in data science and analytics.
Built data pipelines processing 40 billion rows of data.
Deployed ML models scoring 132 million records at Walgreens.

Work Experience

Micron Technology

Staff Engineer, Data Scientist (9 mos)

Walgreens Boots Alliance

Senior Data Scientist (1 yr 8 mos)

Holland America Line

Senior Marketing Analytics Analyst (6 mos)

Amwins Group

Data Scientist (5 mos)

Aon

US Reinsurance Analytics Senior Analyst (6 mos)

Analyst - Analytics (4 yrs)

Education

Master of Science - MS at Georgia Institute of Technology

Master of Science - MS at Northeastern University

Summer Courses at Stanford University

Bachelor of Engineering - BE at Manipal Institute of Technology

Chirag S.

AI Researcher

Mumbai, Maharashtra, India7 yrs 10 mos experience

AI EnabledAI ML Practitioner

Key Highlights

9 years of experience in data science and analytics.
Built data pipelines processing 40 billion rows of data.
Deployed ML models scoring 132 million records at Walgreens.

Stackforce AI infers this person is a Data Science expert with extensive experience in healthcare and analytics.

Contact

chirag.subramanian@takeda.com LinkedIn

Skills

Core Skills

Machine LearningData ScienceData VisualizationPredictive ModelingPredictive Analytics

Other Skills

AlgorithmsAnalyticsArena Simulation SoftwareArtificial Intelligence (AI)AutoCADAzure DatabricksBig Data AnalyticsCC++Data AnalysisData AnalyticsData MiningDatabasesDeep Reinforcement LearningEngineering

About

Experience

7 yrs 10 mos

Total Experience

1 yr 6 mos

Average Tenure

Current Experience

Micron technology

Staff Engineer, Data Scientist

May 2024 – Feb 2025 · 9 mos · Hyderabad, Telangana, India · Hybrid

I was recruited to the organization as a Staff Engineer, Data Scientist in the Technology & Product Development Group (TPG) of Micron in Hyderabad, India.
✔ Developed an efficient priority ordering algorithm using machine learning and vectorization in Python which made the existing process 100% faster in terms of execution time. Significantly improved space and time complexity of existing Python code.
✔ Implemented logistic regression and deep neural network classifier models to come up with optimized weights for the priority ordering algorithm.
✔ Coordinated with SMEs and stakeholders to come up with a deployable product which was put into production.
✔ Created Gantt Charts and developed project plan for key departmental projects.

Machine LearningData ScienceData MiningStatisticsPython (Programming Language)

Walgreens boots alliance

Senior Data Scientist

May 2022 – Jan 2024 · 1 yr 8 mos · Chicago, Illinois, United States · Hybrid

I was recruited to the Fortune 16 Healthcare organization to contribute towards statistical machine learning, propensity modeling, statistical modeling, data wrangling and reporting to drive marketing strategies for Walgreens Boots Alliance. I worked on cross-portfolio and retail insights projects and I was part of the Customer Science team at Walgreens Boots Alliance.
TECH STACK - PySpark, Python, R, SQL, PowerBI, Microsoft Azure Databricks
SELECTED ACHIEVEMENTS:
✔ Built and deployed a successful Credit Card Propensity Tool prediction framework (propensity model) using XGBOOST (multi-class classification) for Walgreens Front of Store Customers in Python. The model had high accuracy, precision, and recall, and it generated an increase of 35% in Walgreens Credit Card transactions.
✔ Built a Customer Behavioral Progression propensity model for Walgreens Front of Store in Python using XGBOOST for multi-class classification. Used insights from the last decision tree in the XGBOOST ensemble to make recommendations to the business. This was a highly ambiguous project with changing requirements. Explored classification, regression, and constrained optimization based approaches to align with changing business requirements.
✔ Wrote complex and efficient SQL queries processing > 40B records to successfully complete ad hoc requests and fire drill requests from the business, within short timeframes.
✔ Performed K-means clustering in PySpark to efficiently segment a dataset of customers into groups based on customer behavior metrics.
✔ Reviewed SQL, Python and Pyspark code written by junior associates and gave them best practices recommendations.
✔ Mentored and guided junior associates. Managed external consultants.
✔ Acted as the lead interviewer for Associate Data Scientist roles in the company.

Statistical Data AnalysisData MiningPredictive ModelingDatabasesOperations ResearchMachine Learning+11

Holland america line

Senior Marketing Analytics Analyst

Oct 2021 – Apr 2022 · 6 mos · Chicago, Illinois, United States

I was recruited to the organization to contribute towards machine learning, A/B testing, customer segmentation analytics, statistical modeling, data wrangling and reporting to drive marketing strategies for Holland America Line.
SELECTED ACHIEVEMENTS:
✔ Creating interactive dashboards and reports using Tableau.
✔ Automating Excel reports in R.
✔ Applying the k-means() algorithm in R for optimal customer segmentation.
✔ Data wrangling in R using dplyr, data.table, tidyr, tidyverse, and lubridate packages.
✔ Write complex SQL queries fetching data with >8M rows using TOAD for Oracle.
✔ Manipulating data using Pivot Tables in Microsoft Excel.

Predictive ModelingPredictive AnalyticsData ScienceData Visualization

Amwins group

Data Scientist

Mar 2021 – Aug 2021 · 5 mos · Chicago, Illinois, United States

I was recruited to the organization as a Data Scientist at Amwins Specialty Casualty Solutions (ASCS).
SELECTED ACHIEVEMENTS:
✔ My day-to-day is fitting statistical distributions to insurance claims data, and running goodness of fit tests on the data. R packages used – dplyr, data.table, fitdistrplus
✔ Performed principal component analysis (PCA) on a matrix of numeric tf_idf vectors calculated from raw text data, reducing 3214 columns to 100 principal components. R packages used - tidytext, dplyr, data.table, prcomp (function in stats package)
✔ Built a XGBOOST multi-class classification model on top of insurance claims data after PCA and achieved 91% accuracy on test data after 10-fold cross validation and using text mining techniques in R. R packages used - tidytext, dplyr, data.table, xgboost
✔ Queried claims data set of 300,000+ rows using MySQL Workbench and fed query results directly into R using the package RMySQL for further analysis.

Predictive ModelingPredictive AnalyticsData ScienceData Visualization

Aon

2 roles

US Reinsurance Analytics Senior Analyst

Promoted

Sep 2020 – Mar 2021 · 6 mos · Chicago, Illinois, United States

I was brought into the company to leverage Multiple Linear Regression, Boosted Decision Tree, K-Nearest Neighbors, K-Means, Feed Forward Neural Networks, and Multi-Layer Neural Networks in R to provide predictive modeling and machine learning. In this role, I trained and mentored higher management on using R programming.
SELECTED ACHIEVEMENTS:
✔ Wrote complex code in R to automate data preprocessing, data cleaning, predictive modeling and data visualization. R packages used – dplyr, data.table.
✔ Developed predictive models in R on datasets with more than 17 million rows.
✔ Machine Learning algorithms developed in R – generalized additive models, generalized linear models, neural networks and regression spline. R packages used – gam, glm, nnet, neuralnet.
✔ Performed statistical goodness of fit t-tests on claims data after aggregating the data to zipcode resolution using summarize() and group_by(), to compare the difference in means of the modeled data with the observed claims data at a 5% significance level.
✔ Developed more than 1000 graphs and bar charts for different construction types and compared different sets of data after data cleaning and data wrangling. The visualizations I developed in R were important in assessing the accuracy and quality of Impact Forecasting’s catastrophe model for the Florida Loss Commission. R packages used – ggplot2

Predictive ModelingPredictive AnalyticsData ScienceData Visualization

Analyst - Analytics

Sep 2016 – Sep 2020 · 4 yrs · Chicago, Illinois, United States

SELECTED ACHIEVEMENTS:
✔ Spatially simulated the largest dataset in the history of Impact Forecasting’s SCS model - 609 million severe convective storm events, and improved model accuracy by developing a revolutionary 5 km by 5 km grid in R. Executed loss calculation for 10 US states in the tornado model portion and wrote code in R for data wrangling, spatial simulation, data visualization, probabilistic modeling, looping and error handling.
✔ Cleaned, sorted hurricane data, preprocessed data, and performed linear interpolation. Wrote code in R using the k-nearest neighbors’ algorithm to generate simulations of hurricane track and central pressure, which compared well with historical measurements. Wrote code in R for linear interpolation, data cleaning, predictive modeling, simulation and data visualization.
✔ Developed robust predictive models for thunderstorm magnitude by state as a function of path length, path width, latitude, longitude and other parameters using Boosted Decision Tree, Feed Forward Neural Networks, and Multi-Layer Neural Networks in R.
✔ Developed robust predictive models for radius to maximum wind speed (Rmax), maximum wind speed (Vmax), and central pressure (Pc) for hurricane events in the North Atlantic Ocean Basin using Multiple Linear Regression. Developed simulations of Rmax against Delta P which compared well with historical measurements of Rmax.
✔ Collected and analyzed publicly available data from the US Census Bureau and other sources to develop the US insurance industry exposure database; queried data with >1M rows using Microsoft SQL Server.

Statistical Data AnalysisPredictive ModelingPredictive AnalyticsData ScienceData Analytics