Sunil Kumar

AI Researcher

Hyderabad, Telangana, India4 yrs 9 mos experience
AI EnabledAI ML Practitioner

Key Highlights

  • Expertise in healthcare data science and engineering.
  • Developed high-accuracy NLP models generating significant revenue.
  • Led successful cloud migration projects enhancing data accessibility.
Stackforce AI infers this person is a Healthcare Data Scientist with strong expertise in NLP and cloud technologies.

Contact

Skills

Core Skills

Natural Language Processing (nlp)Named Entity Recognition (ner)Question AnsweringLarge Language Models (llm)Amazon Web Services (aws)Machine LearningData Quality AssuranceData EngineeringSoftware Development

Other Skills

Text ClassificationspaCyTransformer ModelsFeature EngineeringPython (Programming Language)HL7MySQLApache SparkDSABERT (Language Model)Microsoft SQL ServerScalaSnowflakeAWS Command Line Interface (CLI)XGBoost

About

Experienced Data Scientist with two years of experience in the US Healthcare sector, focusing on data-driven insights from complex datasets. My expertise combines data science and data engineering, with specialties in semantic mapping, cloud data migration, HL7 data parsing, and quality assurance. Engineered a semantic mapping solution to align medical field names with standardized clinical model columns, leveraging advanced techniques like Sentence BERT and Cosine Similarity. This improved data consistency and contextual mapping. Leading a legacy data migration project, I transitioned systems to a hybrid cloud environment with Snowflake and AWS S3. This shift boosted data processing efficiency and accessibility, with Snowflake outperforming traditional Spark SQL jobs. While my background has a strong focus on healthcare, I am eager to explore new challenges and industries where my expertise can drive innovation and growth. If you'd like to discuss data science, data engineering, or new opportunities, let's connect.

Experience

4 yrs 9 mos
Total Experience
1 yr 11 mos
Average Tenure
11 mos
Current Experience

Apple

Machine Learning Engineer - ICT3

Jul 2025Present · 11 mos · Hyderabad, Telangana, India · Hybrid

Hilabs

4 roles

Senior Data Scientist

Jan 2025Jun 2025 · 5 mos · Pune District, Maharashtra, India

Data Scientist II

Jan 2024Feb 2025 · 1 yr 1 mo · Pune District, Maharashtra, India

  • MediText Insight Engine: Developed a Bio Clinical BERT model integrated with a Conditional Random Field (CRF) layer for Named Entity Recognition, achieving F1 scores between 0.864 and 0.938. Engineered a multi-head classification model, yielding 99.9% accuracy in predicting contextual relevance and boosting operational productivity by 15%, generating over $2 million in annual revenue.
  • QA-Driven Clinical Code Extraction Suite: Formulated a question-answer transformer model that improved data accuracy by 18% to 98% in capturing complex medical regimens. This led to a 25% reduction in claim processing time and generated over $1.5 million in annual revenue for health insurance companies.
  • Clinical Note Intelligence (CNI): Spearheaded a summarization platform using state-of-the-art LLMs, achieving notable improvements in ROUGE and BLEU scores.
Natural Language Processing (NLP)Named Entity Recognition (NER)Text ClassificationQuestion AnsweringspaCyTransformer Models+1

Data Scientist

Promoted

Jun 2022Dec 2023 · 1 yr 6 mos · Pune District, Maharashtra, India

  • AI-Powered Clinical Data Mapping Solution
  • Developed a machine learning model to map non-standard clinical data, reducing manual mapping efforts. Achieved 81.2% accuracy with an ensemble of Random Forest & LightGBM model
  • Utilized BERT, Cosine Similarity and TF-IDF for feature extraction, enhancing model accuracy by 6.7% to 81.2%. Evaluated with micro-averaged precision, recall, F1-score & macro-averaged metrics
  • Engineered a multi-label medical entity classifier using LSTM & GRU, achieving 99.5% precision, Validated with Jaccard similarity and confusion matrix, ensuring robust clinical data mapping.
  • Delivered one-time savings of $1 million and recurring annual savings of $200,000 by automating the mapping of non-standard clinical files.
  • Legacy Data Migration & Cloud Transformation
  • Led legacy project migration to Snowflake for data reading and AWS S3 for writing, optimizing data accessibility.
  • Transformed SQL jobs to leverage Snowflake's efficiency, surpassing Spark SQL performance.
  • Ensured uninterrupted data processing in a hybrid cloud setup, merging Snowflake, AWS S3, and Hadoop capabilities.
  • HL7 Data Parsing and Quality Assurance
  • Orchestrated seamless transformation of diverse healthcare data from CSV to HL7 format, creating standardized schemas for clinical data entities.
  • Achieved 10x parsing speed enhancement (400 HL7 messages/min) by optimizing data structures.
  • Executed stringent data quality checks to ensure high-quality data transmission to downstream processes.
Amazon Web Services (AWS)Feature EngineeringPython (Programming Language)HL7MySQLApache Spark+10

Data Scientist

Jan 2022Feb 2022 · 1 mo · Pune District, Maharashtra, India

  • Spearheaded data engineering and processing within a complex healthcare dataset, implementing robust feature engineering techniques to transform and cleanse intricate patient data.
  • Conducted in-depth analysis of diverse coding systems (diagnosis code, procedure code), devising and applying tailored regular expressions. Successfully resolved data inconsistencies and improved data quality by deciphering and standardizing disparate coding patterns.
  • Executed comprehensive data profiling to identify and rectify anomalies, enhancing the dataset's reliability and usability for subsequent analysis and modeling purposes.
Python (Programming Language)Regular ExpressionsMachine LearningSQLScalaData Engineering

Detect technologies

Application Engineer

May 2021Jul 2021 · 2 mos · Chennai, Tamil Nadu, India

  • File Monitoring Scheduler
  • Orchestrated the development of a file monitoring scheduler, conducting regular checks on specified file extensions at 30-minute intervals. Implemented fail notifications to a configurable user list, ensuring timely awareness and action.
  • Video Player Enhancement
  • Elevated the functionality of the video player by integrating features for a drone-captured video clip. Leveraged Python's PyQt5 library to create a dynamic view showcasing real-time drone height and direction data. Enhanced video snapshots to include essential details: drone's flying height, company branding, and a directional compass reflecting the drone's movement.
  • Applied data from a provided CSV file containing drone flight specifics (height, direction, timestamps) to enrich the video player's snapshot capabilities, augmenting user experience and adding valuable contextual information.
  • Collaborated with a multidisciplinary team to seamlessly integrate enhancements into the existing video player framework, demonstrating proficiency in both software development and UI/UX design.
Python (Programming Language)PyQt5Object-Oriented Programming (OOP)Simple Mail Transfer Protocol (SMTP)Software Development

Shoocal

Data Scientist

Mar 2021May 2021 · 2 mos · Jaipur, Rajasthan, India

  • Leveraged data analysis techniques to optimize sales strategies within a specialized restaurant and hotel service provider.
  • Developed and implemented recommendation algorithms, including apriori algorithm, market basket analysis models.
  • Utilized order and menu data to drive insights into user preferences, enhancing the app's food recommendation system.
  • Collaborated cross-functionally to integrate algorithmic solutions, resulting in a measurable increase in sales and customer engagement.

Shaastra, iit madras

Coordinator

May 2019Apr 2020 · 11 mos · Greater Chennai Area

Education

Indian Institute of Technology, Madras

Bachelor of Technology - BTech — Civil Engineering

Jan 2018Jan 2022

Stackforce found 100+ more professionals with Natural Language Processing (nlp) & Named Entity Recognition (ner)

Explore similar profiles based on matching skills and experience