S

Shadab Arif A.

Lead ML Engineer

United States7 yrs 10 mos experience
AI EnabledAI ML Practitioner

Key Highlights

  • Led development of AI-powered solutions at Turing.
  • Achieved significant improvements in loan approval rates at Paytm.
  • Expert in building scalable AI systems for diverse industries.
Stackforce AI infers this person is a Fintech and AI specialist with expertise in machine learning and data-driven solutions.

Contact

Skills

Core Skills

Ai DevelopmentConversational AiData ManagementContent CreationMachine LearningData AnalysisRecommendation SystemsCredit Risk ModelingData Pipeline EngineeringComputer VisionDigital TransformationDeep Learning

Other Skills

LangChainRetrieval-Augmented Generation (RAG)LangGraphConversational MemoryLangfuseDocument IngestionGPT-4Video GenerationAPI DevelopmentFastAPIAzurePostgreSQLUser ProfilingAmazon Elastic MapReduce (EMR)Python (Programming Language)

About

As a Senior Data Scientist with over three years of experience in AI and machine learning, I specialize in building and deploying advanced systems that enhance productivity and decision-making. Currently at Turing, I lead the development of AI-powered solutions, including a cutting-edge RAG-based email assistant for Outlook and a multi-agent travel recommendation system. These initiatives have significantly reduced manual effort, improved response accuracy, and elevated enterprise efficiency through technologies such as LangChain, Azure Kubernetes, and FastAPI. My career has been defined by impactful contributions to AI solutions in leading organizations like Paytm, where I developed credit risk models and merchant transaction strategies that improved loan approvals and reduced bad rates. Adept at leveraging tools like XGBoost and designing scalable data pipelines, I am driven to create transparent, high-performing models that solve complex business challenges. My mission is to empower teams and organizations by transforming innovative ideas into actionable, data-driven solutions.

Experience

7 yrs 10 mos
Total Experience
1 yr 3 mos
Average Tenure
--
Current Experience

Rvin

Senior AI Engineer

Sep 2025Mar 2026 · 6 mos · Remote

  • Led the development of multi-turn Arabic support agent with WhatsApp & Instagram integration, supporting PDFs, documents, and Excel files to handle dialect-aware queries for product information and policies (pricing, availability, delivery, refunds) using conversational memory, with Langfuse-based prompt observability and production monitoring.
  • Developed an internal enterprise RAG platform ingesting PDFs, Word documents, and URLs to answer employee queries across HR policies, operational manuals, and compliance knowledge.
  • Built an automated LLM-driven platform converting policy documents into narrated training videos using GPT-4 for scene, quiz, and multilingual narration generation, with PPT slides and Edge TTS voiceover.
LangChainRetrieval-Augmented Generation (RAG)LangGraphAI DevelopmentConversational AI

Digital exchange

Machine Learning Freelancer

Jul 2025Aug 2025 · 1 mo · Remote

  • Built an explainable ML-based offer personalization engine that ranks promotions using user transaction behavior, location signals, merchant affinity, category preference, cashback sensitivity, and temporal patterns.
  • Designed production-ready pipelines and APIs, including cold-start handling, confidence scoring, and match rationale, with seamless backend integration.
  • Delivered a maintainable, model-agnostic codebase (training, inference, evaluation) handed off via GitHub for full internal ownership.

Turing

Machine Learning Engineer

Jun 2024Jun 2025 · 1 yr · United States · Remote

  • Selected among the Top 15 LLM Engineers to work and research on POCs and my project was recognized among the Top 3 which can also be seen in the company portal.
  • APOLLO GLOBAL MANAGEMENT
  • AI-Powered Email Assistant Chatbot for Outlook (Agentic RAG & Conversational AI)
  • Built an advanced RAG-based Outlook assistant leveraging LangChain, LangGraph, FastAPI, and Azure services
  • Used GPT and document retrieval with cosine similarity to generate accurate, context-aware responses
  • Deployed using FastAPI on Azure Kubernetes with Cosmos DB for metadata storage
  • Impact: Achieved \~3s latency, reduced manual drafting effort by 90%, and boosted enterprise productivity
  • Travel Gear Recommendation Agent (Multi-Agent System + RAG)
  • Designed a multi-agent RAG system using Gemini for personalized travel product recommendations
  • Implemented PostgreSQL-backed conversation memory and user profiling
  • Impact: Improved recommendation relevance by 40% and reduced development time by
  • Domain-Specific Model Optimization
  • Fine-tuned LLMs using structured instruction templates in medical, legal, and financial domains
  • Impact: Boosted accuracy and compliance in sensitive, high-stakes environments
  • GOOGLE(AI Research -LLM)
  • LLM Training and Fine-Tuning (SFT + RLHF)
  • Curated high-quality SFT datasets for Google Gemini RLHF pipeline
  • Designed stress test prompts in domains like math, physics, and logical reasoning
  • Impact: Achieved 78% response accuracy via structured fine-tuning and prompt engineering
  • RLHF – Reinforcement Learning from Human Feedback
  • Designed reward models, curated preference datasets, and trained models for safer, stylistically aligned LLM behavior
  • Impact: Improved model alignment with user expectations and real-world applicability
  • LLM Output Evaluation and Quality Assurance
  • Assessed outputs for correctness, coherence, tone, and factuality across varied domains
  • Ensured alignment with user intent and safety protocols

Paytm

Senior Data Scientist

Nov 2022Jun 2024 · 1 yr 7 mos

  • GenAI RAG-based Customer Support Chatbot
  • Built and deployed a LangChain + GPT chatbot for policy, EMI, and payment-related queries.
  • Integrated with vector DB for fast document retrieval and multi-turn conversation support.
  • Deployed via FastAPI on AKS with Cosmos DB for history storage and monitoring via Prometheus + Grafana.
  • Achieved 30% improvement in response accuracy through retrieval and prompt optimization.
  • Merchant Transaction Drop Model
  • Developed a gross merchant transaction drop model using XGBoost, achieving 74% AUC in the out-of-time sample.
  • Credit Risk Scorecards & Loan Underwriting
  • Built new credit risk scorecards for top-up and renewal loans, leveraging stacked models to achieve 75% AUC with improved stability.
  • Boosted model performance by 5% by integrating third-party data such as location-based and SMS-derived features.
  • Designed a rejection strategy that increased loan approvals by 20% while reducing bad rates by 15%.
  • Large-Scale Feature Engineering & Data Pipelines
  • Optimized the merchant transaction pipeline, processing over 2000 features from more than 20 billion records to enhance efficiency and scalability.
  • Engineered over 3000 features from repayment behavior, location data, and app usage, using CSI, IV, and correlation metrics for effective feature selection.
  • Model Deployment & Monitoring
  • Deployed model pipelines using PySpark on Airflow, reducing execution time by 30%.
  • Built a comprehensive SQL dashboard for real-time model monitoring and validation, reducing manual tracking effort by 25%.
  • Designed model monitoring pipelines for three models, tracking PSI, AUC, and bad rates, with results stored in AWS S3 using PySpark.
  • Key Skills
  • Credit Risk Modeling and Scorecards
  • XGBoost, Stacked Models, Feature Engineering
  • PySpark, Airflow, SQL, AWS S3
  • Model Monitoring (PSI, AUC, Bad Rates)
  • Loan Default Prediction and Risk Strategy
  • Large-Scale Data Processing (20B+ Records)
Amazon Elastic MapReduce (EMR)Python (Programming Language)Artificial Intelligence (AI)Generative AIBack-End Web DevelopmentAWS SageMaker+11

Protium

2 roles

Senior Data Scientist

Promoted

Mar 2022Oct 2022 · 7 mos

  • AI & Machine Learning for Credit Risk & Lending
  • Fine-tuned a BERT model to detect EMI bounce occurrences, improving loan processing efficiency by 30%.
  • Developed a loan amount estimation model with 15% MAPE, boosting prediction accuracy.
  • Built credit default prediction models, achieving 76% AUC and increasing productivity by 25%.
  • Designed risk models for loan approvals and collections, improving approvals by 10% and reducing defaults by 3%.
  • Deployed models using FastAPI and optimized collection strategies for better performance.
  • Model Explainability & Deployment
  • Developed explainability frameworks using Shapley values and ALE plots to ensure model transparency.
  • Deployed models via FastAPI, efficiently handling over 10,000 daily requests.
  • Built loan approval models using bureau and banking data, enabling real-time scoring and decile-based segmentation.
  • ETL & Feature Engineering
  • Built ETL pipelines using Spark, Python, and AWS, generating over 10,000 features and reducing production turnaround time by 30%.
  • Engineered more than 3,000 features from bureau, GST, and banking data, automating daily data pipelines.
  • Automation & Data-Driven Insights
  • Automated credit underwriting processes, reducing review time by 200 minutes across three business verticals.
  • Conducted bureau scrub analysis for over 185,000 customers, optimizing sales office locations and loan amounts.
  • Designed a SQL-based data model in dbt, reducing data retrieval time from days to hours.
  • Key Skills
  • Machine Learning (XGBoost, LightGBM, BERT)
  • Credit Risk Modeling and Loan Default Prediction
  • Model Deployment (FastAPI, Docker, CI/CD)
  • Explainability (Shapley, ALE, LIME)
  • ETL and Feature Engineering (Spark, AWS, SQL, dbt)
  • Real-Time Scoring and Decile Segmentation
pytorchPython (Programming Language)KibanaPostgreSQLBack-End Web DevelopmentMachine Learning+12

Data Scientist

Jan 2020Feb 2022 · 2 yrs 1 mo

  • Credit Risk & Consumer Loan Modeling
  • Developed a probability of default model for consumer loans using XGBoost, improving risk prediction accuracy.
  • Created a business rule engine for consumer loans to automate decision-making and enhance loan processing efficiency.
  • Data Pipeline Engineering
  • Built a data pipeline for parsing GST JSON into 10 structured tables, enabling comprehensive financial analysis.
  • Engineered a pipeline to parse bureau XML into 4 tables, improving the processing of credit history data.
  • Developed a system to extract bank statement data into 3 structured tables, allowing detailed transaction analysis for consumer loan underwriting.
  • Data Enrichment & Feature Creation
  • Designed an address matching algorithm for KYC validation, improving data verification accuracy.
  • Built a photo matching algorithm using a three-layer Convolutional Neural Network (CNN) to compare Aadhaar card photos with live photos, enhancing identity verification.
  • Created a wide range of features from bureau data, GST data, ITR data, and bank statements to enrich datasets and improve model training outcomes.
  • Expert Scorecards & Model Deployment
  • Developed expert scorecards using GST, banking, and financial data to enhance creditworthiness assessment.
  • Deployed machine learning models using REST APIs via FastAPI and Flask for seamless integration into production systems.
  • Version Control & CI/CD
  • Managed version control and continuous integration through GitLab, streamlining the model development and deployment process.
  • Key Skills
  • Credit Risk Modeling (Probability of Default, Risk Prediction, XGBoost)
  • Business Rule Engine Development (Policy Implementation)
  • Data Pipeline Engineering (ETL for GST, Bureau, Bank Statements)
  • Feature Engineering (Bureau Data, Financial Data)
  • Model Deployment (REST API, FastAPI)
  • CI/CD and Version Control (GitLab, Deployment Automation)
pytorchSupervised LearningKibanaMachine LearningData VisualizationNatural Language Processing (NLP)+14

Larsen & toubro

Data Analyst

Jul 2019Dec 2019 · 5 mos · Kolkata, West Bengal, India

  • Digital Solutions for Construction Industry 🏗️📊
  • Designed a digital solution for the construction industry 🏢, enabling daily progress reporting 📅, real-time data visualization 📈, and on-site work cost analysis 💰, reducing manual effort by 20% ⏳.
  • Defect Detection with Deep Learning 🤖🏢
  • Introduced a deep learning model 💻 using CNN 🧠 for the construction industry to automate defect detection 🛠️ in building inspections 🏚️. Achieved an AUC of 0.78 🎯, accurately identifying and classifying defects 🔍🔧.
  • Key Skills 🛠️🚀
  • ✅ Digital Transformation & Automation (Process Optimization, Cost Analysis, Reporting) ⚙️💰📊
  • ✅ Real-time Data Visualization (Dashboards, Monitoring, Analytics) 📈📊👀
  • ✅ Deep Learning for Defect Detection (CNN, Image Classification, Inspection Automation) 🤖🛠️🏚️
  • ✅ Computer Vision (Feature Extraction, Object Detection, Quality Control) 🏗️🔍📸
  • ✅ AI-driven Construction Analytics (Predictive Maintenance, Anomaly Detection) 🏢📉🔧
  • ✅ Data Engineering (ETL Pipelines, Structured Data Processing for Construction) 🔄💾🏗️
DashboardsPython (Programming Language)Machine LearningData VisualizationComputer VisionTableau+3

Self-employed

Freelance Tutor – Python & Data Analysis

Sep 2015May 2017 · 1 yr 8 mos · Remote

  • 🎯 Provided one-on-one and group training in Python, data analysis, and machine learning for students and professionals.
  • 📚 Designed and delivered courses on pandas, NumPy, Matplotlib, Seaborn, and Scikit-learn, focusing on real-world applications.
  • 🧠 Mentored learners in data preprocessing, visualization, and statistical analysis, ensuring practical understanding.
  • 🛠️ Conducted hands-on workshops on EDA, feature engineering, and ML models, building a strong foundation for aspiring data scientists.

Education

Indian Institute of Technology, Kharagpur

Master's degree — RCGSIDM

Jan 2017Jan 2019

Haldia Institute of Technology

Bachelor of Technology - BTech — Mechanical Engineering

Aug 2011Aug 2015

Stackforce found 100+ more professionals with Ai Development & Conversational Ai

Explore similar profiles based on matching skills and experience