Neeraj Mishra

Co-Founder

Noida, Uttar Pradesh, India11 yrs 6 mos experience

Highly Stable

Key Highlights

Led development of real-time ML solutions impacting millions.
Expert in MLOps and machine learning algorithms.
Designed scalable data applications for diverse industries.

Stackforce AI infers this person is a Machine Learning Engineer with expertise in SaaS and E-commerce solutions.

Contact

Skills

Core Skills

Machine LearningMlopsData CatalogingData ScienceSoftware DevelopmentData Analysis

Other Skills

AWS SagemakerLocustZeroMQLit-serveDockerLMDBPySparkAirflowDatabricksReDashFAISSDNNRAG architectureSparkTelemetry

About

Building intelligent data applications which are robust, reliable and maintainable. Lead team to create ML based solutions deployed in distributed environment to generate scores for millions of customers by crunching billions of transactions. Built real time inference engine having capabilities to serve 50 requests per second, involving components like Vector DB and memory based Cache powering solutions like Image Search. Have ~12 years of experience in creating solutions by collaborating with multiple stakeholders to understand business requirements and develop bespoke solutions by using skills developed in statistics, neural networks, optimisation and probabilistic graphical models. Deep understanding of machine learning algorithms and standard software practices for eﬃcient MLOps. Below are my technical skills. • Coding: Python, SOLID, Data Structures, Algorithms. • Science: Recommendation System, Supervised and Unsupervised Learning, Representational Learning, Deep Learning, Reinforcement Learning, Natural Language Processing, Bayesian Learning, Probabilistic Graphical Models, LLMs, RAGs • Optimisation: First Order Methods, Second Order Methods. • Statistics: Monte Carlo methods, A/B testing. • Libraries / Framework: PySpark, Pandas, Numpy.

Experience

11 yrs 6 mos

Total Experience

1 yr 11 mos

Average Tenure

Current Experience

Nykaa

Principal MLE

Jun 2023 – Jul 2025 · 2 yrs 1 mo · Gurugram, Haryana, India · On-site

Real Time Inference Engine: On top of AWS Sagemaker, created platform to easily deploy models for real time inference. Custom modules were created for perf testing (Locust), logging (ZeroMQ), real time batching (Lit-serve), containerisation (Docker), caching (LMDB), etc. Capabilities like Image Search and Vector Search (FAISS) are powered by this platform, having a yearly impact of more than ₹ 100 Millions.
MLOPs: Evangelise best practices among dats scientists. Developed capabilities to better monitor Airflow pipelines. Lead activities like airflow migration (engaging with DevOps), PySpark Optimisation (engaging within team using Databricks Notebooks) (cost came down from ~15000 DBUs to less than 2000 DBUs). Sessions to share developments within team.
ML Platform: Collaborated with Data Science and Analytics team to build a robust, reliable and maintainable feature store platform. In-house Telemetry system was developed to monitor pipelines errors and feature drifts. This was built on top of Databricks, using PySpark for feature calculations, Airflow DAGs (MWAA) for scheduling and ReDash for creating Dashboards. Options given to user for adding alerts.
Catalog Enrichment: Models to predict fashion apparel product attributes and generate product description. DNN based model (FashionCLIP) is trained to identify product attributes like pattern, occasion, color, etc. RAG architecture was used to generate product description from LLM. It lead to 30% improvement on TAT for product to go live on website.

AWS SagemakerLocustZeroMQLit-serveDockerLMDB+6

Rakuten

Staff Data Scientist

Oct 2022 – Jun 2023 · 8 mos · Bengaluru, Karnataka, India · Hybrid

Designed a platform to help users easily identify required tables and columns storing information of Rakuten’s more than 80 businesses. It solves Data Cataloging problems.

Data Cataloging

Dunnhumby

Lead Data Scientist

Feb 2019 – Oct 2022 · 3 yrs 8 mos · Gurgaon, Haryana, India

Lead team in multiple projects requiring diverse skills ranging from identifying custom science algorithm to optimising a Spark solution to designing architecture for science platform. Our work was to create a personalised shopping experience for millions of retail customer across globe. Below are major projects.
Nucleus: Designed architecture of a central repository for data scientists. It allowed team to implement different stages of Machine Learning pipeline with standard APIs in their project. Lead the team to ensure CI is integrated and used Sphinx to create documentation. This ensured better collaboration, faster time to deployment and maintainable codes.
Telemetry: Platform for model monitoring in production. Used combination of parametric and non parametric statistics like PSI, Chebysev Inequality, KS scores, etc to identify thresholds for data, important features and model’s performance. Triggers were activated when incoming samples breaches threshold. Workflow was created for trigger life cycle management and logging of root cause analysis.
Brand Detection: Using product description, of 6-7 words on average, identify its hierarchy and brand. Tweaked Naive Bayes and Jaccard Similarity to create scores for n-grams. Weights were learnt using cross entropy loss and Newton’s optimisation.
New Product Experimenter: Recommendation engine to identify customers who have higher propensity to purchase newly launched products. Billions of records were processed to identify likelihood of millions of customers using PySpark. Science was powered by a combination of NLP and Random Forest. A/B testing proved that this engine was two times (approx.) more efficient than earlier targeting methods.

Machine LearningSparkTelemetryA/B testingData Science

Delhivery

Data Scientist

Jun 2017 – Dec 2018 · 1 yr 6 mos · Gurugram, Haryana, India

Return to Origin (RTO): A model to predict whether a customer will return a product. Used Spark cluster (AWS EMR) to process data and Spark-ML library to build and analyse models like Random Forest, GBT, etc. Then this model was deployed into production after converting it into PMML to make it platform independent.
Dimension Prediction: Using product description and its price, predict its volume. To extract features out of text data, diﬀerent techniques were used like n-grams, Tf-Idf, multilayer LSTMs & GRUs and auto-encoder architecture. Then, these features were used along with other to make final prediction. Here diﬀerent regression models were analysed like Random Forest, SVR, feed forward network for regression, etc. To build RNN models, Tensorflow and GPU (AWS, P2 series) was used.

SparkPMMLTensorflowAWS EMRMachine Learning

Mista bazaar

CTO and Co-founder

Jun 2015 – Apr 2017 · 1 yr 10 mos

Designed & developed e-commerce website and native android apps for customers & logistic team. Backend was built from scratch using Turbogears web-framework (REST Apis’). Postgres was used as database, SQLAlchemy as a bridge between backend and DB. Frontend was built using libraries like Bootstrap, Jquery, AngularJs 2 and having material design concepts. Used ELK stack for making dashboards. Simple analytics were embedded into website and apps to observe customer behaviour, drive sales and optimise network.

TurbogearsPostgresSQLAlchemyBootstrapJqueryAngularJs 2+1

Telecom regulatory authority of india(trai)

Research Associate

Aug 2013 – May 2015 · 1 yr 9 mos · New Delhi Area, India

Analysis of UCC complaints: TRAI gets huge number of complaints from Telecom users across India, regarding unwanted sms sent to them. Task was to design an algorithm to identify epicentres from where spamming is done. K-shingling clustering model was build from scratch, using python, to group similar sms together. Developed web portal (python back-end, Postgres DB) to analyse and update regulatory reports. This portal was used by TRAI, TSPs’, bank and insurance people.

PythonPostgresData Analysis

Chubu university

Intern

May 2012 – Jul 2012 · 2 mos · Nagoya, Aichi, Japan

Segmented the foreground when the video was taken from a moving camera. Proposed an algorithm which tries to modify the codebook method for background segmentation. Research Paper for the same has also been published in PreMi'13 International Conference.