S

Shaun Mendes

Machine Learning Engineer

New York, New York, United States7 yrs 7 mos experience
AI EnabledAI ML Practitioner

Key Highlights

  • 7+ years of experience in Machine Learning and AI.
  • Expertise in Large Language Models and MLOps practices.
  • Proven track record in optimizing cloud-based ML solutions.
Stackforce AI infers this person is a SaaS-focused Machine Learning Engineer with expertise in AI and MLOps.

Contact

Skills

Core Skills

Natural Language Processing (nlp)Data ScienceData MiningDeep LearningObject-oriented Programming (oop)Large Language Models (llm)MlopsCloud ComputingMachine LearningData Engineering

Other Skills

AWSAcceptance TestingAirflowAmazon Web Services (AWS)Analytical SkillsApache OozieApache SparkArtificial Intelligence (AI)AutomationBERT (Language Model)Big DataC (Programming Language)C++Cascading Style Sheets (CSS)Classification

About

• Machine Lerning Engineer with 7+ years of professional experience in the area of Machine Learning(ML), Deep Learning(DL), Artificial Intelligence (AI), MLOps/ LLMOps, Big Data, Web Development, and Consulting. Persistently incorporating myself with the latest developments, trends, and state-of-the-art practices in Artificial Intelligence (AI) and Technology. • Specializing in cutting-edge technologies like Supervised Learning, Unsupervised Learning, Natural Language Processing (NLP), Natural Language Understanding (NLU), Speech/Audio Processing, Computer Vision (CV), Reinforcement Learning and Conversational AI/Generative AI (GenAI) . My expertise extends to various techniques such as Fine-tuning, Prompt Engineering, and proficiency in handling advanced models like GPT, Llama, Phi and other Large Language Models (LLM), chains, agents, etc. • Hands-on working experience with deep learning technologies (Pytorch, Tensorflow, Keras, RAPIDS, Nvidia CUDA, and cuDNN) while training on huge datasets by employing distributed training and strategies to optimize resource and training time. • Full Stack Machine Learning Engineer with a focus on minimizing model size, memory footprint, resource consumption, training, and inference times. Productionized ML models with cloud deployment (AWS, Azure) at scale while implementing best MLOps practices and managed an in-house Nvidia HPC cluster with 8 Nvidia A100 GPUs. • Designed modular and reproducible pipelines for both training and inference, capable of seamless expansion to accommodate various use cases and facilitate multi-task learning within a unified framework. • Methodically analyzed complex problem statements, develop business-critical metrics for optimal solutions, oversee cloud-based end-to-end deployment with rigorous validation, and deliver results with adaptable code to accommodate feedback. • I am highly ambitious and driven to excel, with a strong emphasis on efficiency. I hold myself to high standards, ensuring that I complete my responsibilities and tasks promptly to maximize productivity. Roles: Applied Scientist, Data Scientist, Machine Learning Engineer, Generative AI, Natural Language Processing Scientist/ Natural Language Processing Engineer, Deep Learning Engineer, Deep Learning Researcher, Machine Learning Researcher, Machine Learning Scientist, Research Scientist, Research Engineer, Applied Research Engineer, AI Engineer, AI Scientist, Big Data Scientist, ML Architect, Data Science Manager

Experience

7 yrs 7 mos
Total Experience
2 yrs 2 mos
Average Tenure
1 yr
Current Experience

Etsy

Machine Learning Engineer II

Jun 2025Present · 1 yr · Brooklyn, New York, United States · Hybrid

Justlabs

Machine Learning Engineer

Feb 2025Jun 2025 · 4 mos · New York, New York, United States · Remote

  • LLM | Audio | Recommendation Systems
Natural Language GenerationNatural Language Processing (NLP)Data ScienceAmazon Web Services (AWS)

Stevens institute of technology

2 roles

Graduate Student Assistant

Sep 2024Dec 2024 · 3 mos · Hoboken, New Jersey, United States · On-site

  • Designed assignments, midterm, and final exam papers for over 200 students with detailed rubrics for Prof. Xueqing Liu's course to guide students on common pitfalls and improvement areas.
  • Conducted weekly office hours to provide personalized assistance, resulting in a 20% improvement in overall class average post-midterm.
MathematicsNumPyData MiningUnsupervised LearningDeep Neural Networks (DNN)Pandas (Software)+3

Course Assistant

Sep 2024Dec 2024 · 3 mos · Hoboken, New Jersey, United States · On-site

  • Supported Prof. Jingyi Sun in grading student assignments and conducting research, ensuring accurate evaluation and contributing to academic studies.
NumPyData MiningUnsupervised LearningObject-Oriented Programming (OOP)Pandas (Software)

Here technologies

2 roles

Data Scientist

May 2024Aug 2024 · 3 mos · Chicago, Illinois, United States · Remote

  • Led a successful Proof of Concept (POC) to assess the effectiveness and interpretability of prompt engineered versus fine-tuned Large Language Models (LLMs) in extracting multilingual geospatial data to enhance the extraction efficiency of place attributes from text. Accelerated timelines for feature engineering, training, and testing of ML, DL and Small Language Models by 60%.
  • Accelerated onboarding of data scientist/engineer by 40% and improved information extraction by 30% by developing an end-to-end AI chatbot with LLMs and Retrieval-Augmented Generation (RAG) on AWS, using FastAPI and ReactJS.
  • Containerized and deployed LLMs (Llama-3, OpenChat) on AWS using Docker to detect, extract and format Hours of Operation from website text, optimizing inference efficiency and achieving 85% extraction accuracy.
Large Language Models (LLM)LinuxGitAutomationTransformersMLflow+19

Senior Data Scientist

Apr 2021Aug 2023 · 2 yrs 4 mos · Mumbai, Maharashtra, India · On-site

  • Expanded HERE Maps global coverage by 17% and saved over $2.5M by leveraging web-crawled data to generate 10M+ high-quality place records. Utilized ML, DL and LLMs to extract key place attributes, including name, category, address, and hours of operation.
  • Identified place websites with an accuracy of 92.5% by creating labeled data using heuristics and unsupervised models (K-Means, DBSCAN) for clustering. This data was employed to train Mixture of Experts models (Random Forest, SVM) for classification.
  • Extracted street addresses, place names and hours of operation from 9 countries by supervised finetuning foundational models such as T5, GPTJ and DeBERTa on Named Entity Recognition (NER) and Semantic Re-Ranking achieving an overall accuracy of 94.3%.
  • Enhanced classification of places across 400+ categories in 6 languages by adapting Transformer(BERT, DeBERTa, XLNet) models to unique regional nuances and improving classification metrics by 7% to 0.88 over previous benchmarks.
  • Achieved 25x cost reduction in generating GPS data by building scalable MLOps pipeline on AWS, using optimized CPU / GPU cloud instances and model compression techniques such as ONNX, Quantization and Knowledge Distillation.
  • Utilized Prompt Engineering on LLMs to extract place hours of operation and validate model outputs, enhancing data reliability.
  • Fine-tuned LLMs like Llama-2 to extract multilingual locale data from website text, achieving 92% extraction accuracy.
  • Improved model development by implementing distributed training on multinode Nvidia HPC DGX A100 GPU cluster and automated MLOps pipeline deployment with GitLab CI/CD, Docker, and AWS CloudFormation/SAM.
  • Created interactive dashboards using Streamlit and ReactJS for real-time project monitoring, reducing issue detection time by 40% and enhancing stakeholder visibility and decision-making.
Amazon Web Services (AWS)Large Language Models (LLM)GitProof of ConceptAutomationTransformers+35

Fractal

Machine Learning Engineer

Aug 2017Apr 2021 · 3 yrs 8 mos · Mumbai Area, India · On-site

  • Working as a Deep Learning and Big Data (AI) Engineer, developing Fractal Analytic's AI solutions and capabilities with scaling for structured and unstructured data using Machine Learning and Cloud infrastructure.
  • Reported operational risks in client’s critical business functions by developing a scalable ETL solution for processing terabytes of clickstream data with PySpark deployed on AWS EMR, utilizing Jenkins and Oozie for periodic data refresh.
  • Worked on Audio Based Recommendation System using Nvidia Rapids and Tensorflow. Used audio features based on the Mel Spectrogram to recommended songs based on Location Sensitive hashing.
  • Pretrained and fine-tuned a Speech-to-Text model(Wav2Vec2) on a dataset comprising 16,000 hours of Indian English audio and improved transcription accuracy by 27%. Integrated speaker identification and text-to-speech capabilities.
  • Optimized end-to-end data audit, dashboarding and product mapping processes for client sales data by engineering heuristic and machine learning models optimizing delivery timelines by 60% and saving over $500k in operatinal costs.
  • Worked on AI Demand Forecasting [CPG] client project using Time Series Forecasting and Analysis along with other ML models. Ensembled and stacked multiple linear, additive, gradient boosting, and time series models. Parallelized multiple existing XGBoost models and reduced training turnaround time by 75%.
  • Single-handedly spearheaded the built of a scalable web utility for KPI dashboarding & automation that lets you oversee every aspect of your process in real-time for a [CGP] client.
  • Automation client reports using VBA.
  • Engineered a selenium solution on company tool reducing workload by 400hrs.
  • Cleared AI Engineer Career Track to be designated as Engineer at Fractal Analytics.
  • Completed Data Science Boot Camp conducted by Manipal Prolearn.
HTMLAmazon Web Services (AWS)JenkinsLinuxGitProof of Concept+42

Education

Stevens Institute of Technology

Master's degree — Machine Learning

University of Mumbai

Bachelor of Engineering — Electronics and Telecommunication

Stackforce found 100+ more professionals with Natural Language Processing (nlp) & Data Science

Explore similar profiles based on matching skills and experience