Monalisa Singh

Machine Learning Engineer

Bengaluru, Karnataka, India4 yrs 10 mos experience
AI EnabledAI ML Practitioner

Key Highlights

  • Expert in Automatic Speech Recognition and Speaker Diarization.
  • Proven track record in deploying scalable speech solutions.
  • Strong collaboration skills in cross-functional teams.
Stackforce AI infers this person is a Speech Machine Learning Engineer specializing in scalable speech solutions for consumer electronics and AI applications.

Contact

Skills

Core Skills

Automatic Speech Recognition (asr)Speaker DiarizationText To SpeechVoice PersonalizationSound RecognitionSpeech Emotion RecognitionMachine LearningDatabase Management

Other Skills

ASRAge and Gender ClassificationAudio AugmentationCookieDBVisualizerDeep LearningDockerFastAPIFeature AveragingFeature EngineeringFeature ExtractionFeature SelectionFew Shot LearningIndicWav2VecMATLAB

About

Innovative and results-driven Speech Machine Learning Engineer with over 4 years of hands-on experience designing, developing, and deploying scalable, real-world speech solutions. Specializing in Automatic Speech Recognition (ASR), speaker diarization, emotion and gender recognition, and audio event detection, I have contributed to end-to-end ML pipelines for consumer electronics, enterprise call centers, and multilingual applications. At companies like Samsung Research and Contiinex, I’ve successfully delivered production-grade models for critical applications such as voice personalization (age/gender classification) and sound recognition for smart home devices, showcasing strong ownership and cross-functional collaboration. My technical expertise lies in fine-tuning and optimizing transformer-based models including Whisper, Wav2Vec2, and IndicWav2Vec. I have deep knowledge of speech signal processing, audio feature engineering, and self-supervised learning techniques. I frequently work with toolchains like PyTorch, Hugging Face Transformers, Docker, FastAPI, and GCP, and I’m experienced in deploying ML models as APIs for real-time inference. I’m passionate about building noise-robust, low-latency, and multilingual speech systems, especially for Indic languages. I enjoy solving hard problems in speech using a blend of signal processing intuition and modern deep learning techniques. My work consistently focuses on optimizing model accuracy, inference speed, and real-world usability.

Experience

4 yrs 10 mos
Total Experience
1 yr 7 mos
Average Tenure
1 yr 7 mos
Current Experience

Contiinex.com

Machine Learning Engineer - Speech

Nov 2024Present · 1 yr 7 mos · Bengaluru, Karnataka, India

  • As a ML Engineer with domain knowledge in speech, I have been working here on Speaker Diarization, Automatic Speech Recognition (ASR) and Speech to Text (STT) and Text to Speech (TTS) technologies.
  • Fine-tuned IndicWav2Vec ASR models on customer-specific audio data, significantly improving transcription and keyword accuracy for Indian languages.
  • Increased accuracy of Language Identification (LID) by experimenting various META and Whisper Models with feature averaging.
  • Built US English speech diarization API using Whisper Turbo, PyAnnote, FastAPI, and Docker for scalable, speech transcription.
  • Generated high-quality, TTS audio for keyword-specific sentences using OpenAI’s Grok (text processing) and Cookie (neural vocoder) models.
Speaker DiarizationAutomatic Speech Recognition (ASR)Speech to Text (STT)Text to Speech (TTS)IndicWav2VecWhisper+2

Samsung electronics

Engineer

Nov 2022Sep 2024 · 1 yr 10 mos · Delhi, India

  • As a member of the Intelligence Software Team (ISWT), I have contributed to several innovative projects in the domain of speech and sound recognition.
  • Voice Personalization: Age and Gender Classification
  • Conducted data collection from diverse sources, followed by data cleaning and augmentation processes.
  • Performed feature engineering to identify key features that differentiate gender (male vs. female) and age (child vs. adult) for classification tasks.
  • Here, we did experiments with many Machine Learning (SVM, Logistic Regression, Decision Tree)
  • and Deep Learning Models (CNN, ANN).
  • Developed and fine-tuned models for accurate classification, ensuring robust performance in real-world applications.
  • Led the end-to-end speech signal processing and model deployment, successfully integrating the solution into Samsung Family Hub devices.
  • Pet Care and Sound Event Detection Project
  • Developed models to classify specific sounds, including male/female screaming, glass breaking, cat meowing, and dog barking. Here we used MobileNetV2 model and MFCC features for classification.
  • Applied audio augmentation techniques and deep learning to enhance model accuracy and reliability.
Voice PersonalizationAge and Gender ClassificationFeature EngineeringMachine LearningDeep LearningMobileNetV2+2

Businessnext

2 roles

Junior Engineer

Jul 2022Oct 2022 · 3 mos

  • Implemented unit tests for Natural Language Understanding and chatbot systems with Python's unittest, and integrating SonarQube for continuous code quality checks, to ensure that AI flows are accurate, robust, and maintainable.
Natural Language UnderstandingPythonSonarQube

Graduate Engineer Trainee

Apr 2021Jun 2022 · 1 yr 2 mos

  • As a member of the Datanext-AI Team, I have been deeply involved in AI-driven projects, particularly in Speech Signal Processing and advanced learning methodologies. Some key highlights include:
  • Speech Emotion Recognition (SER): Designed and developed a robust system that leverages signal processing techniques to identify emotions from speech data, enabling more intuitive human-computer interaction. This project was pivotal in improving customer experience across voice-based platforms.
  • Zero Shot Learning (ZSL) and Few Shot Learning (FSL): Implemented cutting-edge learning models to enable AI systems to recognize and classify data with minimal labeled examples. These models are crucial for tasks where annotated data is scarce, pushing the boundaries of machine learning in real-world scenarios.
  • ODM for MongoDB and ORM for MySQL: Developed Object Data Mapping (ODM) for MongoDB and Object Relational Mapping (ORM) for MySQL, focused on data validation within the Industry Data Model (IDM) framework. These tools streamline the interaction with databases, ensuring data integrity and consistency across industrial applications.
Speech Emotion RecognitionZero Shot LearningFew Shot LearningMongoDBMySQL

Education

Dr. A.P.J. Abdul Kalam Technical University (AKTU), Lucknow

Bachelor of Technology — Electronics and Communications Engineering

Jan 2016Jan 2020

Lucknow Public College, Sahara States

10th and 12th

Jan 2004Jan 2016

Stackforce found 79 more professionals with Automatic Speech Recognition (asr) & Speaker Diarization

Explore similar profiles based on matching skills and experience