Siddharth Sharma

Lead ML Engineer

Sunnyvale, California, United States8 yrs 11 mos experience

Most Likely To SwitchAI ML Practitioner

Key Highlights

Expert in Natural Language Processing and Machine Learning.
Proven track record in developing advanced ML models.
Strong academic background with a focus on Machine Learning.

Stackforce AI infers this person is a Machine Learning Engineer with expertise in NLP and Computer Vision.

Contact

Skills

Core Skills

Natural Language Processing (nlp)Deep LearningNatural Language UnderstandingMachine LearningData AnalysisComputer VisionData Integration

Other Skills

AlgorithmsApache FlumeApache SparkArtificial Intelligence (AI)CC++Data MiningData StructuresGPT-4Google BardHTMLHadoopInformation RetrievalJavaLaTeX

About

Working as a Machine Learning Engineer on Natural Language Understanding. I have 18 months of internship experience in Machine Learning/Data Science. My MSE-CS was concentrating on Machine Learning at JHU.

Experience

8 yrs 11 mos

Total Experience

1 yr 9 mos

Average Tenure

4 yrs 5 mos

Current Experience

Google

Software Engineer Machine Learning

Dec 2021 – Present · 4 yrs 5 mos · Mountain View, California, United States

YouTube Ads ML

MathematicsDeep LearningGPT-4Natural Language Processing (NLP)Recommender SystemsGoogle Bard+5

Ushur, inc

Machine Learning Engineer

Sep 2018 – Nov 2021 · 3 yrs 2 mos · San Francisco Bay Area

I work on Natural Language Understanding.

MathematicsDeep LearningGPT-4Natural Language Processing (NLP)Scikit-LearnInformation Retrieval+4

The johns hopkins university

Graduate Research Assistant

Feb 2018 – Aug 2018 · 6 mos · Baltimore, Maryland Area

Simulated High-Performance Computing (HPC) systems with natural and artificially injected faults using The Structural Simulation Toolkit (SST).
Created a framework using python to perform node-based and task-based reliability analysis on logs generated by simulated HPC systems. This analysis is independent of Network Structure.
Built a Support Vector Machine based classifier to identify artificial fault injection. Weibull and Log-Normal lifetime models were used to parameterize the reliability curves.

MathematicsDeep LearningScikit-LearnInformation RetrievalPyTorchMachine Learning+1

Amazon lab126

Applied Scientist Intern

Sep 2017 – Jan 2018 · 4 mos · Sunnyvale

Simulated human annotators using Bayesian modeling to create synthetic annotated data for Speaker Identification (SID) system.
Used Unsupervised Label Refinement (ULR) methods (like Dawid Skene) and showed that these methods work better than Majority Voting for SID annotation.
Evaluated human annotator's False Acceptance Rate (FAR) and False Recognition Rate (FRR) for speaker identification by created ground truth data of varying difficulty.
Showed that the current annotation process was unacceptable even if we use ULR on labels from multiple annotators to reduce errors.
Showed that metadata of test utterance and enrolled utterance did not have enough signal to judge annotation difficulty.
Created training and testing data to evaluate domain classifier.

MathematicsLarge Language Models (LLM)Machine Learning

Center for bioengineering innovation and design at johns hopkins university

Computer Vision Intern

Jun 2017 – Jul 2017 · 1 mo · Baltimore, Maryland

Created training data for localization and detection of Zika virus-carrying mosquito species for a custom built trap.
Used Faster Region-based Convolutional Neural Networks (R-CNN) to localize the mosquito species and Residual Neural Network (ResNet) to detect mosquito species with 81% accuracy.
The results of this work helped in getting funding for this project.

MathematicsScikit-LearnPyTorchComputer Vision

Inria

Research Intern

Nov 2015 – May 2016 · 6 mos · Lille Area, France

Integrated clinical and -omics data of Acute Myeloid Leukemia (AML) from The Cancer Genome Atlas (TCGA) Multimodal Representation Learning.
Used Regularized Generalized Canonical Correlation Analysis (RGCCA) and Sparse Generalized Canonical Correlation Analysis (SGCCA) for creating multimodal representation and selecting important genes.
Suggested design matrices for RGCCA by using graphical methods from MixOmics package and by considering biological results already provided by The Cancer Genome Atlas (TCGA) consortium.
Reproduced results of Lasso-based models of data integration and a paper on Survival Analysis.

MathematicsScikit-LearnData Integration

Arcelormittal

Summer Internship

Jun 2014 – Jul 2014 · 1 mo · Kazakhstan

Windows Server 2012
Project Mentor: Alexandr Chsherbov
Installed server 2012 and maintained 50 clients.
Activated services like authentication, mailing, active directory service, internet access, remote client management and group policy management.