Shangwu Yao

Senior Software Engineer

New York, New York, United States7 yrs 7 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

Expert in performance optimization for Deep Learning.
Led significant neural network optimizations at Apple.
Contributed to Scikit-learn with impactful metrics.

Stackforce AI infers this person is a Deep Learning and Performance Optimization expert in the AR/VR and Machine Learning industries.

Contact

Skills

Core Skills

Performance OptimizationDeep LearningMachine Learning

Other Skills

Computer VisionNeural NetworksASIC AcceleratorMulti-label confusion matrixBenchmarkingLine profilingSpeech RecognitionData AugmentationPyTorchDistributed SystemsBig DataHadoopAmazon Web Services (AWS)PythonJava

About

I am a software engineer specialized in performance optimization, GPU Compute and Deep Learning. Skills: Performance optimization for Deep Learning and GPU compute kernels, hardware software co-designing. Toolchain support: GPU compiler and driver, and ML accelerator compiler.

Experience

7 yrs 7 mos

Total Experience

2 yrs 6 mos

Average Tenure

5 yrs 8 mos

Current Experience

Waymo

Senior Software Engineer

Sep 2020 – Present · 5 yrs 8 mos · Mountain View, California, United States

Apple

Software Performance Engineer

Feb 2019 – Sep 2020 · 1 yr 7 mos · Sunnyvale, California

AR/VR related performance analysis and optimization for computer vision pipeline and deep learning model
Led the performance optimization of neural network on ASIC accelerator

Performance optimizationDeep LearningComputer VisionNeural NetworksASIC Accelerator

Scikit-learn (machine learning in python)

Contributor

Jun 2018 – Aug 2018 · 2 mos · Remote

Contributed a new metric to Scikit-learn: multi-label confusion matrix, and conducted thorough tests (codecov 99.18%)
Conducted benchmarking and line profiling on multi-label confusion matrix, reduced runtime by 81.8%
Maintained warning message suppression code, suppressed 2239 expected warnings in testing

Multi-label confusion matrixBenchmarkingLine profilingMachine Learning

Carnegie mellon university

Independent Study on Speech Recognition

Apr 2018 – Aug 2018 · 4 mos

Under guidance of Prof. Bhiksha Raj, used deep learning for speech recognition problem.
Developed an attention-based encoder-decoder model and a recurrent network trained with CTCLoss, used curriculum learning and beam search to improve results
Adopted MFCC and implemented Vocal Tract Length Perturbation as a method of data augmentation in speech recognition
Implemented weight-dropped LSTM which uses DropConnect on hidden-to-hidden weights and variational dropout on the input
Achieved 3x speedup by reducing data transfer between CPU and GPU and replacing iterations with high level indexing

Deep LearningSpeech RecognitionData Augmentation