Nimit Nigania

Lead ML Engineer

San Francisco, California, United States14 yrs 8 mos experience
Highly Stable

Key Highlights

  • Expert in GPU optimization and performance engineering.
  • Proven track record in deep learning model deployment.
  • Significant contributions to MLPerf submissions.
Stackforce AI infers this person is a Machine Learning Engineer with a strong focus on GPU optimization and high-performance computing.

Contact

Skills

Core Skills

Machine LearningGpu OptimizationDeep LearningData EngineeringGpu Software EngineeringResearchInternshipSoftware Engineering

Other Skills

Analog Circuit DesignArchitectural simulatorsCC++CUDACadence VirtuosoComputer ArchitectureComputer ScienceDebuggingDigital Circuit DesignElectrical EngineeringEmbedded SystemsEmbedded systemsFPGAFeature engineering

About

I am a Machine Learning Engineer specializing in building and accelerating deep learning models for high-performance production environments. My passion lies in bridging the gap between cutting-edge model development and the underlying hardware, transforming computationally expensive models into efficient, low-latency applications.My experience spans the full ML lifecycle, from designing and training novel architectures in PyTorch and JAX to deploying them at scale. What sets my work apart is a deep expertise in performance engineering and GPU optimization. I have hands-on experience profiling with tools like Nsight, writing custom CUDA kernels for critical performance bottlenecks, and leveraging frameworks like TensorRT and Triton Inference Server to slash inference latency. I'm adept at techniques such as model quantization (INT8/FP8), kernel fusion, and optimizing GPU memory bandwidth to maximize throughput.

Experience

Snap inc.

ML Engineering Lead

Jul 2025Present · 8 mos · Palo Alto, CA · On-site

  • Leading ML GPU optimization effort.
GPU optimizationMachine Learning

Google

Machine Learning Engineer

Feb 2019Jul 2025 · 6 yrs 5 mos · Mountain View, California

  • 2019 - 2022: ML at Google brain. Working on Tensorflow, Pytorch performance on GPUs. Improving ML models like NCF, BERT, resnet50. Official Bert GPU submission to MLPerf / MLCommons. Also wrote some CUDA kernels for optimizations.
  • 2022 - present: Ads Machine Learning (Youtube): Improving model quality to drive conversions and revenue for YouTube Ads. Use new data sources, engineer features, leverage LLMs, and architect current recommendation models to represent users more accurately.
TensorFlowPyTorchCUDAMLPerfFeature engineeringLLMs+2

Apple

GPU Software Engineer

Feb 2014Feb 2019 · 5 yrs · Cupertino

  • GPU modeling.
GPU modelingGPU Software Engineering

Intel corporation

2 roles

Graduate Intern

May 2013Aug 2013 · 3 mos · Portland, Oregon Area

  • Worked with the Xeon Phi group. Added a key feature to the performance model to better utilize the memory system.
Performance modelingMemory system utilizationInternship

Software Engineer

Jul 2011Aug 2012 · 1 yr 1 mo · Bengaluru Area, India

  • Worked with the Many Integrated Core(MIC) group on the performance characterization/debug of future many core processors. Gained more in depth knowledge in computer architecture, simulators and performance analysis of high performance computers.
Performance characterizationDebuggingSoftware Engineering

Georgia institute of technology

Graduate Research Assistant

Aug 2012Jan 2014 · 1 yr 5 mos · Greater Atlanta Area

  • Worked on novel techniques to estimate performance and power of GPU/CPUs by using analytical models and architectural simulators.
Performance estimationArchitectural simulatorsResearch

University of heidelberg

Research Intern

May 2010Jun 2010 · 1 mo · Heidelberg Area, Germany

  • Worked on embedded serializers to be used with an FPGA for the ATLAS experiment at CERN.
Embedded systemsFPGAResearch

Georgia institute of technology

Research Intern

May 2009Jul 2009 · 2 mos · Atlanta,GA

  • Worked with the HPC (high performance computing group) and the computer architecture group in a joint project on the combined benefits of prefetching used in software and hardware together.
High Performance ComputingComputer ArchitectureResearch

Xilinx

Summer Intern

May 2008Jul 2008 · 2 mos · Hyderabad Area, India

  • Worked with the APD (advanced product division) Memory team to come up with a testbench for the DDR2 memory controller by integrating it with a Microblaze processor.
Memory controllerTestbench integrationInternship

Education

Georgia Institute of Technology

Master's degree — Computer Science

Jan 2012Jan 2014

Indian Institute of Technology, Madras

Master's degree — Electrical Engineering

Jan 2010Jan 2011

Indian Institute of Technology, Madras

BTech (Bachelor Of Technology) — Electrical Engineering

Jan 2006Jan 2011

Stackforce found 100+ more professionals with Machine Learning & Gpu Optimization

Explore similar profiles based on matching skills and experience