Abhay Gupta

Product Manager

San Jose, California, United States10 yrs 9 mos experience
Most Likely To SwitchAI Enabled

Key Highlights

  • 8+ years in GPU architecture and performance analysis
  • Expert in machine learning acceleration and architectural innovation
  • Proven success in cross-functional collaboration and silicon bring-up
Stackforce AI infers this person is a GPU architecture expert specializing in machine learning and performance optimization.

Contact

Skills

Core Skills

Machine LearningPerformance TuningHardware ArchitecturePerformance AnalysisRay Tracing

Other Skills

AlgorithmsArchitectural ModelingArtificial Intelligence (AI)CCUDAComputer ArchitectureData StructuresDeep LearningFacebook Business SDKGraphicsHigh Performance Computing (HPC)LinuxMatlabNatural Language Processing (NLP)OpenGL ES

About

Performance- and architecture-focused engineer with 8+ years of experience in GPU architecture, machine learning acceleration, and simulator-based performance analysis. Proven expertise in designing and evaluating hardware features for ML and graphics workloads using C++/Python simulators and cycle-accurate models. Demonstrated success in cross-functional collaboration, silicon bring-up, compiler-hardware co-design, and patent-worthy architectural innovation. Adept at bridging research and product through deep insight into power, performance, and area (PPA) tradeoffs

Experience

Qualcomm

GPU Architecture Modeling Engineer

Apr 2023Present · 2 yrs 11 mos · Santa Clara, California, United States · On-site

  • Taped-out efficient work scheduling mechanisms to improve cache behavior in Snapdragon GPU IP
  • Evaluate shader and memory path efficiency of various configurations of paths to memory using arch model
  • Evaluate performance of ML & graphics kernels in assembly to find inefficiencies and optimize for performance
  • Lead the team’s efforts to model and correlate Matrix Multiply Accelerator and Ray Tracing and Traversal Accelerator; proposed numerous HW features; perf analysis and perf projection of arch ideas.
Machine LearningPerformance Tuning

Meta

Graphics Architect at Reality Labs Research

Mar 2022Apr 2023 · 1 yr 1 mo · Sunnyvale, California, United States

  • Working in Reality Labs Research on Codec Avatars, train Deep ML models for realistic face and full-body avatars. Used conditional variational encoder and differentiable rendering to train a model to render avatars
  • Implement custom low power hardware for NN acceleration.
  • Neural networks for Graphical applications. Research new architecture for AR/VR applications.
Architectural ModelingScriptingHardware ArchitectureRTL DesignGraphicsComputer Architecture+3

Samsung sarc | acl

3 roles

Senior GPU Architecture and Modeling Engineer

Promoted

Jun 2020Mar 2022 · 1 yr 9 mos

  • Taped out Opportunistic write-back discard of single-use vector register values in AMD RDNA GPU IP.
  • Primarily working on shader and texture; Evaluate and implement new architectural features, performance modeling and debug on C++ clock driven simulator; Identify issues with the compiler.
  • Developed tools to identify hotspots in GPU programs, identify canned patterns in programs, identify opportunities for hardware acceleration, manually schedule instructions with automated register allocation, extract dynamic counts of instructions, project performance with new instructions.
  • Proposed HW changes to reduce traffic, power and improve performance. Provide performance estimates by prototyping in arch model and provide power & area estimates. Perform ISA updates.
Architectural ModelingScriptingParallel ProgrammingHardware ArchitectureRTL DesignVerilog+7

Senior GPU Performance Engineer

Mar 2020Jun 2020 · 3 mos

  • Performance projection for future architectures using Speed of Light models; Developed multithreaded python framework to project GPU performance by exhaustively evaluating different combinations of throughput of units inside the GPU
  • Debug benchmarks in C++ clock driven model to improve performance.
  • Proposed driver, compiler, and hardware changes to improve performance in benchmark by 5.5%
ScriptingRTL DesignGraphicsShadingComputer ArchitectureParallel Processing+3

GPU performance engineer

Sep 2017Mar 2020 · 2 yrs 6 mos

  • Implemented HSR in event based functional model in C++, performance modeling and debug using C++ clock driven model. Performance projection using throughput model
  • Developed performance models by understanding architecture and micro architecture specifications; Design and implement directed tests; Validate performance against RTL
  • Performance debug of tests and benchmarks on GPU top; Functional debugging for hardware bringup; Python scripts for automation
RTL DesignGraphicsComputer ArchitectureParallel ProcessingHigh Performance Computing (HPC)Performance Tuning+1

University of florida

Ray Tracing on FPGA - Graduate Research Assistant

May 2016Aug 2017 · 1 yr 3 mos · Gainesville, Florida Area

  • Developed a FPGA based Ray tracing Accelerator
  • Logic design of hardware for accelerating Ray Triangle Intersection(RTI) tests using mixed implementation of RTL and hard floating point IP on Arria 10. Used HLS for prototyping
  • The code was designed to be scalable, uses temporal multithreading to increase pipeline utilization and to overcome memory bandwidth limitations. Integrated the accelerator with a software Ray Tracer PBRT
  • Implemented RTI using AVX2, got a 10x speedup over single core
  • The hardware performed 1.3 billion RTI tests/sec, giving a speedup of 50x over single core Xeon
ScriptingHardware ArchitectureRTL DesignGraphicsComputer ArchitectureRay Tracing+2

Capillary technologies

Software Development Engineer

Jul 2014Jul 2015 · 1 yr · Bangalore

  • • Developed framework for automated sever deployment in Amazon AWS using python Fabric. Worked on python and Java based graphical web and server application.

Pranavi devices

Freelance Product Developer

May 2013Jul 2013 · 2 mos · New Delhi Area, India

  • • Designed and developed an Atmega 128 controlled Fire Alarm and interfaced it with GSM module, landline, audio message recording and playback using PWM, SD card using SPI, RTC using I2C.

Hindustan aeronautics limited

Intern

May 2012Jul 2012 · 2 mos · Bangalore

  • Developed a system to remotely monitor and control engine test beds
  • LabView based PC software to communicate with the PLC, program for AC-31 Programmable Logic Controller to remotely control the AS-800 AC drive used in Engine test beds

Education

University of Florida

Master of Science (M.S.) — Electrical and Computer Engineering

Jan 2015Jun 2017

Indian Institute of Technology, Roorkee

B.Tech in Electrical Engineering & M.Tech in Power Electronics — Electrical and Electronics Engineering

Jan 2009Jan 2014

Smt. Sulochanadevi Singhania School

12th

Jan 2007Jan 2009

Barnes School Deolali

10th

Jan 2004Jan 2007

Stackforce found 100+ more professionals with Machine Learning & Performance Tuning

Explore similar profiles based on matching skills and experience