Prajyot Gupta

CTO

Santa Clara, California, United States8 yrs 11 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Expert in CPU and GPU architecture design.
  • Researcher in Fully Homomorphic Encryption and ML accelerators.
  • Proven track record in high-performance computing.
Stackforce AI infers this person is a high-performance computing architect specializing in CPU and GPU design.

Contact

Skills

Core Skills

System ArchitectureMicroarchitectureMicroprocessor SystemsFully Homomorphic EncryptionCpu MicroarchitectureReliabilityGpu ArchitectureApproximate Computing

Other Skills

ARM Cortex-MAssembly LanguageAutomated Stimulus GenerationAvailabilityCCPU DesignCUDACache CoherencyChip-Level VerificationComputer ArchitectureConstrained Random Test GenerationData Acquisition SystemDesign VerificationDigital ElectronicsEmbedded Systems

About

I’m a Senior Systems Architect at NVIDIA, driving the design and development of next-generation CPU and GPU architectures. My work focuses on system-level architecture and microarchitecture for NVIDIA’s bleeding-edge compute platforms — spanning Grace-Hopper Superchips, Blackwell GPUs, and the upcoming Vera-Rubin AI GPU subsystems. Previously, I contributed to Snapdragon SoC designs at Qualcomm and was part of AMD’s RTG SoC Architecture group. I hold a Master’s in Computer Engineering from the University of Wisconsin–Madison, where I was a research graduate focusing on Fully Homomorphic Encryption (FHE), CPU/GPU microarchitecture, and approximate computing under Prof. Joshua San Miguel. I’m passionate about pushing the boundaries of high-performance compute, AI acceleration, and scalable system design — and I’m always exploring how architecture and software co-design can shape the future of computing. More: https://www.prajyotgupta.com/

Experience

Nvidia

2 roles

Senior Systems Architect

Jan 2023Present · 3 yrs 2 mos · Santa Clara, California, United States

System ArchitectureMicroarchitectureCPU DesignGPU Design

Systems Architect Intern

May 2022Sep 2022 · 4 mos · Santa Clara, California, United States

  • Worked on reliability, availability and serviceability (RAS) architectures in NVIDIA’s CPU for data center & driver assistance workloads.
  • Developed infrastructure to automate programming-sequence generation from IP Spec using Portable Stimulus Standard (PSS) in Python.
ReliabilityAvailabilityServiceabilityPythonPortable Stimulus Standard

Amd

GPU Architect Intern

Jan 2022May 2022 · 4 mos · Austin, Texas, United States

  • Designed a framework to identify independent frames (snapshots) in graphics & Ray Tracing (RT) workloads for the Full-SoC simulator
  • Developed a flow to run the snapshots in parallel & accumulate results; achieved speed-ups of up to 6x & reduced runtimes for RT workloads from weeks to days.
GraphicsRay TracingFull-SoC SimulatorGPU Architecture

University of wisconsin-madison

2 roles

Graduate Teaching Assistant

Sep 2021Dec 2022 · 1 yr 3 mos · Madison, Wisconsin, United States

  • Currently working as a teaching assistant for ECE 353: Intro to Microprocessor Systems taught by Prof. Joe Krachey.
  • It's a lab-based course that aims to teach both the low-level and high-level working of an ARM Cortex-M microcontroller, starting from assembly language and going up to communication protocols such as UART, I2C and SPI.
Microprocessor SystemsARM Cortex-MAssembly LanguageUARTI2CSPI

Graduate Research Assistant

Jan 2021Dec 2022 · 1 yr 11 mos · Madison, Wisconsin, United States

  • Graduate Research Assistant at STACS lab under Prof. Joshua San Miguel, working on Fully Homomorphic Encryption & CPU microarchitecture research.
  • Researching hardware-based ML accelerators to speed-up Bootstrapping in Fully Homomorphic Encryption (FHE)
  • Developing a tightly coupled accelerator to dynamically predicate hard-to-predict (H2P) branches in hardware without changes in ISA
  • Modeling the accelerator using the gem5 simulator to analyze Performance vs. Quality trade-offs.
Fully Homomorphic EncryptionCPU MicroarchitectureML Acceleratorsgem5 Simulator

Qualcomm

2 roles

Engineer, Snapdragon SoC DV

Promoted

Oct 2018Dec 2020 · 2 yrs 2 mos

  • Worked in SnapDragon Systems Team:
  • Designed test scenarios to verify system use-cases targeted at parallel execution of Qualcomm’s Kryo CPU multicores with all other processors & IPs in the Snapdragon SoC.
  • Responsible for development & deployment of automated stimulus generation flow for design verification using Accellera’s Portable Test and Stimulus Standard (PSS) – a token-based methodology to implement parallel execution of multiprocessors through serial programming and mailbox synchronizations. The flow simplified generation of concurrency scenarios with multiple processors & IPs running in parallel.
  • The designed test plan helped identify bugs related to processor-memory accesses, concurrent sub-system restart, Memory Management Unit (MMU) halt-unhalt, frequency switching and many other flavours.
  • Worked with Architecture team to identify deadlocks/drops/hangs during concurrent traffic from masters to NoCs, DDR & Last Level Cache (LLC).
  • Lead verification of Qualcomm’s QoS Management IP to ensure priority client’s bandwidth requirements and memory controller requirements are met. Revamped the verification flow using PSS.
  • Lead verification of Inter-processor communication controller.
  • Involved in the tape-out of 14 Snapdragon chipsets.
Test ScenariosDesign VerificationAutomated Stimulus GenerationPSSSystem Architecture

Associate Engineer, Snapdragon SoC DV

Jul 2017Sep 2018 · 1 yr 2 mos

  • Snapdragon Mobile Platforms 655 & Gaming-Intensive 730G
  • Studied the design of Qualcomm’s Kryo based ARM CPUs and developed test suite to verify cache coherency.
  • Worked on constrained random test-generation flow with Cadence Perspec tool using SLN language.
  • Finalists in “QBuzz: Makers Challenge” an Inter-Qualcomm Campus event. Designed ‘QRaksha’ – a voice-activated wearable safety device for women using Qualcomm’s IoT chipset.
Cache CoherencyTest Suite DevelopmentConstrained Random Test GenerationSystem Architecture

Cfaed - center for advancing electronics dresden

Guest Researcher

Jan 2017Jul 2017 · 6 mos · Dresden, Germany

  • Research Area: Approximate Computing
  • Bachelors Thesis: “Development of QRS Detection Algorithm and Analysis for Approximate Reconfigurable Computing”
  • Thesis Advisor: Prof. Akash Kumar
  • Developed a system architecture to detect QRS complexes in ECG signals.
  • Introduced approximate reconfigurable adders into the system whose accuracy-energy trade-off can be tuned at run-time and studied the effects on the output quality along with power/area numbers.
  • Post-layout simulations indicate power and area savings with an insignificant loss in output quality.
Approximate ComputingQRS Detection Algorithm

Qualcomm

Interim Engineering Intern

May 2016Jul 2016 · 2 mos · Bengaluru Area, India

  • Worked as Interim Engineering Intern in SoC Verification and Validation Team: Integration Bubble focusing on chip-level verification of GCCs and GPUs present in Qualcomm's MSM Chipsets.
  • Developed Perl Scripts for extraction of clocks for Clock Controllers and GPU architecture to create an executable file to generate the clock waveform in Verdi nWave
  • Simulatation and Analysis of Global Clock Controller Architecture and GPU Clock Controller Architecture using Verdi nWave
  • Winners of Idea Quest: Driver Distraction Detection System
  • Learnt use of DragonBoard™ 410c and Open-Q™ Snapdragon 820 Development Kit, as a prototyping platform, booting Android and Debian OS on it
Chip-Level VerificationPerl ScriptsSimulation

Defence research and development organisation (drdo)

Undergraduate Research Fellow

May 2015Jul 2015 · 2 mos · Delhi Area, India

  • Design and Testing of reconfigurable Data Acquisition System Using Xilinx Spartan®-3e FPGA
  • Designed, implemented and tested Data Acquisition System (DAQ) using the 8 channel multiplexed ADC in conjunction with Xilinx Spartan®-3e FPGA. The DAQ acquired data using analog sensors connected to ADC0808n, soldered on a PCB. The output was plotted in the form of graphs using GUI Processing-2.2. The IDE tool Xilinx ISE 14.5 was used and programming was done in Verilog.
Data Acquisition SystemFPGA

Education

University of Wisconsin-Madison

Master of Science - MS (Research) — Computer Engineering

Jan 2021Dec 2022

Birla Institute of Technology and Science, Pilani

Bachelor’s Degree — Electrical and Electronics Engineering

Jan 2013Jan 2017

Stackforce found 100+ more professionals with System Architecture & Microarchitecture

Explore similar profiles based on matching skills and experience