Nitin Singh — AI Researcher
AI Systems Performance Engineer specializing in deep learning compilers and accelerator efficiency. I build infrastructure that enables large-scale models to train and execute efficiently on modern hardware. At the PyTorch framework level, I focus on graph optimization, dynamic shape handling, and operator clustering to improve performance of LLM workloads. I also work across distributed training and inference, debugging and extending collective communication backends to strengthen FSDP correctness, scalability, and multi-node performance. At the kernel layer, I design CUTLASS-SYCL compute primitives for Intel GPUs, implementing low-precision and quantized kernels optimized for memory bandwidth, compute utilization, and architectural characteristics. Earlier in my career, I built production C++ systems and real-time ML pipelines for Brain-Computer Interface applications, developing strong foundations in signal processing, linear algebra, and performance-critical systems engineering.
Stackforce AI infers this person is a specialized AI Infrastructure Engineer with a focus on deep learning and BCI applications.
Location: Bengaluru, Karnataka, India
Experience: 6 yrs 3 mos
Skills
- Gpu Programming
- Deep Learning
- C++
- Machine Learning
- Data Science
Career Highlights
- Expert in GPU programming and deep learning optimization.
- Led development of real-time BCI data processing SDK.
- Proven track record in enhancing AI model performance.
Work Experience
Intel Corporation
AI Software Solutions Engineer (2 yrs 6 mos)
Nexstem
C++ Software Engineer, R&D Lead (2 yrs)
Machine Learning Specialist (1 yr)
Larsen And Toubro Construction
Graduate Engineer Trainee (9 mos)
Education
M. Tech at National Institute of Technology Goa
B. Tech at National Institute of Technology Goa