Nageshwar Singh

Software Engineer

Bengaluru, Karnataka, India10 yrs 4 mos experience
Most Likely To SwitchAI Enabled

Key Highlights

  • Expert in Deep Learning and AI frameworks.
  • Proven track record in optimizing performance for complex algorithms.
  • Strong background in Digital Signal Processing and Embedded Systems.
Stackforce AI infers this person is a highly skilled AI and Deep Learning engineer with a focus on performance optimization.

Contact

Skills

Core Skills

Artificial Intelligence (ai)Deep LearningMachine LearningHigh Performance Computing (hpc)Software Development

Other Skills

DirectML FrameworkQualcomm Neural Networks (QNN)MLOPSONNXGraph Performance OptimizationUnit Test FrameworkLarge Language Models (LLMs)PytorchAutomationBLISBLASAVX2Memory OptimizationDeep Learning InferenceTensor Processing Unit

About

Ambitious, Results-oriented software engineering professional. Continuous learner having great interest in signal processing, processor architecture, Large Language modelling and AI in general. Experience - Development of Deep Learning and Image Processing kernel for performance intensive processors (DSP, Vector Processor) - Development of assembly and intrinsic programming (SIMD, VLIW) - Development of C# and C++ based Desktop Software. (WPF, Crystal Report and SQL) - TI's TDA2xx Board Bring-up with 4 x AR12xx radar module for Multi-mode Radar System. - Development Active Noise Cancellation (ANC) system, specifically on modelling the secondary path on DSP board (TI C6713) and Matlab. Enthusiastic about - Machine Learning and Deep Learning - Algorithms - Digital Signal Processing : Audio, Image and Video processing - Large Language models - Web Technologies

Experience

Qualcomm

Senior Lead Engineer

Nov 2023Present · 2 yrs 4 mos · Bangalore Urban, Karnataka, India · On-site

  • DirectML Framework/Interface development for Qualcomm Hexagon NPU which uses Qualcomm Neural Networks (QNN) library.
  • MLOPS interface support (Conv, Matmul, Normalization, Activations, Pooling etc.)
  • ONNX graph convertor optimizations (Axis Tracking, Layout agnostic op support)
  • Graph Performance optimization (Layout tracking, Graph splitting, tilling, Variable frequency support)
  • Graph to Graph optimization
  • FP16 ONNX Model accuracy improvements.
  • Models tools, ML sample app development/support.
DirectML FrameworkQualcomm Neural Networks (QNN)MLOPSONNXGraph Performance OptimizationArtificial Intelligence (AI)+1

Cerebras systems

Member Of Technical Staff

Jun 2022Nov 2023 · 1 yr 5 mos · Bengaluru, Karnataka, India

  • Software development for unit test framework on all kernels for large language models (LLMs) (Gpt3 tiny and UL2) test in simulator
  • Development of subgraphs for different LLMs in Pytorch and KMIR. Development of testing infra for subgraph.
  • Automation for generation of LLM (Large Language Model) for different compile variants. Reducing the time taken for a job from 1 days to 10 minutes.
Unit Test FrameworkLarge Language Models (LLMs)PytorchAutomationMachine LearningDeep Learning

Amd

Software Engineer II

May 2020Jun 2022 · 2 yrs 1 mo

  • Development of BLIS / BLAS (Basic Linear Algebra Subprograms) open-source library for X86 multicore, multithread processors. (AMD- EPIC based Rome, Milan).
  • General Matrix Multiplication (GEMM) and Linear algebra API development using AVX2 for complex and non-complex datatypes, Vector programming, benchmarking and memory optimization with respect to OPENBLAS and MKL.
  • Development of Math libraries using AVX2, AVX512 for Server CPUs, benchmarking with respect to Glibc and MKL.
BLISBLASAVX2Memory OptimizationHigh Performance Computing (HPC)Software Development

Kpit

Senior Software Engineer

Jun 2016May 2020 · 3 yrs 11 mos · Greater Bengaluru Area · On-site

  •  Deep Learning Inference and Training kernels development on Tensor Processing Unit (GPU like highly vectorized processor), 4-VLIW Optimizing & Multicore Tensor Processing. (Convolution, Activation Functions, Normalization, Bitonic Sort, etc.)
  •  Intrinsic vector programming with C++ reference implementation using native C++ framework for linking the kernels.
  •  Developing Fix-Point (16 bits) and Gemmlowp (8 bits) quantization kernels for improved algorithm performance, developing Unit testing for each algorithm.
  •  Optimizing radar algorithms on VLIW-8 DSP C66xx and Linear assembly optimizations.
  •  Part of the Deep Learning and AI group for pose-detection algorithm CNN.
Deep Learning InferenceTensor Processing UnitC++QuantizationMachine LearningDeep Learning

Manipal institute of technology

Teaching Assistant

Sep 2014Apr 2015 · 7 mos · Manipal, udupi,Karantaka

  • I Helped Under graduate students in Digital Signal Processing and Analog Circuits

Education

Manipal Institute of Technology

Master of Technology (M.Tech.) — Digital Electronics and Advance Communication

Jan 2014Jan 2016

Gujarat Technological University

Bachelor of Engineering (B.E.) — Electronics and Communications Engineering

Jan 2009Jan 2013

Delhi Public School - Ahmedabad

HSC and SSC — Science

Jan 2006Jan 2009

Stackforce found 100+ more professionals with Artificial Intelligence (ai) & Deep Learning

Explore similar profiles based on matching skills and experience