Kali Uday B.

CTO

Bengaluru, Karnataka, India10 yrs 11 mos experience
AI ML PractitionerAI Enabled

Key Highlights

  • Over a decade of experience in HPC and AI.
  • Expert in performance tuning and kernel optimizations.
  • Led teams in developing cutting-edge AI applications.
Stackforce AI infers this person is a specialist in HPC and AI, focusing on performance optimization and FPGA acceleration.

Contact

Skills

Core Skills

Ai/ml Application DevelopmentPerformance BenchmarkingFpga AccelerationCompression AlgorithmsData CompressionAi/ml AccelerationCnn InferenceVideo EnhancementGpu Acceleration

Other Skills

AIPCAlgorithm DesignApple MetalArtificial Intelligence (AI)Attention MechanismsBERT (Language Model)BioinformaticsCC++C/C++CUDACode OptimizationDPC++Data StructuresDigital Image Processing

About

An experienced High-Performance Computing (HPC & Artificial Intelligence) R&D Engineer spanning a successful career over a decade in various reputed semi-conductor & HPC/AI based organizations. Specialized in application acceleration, performance tuning and kernel optimizations targeting different HPC architectures (Multi-core, Many Core (GPUs) and Spatial (FPGA)). Key Technical Skills: - C/C++, Python, Julia, Rust and Scripting - Parallel Programming (MPI, OpenMP, CUDA, SYCL, OpenCL and Vitis High Level Synthesis) - Expert in building RAG based genetic AI systems on-premise. - ML Frameworks - pyTorch, Tensorflow, JAX, OpenVino, DirectML, DeepSpeed - AI/ML Compiler Development - Triton - Artificial Intelligence, Data Compression, Genomics and Networking - Application tuning, Performance and Power analysis on CPU, GPU and FPGA architectures - HPC Architecture Design and Implementation - Large Scale Cluster & Grid Computing - Algorithm Optimization - Exposed to various analyzers (Intel (VTune, APS, GTPin and Advisor), Nvidia (Nsight Compute, NSys, NvProf), AMD (Vitis, Codexl and vivado) - Experienced in development on Linux and Windows Key Non-Technical Skills: - Possess excellent presentation skills - Mentored engineers at various levels - Lead a team of size ranging from 3 to 7 members - Played a key role in establishing teams on multiple occasions - Demonstrated customer first approach

Experience

Applied materials

Deputy Director Software (HPC/AI)

Aug 2025Present · 7 mos · Bengaluru, Karnataka, India · On-site

Intel corporation

Staff Software Engineer (HPC/AI)

Sep 2022Aug 2025 · 2 yrs 11 mos · Hyderabad, Telangana, India · On-site

  • AI/ML Application Development on Linux/Windows
  • Enable OpenAI Triton Backend for Intel GPUs, Triton (AI/ML) Compiler Development
  • Development of AIPC based GPU/NPU AI/ML applications on Linux/Windows
  • Lead the team of 6+ engineers on OpenVino and AI workloads development targeting GPU/NPUs
  • Enable oneAPI for Julia Programming Language
  • Enable oneAPI for RUST Programming Language
  • Enable ChipStar benchmarks (HeCBench) and customer enablement.
  • Key member in handling a customer project of Intel Analyzers team targeting large scale HPC clusters
  • Performance Benchmarking and Tuning of applications in ML domain targeting Intel GPUs (SYCL/DPC++)
  • Improve quality of Intel GPU software stack (Level Zero / oneAPI)
  • Received multiple recognitions (3 Division Level Recognition Awards, Executive level: CustomerFirst, oneIntel and ResultsDriven) for my contributions on product improvement & pathfinding work.
  • Mentored junior engineers on HPC & AI
  • Hands On: C/C++, SYCL, CUDA, OpenCL, Level Zero, SYCL, JuliaGPU, Apple Metal and RUST
  • AI/ML Frameworks: Triton, PyTorch, OpenVino, TensorFlow, JAX, DirectML
C/C++SYCLCUDAOpenCLLevel ZeroJuliaGPU+10

Xilinx

2 roles

Senior Software Engineer - II

Promoted

May 2019Sep 2022 · 3 yrs 4 mos

  • ~ Architect & implemented highly optimized FPGA accelerated Compression library (OpenCL/HLS) consisting of various well-known algorithms such as GZip, LZ4, ZLIB and Snappy.
  • ~ Main architect of Libz compression library for Cloudera-Hadoop acceleration (Map/Reduce) framework and entire software stack.
  • ~ Migrated various Genomic Pipeline Algorithms from Xilinx SDAccel to Vitis targeting Alveo U50
  • ~ Owner of Network Traffic Generator Simulating Alveo Network Enabled FPGA (X3) architecture which helps in software/hardware emulation.
  • ~ Enabled various key customers in data compression acceleration targeting various discreet FPGA (Alveo SmartSSD, U50 and U250) platforms.
  • ~ Mentored various engineers/interns within and outside team on discreet FPGA acceleration using OpenCL/HLS software stack on Vitis
  • ~ Lead a project to publish FPGA accelerated GZip app on on-premise, docker and AWS for quick customer adoption.
  • ~ Received a customer appreciation and got nominated for an award world-wide on SmartSSD acceleration of LZ4 application delivered in quick time.
  • ~ Successfully conducted GO-PYNQ contest - IIT Kharagpur (Kshitiz)
  • Hands On: C/C++, Python, Scripting, OpenCL and HLS. Experience in AMD Vitis Tool
C/C++PythonOpenCLHLSFPGA AccelerationCompression Algorithms

Senior Software Engineer - I

Jan 2017May 2019 · 2 yrs 4 mos

  • ~ Worked on Xilinx heterogeneous and embedded FPGAs (HLS & OpenCL).
  • ~ Developed high quality SDx (SDAccel & SDSoC) on-boarding applications which show cases new features and best practices for end user of SDx tool.
  • ~ Accelerated data compression algorithms using SDAccel OpenCL targeting Xilinx Pcie FPGA cards
  • ~ Provided solutions to new users on Xilinx SDAccel & SDSoC forums.
  • ~ Successful in building data center adoptable FPGA accelerated data compression applications such as GZip and LZ4. Targeted FPGA cloud AWS F1, Alibaba and Nimbix Cloud
  • Hands On : C/C++, Shell scripting and Xilinx SDx [SDAccel & SDSoC] (C/C++, HLS and OpenCL)
C/C++OpenCLFPGA AccelerationData Compression

Multicoreware

2 roles

Senior Software Engineer

Promoted

May 2015Jan 2017 · 1 yr 8 mos

  • ~ Developed a OpenCL- FPGA acceleration of CNN inference AI workload
  • ~ Worked on-site in China on Google-VP9 decoder acceleration using Renderscript
  • ~ Implemented a demo presenting vehicle detection using FPGAs with Xilinx-SDAccel/OpenCL environment.
  • ~ Worked on Drone-based face classifier using CNNs on Jetson TK1 & K40 (CUDA)
  • ~ Published a research article & presented a poster in High Performance Computing Conference (Hipc-2015)
  • ~ Worked on acceleration of Genome-Sequence alignment us SDAccel (HLS & OpenCL flows).
  • ~ Worked on Optimization of Alexnet model using Imagenet data for Virtex7 FPGA using SDAccel.
  • Hands On : C/C++, OpenCL
C/C++OpenCLAI/ML AccelerationCNN Inference

Software Engineer

May 2013May 2015 · 2 yrs

  • ~ Implemented a Video enhancement application targeting FPGA.
  • ~ Worked on acceleration of VP9 decoder using Renderscript & OpenCL for Mobile GPU.
  • ~ Worked on memory optimizations of few blocks in VP9 decoder (inter, intra and loopfilter)
  • ~ Worked on power optimizations of VP9 mobile decoder
  • ~ Lead a team of 6 members working on ray tracing application acceleration (Sep/2014 - Mar/2015).
  • ~ Designed and implemented multiple GPU kernels and achieved best performance for cycles render.
  • ~ Implemented concurrent OCL kernels which utilizes multiple hardware. (CPU & GPU)
  • ~ Published a research article on medical imaging in IEEE sponsored conference along with an intern.
  • Hands on : C/C++, OpenCL, CUDA, Google-Renderscript and Java
C/C++OpenCLCUDAVideo EnhancementGPU Acceleration

Education

Sri Sathya Sai Institute of Higher Learning

Mtech — Computer Science specialization in High Performance Computing

Jan 2011Jan 2013

Sri Sathya Sai Institute of Higher Learning

Msc — Mathematics and Computer Science

Jun 2009Mar 2011

Stackforce found 100+ more professionals with Ai/ml Application Development & Performance Benchmarking

Explore similar profiles based on matching skills and experience