Chinmay Kulkarni

Software Engineer

Bengaluru, Karnataka, India2 yrs 10 mos experience

Highly Stable

Key Highlights

Awarded Quarterly Spotlight Award thrice for high performance.
Presented at AMD internal conferences, winning Best Poster award.
Specializes in optimizing deep learning models for hardware efficiency.

Stackforce AI infers this person is a Deep Learning Engineer specializing in performance optimization within the SaaS industry.

Contact

Skills

Core Skills

Deep LearningPerformance OptimizationMachine LearningModel OptimizationImage ProcessingData Analysis

Other Skills

C++PythonPyTorchTensorFlowPretrainingKerasData ScienceResearch ProjectsNatural Language Processing (NLP)Computer VisionMathematicsQuantum ComputingR (Programming Language)Data StructuresTabla

About

I am a Software Development Engineer specializing in bridging the gap between deep learning architectures and hardware efficiency. At AMD, I focus on the performance optimization of Large Language Models (LLMs) and Recommendation Systems on EPYC™ CPUs. My work involves dissecting PyTorch/TensorFlow internals, performing ML graph surgery, and developing custom C++ operators to maximize throughput. I have presented my team's work at two internal conferences, in 2024 and 2025, securing the runners-up award and Best poster Award respectively and recognition from senior management. I also have been awarded the Quarterly Spotlight Award thrice for High Performance. Beyond systems engineering, on personal front, I spend my free time exploring multilingual pre-training techniques, specifically investigating how tokenizer-free designs can help represent morphologically rich languages. I am currently experimenting with hierarchical transformer models (inspired by CANINE and ByT5) to explore efficient character-level generation for morphologically rich languages like Sanskrit and other Indic languages. I have completed my Bachelor's degree in Computer Science from PES University, where I have acquired skills in Machine Learning, Python, Deep Learning, and a bit of Quantum Computing. During my bachelors, I have co-authored publications on Visual Transformers for Video QA, Autoencoder-based Anomaly Detection and Weighted loss functions (FLAIRS-23, IPMU-22, iiWAS-21)

Experience

2 yrs 10 mos

Total Experience

2 yrs 10 mos

Average Tenure

2 yrs 10 mos

Current Experience

Amd

3 roles

Software Development Engineer 2

Promoted

Dec 2024 – Present · 1 yr 5 mos

Quarterly Spotlight Award for High Performance — Q4, 2025
Spearheaded the integration of performance-critical kernels (matmul, embedding bag, quantized
matmul) from a new library backend for deep learning inference on AMD Zen architecture,
establishing a modular and maintainable integration blueprint.
Presented the team's work at an internal conference and won the Best Poster award and recognition from senior management.
Working on performance optimization of large language models via HuggingFace through PyTorch’s
NN Module representation of deep learning models and the Torch compile flow for AMD EPYC CPUs.
Responsible for the development of optimized Python classes for LLMs and runtime replacement of
native Pytorch classes with their optimized counterparts.
Working on the development of custom operators in C++/Python using TORCH Library to enhance
the performance of the models
Quarterly Spotlight Award for High Performance — Q1, 2025

C++PythonPyTorchTensorFlowDeep LearningPerformance Optimization

Software Development Engineer 1

Jun 2023 – Nov 2024 · 1 yr 5 mos

Presented the team's work at an internal conference, AMD Asia Technical Conference 2024, securing the runners-up award and recognition from senior management.
Quarterly Spotlight Award for High Performance — Q2, 2024
Working on performance optimization of deep learning recommender models and large language models via PyTorch's graph representation of deep learning models. (https://pytorch.org/docs/stable/fx.html)
Responsible for various graph optimizations and operator fusions for more efficient and performant execution of deep learning models.
Enhancing the performance of the models on the Python frontend and the C++ backend.

PythonC++PyTorchDeep LearningPerformance Optimization

Machine Learning Intern

Jan 2023 – May 2023 · 4 mos

Quarterly Spotlight Award for High Performance — Q2, 2023
Part of the team that presented a paper in the AMD ASIA TECHNICAL CONFERENCE -2023
Worked on ML Graph surgery and optimization using the TensorFlow graph representation of machine learning models. (https://www.tensorflow.org/api_docs/python/tf/Graph)
Worked on developing an internal tool for model optimization at the graph level for performant execution and achieved up to 20% better throughput on various CNN-based models.
Frequently performed accuracy and performance benchmarking for over 25 machine-learning models and worked on automating the same

PythonTensorFlowMachine LearningModel Optimization

Pesu venture labs

Research Intern

Nov 2021 – Jan 2022 · 2 mos · Bengaluru, Karnataka, India

Worked on a product that tracks the posture of the person.
Experimented with several popular architectures that are being used in Image Processing using Transfer Learning.
Attempted to find a novel approach similar to 1-shot learning to tackle the problem of scarce data.

Image ProcessingTensorFlowMachine Learning

Center for data sciences and applied machine learning (cdsaml)

Research Intern

May 2020 – Jul 2021 · 1 yr 2 mos · Bengaluru, Karnataka, India

Given a dataset from a bank about fraudulent and anomalous transactions with normal transactions, we tried to find what parameters made the transactions fraudulent and find them.
We tried to generalize it so that we could avoid any future anomalous transactions if the parameters of the transaction matched the parameters we found.

Image ProcessingTensorFlowData AnalysisMachine Learning