S

Surbhi Kakarya

Software Engineer

Canada6 yrs 8 mos experience
AI EnabledAI ML Practitioner

Key Highlights

  • Expert in GPU virtualization and Linux kernel debugging.
  • Led development of AI security solutions at Edera.
  • Significantly improved AI training performance at AMD.
Stackforce AI infers this person is a highly skilled software engineer specializing in GPU virtualization and AI security solutions.

Contact

Skills

Core Skills

Gpu VirtualizationAi WorkloadsLinux KernelEmbedded SystemsSoftware DevelopmentSecurity Software

Other Skills

Nvidia GPUKubernetesXen PVRustGPU drivershypervisorPCIe SR-IOVVMware vSphereLinux kernel debuggingQEMU-KVMSR-IOVVM live migrationAI training performanceEmbedded LinuxQT/QML

About

With a focus on enhancing AMD's hardware virtualization solutions, our team has been pivotal in integrating SRIOV technology into cloud services like Azure and AWS. Through this, we have enabled efficient sharing of PCIe hardware resources among virtual machines, significantly enhancing system performance and resource utilization. My expertise in Linux kernel debugging has played a crucial role in developing robust virtualization support across platforms. Leveraging skills in C and QT, I have previously contributed to the communication layer and user interface for electric vehicle charging stations, showcasing a versatile skill set in software development for diverse applications.

Experience

Edera

Staff Software Engineer

Mar 2025Present · 1 yr · Ontario, Canada

  • Lead efforts to enable AMD and Nvidia GPUs in Edera AI Protect in Xen PV and PVH configuration!!
  • Design and architecting Edera product to support Nvidia GPU in K8s .
  • Hands on experience in building and running AI workloads in non-k8s and k8s experience using Nvidia GPU operator, Nvidia Device Plugin, Cuda driver/runtme and Nvidia open source driver.
  • Edera Protect AI is built to secure AI workloads, with a particular emphasis on GPU-accelerated tasks. Its architecture isolates GPU drivers and workloads into separate zones, preventing vulnerabilities in device drivers from compromising the host kernel or other workloads.
  • Unlike traditional container runtimes that rely on OS namespaces, Edera Protect AI treats each container as a virtual machine guest, delivering isolation at the hypervisor level using a type-1 microkernel hypervisor based on Xen, paired with a memory-safe Rust control plane.
Nvidia GPUKubernetesAI workloadsXen PVRustGPU Virtualization+1

Amd

Senior Software Development Engineer

Mar 2020Mar 2025 · 5 yrs · Canada

  • Developed and maintained AMD MI Series Hardware Accelerator supporting GPU virtualization solutions, enabling PCIe Single Root I/O Virtualization (SR-IOV) to allow multiple virtual machines (VMs) to share a single physical GPU resource.
  • Developed and optimized GPU virtualization features like Device Groups and Device Virtualization
  • Extension for AMD MI200 Series Accelerators on VMware vSphere 8.0. This software development effort enhanced scalability and helped them increase ESXi host from 400 to 1000, and integrated VM live migration, improving operational efficiency and workload flexibility.
  • Published the software stack and driver installation guide for both internal and external stakeholders, facilitating seamless implementation after extensive verification.
  • Contributed on enhanced virtualization features like XNACK and Compute Partitioning for AMD MI300 and MI325 Accelerator for Amazon EC2(AMS) platform and Alibaba Cloud which is based QEMU-KVM hypervisor and improved the AI training world performance by 6.8x.
  • Triage critical bugs, debugged and led them to resolution, leveraging strong linux kernel debugging skills and submitting GIT patches for code-review and merging them into the remote repository to achieve milestone releases.
  • Mentored junior engineers, providing guidance and knowledge sharing to meet project goals.
  • Linux Open-Source patches submitted and approved by open-source community to the kernel module (drm/amdgpu)-->https://patchwork.freedesktop.org/project/amd-xorg-ddx/list/?submitter=19763&state=*&q=&archive=both&delegate=
GPU virtualizationPCIe SR-IOVVMware vSphereLinux kernel debuggingQEMU-KVMGPU Virtualization+1

Chargepoint

Software Embedded Developer

Feb 2018Jan 2020 · 1 yr 11 mos · Gurgaon, Haryana, India

  • Leveraging familiarity with Embedded Linux, contributed to the development of an EV charging software platform, designed to enable organizations—ranging from automakers to workplaces and fleets—to maximize the efficiency and benefits of their EV charging operations while providing an optimal driver experience
  • Collaborated with a team to design and implement a scalable abstraction layer, enhancing software reusability across multiple products. This solution integrated the backend with a unified frontend, reducing user interface latency and improving the overall customer experience, while contributing to the timely delivery of major features and releases.
  • Migrated from a Flash-based UI framework to a more robust QT/QML-based user interface, enhancing performance and maintainability and deployed it worldwide in more than 25000 charging stations.
  • Led the development of Qt-based application for automotive infotainment systems, improving UI performance responsiveness.
Embedded LinuxQT/QMLEV charging softwareUI performanceEmbedded SystemsSoftware Development

Mcafee

Software Development Engineer

Sep 2015Feb 2017 · 1 yr 5 mos · Gurgaon, Haryana, India

  • Contributed on McAfee's Endpoint Detection and Response (EDR) security product, focusing on automation, adaptability, and continuous monitoring for effective threat prevention.
  • Developed a Linux-based EDR tracing system for behavioral monitoring of endpoints processes.
  • Captured key system events (file operations, process lifecycle, and genealogy) and enforced security policies, generating reports for cloud/ePO integration and threat remediation.
  • McAfee Embedded Control (MEC) on Android Devices
  • Collaborated with the Linux team to deliver McAfee Embedded Control (MEC) software for Intel’s SOFIA smartphone Android chip.
  • Added support for ARM and x86 Android SDK toolchains, enabling cross-compilation of MEC software.
Linux-based EDRBehavioral monitoringAutomationSecurity Software

Intel corporation

Graphic Software Engineer

Jul 2013Sep 2015 · 2 yrs 2 mos · Bengaluru Area, India

  • Contributed as Android Display Driver Debug Engineer (Intel i915 Driver) that acted as bridge between developers and validation engineers.
  • Utilized various debug tools to gather detailed information, accelerating the resolution of critical issues and ensuring timely closure of JIRA ticket that accelerate the closure of critical tickets.
  • Android Board Initialization and Setup Activities.
Android Display DriverDebug toolsJIRASoftware Development

Consilium unified communications

Intern

Jan 2013Jun 2013 · 5 mos · Gurgaon, India

  • 1. Written utility in C# with Microsoft Visual Studio 2010 and Microsoft SQL Server to enhance the productivity of Unified Communication as a Service(UCaaS) which provides enterprise level communication solutions and provides multi-vendor communication support to customers.
  • 2. Developed Test Cased for modules to ensure that they fulfil the requirements for which they were designed and meets quality expectations.
C#SQL ServerUnified Communication

Cadence design systems

Summer Intern

May 2012Jul 2012 · 2 mos · Noida Area, India

  • 1. Worked with VSR - Noida Team on fixing defects using Coverity Static Analysis
  • 2. Written a utility in C++ to dump the constraints (associate with a cell) in XML Format
Coverity Static AnalysisC++

Birla institute of technology and science , pilani

Teaching Assistant

Aug 2011May 2012 · 9 mos · Jhunjhunun Area, India

  • 1. Assisted labs for students enrolled in C Programming course
  • 2. Assisted labs for students enrolled in Advance Computer Architecture course
  • Tools : Verilog - Experience on Model Sim S.E 6.1d
VerilogModel Sim

Cadence design systems

Summer Intern

May 2011Jul 2011 · 2 mos · Noida Area, India

  • 1. Worked with VSR-Noida Team on enhancing the productivity of the Virtuoso Space Router(VSR) build by writing scripts and reporting format of the regression utility by using Tcl-Tk scripting language.
  • 2.Written a utility in C++ to generate an XML file from a given set of parameters.This utility was intergrated in their product.
Tcl-TkC++

Education

Birla Institute of Technology and Science, Pilani

Master's Degree — Computer Science

Jan 2011Jan 2013

BIT Mesra Student-Industry Relations Cell

Bachelor's Degree — Computer Science

Jan 2007Jan 2011

Holy Child Senior Secondary School

High School — PCM + B (CBSE)

Stackforce found 100+ more professionals with Gpu Virtualization & Ai Workloads

Explore similar profiles based on matching skills and experience