Prabu Sekar

Software Engineer

Abu Dhabi, United Arab Emirates6 yrs 7 mos experience

Highly Stable

Key Highlights

Expert in deploying and configuring HPC clusters.
Proficient in managing computational workflows with Slurm.
Strong background in troubleshooting complex system issues.

Stackforce AI infers this person is a High Performance Computing Infrastructure Specialist.

Contact

Skills

Core Skills

High Performance Computing (hpc)Slurm Workload ManagerLinux System Administration

Other Skills

AMD MI210 GPGPUsAnalytical SkillsAnsibleBashBig DataCentOSCloud ComputingClusterCluster ManagementCreative Problem SolvingData CenterDistributed SystemsGPFSHPLHardware Diagnostics

About

With over two years of experience at Core42, I specialize in deploying and configuring high-performance computing (HPC) clusters. My work focuses on leveraging Slurm to manage computational workflows across thousands of NVIDIA H100/H200 and AMD GPGPUs, ensuring optimal performance through benchmarking tools like NCCL, RCCL, and HPL. I also facilitate seamless integration of Azure-based HPC clusters with distributed workload storage solutions. At Core42, I contribute to diagnosing and resolving complex system issues across Slurm clusters, InfiniBand networks, and NVIDIA DGX/HGX systems. My role includes providing L3 operational support for HPC platforms, enhancing system reliability, and advancing computational capabilities through collaborative problem-solving and innovative system optimization.

Experience

6 yrs 7 mos

Total Experience

2 yrs 2 mos

Average Tenure

2 yrs 5 mos

Current Experience

Core42

Engineer - HPC Systems

Dec 2023 – Present · 2 yrs 5 mos · Abu Dhabi Emirate, United Arab Emirates · Hybrid

As an Engineer at AI Factory, Core42 in Abu Dhabi, I have led the deployment and configuration of high-performance computing (HPC) clusters using Slurm across thousands of NVIDIA H100/H200 and AMD MI210 GPGPUs. My responsibilities included post-deployment benchmarking using rail-optimized NCCL/RCCL and HPL/rocHPL to validate system performance, and diagnosing complex issues across NVIDIA DGX and HGX systems, Slurm clusters, and InfiniBand networks.
I also deployed Slurm-based HPC clusters in Microsoft Azure, integrating thousands of NVIDIA H200 GPUs with Azure Managed Lustre for distributed workload storage.
In addition to deployment, I actively provide L3 operational support across all HPC platforms and participate in hands-on cluster migration activities and troubleshooting. I maintain comprehensive documentation for system configurations, operational procedures, and best practices to support ongoing platform stability and scalability.

Analytical SkillsAnsibleProblem SolvingCreative Problem SolvingDistributed SystemsOptimization Techniques+3

Lenovo pccw solutions

Senior System Engineer

Sep 2022 – Nov 2023 · 1 yr 2 mos · Singapore · On-site

Analytical SkillsProblem SolvingBash

Nanyang technological university

Research Engineer, HPC

Sep 2019 – Oct 2022 · 3 yrs 1 mo · Singapore

Lead for HPCC (High Performance Computing Cluster)/Linux Infra. Designing , planning, budgeting and implementation of HPC, storage and Data Center Facilities. Linux Based Solution and Data Center Management.

Analytical SkillsIsilonPBSSlurm Workload ManagerGPFSProblem Solving+6

Harrington hpc microsystems ltd

HPC Systems Engineer

Mar 2017 – Aug 2019 · 2 yrs 5 mos · Abu Dhabi, United Arab Emirates

> Deputed at Khalifa University to work as part of Research Computing Department, during this time I was the main sysadmin for one of our HPC clusters.
> Person in charge of managing all the computing and storage hardware, as well as the liaising with vendors, installing and maintaining software. I was also the first point of contact for all the cluster’s users.
> Working closely with researchers and other computational specialists at all levels to understand their computational requirements and to provide consultancy on how research computing can contribute to addressing those requirements.

Analytical SkillsProblem SolvingOptimization TechniquesBash

Micropoint computers ltd.

HPC Technical Support Engineer

Aug 2014 – Feb 2017 · 2 yrs 6 mos · Bengaluru Area, India

> Architecting, installing and configuring the complete hardware and software of High-Performance Computing Solutions for the various institutes across India.
> Deployed several Linux based HPC cluster with GPU accelerator and IB interconnects as per the needs of the customer.
> Troubleshoot and optimised customer applications for use on IB interconnected clusters.
> Provided training and hands-on session for HPC users of institution faculty and students.

Analytical SkillsProblem SolvingBash

Indian institute of technology, madras

Project Technician in HPC

Oct 2011 – Aug 2014 · 2 yrs 10 mos · Chennai Area, India

As part of the technical team to administrate, support, identify and implement solutions to problems affecting on the institute HPC cluster and provided user level support.

Analytical SkillsProblem SolvingBash