Raghul Vasudevan

Software Engineer

Bengaluru, Karnataka, India7 yrs 9 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Expert in High Performance Computing management.
  • Developed monitoring solutions for top-tier HPC systems.
  • Strong background in software development and automation.
Stackforce AI infers this person is a High Performance Computing specialist with strong software engineering skills.

Contact

Skills

Core Skills

High Performance Computing (hpc)Cluster ManagementTest AutomationSoftware Development

Other Skills

Apache KafkaBasics of Machine LearningCustomer ServiceData AnalysisDatabasesGitHubGrafanaLeadershipLinuxLogstashMulti threadingOpenSearch DBPython (Programming Language)Shell ScriptingSlurm Workload Manager

About

Experienced Senior Software Developer specializing in High Performance Clusters (HPC) management and monitoring. Key contributor to a dynamic team, focused on developing and enhancing management software called HPCM (HPE Performance Cluster Manager) for some of the world's largest HPC systems including Frontier, Aurora, SHAHEEN III and etc. Proficient in creating monitoring and alerting framework, dashboards. Skilled in telemetry and alert collection methods of the various HPC components with a track record of developing scalable monitoring solution for at-scale cluster management. Passionate about driving innovation in the HPC space.

Experience

Hewlett packard enterprise

4 roles

HPC System Software Engineer III

Promoted

Jan 2024Present · 2 yrs 2 mos

  • Working on HPCM (HPE High Performance Cluster Manager) and delivering the monitoring solution platform for worlds largest and faster HPC's clusters which are being deployed by HPE.
  • Delivering Cluster data monitoring solution including Cluster health, Cooling and power devices, system monitoring (compute and non compute nodes), Workload managers, Slingshot Interconnect .
  • WLM (SLURM/PBS) - node monitoring along with Job level Energy and power consumption monitoring with power capping support.
  • Grafana dashboards for the visualisation of different data variants across the cluster/system
Customer ServiceLinuxApache KafkaCluster ManagementPython (Programming Language)Solution Architecture+3

HPC Software Engineer II

Promoted

Apr 2021Jan 2024 · 2 yrs 9 mos

  • Working on HPCM (HPE High Performance Cluster Manager) and delivering the monitoring solution platform for worlds largest and faster HPC's clusters which are being deployed by HPE.
  • Delivering Cluster data monitoring solution including Cluster health, Cooling and power devices, system monitoring (compute and non compute nodes), Workload managers, Slingshot Interconnect .
  • WLM (SLURM/PBS) - node monitoring along with Job level Energy and power consumption monitoring with power capping support.
  • Grafana dashboards for the visualisation of different data variants across the cluster.
  • Unified Alerting framework (with OpenSearch Alerting and Grafana Alerting) in HPCM to monitor and alert on various events from various hardware and software components in the cluster.
  • Deployed the monitoring features in many active super computers including Frontier, Aramco, CINES and Aurora.
LeadershipDatabasesLinuxOpenSearch DBApache KafkaPython (Programming Language)+10

System Software Engineer

Jul 2018Apr 2021 · 2 yrs 9 mos

  • Successfully managed various JDK Software releases (Java7, Java8 & Java11) and JVM performance analysis supporting tool for HP-UX Integrity/PA-RISC servers
  • Successfully automated the critical test suites for the manual testers to test the HPE Edgeline and Moonshot Servers functionalities With Python Robot framework
  • Worked on critical customer defects across various stages
LinuxPython (Programming Language)Test AutomationShell ScriptingGitHubSoftware Development

Intern

Jan 2018Jul 2018 · 6 mos

  • Briefly Worked with Intel towards benchmark of Cassandra DB cluster performance with NVMe (Non-Volatile Memory express) and Persistent memory.
  • Successfully conducted various benchmarks on Cassandra cluster and monitored the performance between NVM and Persistent memory for performance gain and flashed the analysis to Intel for performance Improvement.
  • Contributed to the Open Source tool called PAT (Performance Analysis Tool) which helps to analyze the performance on Linux OS.
LinuxShell ScriptingGitHub

Education

Birla Institute of Technology and Science, Pilani

Master of Technology - MTech — Software Systems

Jan 2019Jan 2021

PSG College of Technology

Bachelor of Technology - BTech — Information Technology

Jan 2014Jan 2018

Noble Matriculation Higher Secondary school

Computer Science

Jan 2012Jan 2014

Stackforce found 100+ more professionals with High Performance Computing (hpc) & Cluster Management

Explore similar profiles based on matching skills and experience