Usman M.

CEO

Washington, DC, United States14 yrs 8 mos experience

Most Likely To Switch

Key Highlights

Increased HPC cluster capacity from 98% to 400%
Reduced deployment time from hours to minutes
Delivered 99.9%+ uptime across multi-thousand node environments

Stackforce AI infers this person is a Senior Infrastructure Engineer specializing in AI and HPC workloads for enterprise and public sector.

Contact

Skills

Core Skills

Infrastructure EngineeringCloud ArchitectureHpc EngineeringInfrastructure ManagementDevops EngineeringCloud InfrastructureSystems EngineeringInfrastructure Modernization

Other Skills

LinuxAWSKubernetesDockerAnsibleTerraformCI/CDPostgreSQLElasticsearchHPC schedulersobservability platformsAutomationHPCStorage SolutionsIBM LSF

About

Senior Infrastructure and Platform Engineer with 12+ years designing, building, and operating large-scale Linux and cloud environments for enterprise, research, telecom, and federal systems. Specialize in compute-intensive platforms, AI and HPC workloads, distributed systems, and automation-driven infrastructure across hybrid and cloud environments. Strong background delivering reliable, high-throughput platforms supporting mission-critical services and large data processing. Key strengths include Kubernetes platforms, GPU compute environments, batch scheduling, parallel storage, cloud architecture, and infrastructure as code. Selected impact: • Increased HPC cluster capacity from 98% to 400% through automated resource optimization • Reduced deployment time from hours to minutes via infrastructure automation • Delivered 99.9%+ uptime across multi-thousand node environments • Migrated large data systems with zero downtime and high data integrity • Built secure, hardened platforms for enterprise and federal workloads Extensive experience across AWS, Linux at scale, container orchestration, observability platforms, and distributed services. Open to remote senior roles and contract engagements focused on AI infrastructure, platform engineering, SRE, HPC, or large-scale cloud systems.

Experience

14 yrs 8 mos

Total Experience

5 yrs 3 mos

Average Tenure

14 yrs 8 mos

Current Experience

Cadence design systems

Senior HPC / Linux Systems Engineer

Jan 2020 – Jul 2022 · 2 yrs 6 mos · California, United States

Operated large-scale HPC infrastructure supporting semiconductor design and engineering simulations.
Managed IBM LSF clusters across ~1700 servers with 99.8% uptime
Increased effective compute capacity to 400% via automated resource optimization
Implemented containerized workloads to reduce resource conflicts
Automated provisioning using kickstart and configuration management tools
Deployed and optimized GPU compute environments
Maintained core enterprise Linux services across heterogeneous systems

IBM LSFLinuxGPU compute environmentsContainerizationAutomationHPC Engineering+1

Etisalat

Senior DevOps Engineer

Mar 2018 – Dec 2019 · 1 yr 9 mos · Down town, Dubai

Built cloud platforms and automation systems for large telecom services supporting hundreds of thousands of users.
Designed CI/CD pipelines for enterprise application platforms
Architected multi-AZ AWS environments with automated failover
Managed containerized services and autoscaling infrastructure
Implemented monitoring, alerting, and operational automation
Automated configuration across large Linux server fleets

CI/CDAWSContainerizationMonitoringAutomationDevOps Engineering+1

Al khaleej international pvt. school

Systems Engineer / devops

Jan 2016 – Feb 2018 · 2 yrs 1 mo · Sharjah, United Arab Emirates

Led infrastructure modernization including virtualization, cloud adoption, network redesign, and centralized services for large campus environments.

VirtualizationCloud adoptionNetwork redesignSystems EngineeringInfrastructure Modernization

Kryptohive

Independent Infrastructure & Platform Consultant

Sep 2011 – Present · 14 yrs 8 mos · United States · Remote

Architect and operate large-scale Linux, cloud, and HPC environments for enterprise, telecom, research, and public sector clients.
Selected engagements:
Georgia Institute of Technology
Automated infrastructure provisioning reducing deployment effort ~70%
Led large Elasticsearch migration with zero downtime
Built PostgreSQL high-availability cluster achieving 99.95% uptime
Deployed enterprise monitoring platform across distributed environments
General Electric
Designed AWS ParallelCluster HPC platform supporting engineering simulations
Implemented high-throughput storage using FSx for Lustre
Built automated infrastructure using CloudFormation and CI/CD pipelines
Delivered GPU-enabled remote visualization environment
Public Sector Systems
Delivered hardened infrastructure automation for sensitive environments
Implemented monitoring, patching, and configuration management at scale
Technologies: Linux, AWS, Kubernetes, Docker, Ansible, Terraform, CI/CD, PostgreSQL, Elasticsearch, HPC schedulers, observability platforms