Alfateh Mustafa

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India12 yrs 7 mos experience

Highly Stable

Key Highlights

Expert in cloud cost visibility and optimization.
Led SRE initiatives for high-scale infrastructure.
Innovative in building tools for infra visibility.

Stackforce AI infers this person is a Site Reliability Engineer with expertise in cloud computing and infrastructure management.

Contact

Skills

Core Skills

KubernetesCloud ComputingSite Reliability Engineering

Other Skills

AWS CodePipelineAmazon Web Services (AWS)AnsibleApacheBashC (Programming Language)Cloud ApplicationsCommunicationComputer Network OperationsCustomer ServiceDatabasesDatadogDevOpsEnglishGit

About

Over the years, I’ve led SRE and platform engineering efforts across multi-cloud, hybrid, and on-prem stacks; and kept running into the same recurring challenge:Cloud cost visibility and optimisation is still broken.Most teams are left chasing down unused infra, patching inconsistent tagging, firefighting surprise cost spikes, or working with siloed spreadsheets; all while spend keeps rising.These gaps aren’t just inefficiencies; they directly impact margins, accountability, and scale.That’s led me to deeply explore ways to combine: • FinOps principles • SRE best practices • Automation & Observability • And intelligent insights from infrastructure dataAll with the goal of making cloud cost efficiency effortless and continuous; not an afterthought.💻 What I love working on: • Cost Attribution | Commitment & Rightsizing Strategy • Kubernetes & Multi-Cloud Platform Architecture • Terraform / Terragrunt | IaC Cost Intelligence • Real-Time Infra Observability + Cost Anomaly Detection • Building high-impact Infra Products from 0→1I’m currently collaborating with teams across FinTech, SaaS, and high-scale infra domains to understand how engineering and cost teams can work better, together.Let’s connect if you’re thinking about the same problems or have solved them differently.

Experience

12 yrs 7 mos

Total Experience

2 yrs 2 mos

Average Tenure

1 yr 7 mos

Current Experience

Stealth ai startup

Staff SRE

Oct 2024 – Present · 1 yr 7 mos · Bengaluru · Hybrid

] Part of the Embedded SRE team that integrated deeply into engineering teams to support core product initiatives with hands-on infrastructure, observability, and reliability expertise.
] Spearheaded the complete onboarding and lifecycle support for Okta Personal into a newly re-architected, compliance-driven infrastructure.
] Led end-to-end onboarding, from service design to production readiness, ensuring system-wide auditability, compliance alignment, and reliability.
] Collaborated across orgs to drive strategic infra planning, balancing long-term architectural shifts with product delivery timelines.

AWS CodePipelineDatabasesCloud ComputingCloud ApplicationsAmazon Web Services (AWS)Terraform+7

Palo alto networks

Sr. Staff Engineer, Site Reliability

Apr 2021 – Sep 2024 · 3 yrs 5 mos · Bengaluru · Hybrid

] Led SRE for Cortex Data Lake, FAWKES, and App Services Infra, ensuring uptime, scalability, and resilience across critical services.
] Built frameworks for CI/CD automation, observability at scale, and infra standardization, including 1500+ DB instance migrations, 100+ service deployments across FedRAMP & commercial regions.
] Designed and delivered AccessPANor for GitLab-integrated infra/IAM provisioning, CMDP observability revamp (>3M metrics/sec), and a Cert Lifecycle Management system to eliminate expiration outages.
] Owned synthetic monitoring, policy compliance remediations, and cost-optimized observability pipelines, saving ~80% in recurring infra spend.
] Evangelized blameless culture with WS3 syncs, improved on-call quality, and mentored SRE org-wide.

Unix Shell ScriptingHashiCorpDevOpsObservabilityTroubleshootingGrafana+15

Quotient technology inc.

3 roles

Lead SRE

Feb 2020 – Mar 2021 · 1 yr 1 mo

] Formed and led the new Tools-SRE team, focused on reliability craftsmanship, MTTR reduction, and tooling innovation.
] Built SPADE, HIT, and Hawkeye; tools enabling infra visibility, AI-driven alerting, and real-time infra skeletons.

HashiCorpDevOpsAWS CodePipelineMicroservicesDatabasesCloud Computing+5

Senior Site Reliability Engineer

Promoted

Dec 2018 – Feb 2020 · 1 yr 2 mos

] Delivered advanced Splunk dashboards (inbound traffic, impact volume, errors) for proactive incident insights.
] Contributed to GCP migrations, and drove initiatives in availability engineering, event correlation, and traffic anomaly detection.

DevOpsDatabasesCloud ComputingCloud ApplicationsCommunicationAnsible+1

Sr. Operations Engineer

Sep 2018 – Dec 2018 · 3 mos

DevOpsDatabasesCommunication

Site Operations Engineer

May 2014 – Jan 2018 · 3 yrs 8 mos · Bengaluru · On-site

] Candidly work with SRE and Development teams, as well as coordinate / communicate / manage notifications and updates of issues affecting site availability / performance to customers and executive management.
] Managed LinkedIn’s 24/7 production infrastructure and site reliability operations across global data centers.
] Automated alert triage, built custom internal platforms, and authored tools like SIT (Service Info Tool) and Search Genie for infra visibility and SRE-developer collaboration.
] Reduced MTTD/MTTR with smart monitoring pipelines and improved cross-functional incident workflows.
] Acted as the backbone for infrastructure performance, driving platform availability at scale.

DevOpsDatabasesC (Programming Language)Communication

Harman international india pvt. ltd.

Intern as Youth Ambassador

Jan 2014 – Apr 2014 · 3 mos · Bangalore

Social Media Marketing
Promoting JBL (Harman India) products
Organise JBL Studio where people show-cast their talents related to Music
Conducted product trials by the University crowd

Star tv

Intern as Campus Manager

Oct 2013 – Mar 2014 · 5 mos · Bangalore

Active member of the organising team for Channel [V] present The IndiaFest '14 held at Dayanand Sagar Institutes, Bangalore.
Social Media Marketing, promotions, Inviting members throughout the city to be a part of the fest, Participant Management.