Alfateh Mustafa

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India12 yrs 6 mos experience
Highly Stable

Key Highlights

  • Expert in cloud cost visibility and optimization.
  • Led SRE initiatives for high-scale infrastructure.
  • Innovative in building tools for infra visibility.
Stackforce AI infers this person is a Site Reliability Engineer with expertise in cloud computing and infrastructure management.

Contact

Skills

Core Skills

KubernetesCloud ComputingSite Reliability Engineering

Other Skills

AWS CodePipelineAmazon Web Services (AWS)AnsibleApacheBashC (Programming Language)Cloud ApplicationsCommunicationComputer Network OperationsCustomer ServiceDatabasesDatadogDevOpsEnglishGit

About

Over the years, I’ve led SRE and platform engineering efforts across multi-cloud, hybrid, and on-prem stacks; and kept running into the same recurring challenge:Cloud cost visibility and optimisation is still broken.Most teams are left chasing down unused infra, patching inconsistent tagging, firefighting surprise cost spikes, or working with siloed spreadsheets; all while spend keeps rising.These gaps aren’t just inefficiencies; they directly impact margins, accountability, and scale.That’s led me to deeply explore ways to combine: • FinOps principles • SRE best practices • Automation & Observability • And intelligent insights from infrastructure dataAll with the goal of making cloud cost efficiency effortless and continuous; not an afterthought.💻 What I love working on: • Cost Attribution | Commitment & Rightsizing Strategy • Kubernetes & Multi-Cloud Platform Architecture • Terraform / Terragrunt | IaC Cost Intelligence • Real-Time Infra Observability + Cost Anomaly Detection • Building high-impact Infra Products from 0→1I’m currently collaborating with teams across FinTech, SaaS, and high-scale infra domains to understand how engineering and cost teams can work better, together.Let’s connect if you’re thinking about the same problems or have solved them differently.

Experience

Stealth ai startup

Staff SRE

Oct 2024Present · 1 yr 5 mos · Bengaluru · Hybrid

  • ] Part of the Embedded SRE team that integrated deeply into engineering teams to support core product initiatives with hands-on infrastructure, observability, and reliability expertise.
  • ] Spearheaded the complete onboarding and lifecycle support for Okta Personal into a newly re-architected, compliance-driven infrastructure.
  • ] Led end-to-end onboarding, from service design to production readiness, ensuring system-wide auditability, compliance alignment, and reliability.
  • ] Collaborated across orgs to drive strategic infra planning, balancing long-term architectural shifts with product delivery timelines.
AWS CodePipelineDatabasesCloud ComputingCloud ApplicationsAmazon Web Services (AWS)Terraform+7

Palo alto networks

Sr. Staff Engineer, Site Reliability

Apr 2021Sep 2024 · 3 yrs 5 mos · Bengaluru · Hybrid

  • ] Led SRE for Cortex Data Lake, FAWKES, and App Services Infra, ensuring uptime, scalability, and resilience across critical services.
  • ] Built frameworks for CI/CD automation, observability at scale, and infra standardization, including 1500+ DB instance migrations, 100+ service deployments across FedRAMP & commercial regions.
  • ] Designed and delivered AccessPANor for GitLab-integrated infra/IAM provisioning, CMDP observability revamp (>3M metrics/sec), and a Cert Lifecycle Management system to eliminate expiration outages.
  • ] Owned synthetic monitoring, policy compliance remediations, and cost-optimized observability pipelines, saving ~80% in recurring infra spend.
  • ] Evangelized blameless culture with WS3 syncs, improved on-call quality, and mentored SRE org-wide.
Unix Shell ScriptingHashiCorpDevOpsObservabilityTroubleshootingGrafana+15

Quotient technology inc.

3 roles

Lead SRE

Feb 2020Mar 2021 · 1 yr 1 mo

  • ] Formed and led the new Tools-SRE team, focused on reliability craftsmanship, MTTR reduction, and tooling innovation.
  • ] Built SPADE, HIT, and Hawkeye; tools enabling infra visibility, AI-driven alerting, and real-time infra skeletons.
HashiCorpDevOpsAWS CodePipelineMicroservicesDatabasesCloud Computing+5

Senior Site Reliability Engineer

Promoted

Dec 2018Feb 2020 · 1 yr 2 mos

  • ] Delivered advanced Splunk dashboards (inbound traffic, impact volume, errors) for proactive incident insights.
  • ] Contributed to GCP migrations, and drove initiatives in availability engineering, event correlation, and traffic anomaly detection.
DevOpsDatabasesCloud ComputingCloud ApplicationsCommunicationAnsible+1

Sr. Operations Engineer

Sep 2018Dec 2018 · 3 mos

DevOpsDatabasesCommunication

Linkedin

Site Operations Engineer

May 2014Jan 2018 · 3 yrs 8 mos · Bengaluru · On-site

  • ] Candidly work with SRE and Development teams, as well as coordinate / communicate / manage notifications and updates of issues affecting site availability / performance to customers and executive management.
  • ] Managed LinkedIn’s 24/7 production infrastructure and site reliability operations across global data centers.
  • ] Automated alert triage, built custom internal platforms, and authored tools like SIT (Service Info Tool) and Search Genie for infra visibility and SRE-developer collaboration.
  • ] Reduced MTTD/MTTR with smart monitoring pipelines and improved cross-functional incident workflows.
  • ] Acted as the backbone for infrastructure performance, driving platform availability at scale.
DevOpsDatabasesC (Programming Language)Communication

Harman international india pvt. ltd.

Intern as Youth Ambassador

Jan 2014Apr 2014 · 3 mos · Bangalore

  • Social Media Marketing
  • Promoting JBL (Harman India) products
  • Organise JBL Studio where people show-cast their talents related to Music
  • Conducted product trials by the University crowd

Star tv

Intern as Campus Manager

Oct 2013Mar 2014 · 5 mos · Bangalore

  • Active member of the organising team for Channel [V] present The IndiaFest '14 held at Dayanand Sagar Institutes, Bangalore.
  • Social Media Marketing, promotions, Inviting members throughout the city to be a part of the fest, Participant Management.

Nokia

Intern as Campus Ambassador

Jul 2012Nov 2013 · 1 yr 4 mos

  • Social Media Marketing
  • Pre-launched Device testing
  • Attended National Conferences
  • Meet the National Tech Gurus
  • Shared my ideas for AppDevelopments for NokiaLumia

Education

Christ University, Bangalore

Bachelor of Computer Application (BCA) — Computer

Jan 2011Jan 2014

Stackforce found 100+ more professionals with Kubernetes & Cloud Computing

Explore similar profiles based on matching skills and experience