Alex Schneider

SRE (Site Reliability Engineer)

Vancouver, British Columbia, Canada11 yrs 10 mos experience

Key Highlights

  • Expert in Site Reliability Engineering with extensive experience.
  • Proven track record in incident management and tooling development.
  • Strong background in security and vulnerability management.
Stackforce AI infers this person is a Site Reliability Engineer with a strong focus on security and incident management in SaaS.

Contact

Skills

Core Skills

Site Reliability EngineeringIncident ManagementSystem Automation

Other Skills

Software DevelopmentCross-team CollaborationLive Video StreamingSociotechnical EngineeringStreaming MediaLarge Scale System IntegrationBig DataRemediation EngineeringChatGPTSocial TechnologiesVulnerability ScanningData PipelinesMicrosoft AzureCritical Incident ResponseProgramming Languages

Experience

11 yrs 10 mos
Total Experience
1 yr 5 mos
Average Tenure
--
Current Experience

Freelance

Sabbatical

Jan 2026May 2026 · 4 mos

  • Achieved major personal growth objectives.
  • Walked nonstop for over a month, 1000km across France and Spain.
  • Rekindled connections with friends I haven't seen in years.
  • Visited new and awe-inspiring locations, especially in Canada, Italy, and the United Kingdom.
  • Kept up to date with trends in the industry, and focused on personal learning and development with these trends, especially AI and its impacts on a reliability culture.

Netflix

Site Reliability Engineer

Jan 2023Jan 2025 · 2 yrs · Remote

  • Perform incident management, incident response, retrospectives, and remediation/follow up
  • for company wide customer impacting incidents.
  • Develop incident management processes and procedures for the new Live Streaming product.
  • Design and implement an engagement model for an embedded SRE engagement with the
  • Live Streaming product and teams developing services for it.
  • Design a process for production readiness reviews for new and existing applications across
  • Netflix, and build tooling to identify opportunities for teams to improve their services.
  • Mentor new engineers and train them for on-call, Live Streaming operations, and performing
  • incident follow up work.
  • Develop and deploy new incident management tooling and unified processes across the entire
  • company (built on top of Incident.io).
  • Consult with teams across the company to identify pain points and implement engineering
  • and organizational solution to the pain points.
Software DevelopmentSite Reliability EngineeringIncident ManagementCross-team CollaborationLive Video StreamingSociotechnical Engineering+1

Twitter

Senior Site Reliability Engineer

Jan 2021Jan 2023 · 2 yrs · Toronto, ON · Hybrid

  • Initiate SRE engagement within the Health Tools team to reduce on-call toil by fine tuning alerts and optimizing response
  • playbooks.
  • Lead engineers within Health to improve security for both service-to-service and employee authentication/authorization. Develop tooling to provide executive visibility into authentication/authorization across Twitter’s internal services.
  • Adapt company-wide incident management and production readiness processes for unique challenges imposed by scale and government requirements within Health.
  • Onboard and mentor junior and mid-level engineers as they join the Health SRE team.
Site Reliability EngineeringSystem AutomationLarge Scale System IntegrationBig Data

Heroku

Senior Software Engineer

Jun 2019Jan 2021 · 1 yr 7 mos

  • Build tooling for better visibility into the vulnerability management program, driving hosts with detected vulnerabilities from 5% of the fleet to 0.01%. Improve automation and compliance reporting around onboarding/offboarding employees.
  • Start an SRE engagement with the Verification team, provide support for scaling and production readiness and assist with launching and scaling new verification features.
  • Act as incident manager on call during incidents with customer impact, and steward incidents through the response, remediation, and blameless postmortem processes.

Facebook

Production Engineer

May 2018Jun 2019 · 1 yr 1 mo · Menlo Park, CA

  • Design and develop code and operating system scanning solutions to implement comprehensive coverage of vulnerability management program, expanding on existing commercial network scanning tools for PCI compliance. Drive detected critical CVE count from thousands across Facebook to zero.
  • Design and develop Extract/Transform/Load pipelines and systems that normalize and join production data and emerging vulnerability feeds. Leverage data to justify engineering effort for teams to action on their own services’ vulnerabilities and consult with them on remediation options to avoid impacting service reliability

Microsoft

Site Reliability Engineer

Jul 2016May 2018 · 1 yr 10 mos · Redmond, WA

  • Respond to critical security incidents within Azure. Collaborate with teams across Microsoft to build a solution for executives to have visiblity on Azure’s security capabilities. Develop software and procedures to minimize the security and reliability impact to Azure during incidents.
  • Develop codescanning infrastructure to scan for date-time handling bugs in the Azure codebase, preventing leap year issues in Azure in 2016.
  • Initiate SRE engagement with the Access to Production team on Azure. Reduce incident responder toil by increasing stability of the service and fidelity of alerting. Develop a new process for customer engagements by the Access to Production team that increases security, reliability, and auditability

Security innovation

Security Engineering Intern

May 2015Aug 2015 · 3 mos · Seattle, WA

  • Perform penetration tests and code review on a variety of different platforms and servers.

Techempower

Programming Intern

Jan 2014Dec 2014 · 11 mos · Greater Los Angeles Area

  • Develop and maintain Java web applications.

Loyola marymount university

Student Worker

Sep 2013Dec 2013 · 3 mos · Archives and Special Collections

  • Data preservation
  • Developing software to facilitate preservation of aging and damaged optical and digital media.

Ucla david geffen school of medicine

Research Assistant

May 2013May 2016 · 3 yrs · Greater Los Angeles Area

  • Evaluation of medical student activities in the Emergency Department using data from computerized logs. Sanitizing and processing data.

Education

Loyola Marymount University

Bachelor of Science (B.S.) — Computer Science

Jan 2013Jan 2016

Loyola Marymount University

Bachelor of Science — Computational Science

Stackforce found 100+ more professionals with Site Reliability Engineering & Incident Management

Explore similar profiles based on matching skills and experience