Alex Schneider

SRE (Site Reliability Engineer)

Vancouver, British Columbia, Canada11 yrs 10 mos experience

Key Highlights

Expert in Site Reliability Engineering with extensive experience.
Proven track record in incident management and tooling development.
Strong background in security and vulnerability management.

Stackforce AI infers this person is a Site Reliability Engineer with a strong focus on security and incident management in SaaS.

Contact

Skills

Core Skills

Site Reliability EngineeringIncident ManagementSystem Automation

Other Skills

Software DevelopmentCross-team CollaborationLive Video StreamingSociotechnical EngineeringStreaming MediaLarge Scale System IntegrationBig DataRemediation EngineeringChatGPTSocial TechnologiesVulnerability ScanningData PipelinesMicrosoft AzureCritical Incident ResponseProgramming Languages

Experience

11 yrs 10 mos

Total Experience

1 yr 5 mos

Average Tenure

Current Experience

Freelance

Sabbatical

Jan 2026 – May 2026 · 4 mos

Achieved major personal growth objectives.
Walked nonstop for over a month, 1000km across France and Spain.
Rekindled connections with friends I haven't seen in years.
Visited new and awe-inspiring locations, especially in Canada, Italy, and the United Kingdom.
Kept up to date with trends in the industry, and focused on personal learning and development with these trends, especially AI and its impacts on a reliability culture.

Netflix

Site Reliability Engineer

Jan 2023 – Jan 2025 · 2 yrs · Remote

Perform incident management, incident response, retrospectives, and remediation/follow up
for company wide customer impacting incidents.
Develop incident management processes and procedures for the new Live Streaming product.
Design and implement an engagement model for an embedded SRE engagement with the
Live Streaming product and teams developing services for it.
Design a process for production readiness reviews for new and existing applications across
Netflix, and build tooling to identify opportunities for teams to improve their services.
Mentor new engineers and train them for on-call, Live Streaming operations, and performing
incident follow up work.
Develop and deploy new incident management tooling and unified processes across the entire
company (built on top of Incident.io).
Consult with teams across the company to identify pain points and implement engineering
and organizational solution to the pain points.

Software DevelopmentSite Reliability EngineeringIncident ManagementCross-team CollaborationLive Video StreamingSociotechnical Engineering+1

Twitter

Senior Site Reliability Engineer

Jan 2021 – Jan 2023 · 2 yrs · Toronto, ON · Hybrid

Initiate SRE engagement within the Health Tools team to reduce on-call toil by fine tuning alerts and optimizing response
playbooks.
Lead engineers within Health to improve security for both service-to-service and employee authentication/authorization. Develop tooling to provide executive visibility into authentication/authorization across Twitter’s internal services.
Adapt company-wide incident management and production readiness processes for unique challenges imposed by scale and government requirements within Health.
Onboard and mentor junior and mid-level engineers as they join the Health SRE team.

Site Reliability EngineeringSystem AutomationLarge Scale System IntegrationBig Data

Heroku

Senior Software Engineer

Jun 2019 – Jan 2021 · 1 yr 7 mos

Build tooling for better visibility into the vulnerability management program, driving hosts with detected vulnerabilities from 5% of the fleet to 0.01%. Improve automation and compliance reporting around onboarding/offboarding employees.
Start an SRE engagement with the Verification team, provide support for scaling and production readiness and assist with launching and scaling new verification features.
Act as incident manager on call during incidents with customer impact, and steward incidents through the response, remediation, and blameless postmortem processes.

Facebook

Production Engineer

May 2018 – Jun 2019 · 1 yr 1 mo · Menlo Park, CA

Design and develop code and operating system scanning solutions to implement comprehensive coverage of vulnerability management program, expanding on existing commercial network scanning tools for PCI compliance. Drive detected critical CVE count from thousands across Facebook to zero.
Design and develop Extract/Transform/Load pipelines and systems that normalize and join production data and emerging vulnerability feeds. Leverage data to justify engineering effort for teams to action on their own services’ vulnerabilities and consult with them on remediation options to avoid impacting service reliability

Microsoft

Site Reliability Engineer

Jul 2016 – May 2018 · 1 yr 10 mos · Redmond, WA

Respond to critical security incidents within Azure. Collaborate with teams across Microsoft to build a solution for executives to have visiblity on Azure’s security capabilities. Develop software and procedures to minimize the security and reliability impact to Azure during incidents.
Develop codescanning infrastructure to scan for date-time handling bugs in the Azure codebase, preventing leap year issues in Azure in 2016.
Initiate SRE engagement with the Access to Production team on Azure. Reduce incident responder toil by increasing stability of the service and fidelity of alerting. Develop a new process for customer engagements by the Access to Production team that increases security, reliability, and auditability