Quinn Murphy

SRE (Site Reliability Engineer)

Lowell, Massachusetts, United States15 yrs 4 mos experience

Highly Stable

Key Highlights

Built multi-tenant Kubernetes cluster for internal developers.
Transitioned GitHub infrastructure to Kubernetes, reducing outages.
Managed a complete data-center migration with zero downtime.

Stackforce AI infers this person is a SaaS Infrastructure Engineer with strong expertise in Site Reliability Engineering and DevOps.

Contact

Skills

Core Skills

Site Reliability EngineeringDevopsKubernetesInfrastructure ManagementAutomationDatabase ManagementData EngineeringOperations ManagementIncident ManagementSecurityTeam ManagementDisaster RecoverySystem AdministrationCompliance

Other Skills

BashTerraformJenkinsAnsibleLinux System AdministrationAmazon Web Services (AWS)Continuous Integration and Continuous Delivery (CI/CD)Server ArchitectureReliability EngineeringGo (Programming Language)Microsoft AzureKusto Query Language (KQL)Azure KustoMySQLAzure Data Explorer

About

Using insights and experience to improve infrastructure and decrease friction in its use for internal and external customers. Looking after the most important feature of any service, which is "uptime and reliability".

Experience

15 yrs 4 mos

Total Experience

7 yrs 3 mos

Average Tenure

9 mos

Current Experience

Authzed

Senior Site Reliability Engineer

Sep 2025 – Present · 9 mos · Remote

Github

2 roles

Senior Software Engineer

May 2021 – Mar 2025 · 3 yrs 10 mos · Remote

As a Senior Engineer, I transferred to working on the GitHub Enterprise Server (GHES) infrastructure team, supporting customers with appliances running their own private instance of GitHub. I supported our customers by participating in on-call rotations to handle escalations and developing improvements to the running of our appliances.
volunteered to work on the engineering documentation review team, which reviewed technical documents to improve overall quality of documentation across GitHub.
took ownership of backup-utils, a product customers relied on for backing up and restoring their GHES instances. Became SME and was point of contact for the product, adding refinements and improvements such as improved logging and incremental mysql backups. Improved our documentation and gave training on the project to other engineers.
worked on multiple projects that to improve internal operations : using Actions to automate documentation creation; proof of concept AI agent to create escalation summaries of support tickets; participated in initiative to improve CI; refined and improved internal documentation.
managed a transition from MySQL 5 to MySQL 8, working with the SaaS database team to ensure a smooth transition for our appliance customers when upgrading their appliances.
developed a data pipeline to move support information about customer database migration timing to our data warehouse. This data was used to identify and reduce downtime from long-running migrations and transitions. Built dashboards in Azure Data Explorer for others to use.
Designed and developed process and pipeline to transfer support information about customer appliances to data warehouse so that product and engineering leadership could make data-driven decisions regarding areas of focus in the GHES product. Worked cross-team to extract the data from json via an API into millions of events sent to Kafka, to be consumed by Azure Data Explorer.

Linux System AdministrationMicrosoft AzureKusto Query Language (KQL)BashAzure KustoJenkins+9

Site Reliability Engineer

Feb 2017 – Sep 2025 · 8 yrs 7 mos · Remote

As Site Reliability Engineer at GitHub, I worked on maintaining mission critical legacy systems while also working to improve observability, reduce outages, and decrease developer friction. Working on a number of teams as GitHub grew and evolved, I built tooling and infrastructure to support our internal and external users.
To decrease developer friction, I built our first multi-tenant kubernetes cluster for internal developers to deploy new services to, transitioning from our legacy system of deploying service nodes via puppet. Wrote documents and assisted developers in getting services on-board.
To reduce outages caused by rapid growth, worked as part of infra team that transitioned GitHub infra from bare-metal, puppet provisioned nodes to kubernetes (detailed in https://github.blog/engineering/infrastructure/kubernetes-at-github/). Provisioned clusters, worked with team to vet services and support transition by developing tooling and working on-call to troubleshoot nodes.
As a member of the Production Delivery team, built our internal platform Moda (https://www.youtube.com/watch?v=YmdqxzX6KAc) to onboard developers to kubernetes deployment. Also provided guidance and support to these users in on-call rotations and updating documents.
Built several chatops systems to make onboarding easier, including a scaffolding template system that allowed developers to follow best practices when starting Moda projects.

BashTerraformJenkinsKubernetesAnsibleDevOps+8

Netsuite

4 roles

Sr Systems Engineer, Centralized Services

Feb 2016 – Feb 2017 · 1 yr · Greater Boston

In this role, I work to mature operations of acquired products by identifying and implementing best practices in processes and technologies. Act as project lead in implementing solutions, coordinating efforts while also serving as a technical lead.
Worked to build an incident management methodology that standardized operations approaches and technologies. Identified need for using Incident Command System, shared contact trees, and post-mortems, but also to use improved, proactive monitoring and alerting using Icinga, collectd, and InfluxDB.
Designed Standard VPN solution for acquired brands using Pritunl. Currently implementing this solution, which will span multiple data centers using containers to deploy standardized installs of the software solution to each service team.
Designed and implemented method of deploying a security server for the Illumio product to acquired products via AWS. Standardized and automated deployment of the server structure using Terraform and Ansible.
Currently Implementing a "service catalog" for NS acquired products to get standard cloud services from easily. Building automation with Terraform to create a standard VPN-protected rancher with auto-registering host nodes. When finished, first service supplied will be status pages (using Cachet.io). NS Service teams will be able to get easy access to HA, centrally-monitored and maintained services without the need to add extra knowledge or maintenance burden to the team.
Designed and am implementing proactive monitoring using Icinga to perform checks directly against an InfluxDB time series database which recieves data from collectd on host nodes. This approach makes dynamic correlation of statistics easier to perform, and allows monitoring to work proactively, expressing dynamics rather than simply looking for thresholds. I have the basic infrastructure present in two acquired products, beginning to implement new checks in Q4.

BashTerraformKubernetesAnsibleDevOpsLinux System Administration+4

Technical Team Lead, Linux System Administration (Acquired Products)

Promoted

Dec 2014 – Feb 2016 · 1 yr 2 mos · Greater Boston

Managed a team of six Linux Sysadmins to maintain site reliability and mature/upgrade architecture. Mentored staff in operational procedures and also in scripting, troubleshooting and incident management. Installed and managed team workflow with Jira and Mattermost. Wrote several scripts (Bash & Python) to ease use of these tools.
Managed a complete data-center migration within a tight timeframe (around three months). Managed to keep strong uptime while coordinating changes across several groups. Created solutions and processes to facilitate operational speed without sacrificing stability.
Built a disaster recovery/release preview infrastructure, recreating the bare-metal production site with containerized infrastructure in a Kubernetes + CoreOS + Openstack environment. Was a lead technical contributor (designed process for bringing up the site, built initial containers, wrote site stand up scripts) and also project manager (coordinated work with Openstack engineer and sysadmin team, organied training, delegated and followed up on work).

BashKubernetesAnsibleDevOpsLinux System AdministrationAmazon Web Services (AWS)+3

Senior Unix System Administrator, (Openair)

Promoted

Mar 2014 – Dec 2014 · 9 mos · Greater Boston

After covering operational basics of the service, focused on trying to improve architecture, process and practice to increase uptime and scalability.
Took ownership of technical parts of SOC-1 audit, building tools to automate the delivery of data to auditors.
Deployed and administered first status page for the service, to inform customers about maintenance, uptime and issues with the service.
Built modern infrastructure to support a long-standing LAMP (P is Perl) application. Introduced configuration management to the environment (Puppet, planning transition to SaltStack), more robusting monitoring and metrics (Icinga, Graphite, Grafana), and containerization for supporting services using Docker. Started to decouple services from application servers to move away from monolithic server architecture. Worked with development to get tips but also buy-in for future implementations.

BashDevOpsLinux System AdministrationAmazon Web Services (AWS)System AdministrationOperations Management

Operations Engineer (Openair)

Feb 2011 – Mar 2014 · 3 yrs 1 mo · Greater Boston

At Netsuite, grew rapidly in responsibility and technical ability, in the process improving our operational capabilities. In October 2015 won company-wide operational award for uptime, for having a previous year with no site downtime. Responsible for designing and implementing new technologies to ensure our site conformed to best practices in security and SaaS architecture. Have done extensive scripting (BASH, Perl, Python) to automate processes wherever possible.
In these first few years with NS, worked to close operational debt acquired from not having dedicated operations staff. Standardized backup procedures, shored up security, performed system patches, release assistance, user management, and more.
Created several Perl scripts for automating critical procedures, including MySQL migrations, backup rotation, user creation, etc.
Worked on-call to correct issues with the site. Communicated with support and the business during system outages to keep stakeholders in the loop.

BashDevOpsLinux System AdministrationAmazon Web Services (AWS)System AdministrationOperations Management