2 min read

GitLabs' database incident

By: Dom Bush on 02 Feb 2017

No one who works in the tech industry should have any schadenfreude in response to GitLab’s outage yesterday as reported by Business Insider and TechCrunch.

According to the incredibly open notes that GitLabs published while the incident was still being worked on, the initial trigger to the problem was:

Spike in database load due to spam users

In response, they took a series of actions to attempt to resolve the spam problem but at 11pm an admin referred to as team-member-1 made a mistake and confused which machine they were running an rm -rf command on. This deleted a live production PostgreSQL data directory. By the time the mistake was noticed only 1.5% of approximately 300GB of data remained.

The problem was further compounded by a series of problems they had with their backups. According to an update that they posted some of the backup did not appear to have worked, producing:

files only a few bytes in size

They have since managed to restore their service but with 6 hours of data lost. They have promised to publish their 5 whys of the cause of the incident and steps they will implement to prevent this from happening again.

In another interesting blog post, 2ndQuadrant, the initial author of the core PostgreSQL’s backup technologies, responded to the incident with their observations and suggestions for tools to consider. Well worth a read.

As we said at the beginning, there is no room for any schadenfreude. Today this is GitLab, tomorrow it could be anybody. Admins are people and people make mistakes. The only solution is to try and make making a mistake that risks production data as difficult as possible through scripting and automation, regularly ensuring that backups are happening successfully and that backups will actually restore in practice.

One positive thing to come out of this will be that lots of people in the tech industry will checking their backups today (I know we are). Another was the #HugOps hashtag where people sent their best wishes to GitLab on Twitter. We certainly echo that sentiment.

BDQ Solutions

Lightning ImplementationsFor people who know what they want, and want it done fast

Enhancement HoursGet best practice and configuration consultancy.

Review and AssessmentA low cost, low risk way to get the assistance you need.

Digital Adoption ServicesMake sure software is being used consistently across teams.

PII ServicesOur solution to help you find unauthorised data.

DevOps ServicesGet great, high quality software shipped faster. Faster.

Test Automation & ManagementReduce costs and increase quality with automation.

Atlassian Solutions

Atlassian EnterpriseSCALE WITH CONFIDENCE USING THE BENEFITS OF pREMIUM AND aCCESS

Jira Work Managementwork management for technical & non-technical teams.

Cloud Migration ServicesQuicker and more cost effective than doing it in house.

Jira Service Management / ITSMFast, painless, fixed price ITSM implementations.

BDQ AtlassianCareCost effective, flexible care options.

Other Atlassian ServicesMaximise the potential of your Atlassian products.

Other Solutions

LEXZUR PRACTICE MANAGEMENTComplete managament software for legal practitioners.

Asana Digital Work ManagementA simple, flexible way to manage work for business.

Solutions

Products

Training

About Us

2 min read

GitLabs' database incident

Related Posts

BDQ Awarded Framework Contract for G-Cloud 13 | BDQ

BDQ are pleased to announce that we have been awarded a framework contract for G-Cloud 13. This is...

Reports overview video

AtlasCamp 2016 retrospective

BDQ Products

Partner Products

Lightning Implementations
For people who know what they want, and want it done fast

Enhancement Hours
Get best practice and configuration consultancy.

Review and Assessment
A low cost, low risk way to get the assistance you need.

Digital Adoption Services
Make sure software is being used consistently across teams.

PII Services
Our solution to help you find unauthorised data.

DevOps Services
Get great, high quality software shipped faster. Faster.

Test Automation & Management
Reduce costs and increase quality with automation.

Atlassian Enterprise
SCALE WITH CONFIDENCE USING THE BENEFITS OF pREMIUM AND aCCESS

Jira Work Management
work management for technical & non-technical teams.

Cloud Migration Services
Quicker and more cost effective than doing it in house.

Jira Service Management / ITSM
Fast, painless, fixed price ITSM implementations.

BDQ AtlassianCare
Cost effective, flexible care options.

Other Atlassian Services
Maximise the potential of your Atlassian products.

LEXZUR PRACTICE MANAGEMENT
Complete managament software for legal practitioners.

Asana Digital Work Management
A simple, flexible way to manage work for business.