Expert consulting and managed services to help complex organisations to work flatter, faster and more dynamically.

      With the help of our trusted partners:


      Solutions Home




        BDQ Originals



          Other products








              Whether it's our own Atlassian Marketplace apps or the apps that we provide a value-added-reseller service for, you can trust BDQ for the best support, consultancy, training and implementation available.


              Products Home



                • We provide high quality technology training to customers in the UK, EU and US.

                • Our customers range from small companies to multi-nationals. They all want to maximise employee productivity.

                • We listen to what our customers want to achieve, and take this into account when delivering the courses.

                home-icon-300x300Training Home →


                  From webinar recordings to whitepapers, case studies to blog posts. Help yourself to our free content that will hopefully inform and inspire.

                  Resources Home →
                    2 min read

                    GitLabs' database incident

                    Featured Image

                    No one who works in the tech industry should have any schadenfreude in response to GitLab’s outage yesterday as reported by Business Insider and TechCrunch.

                    According to the incredibly open notes that GitLabs published while the incident was still being worked on, the initial trigger to the problem was:

                    Spike in database load due to spam users

                    In response, they took a series of actions to attempt to resolve the spam problem but at 11pm an admin referred to as team-member-1 made a mistake and confused which machine they were running an rm -rf command on. This deleted a live production PostgreSQL data directory. By the time the mistake was noticed only 1.5% of approximately 300GB of data remained.

                    The problem was further compounded by a series of problems they had with their backups. According to an update that they posted some of the backup did not appear to have worked, producing:

                    files only a few bytes in size

                    They have since managed to restore their service but with 6 hours of data lost. They have promised to publish their 5 whys of the cause of the incident and steps they will implement to prevent this from happening again.

                    In another interesting blog post, 2ndQuadrant, the initial author of the core PostgreSQL’s backup technologies, responded to the incident with their observations and suggestions for tools to consider. Well worth a read.

                    As we said at the beginning, there is no room for any schadenfreude. Today this is GitLab, tomorrow it could be anybody. Admins are people and people make mistakes. The only solution is to try and make making a mistake that risks production data as difficult as possible through scripting and automation, regularly ensuring that backups are happening successfully and that backups will actually restore in practice.

                    One positive thing to come out of this will be that lots of people in the tech industry will checking their backups today (I know we are). Another was the #HugOps hashtag where people sent their best wishes to GitLab on Twitter. We certainly echo that sentiment.