Buffer’s March Engineering Report: 5 Whys, Reliability, Open Source, and More

Apr 16, 2014 5 min readReports
Screen Shot 2014-04-11 at 9.12.15 AM

March was an incredible month for the engineering team at Buffer.  Looking back, it’s pretty exciting to see what all we’ve accomplished for moving the ball forward and how we’ve executed on some of our engineering goals.  Here are some stats and tl;dr for March:

  • 1 person converted to full-time (yay Dan Farrelly!)
  • 6 “5 whys” conducted
  • 1 new open source project started
  • 15 minutes of system wide downtime
  • Switched to New Relic for platform monitoring
  • Bufferbot (hipchat hubot) now deploys our app

5 Whys

One of the big and exciting changes we made in March is to be slightly more disciplined on reflecting back when something unexpected or unintended occurs. This is something every startup faces, and we’re no exception.  It seems like almost daily there would be something that occurs that’s not ideal and could have been prevented or could be prevented in the future. Some examples of these are unintended bugs, downtime, a mistake made by a developer, or a negative experience from a customer.  We’ve been heavily influenced by Eric Ries’ lean startup methodology, and one thing he suggests is conducting the 5 whys to learn from mistakes.

March was a great time to test this out as we had our fair share of issues. Conducting the 5 whys was slightly rough initially as we learned a bit more about what works, and we’ve now gotten into a great flow of doing this more often.

The idea behind the 5 whys is to dig at least 5 levels deep into a particular problem.  The idea originates from Taiichi Ohno while developing the Toyota production assembly line. The 5 whys method states that usually an unintended consequence could have been prevented on at least 5 different levels.  During the 5 whys we have a “5 whys master” who leads the discussion.  We try to include everyone who was involved in some way with the unintended consequence we’re discussing on the chat (we conduct these over Google Hangouts).  The “5 whys master” will ask the group 5 whys, and the group answers them.  At the end of the exercise, we go through each why question/answer pairing and come up with 5 correlated “corrective actions” that we agree on.  We assign one person the responsibility of owning that corrective action so that the issue is hopefully prevented in the future.

What I really like about this is that it lets us worry about issues when they happen, and it helps us work towards ensuring they won’t happen again. At the same time, it lets us not have to worry about issues that haven’t happened.  I now trust if something comes up that we didn’t foresee, we’ll conduct a 5 whys and learn from it.  We let the 5 whys dictate what documentation we need in place or adjustments to make in our on-boarding process.

Here are some examples of the 5 why’s we conducted this month.

System wide outage for 15 minutes due to weekly digest processing

Some paying business users were sent an email regarding why they had chosen to not continue even though they were still on the trial

Blogging

One of the big things we’re trying to do more of is engineering blogging.  Transparency has been one of the key values at Buffer, and with that I’m hoping to make the engineering team have a good habit of describing lessons and experiences to the rest of the world.

We had a great first month with this new focus.  I wrote a more in-depth post about the evolution of how Buffer’s scheduling core works on Medium that was well received.  Andy wrote a post showing a sneak peak at the new Buffer iOS7 app (that’s now available!).  And here’s Colin’s blog post in March: Yaks, Alligators and Bikes.

Open Source

Continuing with our value of transparency and how we’re hoping to really contribute back to the community, we finally got some time to open source our date time picker.  Our date time picker is unique in that we allow you to first pick the date and then a time in a distinct flow.  You’ll notice this is used whenever you set a custom scheduled post. There have been some calls in the bootstrap-datetimepicker community for us to open source it, and

Niel

was able to find time in his busy schedule to set this up.  We’ve already had some great pull requests to make this much better!

Reliability

Coming off a great February, we unfortunately had a bit more downtime in March.  There was about 15 minutes of downtime as we were sending out weekly digest emails on Monday, March 10, at around 4:40am PDT.  We learned a great deal from this through our new 5 whys post-mortem habit.  So far we haven’t had any other trouble in the following weeks of sending weekly digests after this issue.

One thing I was also very excited about was setting up status.bufferapp.com.  We plan to get into a good pattern to update status.bufferapp.com whenever we’re investigating, working through, or fixed any sort of issue that comes up.  We know that when our service isn’t working as it should, the only thing that would make that experience better is if we’re honest and keep everyone in the loop as early as possible. I’m very excited to have status.bufferapp.com play a key role in our aspiration to do that.

We also switched our monitoring tool to New Relic. After a couple weeks of trialling New Relic, we found some amazing new insight and detail into how our platform was behaving.  We especially enjoyed digging into profiling traces of various important transactions.  Integrating New Relic into our architecture was  a breeze, and we hope to really use it a lot to ensure Buffer is as reliable and performant as it can be.

Security

As in the past couple months, we’ve worked through a few different reports that were submitted through our security page.  We’ve closed up a few more security holes, added more csrf checks, and added protection against brute force login attacks.  We’ve been seeing an uptick in brute force logins, so this was key for us.

Bufferbot Changes

One of the challenges for a few of the engineers on the team is slower internet.  Niel especially has usually had some trouble in deploying our main application from his local repository.  This is most likely because of Niel’s local internet speeds in South Africa.  Motivated in part because our third Buffer retreat was in South Africa, we decided to adjust our deployment flow away from a local script and push.  Now instead what we do is whenever a developer pushes to a branch, the unit tests will run on our Jenkins server.  We’ve had our own hubot we call Bufferbot that’s set up to always be on our hipchat.  We extended Bufferbot so that it would help us deploy from Jenkins to our Elastic Beanstalk environments super easily!  It also has a nice side effect of coordinating deploys in hipchat.  We now simply do @Bufferbot deploy Production-Web, and Bufferbot will take care of the rest!

Looking forward to April

Seeing as I’m writing this post on April 14, I’m quite excited to write all that’s happened in April so far.  We got quite a lot done on our third Buffer retreat, and I’m excited to share that and more with you in a couple weeks. If you have any questions at all about what we did in March I’d love to answer them!  Just comment below or tweet me!

Brought to you by

Try Buffer for free

140,000+ small businesses like yours use Buffer to build their brand on social media every month

Get started now

Related Articles

ReportsAug 13, 2020
Shareholder Update: Q2 2020 and July

Note: This is the quarterly update sent to Buffer shareholders, with a bit of added information for context. We share these updates transparently as a part of our ‘default to transparency ’ value. See all of our revenue on our public revenue dashboard and see all of our reports and updates here . It's been quite the y

OpenApr 10, 2020
Pay Analysis Update: Examining Equal Pay at Buffer in 2020

Editor’s Note: Thanks for checking out this post! We’ve released our updated 2021 pay analysis here. You can’t improve something if you don’t know that it needs to be improved. That was very true for us four years ago when we first started looking into equal pay at Buffer. We have long used a salary formula to determine all of our salaries – the same role in the same part of the world receives the same salary. That m

Buffer Shareholder Update: COVID-19 Impact and Approach

Ever since the world got turned upside down by COVID-19, it’s been “business as unusual” for everyone – Buffer included. I sent this update out to Buffer’s investors one week ago. I hesitated on whether to share it more widely, as I know a lot of companies have been impacted more severely in these times. That said, I believe it makes sense to lean into our company value of transparency, since there may be some companies this could help, and it shows Buffer customers that we will be around beyon

140,000+ people like you use Buffer to build their brand on social media every month