Clean Code: Analyzing Uncle Bob’s principles in practice

July 17, 2019

Category

Technical Difficulty

Reading time

Everyone wants their code to be clean and free of bugs. But codebases will deteriorate if you don't constantly pay attention to their quality.

So Robert C. Martin (aka Uncle Bob) coined the Boy Scout Rule. It encourages people to treat their codebase the way scouts treat a campsite and clean up not only after themselves but also after others:

Leave your code better than you found it.

It's a great motto, but do people actually observe it?

Using LGTM's wealth of analysed open source code data, I set out to find how often during the last year people actually left their code in a better state than they found it. Turns out that there was just one day in 2018 when they did.

That day was exactly one year ago today.

Finding the best day

LGTM analyses a large portfolio of projects for code defects. On the average day in 2018, it analysed 6,633 commits that actually changed source code (rather than, for example, a resource file). Some of those commits increase the number of errors and warnings1 LGTM finds. These alerts serve as a good proxy for actual runtime bugs in the code. Other commits reduce the number of such alerts, but these are rather rare, constituting only 2.7% of all commits in 2018.

There is a certain noise component in how many alerts are fixed on one day. I'm using a binomial test to pick out which deviations from the base rate are statistically significant. After taking into account the full number of days I tested2, there is exactly one day labeled as statistically significant at the 0.1% level3:

Tuesday, 17th of July 2018.

This is also the only day where the number of commits that fixed errors and warnings outstrips the number of commits that introduced new ones: 300 versus 282. The day that came closest was July the 20th, with 94 'good' commits versus 96 'bad'.

What was so special about that day?

Google tells me that July 17 is World Emoji Day. Happy World Emoji Day 👋!

But while I adore emojis, I doubt they're responsible for the sudden adoption of the Boy Scout Rule.

Compared to other days, the 17th appears to have been both a special day in the summer 2018, and a special Tuesday in general. The plot below shows it as clear outlier; only the series of four following Sundays comes slightly close4.

July 17th was the only day where the OSS community appeared to adhere to the Boy Scout Rule

July 17 saw 260 out of 7047 commits fixing more errors and warnings than they introduced. That's 71 more than expected. Fully 61 of them are accounted for by one single project, OpenLayers, who fixed a mix of different alerts in their files, mostly regarding class inheritance. When removing OpenLayer's contribution, the rest looks rather like random fluctuation.

Normally, I prefer finding general patterns over individual explanations. But in this case, I think there's a larger lesson that lends some extra poignancy to this investigation:

If all it took for an outlier to appear and the sign of net alerts to change from 'more created' to 'more eliminated' was for one single project to decide to have a spring clean, then having a substantial impact cannot be that hard. It must be that there just aren’t enough Girl Scouts and Boy Scouts.

So go to your favorite project on LGTM.com now, and start fixing. And maybe I'll pick up your improvements when I look at the best day of 2019.

Footnotes


  1. For this investigation, I did not count recommendations. They are only recommendations, and some people might consciously decide not to fix them.

  2. Even if every single test has a low chance of a false positive, the chance for at least one false positive when you run a battery of 365 tests (one for each day) can be quite high. Since that will be the date I report, that would be bad, and I need to correct by making my tests appropriately tougher. I do that with a method called Holm-Bonferroni adjustment.

  3. p = 0.00022 after using Holm-Bonferroni.

  4. As a general rule, Sundays exhibit more commits fixing mistakes and more commits introducing them. This may have to do with Sunday commits being more often to individual projects and rarely to commercial ones.