The best predictor of code quality is simply how much code review occurs within a project (see previous post). It's vital to foster a positive review culture that encourages more pairs of eyes, rather than more time staring.
So more independent reviewers looking at code is good. But their efforts can be organized in different ways. In particular, I want to investigate whether the ‘hierarchy-ness’ of the review process impacts code review. Hierarchy means there is a ranking or ordering of members in a group and it’s not necessarily a bad thing. In fact, it naturally arises in many systems, including among Wikipedia editors. How does hierarchy between code reviewers impact pull requests?
Who reviews who?
Developers have many separate—and sometimes conflicting—ideas for a code base. So developers propose their change ideas in a pull request, which are then reviewed by other developers. Pull requests are great for spotting bugs, discussing improvements, removing vulnerabilities, or simply saying thanks. Once a pull request gets the green light, the changes are merged into the code base.
I defined the ‘opener' as the developer who opened the pull request, and the ‘reviewer' as someone who wrote a review comment on a pull request. For each project, I constructed a graph of all interactions between reviewers and openers. I added an edge between developer A and developer B if either has left review comments on the other's pull requests. I took into account if developer A reviews more pull requests of developer B, or vice versa by making the edges directed. The image below illustrates a review graph with arrows between developers referring to reviews.
For example, if Jim has left a total of 3 review comments on pull requests opened by Sarah, but Sarah has left 7 review comments on Jim’s pull requests, then I add a directed edge from Sarah to Jim. The final result is a graphical representation of who-reviews-who.
A measure of hierarchy
I now have a graph representing the interactions between reviewers, generated from a project’s pull request data. How do I quantify the ‘hierarchy-ness’ of that project?
Generally, when people think of hierarchy, images of pyramidal structures like food chains come to mind. At the top of the pyramid, you have the apex predator, who feeds on organisms below, but isn't devoured by any organism above. Organisms at the bottom of the food chain don’t feed on anyone. This who-eats-who example is a highly hierarchical system.
Now consider the who-reviews-who graph for a project where only one developer does the majority of reviewing. The other developers in the project merely open pull requests that get reviewed by the ‘apex’ reviewer. I want a hierarchy metric that quantifies this kind of project as having high hierarchy compared to, say, a project where everyone reviews everyone else.
No single metric perfectly captures hierarchy, but the flow hierarchy4 is my metric of choice. Formally, a graph’s flow hierarchy is the percentage of edges not involved in cycles. A graph in which every node is connected to every other node, has a flow hierarchy of 0%, since all its edges are involved in cycles. On the other hand, a directed acyclic graph (i.e. a tree), lacks any cycles, and hence its flow hierarchy is 100%. Higher flow hierarchy, in general, implies a more hierarchical review process.
Flow hierarchy in action
How does the flow hierarchy metric relate to real projects? Let’s look at two Github projects to see if their review structures have visible differences.
The first project is the jupyter/jupyter repository. Its review graph is shown below6, with node sizes proportional to the number of review comments made by that developer, and edge widths proportional to the sum of review comments between two developers.
Despite a few developers standing out with more reviews, the developers of jupyter tend to review code by many other developers, as implied by the many overlapping edges in the centre of the graph. No matter how you arrange the developers of jupyter, you can't ‘untangle' the maze of overlapping edges in a sensible way. This ‘tangled-ness' is characteristic of low hierarchy review structures, and indeed the flow hierarchy for this graph is 33%.
Let’s now consider a project with a high flow hierarchy value. Below is the review graph for miracle2k/webassets, with a flow hierarchy value of 92.9%. Straight away we see few edges overlap, and correspondingly few cycles.
Interestingly, two main developers act as central hubs, and the other developers are generally isolated from everyone else (most nodes have only one edge). This review graph is a great example of a high-hierarchy project.
Less hierarchy, more reviews!
We know that one of the strongest indicators of code quality is simply how many pairs of eyes review each pull request (see the previous blog post. But what kind of review culture encourages more pairs of eyes to review in the first place? Spoiler alert: the less hierarchical ones!
To quantify how many pairs of eyes are reviewing each project, I calculate the number of reviewers per opener. This measure is more independent of project size than simply counting the number of reviewers. Higher reviewer-opener ratios are linked to higher code quality scores.
Using pull request data from 1000+ Github projects, I find a negative correlation between the flow hierarchy and the reviewer-opener ratio (r=-0.45, p<1e-6). In other words, less hierarchical projects have more reviewers. This makes sense: if there are no barriers to whose work you can review, then developers are more likely to engage in the review process7.
To plant the seeds of a healthy review process, start fostering a less hierarchical review culture within your project. Easier said than done? Probably, but if you succeed, there are plenty of benefits. As we shall see, improved code quality8 isn’t the only bonus you get.
Note that once the effect of simply “more reviewers” is removed, I don’t observe any direct code quality benefit from a less hierarchical review process. However, lower hierarchy has two other big bonuses...
Less hierarchy, faster pull requests!
Quick turnaround times for pull requests are desirable. Maybe you have a project deadline or other developers need to build on your pull request; either way, we don’t want pull requests to stall.
What kind of project environment encourages a quick turnover?
I define the pull request duration as the time between the opening of the pull request and when it’s merged or closed. For each project, I calculate the median duration across all pull requests. Pull requests that are never closed are excluded from the analysis.
The effect is substantial: projects with flow hierarchy values above 50% take on average 38% longer to process pull requests relative to projects with less than 50% flow hierarchy.
But perhaps hierarchy is unrelated to faster pull requests, and the effect is simply due to more reviewers? No:
Even when I remove the dependence (linear or quadratic) of the flow hierarchy on the reviewer-opener ratio, the positive correlation between the flow hierarchy and duration remains statistically significant. This implies that given two projects with the same number of reviewers, less hierarchy is indeed linked to faster pull requests.
Of course, we don’t want to sacrifice the thoroughness of code review for faster pull requests. Reassuringly, I find no relationship between LGTM’s code quality score and pull request duration. So faster turnaround doesn’t imply a decline in review quality.
Less hierarchy, more merging!
Pull requests can contain a sizable chunk of work, and if that work never gets merged, it’s a pity since work has been wasted. Of course, you don’t want to merge bad work, but if you can easily salvage it and turn it into good work, that’s certainly preferable. I find that, on average, a higher percentage of merged PRs do not correlate with reduced code quality (again measured by LGTM).
This suggests that a higher merge rate usually means better utilization of developer effort, as opposed to lower review standards.
So you should aim for higher merge rates, but how do you get there? Let’s see how hierarchy within the review process impacts how many pull requests are merged. For each project, I calculated the percentage of pull requests merged, and then correlated that with the flow hierarchy.
Again, less hierarchy is a good thing, since less hierarchical projects get more of their pull requests merged (r=-0.35, p<1e-6). The relationship holds within all three languages and remains statistically significant even after removing the dependence of flow hierarchy on the reviewer-opener ratio.
So not only does less hierarchy lead to faster pull requests, but those pull requests are more likely to make it into the code base.
What does this mean for you?
Do we want higher code quality? Do we want to merge pull requests faster? Do we want to avoid wasted effort? Of course we do! Then the empirical data suggests we should aim for less hierarchy in the code review process, so everyone gets a chance to review everyone else. The everyone-review-everyone sentiment is shared by Kelly Sutton, who emphasizes this point in his blog post9 on code review:
I find that distributing reviews around to different members of the team yields healthier team dynamics and better code. One of the most powerful phrases a junior engineer has in a code review is, ‘I find this confusing.’ These are opportunities to make the code clearer and simpler. Spread those reviews around.
Good consequences will follow from an inclusive review culture, where reviewing becomes mutual between all developers. Less hierarchy is an effective method to improve your code review process.
Maybe you’re ahead of the game, and everyone within your project already reviews everyone else's work; is there anything else you can do? LGTM’s automated code review is like getting another pair of eyes for free. LGTM will review pull requests opened by anyone, no matter what part of the code base they’re working on. So enabling LGTM’s automated code review instantly gives your review process a boost.
Image Credits: Photo by Robert Bye
Treewidth decompositions are performed on undirected graphs, but there are attempts to extend the definition to directed graphs.↩
An example of a random walk hierarchy measure: https://www.nature.com/articles/srep17994.↩
Google’s PageRank algorithm is based on a random surfer traversing links to different websites, so is not separate from other random walk based methods.↩
The implementation of flow hierarchy in NetworkX: https://networkx.github.io/documentation/networkx-1.9/reference/generated/networkx.algorithms.hierarchy.flow_hierarchy.html, which provides a reference to the original paper.↩
Some hierarchy measures, like the treewidth from a treewidth decomposition, produce discrete values where most projects have a treewidth of 1, 2 or 3; this is not helpful for quantifying variations in hierarchy between projects.↩
The low-hierarchy graph was plotted using NetworkX with a Kamada-Kawai layout.↩
Causation can also run in the opposite direction: projects with a high proportion of reviewers could tend towards less hierarchy.↩
It should be noted that once the effect of simply “more reviewers” has been removed, I could not observe any additional code quality benefit to having a less hierarchical review process.↩