Open source projects: Who contributes the best code?

October 25, 2018


Technical Difficulty

Reading time

Many large software companies recognize the benefits of having their coders contribute to open source projects. Sometimes, these contributions are like charitable donations to a community effort. Sometimes, it's a company's own project that's worked on in a transparent way that encourages outside contributions. Often, it's something in between.

But there's one thing that open source contributions always are: open. Provided that you can recognize someone as an employee of a certain company (for example by virtue of a company email address), they automatically represent this company. The quality of these contributions can benefit or hurt the company's reputation, since they are judged by prospective customers, collaborators, and employees. And also by LGTM.

In this blog, I compare companies (Google or Microsoft?), organizations (Apache or Mozilla?) and email providers (Gmail or Hotmail?). My goal is to find out which email domains offer the best contributions to the open source communities on GitHub and Bitbucket. Who writes the best code?

Judging and attributing contributions

I analyze commits made to open source projects listed on LGTM. This includes all medium and large projects on platforms such as GitHub and Bitbucket. I examine contributions in the three1 languages Java (567,202 commits), Python (1,037,907 commits), and JavaScript (2,343,927 commits).

LGTM checks each commit to see whether it introduces or fixes alerts in the code base. These alerts are related to run-time bugs (see my previous blog posts about the significance of LGTM alerts here and here). If two commits change the size of the code base in a similar way (for example, they both increase it by 10 lines of code) you can compare their quality: If one commit fixes many more LGTM alerts than it causes, and the second one fixes only a few, the first one is better. Based on this, any commit can be assigned an "alert rank" between 0% (worse than all other similar commits) and 100% (better than all other similar commits) as described in another blog post.

Each commit has an author, and each author has an email address. I use the email address's domain name to find out which organization an author belongs to. I compute the mean alert rank of each organization to determine which one is best. Of course, if an organization is only responsible for a single commit, which happens to be excellent, that organization might easily lead a naively compiled table. Most likely, that's just chance and does not actually represent a phenomenally good company. I take the following two actions to avoid such false positives:

  1. I only consider the 20 most common email domains. Using more would incur a larger risk of false positives.
  2. I compute confidence intervals2, taking care that all medalists are statistically significantly better than the average commit. It doesn't guarantee that the order is completely correct, but at least the prizes don't go to anyone who doesn't deserve them.

What is it that sets the organizations that produce the best code apart from the rest? Of course any attempt at an exhaustive treatment of this multifaceted question would exceed the scope of this blog post. But I provide at least a flavor of background by also comparing whether a typical commit is more focused on expanding or pruning the code base and whether the commit is focused on few files or spread out over many3.

I only report results if the commits from an email domain are statistically significantly different from other commits4.

Who writes the best Java code?

In the following table, the calculated confidence interval5 is blue, with the actual measurement marked in green.

medal gold medal silver medal bronze
1902 commits 11316 commits 6369 commits
107 contributors 106 contributors 270 contributors
+0.4% average quality java scores 1 +0.2% average quality java scores 2 +0.2% average quality java scores 3
commits focused on one file, mainly new code commits large, focused on one file, refactoring existing code commits large, spread over many files, mainly new code

Java coders from Microsoft have the best average quality. (There aren't very many of them working in open source projects, which is why the confidence interval is rather broad.) Runner-up Pivotal is mainly a Java company, best known for the Spring framework. LGTM's security team recently uncovered some serious vulnerabilities in Spring (as described in previous blog posts: 1, 2, and 3). Pivotal has since corrected them. Their efforts to increase quality allow them to narrowly push Red Hat to third place.

Who writes the best JavaScript code?

medal gold medal silver medal bronze
8408 commits 15635 commits 3424 commits
253 contributors 620 contributors 177 contributors
+1.1% average quality js scores 1 +0.8% average quality js scores 2 +0.3% average quality js scores 3
commits large, focused on one file, mainly new code commits large, mainly new code commits large, focused on one file, mainly new code

The software company SAP's open software contributions are chiefly JavaScript, and in that category, they prevail by putting out large quantities of clean, new code (while neither introducing nor fixing many errors). Less than half a percent of all their commits change the number of errors, but since they usually add quite a bit of code, that's rather good (on average, more code means more errors). In contrast, runners-up Google and Red Hat have a much higher chance of fixing issues in their commits, but also of introducing new ones.

Who writes the best Python code?

medal gold medal silver medal bronze
10412 commits 1689 commits 2485 commits
599 contributors 192 contributors 52 contributors
+0.9% average quality python scores 1 +0.9% average quality python scores 2 +0.3% average quality python scores 3
commits large, focused on one file, mainly new code commits large, focused on one file, mainly new code commits small, spread over many files

Google had to concede the first place in the JavaScript category, but they take the trophy for Python. It's a photo finish win over Facebook. Both companies nurture a Python portfolio with a definite slant towards machine learning projects (tensorflow, caffe2, pytorch), and they usually produce commits that add a liberal amount of code to a small number of files. This is quite the opposite strategy of bronze medalist Mozilla, whose commits are smaller than average, but touch a relatively large number of files.

Where are the big differences?

The rankings for Java are closer than for the other languages. There, the best domains aren't much better than the worst domains, although the difference is still statistically significant. This matches the effect observed in my last blog post, where the quality of Java code appears less affected by the developer's emotions than JavaScript or Python code.

A possible explanation is that Java is a statically typed and compiled language. As such, many mistakes are already caught by the compiler. Any differences in quality might be more subtle than in JavaScript or Python, where it's much easier to go wrong.

What about the free email providers?

Eight of the common email domains are not limited to a particular company or organization, but are open to everyone. In order of number of commits, these are,,,,,,, and I expect their users to be a more diverse group than the employees of any particular company. However, there may still be some general trends. And indeed there are.

There seem to be two kinds of providers:

  • Users with email addresses from,,, and are pretty average. 5 of their 11 scores are very slightly better than average, 6 are very slightly worse, but the differences are not statistically significant.
  • Email addresses from,, and are bad news. They all perform substantially below average, mostly6 statistically significantly so.

The following plot shows that the two categories are nicely separated for all three languages:

plot of frees 2

The shaded areas in the above plot represent confidence regions7, green for good and blue for bad.

That demonstrates that using a free email address doesn't need to be a bad sign. It depends on the exact domain.

Why do hotmail, 163, and qq perform so badly? I know very little about the Chinese internet scene, and what using an email address from the big portal sites or signifies. But I do know that hotmail has a reputation to also include many less technically skilled users, to the point that many high profile recruiters are willing to go on record to talk about the negative impact of hotmail addresses on job applications.

Final results

Overall, coders with email domains hotmail, qq, and 163 contribute poorer quality code. Coders with email domains google, yahoo, and outlook supply code of a similar quality to average, or below average, coders with commercial email addresses. Coders who work for Microsoft, SAP, or Google contribute the highest quality code.

A handy trick to boost your score

Everyone makes mistakes, but not everyone has to suffer the consequences. If you enable automated code review for pull requests, LGTM warns you whenever someone is about to introduce new problems into your code base. That way, you never have to worry about subpar scores again. Oh, and you'll also have fewer long-term headaches and run-time bugs, if you care about that sort of thing.

Conflict of interest statement

Some of the companies that I investigated (in particular Microsoft and Google) are clients of the company Semmle where I am employed. I have not treated these companies differently, nor have I received any suggestion that I do so.

Image credits

Title image: Hitesh Choudhary

Emojis: Noto project

  1. LGTM also analyzes projects of the C family, but I'd prefer to wait until I have more data before drawing any conclusions.

  2. I computed the confidence intervals using a bootstrapped test based on 100.000 simulations. The intervals have a confidence level of 95% (one-sided: since I'm only interested in winners, this is one of the rare instances where one-sided tests are actually appropriate).

  3. I also take the commit size into account here—a large commit touching 3 files counts as more focused than a tiny commit touching 2 files.

  4. For commit size, this is tested with a Mann-Whitney-Wilcox test. The expanding/pruning distinction and the focused/spread distinction are tested with bootstrapped p-tests. In each case, I use a significance level of 5% (two-sided).

  5. The confidence interval is a 90% interval. This makes the certainty that the true value is at least as large as the lower end of the interval 95%, which is the usual criterion for statistical significance.

  6. Only the Python contributions from and fail a significance test at 5%. However, there are very few contributions from these domains to Python projects. In a sense, there just isn't enough data to condemn them.

  7. To be precise: I compute 95% confidence intervals for the each email provider in each dimension. For example, the average for commits in JavaScript is between -.2% and -.6% worse than the baseline of all commits. The shaded areas are the convex hull defined by the intervals marked on the plot.