The open source community is made up of passionate coders who freely give their time and energy to contribute to collaborative software projects. In the enterprise world, companies tend to take a more proprietary approach to software, avoiding what they might imagine to be the more unmanageable, unreliable world of open source. It’s a source of much debate as to which is the better approach. Now, by using LGTM we have finally been able to get to the bottom of the issue.
We used LGTM to analyze the results of hundreds of Java codebases from Semmle's enterprise customers, and benchmarked the results against thousands of Java projects.
In the charts below, each circle represents an open source project, and each cross is a proprietary codebase. Horizontally, we show the size in terms of lines of code, and vertically, the total number of alerts found. The "pit of shame" is the top left - small, messy codebases; the "hall of fame" is the bottom right - big, clean, codebases.
Each project has been given a color, based on the vertical distance from the average trend line for all projects (whether proprietary or open source): red is bad and green is good.
In this first graph, the black trendline gives the average for all proprietary projects, while the grey trendline is the average (including open source). It’s clear to see that proprietary projects have a higher alert density.
By contrast, if we take the same view of open source projects, they are better than average:
So there you have it: open source yields cleaner code. More eyes make better software. You don’t need to be a heart-on-sleeve open source evangelist to take advantage of this method though. LGTM is here to help you ask the tough questions about your codebase, and get concrete answers to benefit your projects, whether they are open source or not.
Note: This post was originally published on LGTM.com on February 14, 2017