The Software Security Crisis: Evidence from the Trenches

February 13, 2019

Category

Technical Difficulty

Reading time

At this exact moment, there must be over a thousand computer system processes running, which, if compromised, could easily harm me personally. They include systems on my mobile phone or laptop, as well as at my bank and all the numerous companies and organizations that handle my data or where I have accounts. Many more processes have the potential to harm not me specifically, but society as a whole, e.g. by messing up production or disrupting supply lines.

Increasingly, we rely on secure computer systems for more or less all areas of our life. Yet at the same time, by all accounts, the frequency and impact of cyber attacks are on the rise (1, 2, 3, 4, 5). There is a widening rift between our need for safe software and the security that is being delivered, even though developers are increasingly aware of the challenges that they face. In fact, developer awareness is increasing exponentially, as we'll now see.

Developers are increasingly aware about security

The platform LGTM.com analyses open-source software development on the level of commits, which are the individual units of change a developer checks into a code base. This yields a corpus of 30,127,131 commits on LGTM. Commits all come with a short description called a "commit message", and I had a look at what those descriptions said.

For example, about 1 in 35 commit messages mention the word "bug", and 1 in 90 mention "feature". These numbers remained stable over the last few years. Bugs and features have always been important, and that doesn't seem about to change.

But more and more commits talk about "security". Not all of them use the word in the cybersecurity sense though3, so the picture gets even clearer when we check for a word like "vulnerability". This word is almost exclusively used to refer to software vulnerabilities4.

mentioning the word "security". 9 were directly about security considerations in the code. 4 were touching on the subject of security in passing, e.g. as a minor consideration in a long commit message. 5 mentioned names (web addresses, variables, filenames) that cotain the word security, but they are not about cybersecurity. 2 I couldn't easily classify.

mentioning the word "vulnerability". 19 of them were talking directly about cybersecurity. The last one I couldn't easily classify.

more developers are thinking about security 1

Finding 1: There's an exponential growth in the number of commits addressing security vulnerabilities in open-source software.

This explosion in the use of 'vulnerability' in software development is all the more striking since it is not reflected in the general population. Although 'vulnerability' is tied quite specifically to software security, Google Trends show at most a modest general increase in its use. The exponential1 growth in relevance is specific to software development.

But code doesn't improve

So, developers have awakened to the importance of secure code. A glorious future must be upon us, in which all remaining vulnerabilities will soon be detected and eliminated. Right?

Unfortunately not. In fact, the number of vulnerabilities is on the rise.

LGTM automatically checks a large number of open-source software projects for potential security problems. At the time of writing, this included 51959 JavaScript projects, 22387 Python projects, 9549 Java projects, 4181 C++ projects, and 1442 C# projects.

Using this data we'll check for any recent trends by comparing the number of commits that introduce new problems against the number of commits that repair them. It turns out that for every 2 commits that repair a vulnerability, there are 3 that introduce a new one. Surprisingly, that number hasn't shown an overall improvement during the last few years:

more problems introduced than fixed 1

Finding 2: The security debt of open-source software is increasing.

In a sense, it's no great wonder that more issues are being created than eliminated. More software is being written than deleted, after all. But it's still an unsatisfactory state of affairs, and in fact more security problems appear to arise than could be explained by the increased amount of software: the red line showing the increase in security issues is consistently above the blue line showing the growth of total code.

So maybe growing developer awareness is just a drop in the bucket2? Or maybe the growth in concern about security is offset by an increase in complexity, as systems become more interconnected and harder to secure? Whatever the reason, the open-source community's security debt is increasing.

How can we get better?

A recent analysis by my colleague Tom Bolton suggests that when code quality is concerned, there is really only one silver bullet: complete and in-depth code review. Make sure your code gets checked thoroughly. The more reviewers there are going over your code, the cleaner it will be in the end.

Finding security concerns can be especially challenging though, requiring special skill and often the ability to keep in mind disparate parts of the program, which may work together to create the vulnerability. This is an area where QL, the analysis engine LGTM was built upon, shines. This is evidenced by the number of high profile vulnerabilities found and fixed because they were flagged by LGTM (e.g. here and here, or more generally here).

LGTM is aware of the increasing importance of application security and is continuously improving the range of vulnerabilities it can find. In 2018, LGTM started open-sourcing its queries, accepting contributions from security researchers from companies such as Google and Microsoft. The results are already very visible: there has been a sharp rise in the security issues LGTM helps developers find.

LGTM security contributions 1

Finding 3: LGTM is identifying more security problems in open-source software than ever before.

If you've got a stake in an open-source project, it's easy (and free!) to let LGTM help you keep your code safe through enabling automated code review. That way, you automatically add LGTM as a reviewer every time a contributor suggests a change to your code base, testing whether it would leave your software open to attack. It's an extra pair of eyes that's particularly adept at flushing out security issues before they even get merged.

when they merely mean "increasing a lot". However, in the curve shown above the growth does genuinely appear to warrant the label "exponential".

are still lagging behind the actual number of vulnerabilities. And of course, not every commit that talks about vulnerabilities fixes them.


  1. I've spot checked 20 randomized commits

  2. I've spot checked 20 randomized commits

  3. One of my personal pet peeves is people using the phrase "increasing exponentially"

  4. Even the most recent frequencies of commits talking about vulnerabilities