How NASA Saved the Curiosity Mission using Variant Analysis

June 20, 2019

Category

Technical Difficulty

Reading time

A few years ago, somewhere between Earth and Mars…

The Curiosity rover was on its way to Mars in February 2012 when a flight software developer at NASA Jet Propulsion Laboratory (JPL) discovered a mission-critical bug through manual code review. The problem was located in Curiosity’s Entry, Descent, and Landing software— the software responsible for controlling the Rover’s descent through the Martian atmosphere and landing it safely on the red planet.

The bug, which had gone undetected by NASA's rigorous testing and review, was likely to prevent the capsule’s parachutes from opening, resulting in the Rover crashing onto the red planet’s rocky surface.

The problem stemmed from a peculiarity in the C language. In C, it is legal to declare a function that takes a sized array parameter, such as void fire_thrusters(double vectors[12]). However, this is semantically identical to declaring a function with a simple pointer type: void fire_thrusters(double *vectors), which contains no size information. This syntactic oddity can lead programmers to think that the function's array parameter is guaranteed to be a certain length, when in fact C offers no such guarantee. For example, a programmer might write the following:

void fire_thrusters(double vectors[12]) {
 for (int i = 0; i < 12; i++) {
   … vectors[i]}
}
double thruster[3] =;
fire_thrusters(thruster);

Note: Pseudocode, not actual NASA code.

The pseudocode above would have no protection from passing a mismatched array. The function is declared to take an array of length 12 (presumably three data points for each thruster), however, there is no sanity checking, and a developer might call it with an array that’s too short, holding direction information for only one of the thrusters. The function will then read past the end of the array, and unpredictable results occur.

When risk is not an option

Space missions have no room for error. With millions of dollars invested, and groundbreaking scientific endeavors at stake, it’s easy to see why NASA JPL’s testing process for software is one of the most demanding in the world. So, when this mission critical bug was discovered, the JPL team found themselves in the unenviable position of needing to fix Curiosity’s code, literally on the fly.

NASA JPL makes use of many commercial code analyzers, all of which were in production use on the Curiosity mission. Despite this, the mission critical bug had slipped through every one of these static analysis checks. While patching was still possible mid-flight, the JPL team quickly realized that similar, more serious problems might have also been overlooked by their static analysis checks. The team needed to know if any variants of the problem existed in other places in the codebase.

The power of variant analysis

JPL contacted Semmle for help discovering where other defects might exist in the Curiosity control software.

In just 20 minutes, our research engineers produced a QL query and shared it with the JPL team. The query finds all functions that are passed an array as an argument whose size is smaller than expected.

import cpp

from Function f, FunctionCall c, int i, int a, int b
where f = c.getTarget()
 and a = c.getArgument(i).getType().(ArrayType).getArraySize()
 and b = f.getParameter(i).getType().(ArrayType).getArraySize()
 and a < b
select c.getArgument(i), "Array of size " + a
 + " passed to $@, which expects an array of size " + b + ".",
 f, f.getName()

Note: Actual query to find variants of initial bug.

When JPL ran the query across the full Curiosity control software, it quickly identified the original defect, as well as 30 other variants elsewhere in the code. Three of which were in the critical Entry, Descent, and Landing module and would have resulted in a failed mission.

With these problems pinpointed, the JPL engineers were able to fix the code, and patch the firmware remotely. Not long after, Curiosity survived the seven minutes of terror as it descended through the atmosphere before safely landing on Mars.

Code standards

At NASA JPL, all mission-critical software must comply with their internal risk-reduction coding standards.

Klaus Havelund and Gerard J. Holzmann, leaders of NASA JPL’s Laboratory for Reliable Software, published a paper explaining the motivations for the stringent software certification implemented at NASA. In their paper they write:

... we can often point at a mission anomaly or mission loss that was caused by the violation of [an] underlying principle [...] An example of a risk-related rule in this coding standard is the abolition of all dynamic memory use and of recursive code.

This is just one of many rules that NASA JPL’s flight software developers have to abide by. Due to Semmle’s help on the Curiosity mission, NASA now uses the Semmle QL to implement the rules that define their full Institutional Coding Standard for Java and C.

Note: Post revised from its original publication on LGTM.com on 02/14/2017

Join us in securing the software that runs the world!

Enter your email address below to stay up-to-date with Semmle news, security announcements and product updates.

Loading...