Data flow analysis and path exploration in LGTM

February 28, 2019


Reading time

Data-flow analysis and taint analysis are core techniques for any researcher who looks at source code to find or exploit vulnerabilities, such as when performing variant analysis. The process usually involves defining a set of untrusted “sources” and a set of unsafe “sinks”, and then looking for places where data flows from a source to a sink. This can be indicative of a problem, unless there is some form of sanitization.

For example, Cross-site-scripting, SQL Injection, Command Injection and Unsafe Deserialization vulnerabilities all involve untrusted user data flowing through code to a location where it's assumed to be safe.

QL (the query language that powers LGTM’s analysis) has extensive support for writing data-flow and taint-tracking queries: simply define the sources, sinks and sanitization steps, and the data-flow libraries will do the rest of the work, showing you the (source, sink) pairs where data flows from one to the other.

However, just having the list of sources and sinks is often not quite enough information… Paths between the sources and sinks can often be highly complex, and involve many different files. As a result, it can be hard to see how a particular result pair was produced, and to verify and fix the problem.

To alleviate this, we’ve worked hard to improve both QL and LGTM to allow these paths to be explored, and today we’re happy to announce that it’s now also possible to explore these paths directly in LGTM:

Exploring paths in LGTM

screenshot 1

When you view certain alerts on LGTM, you will now see a “Show paths” button. If you click the button, you’re presented with a popup that will show you all the steps involved in the data going from the source to the sink.

screenshot 2

Which queries are compatible?

Any QL query that has @kind path-problem in its metadata will produce alerts that allow you to explore the paths. Such queries are referred to as path queries. We’ve already converted all of our standard open source data-flow and taint-tracking queries, which you can see on GitHub, so all the standard queries that catch things like XSS, Command Injection, SQL Injection, Unsafe Deserialization, etc. will produce alerts that allow you to explore the paths.

Additionally, you can also write your own path queries and add them to your repository. The alerts for these queries will be displayed on LGTM alongside the results for our standard queries, and you’ll be able to explore the paths in exactly the same way. We’ll be going into details of how to write your own data-flow queries in our Introduction to variant analysis blog post series.

Beyond this, you can also explore paths in QL for Eclipse if you’re using that plugin to write and run QL queries locally.

Check it out now

Take a look at the alerts for the Java XSS query to see this new feature in action!

If you have further questions or comments, feel free to open a discussion on