Threat Hunting At Microsoft: Using Semmle QL To Eliminate Bugs

May 14, 2019


Technical Difficulty

Reading time

Threat hunting: hard and repetitive

After a security vulnerability has been discovered and patched, the next two critical things you need to do are root cause analysis, followed by variant analysis: find and eradicate all variants of the vulnerability. From the specific bug, you’ll expand your search to a broader pattern, and you’ll run this search in your whole codebase. If you don't have the right tools, variant analysis can be a tedious and time intensive process, involving a significant amount of manual work. Unfortunately you have to do it quickly because you want all exploitable variants to be fixed at the time you disclose the original bug.

Semmle QL to the rescue

In a series of blog posts, Vulnerability hunting with Semmle QL part 1 and part 2, Steven Hunter and Christopher Ertl, from the Microsoft Security Response Center (MSRC), explain how they automate variant analysis, scaling repetitive manual tasks into automated checks, run across multiple codebases, continuously.

In their posts you’ll discover, with concrete examples, some key features of the Semmle QL technology:

  • Code as data. Where textual search is insufficient to capture the properties you care about, Semmle QL will extract every aspect of your codebase into a relational database, that you can query to find complex issues.
  • Extensibility. Semmle QL is a declarative, object-oriented query language. You can extend the existing classes to write your owns and easily customize the logic to fit the exact pattern you are looking for, or the specifics of your codebase.
  • Expressiveness. Your query is concise, and meaningful.
  • Comprehensive set of libraries. Don’t start from scratch for creating custom analyses. In particular, you’ll see in the posts, how the authors use the data flow library, to follow the interprocedural flow of some untrusted data across the codebase, and the taint tracking library, to mark all potential data “tainted” by the untrusted data. Hunter and Ertl say:

    QL provides a powerful global data flow library which abstracts away most of the tricky language-specific detail involved in this.

  • Continuous prevention. Once your query is written, you can deploy it into Semmle LGTM, and run it continuously during CI/CD and code review, making sure that a vulnerability and all of its variants is not only eradicated, but never re-introduced again.

Want to know more?