Advancing Rust Support in Semgrep

At Kudelski Security, we perform quite a few security and cryptography reviews involving Rust code. Rust support in tooling has been lacking. We’ve developed some tools internally to assist in our reviews, but we were looking for a more general and mature framework that supports multiple languages. Semgrep is great great tool for performing static analysis and its support for multiple programming languages makes it valuable for developers and security professionals alike. Experimental support for Rust was recently added to Semgrep and we will see how to use it by going through a quick example, which will demonstrate some limitations. Then we’ll see how we improved this Rust support increasing usability for the scenarios we and many others find ourselves in everyday.

First we install Semgrep from PyPi:

$ pip install semgrep

Then, we create a file named rules.yml with the following contents. This defines a very simple rule that matches those exact two lines defining a variable x and a variable y. We could of course define more complex rules, but this is not the point here:

rules:
  - id: my-rule-id
    patterns:
      - pattern: |
          let x = 0;
          let y = x * 2;
    message: "Found a match for my custom rule."
    languages: [ rust ]
    severity: WARNING

After that, we clone our codebase and run Semgrep on that directory using our custom rules. Semgrep then tells us which parts of the codebase match those rules.

$ git clone git@github.com:ing-bank/threshold-signatures.git
$ cd threshold-signatures
$ semgrep --config rules.yml .
running 1 rules...
semgrep error: invalid pattern
  --> rules.yml:4
4 |       - pattern: |
5 |           let x = 0;
6 |           let y = x * 2;
7 |     message: "Found a match for my custom rule."

Pattern could not be parsed as a Rust semgrep pattern

But this is not working!? Well, it appears that the two lines of Rust code in the semgrep pattern are not valid Rust code by themselves. If you try to compile a Rust source file containing only those two lines, it won’t work.

$ cat test.rs
let x = 0;
let y = x * 2;
$ rustc test.rs
error: expected item, found keyword `let`
 --> test.rs:1:1
  |
1 | let x = 0;
  | ^^^ expected item

error: aborting due to previous error

Indeed, the Rust grammar is defined in a way that this should be wrapped in a function definition to be valid:

$ cat test.rs
fn main() {
  let x = 0;
  let y = x * 2;
}
$ rustc test.rs
warning: unused variable: `y`
 --> test.rs:3:7
  |
3 |   let y = x * 2;
  |       ^ help: if this is intentional, prefix it with an underscore: `_y`
  |
  = note: `#[warn(unused_variables)]` on by default

warning: 1 warning emitted

Obviously, this produces a warning, but it now compiles.

To parse the pattern defined in our rule, semgrep uses tree-sitter, a parser generator, and more specifically, a modified version of the tree-sitter-rust grammar in the case of Rust source code. Similarly to the Rust compiler, this grammar won’t accept these two lines just by themselves. So we need to update the pattern in our rule:

rules:
  - id: my-rule-id
    patterns:
      - pattern: |
          fn $F(...) {
            let x = 0;
            let y = x * 2;
          }
    message: "Found a match for my custom rule."
    languages: [ rust ]
    severity: WARNING

Note that we used some special semgrep pattern syntax here: metavariables ($F) and the ellipsis operator (…). Indeed, Semgrep extends the tree-sitter-rust grammar to support additional language features only for semgrep patterns. This lets us match any function with any parameters without having to hardcode that for every possible case.

Now, if we run Semgrep again with our updated rule, we get some matches in our test.rs file:

$ semgrep --config rules.yml .
running 1 rules...
test.rs
severity:warning rule:my-rule-id: Found a match for my custom rule.
1:fn main() {
2:  let x = 0;
3:  let y = x * 2;
4:}
ran 1 rules on 20 files: 1 findings
2 files could not be analyzed; run with --verbose for details or run with --strict to exit non-zero if any file cannot be analyzed

This would be less painful if we could write only those two lines as the pattern for our rule. Additionally, the whole function is being printed out, but we only wanted to search for those two lines. For long functions, this makes it difficult to see which line matched.

Well, this is now possible thanks to a contribution we made to Semgrep:

We extended the tree-sitter-rust grammar to support a list of statements directly as a pattern and we updated the CST to AST mapping to make it work. So it is now possible to write rules directly containing a list of statements:

rules:
  - id: my-rule-id
    patterns:
      - pattern: |
          let x = 0;
          let y = x * 2;
    message: "Found a match for my custom rule."
    languages: [ rust ]
    severity: WARNING

Semgrep now accepts that pattern and only prints the lines we wanted to match:

$ semgrep --config rules.yml .
running 1 rules...
test.rs
severity:warning rule:my-rule-id: Found a match for my custom rule.
2:  let x = 0;
3:  let y = x * 2;
ran 1 rules on 20 files: 1 findings
2 files could not be analyzed; run with --verbose for details or run with --strict to exit non-zero if any file cannot be analyzed

Note: Since these changes were recently merged, they will only appear in the next semgrep release. You will have to build from source to see them immediately.

We love being able to write rules more easily now and hope this will be helpful to others too. Our hope is that this expanded support leads to quicker identification of bugs and security issues in Rust code. Happy bug hunting.

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s