Should Curve25519 keys be validated?

While analyzing Signal with Markus, I noticed that Signal’s Curve25519-based ECDH doesn’t validate public keys, and in particular will accept the 0 point as a public key—leading to a shared secret equivalent to 0 regardless of the value of the private key scalar. In contrast, libsodium will return an error if the shared secret happens to be 0, and Wire now performs this verification as well, after we reported that it was missing in our report.

We reported this behavior to Moxie, who argues that public keys should not be validated, essentially because if a peer is malicious, then they could do much worse than sending invalid keys. Trevor Perrin raises similar objections in the context of Noise, arguing that a zero check “adds complexity (const-time code, error-handling, and implementation variance), and is not needed in good protocols.” DJB, the designer of Curve25519, also claims that Curve25519 keys don’t require validation in ECDH, but may for “some unusual non-Diffie-Hellman elliptic-curve protocols that need to ensure ‘contributory’ behavior.”

If Moxie, Trevor, and DJB argue that public keys shouldn’t be validated, then the debate is over.

Really?

Thai Duong has a different take, Matt Green is skeptical, and I am too.

So why would it make sense to validate ECDH public keys?

  • The first thing you learn in any infosec class is to reject invalid inputs, and check return values for errors, even if there’s no obvious exploit in sight. Doing this is sometimes called “defense in depth” or “best practice”.
  • The point of Diffie-Hellman is that both key shares should equally contribute to the shared secret, so that the protocol doesn’t allow key control, a desirable attribute of any authenticated key agreement protocol, as discussed in this MQV paper. If the protocol allows a peer to force the shared secret to be zero, or more generally to lie in a subgroup, then the said peer can surreptitiously weaken the protocol’s security (objection: “but why would a peer be malicious?”).
  • It’s costless: adding a zero check is ten lines of code tops, which is unlikely to introduce new vulnerabilities nor to hurt performance.
  • It reduces the risk of non-obvious attacks. Take Signal’s protocol, for example. If Alice generates all-zero prekeys and identity key, and pushes them to the Signal’s servers, then all the peers who initiate a new session with Alice will encrypt their first message with the same key, derived from all-zero shared secrets—essentially, the first message will be in the clear for an eavesdropper. Alice can deny being malicious, arguing that her PRNG failed. That’s just an example scenario—granted, far-fetched—but there might be others, and checking for invalid keys is probably easier than proving that they will never be exploited.

The bottom line is that omitting key validation may be fine in many cases, but with today’s complex protocols and scenarios it’s just playing with fire.

Thanks to Aaron Zauner, David Wong, and Matthew Green for comments on a preliminary version of this post.

UPDATE1: Thai Duong points out this real-world protocol that is broken if Curve25519 keys aren’t validated.

UPDATE2: Renamed the post from “Should ECDH keys …” to “Should Curve25519 keys…”.

UPDATE3: Trevor Perrin wrote a detailed response arguing that validity checks (for zero and other invalid keys) are superfluous in good DH protocols and risky. He also cites a 2014 post by George Danezis, that I encourage you to read too.

2 comments

  1. Now this discussion has became slightly more interesting.

    Click to access 806.pdf

    May the Fourth Be With You: A Microarchitectural Side Channel Attack on Real-World Applications of Curve25519
    by Daniel Genkin, Luke Valenta, and Yuval Yarom

    Here’s a real world timing attack against GnuPG’s implementation Curve25519. Since libgcrypt doesn’t use constant-time field arithmetic (!), it’s possible to inject malicious input with invalid curve points and observe the timings, hence recover the private key “in as few as 11 attempts”. The malicious input fails the output checking and triggers an error, this is when the timings have been leaked.

    GnuPG has fixed by adding input validation.
    https://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=commit;h=bf76acbf0da6b0f245e491bec12c0f0a1b5be7c9

    I was shocked to find libgcrypt does not use constant-time arithmetic, which should be the proper way of doing it. But at least we now know input validation can protect you from timing attacks if you do not implement constant-time field arithmetic correctly (you should).

Leave a Reply