The KyberSlash vulnerability and the crystals-go library: A retrospective story

Introduction

In this blog post we are going to talk about a security incident which involved an open-source library developed by a student working on their Master’s thesis at Kudelski Security. The library in question is crystals-go, a Golang implementation of the NIST-selected quantum-resistant cryptographic algorithm Kyber and Dilithium (now finally approved and soon-to-be-standards, as ML-KEM and ML-DSA, respectively). This library was found to be vulnerable to a recently discovered timing attack called KyberSlash, which affects certain implementations of Kyber. It is important to stress that the library crystals-go is unmaintained: it was released by us as an open-source research project in 2021, and was not meant for production; Kudelski Security does not use, and has never used, crystals-go in any commercial product or offering. Despite this, we decided to apply an excess of caution, also because previously we forgot to properly mark the library as unmaintained, so we are not sure whether someone else is using it in the wild. Hence, we patched crystals-go against KyberSlash, and we are going to see in this post how we did it.

KyberSlash1 and KyberSlash2

Kyber (now called ML-KEM) is a key encapsulation mechanism (fundamentally, a public-key encryption scheme) based on modular lattices, which was submitted as part of the NIST PQC standardization competition. There exists a reference implementation in C, plus many other libraries in different programming languages. The vulnerability in question (albeit not a problem of the theoretical Kyber scheme itself) affects many of these implementations, and was discovered by researchers Goutam Tamvada, Karthikeyan Bhargavan, and Franziskus Kiefer, and later independently by Prof. Dan J. Bernstein at the end of 2023. It is a classical example of secret leakage through timing side-channel. Initially the issue was found in the poly_tomsg decoding procedure of Kyber, and was initially called “KyberSlash”; but later, researchers Prasanna Ravi and Matthias Kannwischer found the same issue also in the procedures poly_compress and polyvec_compress. After that, after proposal of Dan Bernstein, the adopted nomenclature became “KyberSlash1” for the specific vulnerability in poly_tomsg and “KyberSlash2” for the problem in poly_compress and polyvec_compress (and “KyberSlash” for the general issue).

The issue with KyberSlash is a variable-time division: there is a step in the code where a division is executed, whose result might take more or less time (CPU cycles) depending on the (secret) input. By crafting malicious ciphertexts and tricking a target user to decrypt them, it is possible to extract the full secret key by measuring accurately this time difference. Such kind of attacks can be devastating, and have appeared many times in the past on different security advisories, even following real-world exploitation in most cases. Therefore one must take KyberSlash very seriously.

More in detail, the issue lies at steps in the above mentioned routines, where some secret value is divided by a constant. For example:

      t  = (((t << 1) + KYBER_Q/2)/KYBER_Q) & 1;

In the code line above (which is taken from Kyber’s reference C implementation pre-fix), t is a temporary variable which stores some polynomials’ secret coefficients and executes certain algebraic operations on them, while KYBER_Q is a scheme-dependent constant which assumes the value 3329.

The problem lies in the division by KYBER_Q. This /KYBER_Q operation is the one which appears every time the KyberSlash vulnerability is found. Generally speaking, modern CPU architectures implement integer division through a single div assembly instruction. However, the div instruction is very heavy computationally (in term of CPU cycles); it is, in fact, usually the slowest operation possible on integers, taking many cycles to complete. Even worse, this number of cycles almost always depends on the size of the input, which can lead to timing attacks. So, “naive” integer divisions in code are a pain both for performance and security.

In order to avoid the computational overhead and optimize speed, modern compilers are often “smart enough” to understand when the divisor is a constant, and therefore use by default certain tricks to speed up things. More precisely, they replace a single div instruction with a combination of mul (multiplication) and shr (bitwise shift right), in such a way that the resulting action on the dividend is the same (more on this later). A single div instruction on most modern architectures is so costly (in terms of CPU cycles) that the equivalent combination of mul, shr, plus a couple of necessary mov (move) instructions is almost always faster. As an added bonus, on most architectures, these simpler instructions have a constant cycle cost, therefore the resulting division algorithm is actually constant-time. This trick, therefore, is a win-win both for performance and security: modern compilers use it all the time by default. And this is why the KyberSlash vulnerability was never spotted before.

But here is the caveat: There are actually cases where what is described above does not happen or does not apply. For example:

  • (rare) architectures which do not implement shr, or compilers who are not able to manage well this operation;
  • (not so rare) architectures where mul is not constant-time;
  • (more common) compilers trying to optimize for code size instead of speed.

Notice, in fact, that when looking at opcode size, a single div instruction will always take less space than a combination of multiple, simpler instructions, so compiler flags that optimize for code size will disable the speed optimization described above.

This intuition was tested and verified with proof-of-concept exploits, that confirmed a possible, devastating key-recovery attack on many implementations. Therefore, KyberSlash was addressed as a “high” severity vulnerability, and reference code and other affected libraries quickly patched by enforcing *explicitly* the constant-time division trick.

Crystals-go

In 2021, Kudelski Security released crystals-go, an open-source library written in GoLang which implemented both Kyber and Dilithium with a particular eye to side-channel attacks. The library was created as a M.Sc. student project by Mathilde Raynal, under supervision of Yolan Romailler, it was really a great job and was even integrated in some PoC and presented at conferences such as GopherCon. But this library was devised as a research and testing tool, and not meant for production. After the maintainers of the library left our team, the original intent of the project was largely reduced in scope, until it became pretty much abandoned. Unfortunately, we forgot to mark it as such, and remnants of the project’s initial enthusiasm remained misleadingly prominent. For example, until January 2024, the project’s README still reported:

Our library stands out because of its security properties. Among the vulnerabilities reported on the original implementation, we integrate countermeasures for most of them, providing a library that is both theoretically and practically secure. We predict that new attacks will be published as the candidates are refined, and expect changes in the code to occur as the security of our library is treated as a continuous process.

This led to a communication incident a few weeks ago, after crystals-go was found to be one of the many libraries vulnerable to KyberSlash.

The Incident

On January 8th 2024 we got notified that crystals-go appeared on the list maintained by Dan Bernstein of the libraries affected by KyberSlash. This was picked up by some tech news outlets [1] [2], and someone even claimed that “Kudelski Security is impacted by KyberSlash”. We got an early morning call from our PR/marketing department…

We immediately realized that no one was “impacted” the way it was implied, and that the whole thing was just a miscommunication. But, regardless, we should have archived the project and documented its status, so now we had to take some action.

As a first thing, we updated the crystals-go README.md with a big disclaimer properly stating that the project was unmaintained and unrecommended outside of testing or research purposes, and that Kudelski Security had never put the library in production. We also realized that Goutam Tamvada, one of the KyberSlash original discoverers, had reported the issue to us via GitHub earlier, but (guess what?) nobody read it – because the project was not maintained.

We could have archived the repository and stopped here, but we felt it was our responsibility to patch the issue in the current code, because we did not know if at that point anyone was using this library in production (we only knew that we weren’t), or anyway what the broader implications could be.

Patching KyberSlash

We started with patching KyberSlash1. This was easy, it was basically a copy-and-paste from the patch of the C reference implementation.

For KyberSlash2 the situation was not straightforward. The reason is that crystals-go was built over an old version of the reference code, and a few things are a bit different. Most importantly, crystals-go supports certain early parameters of the Kyber scheme that have later been removed from the standard, in particular the value d (number of bits per coefficient when compressing polynomial arrays), which in crystals-go can assume values 3, 4, 5, 6, 10, or 11 (while the set supported by the current reference implementation is smaller).

So, first of all, we had to decide how to proceed. One option could have been to actually remove the unused parameters, and then copy-and-paste the patch for KyberSlash2 from the reference implementation. It would have been more elegant, but we decided against this option, for two reasons:

  1. It would have taken us too much time for an abandoned project. The structure of crystals-go is different from the current C reference, certain functions are in different files, other functions are merged into one… We would have had to restructure the code quite heavily, and hence we would have had first to get very familiar with it. Overkill.
  2. Most importantly, we cannot say for sure whether anyone in the world is using the crystals-go library for research purpose or whatever exactly because it still supports those legacy parameters.

So we decided to patch against KyberSlash2 for all parameters, even those not currently supported by the reference implementation. But this required understanding well the math and engineering behind the fix, not merely copy-pasting.

Basically, in order to patch the issue, we need to enforce in code what compilers should do under “normal” conditions: namely, replacing the /KYBER_Q division with an equivalent, constant-time sequence of operations. Let’s start by looking at KyberSlash1. Recalling that KYBER_Q is a constant (3329), let’s see again here the original, vulnerable snippet of code seen before from the reference C implementation:

            t  = (((t << 1) + KYBER_Q/2)/KYBER_Q) & 1;

This was officially patched in the following way:

      t <<= 1;
      t += 1665;
      t *= 80635;
      t >>= 28;
      t &= 1;

What is going on here? Where do those other constants come from? 1665 seems to be a rounding up (ceil) of KYBER_Q/2, but how about the others?

Similar patches also appear on the code vulnerable to KyberSlash2 in the reference implementation. For example, in polyvec_compress the vulnerable line:

        t[k] = ((((uint32_t)t[k] << 11) + KYBER_Q/2)/KYBER_Q) & 0x7ff;

is replaced with:

        d0 = t[k];
        d0 <<= 11;
        d0 += 1664;
        d0 *= 645084;
        d0 >>= 31;
        t[k] = d0 & 0x7ff;

But now the constants are different, and even the rounding of KYBER_Q/2 is floor (round down) instead of ceil. How do we interpret this?

Well, here is how it works: one can always see a division x/y as x * (1/y). When working with integers we do not have, of course, the luxury of representing 1/y, but what we can do is approximate the result of 1/y as z / 2^s for certain values z and s, because a division by 2^s is simply a shr (shift right) by s positions, also written >> s. In other words, one can approximate x/y by computing (x*z) >> s where z is an integer as close as possible to (2^s)/y. This can be precomputed and hardcoded when the divisor is a constant, which is exactly what is happening here: in our case, all those constants like 80635 and 645084 are of the form round(2^s/KYBER_Q) for some s (and s is exactly the value of the subsequent shift right, i.e. 28 and 31 in the two examples above).

But how to choose a good s? And why are they different case by case? Well, remember that you need a good approximation if you want the (integer) result to be correct, so ideally you want to use an s which gives you the best possible approximation, and it is easy to see that, in general, the larger s, the better the approximation. But using a too large s will result in the variable overflowing after the multiplication, and this means that you will lose some bits of the result, starting from the most significant ones. In our case, during polynomial coefficient compression, we are only interested in the d least significant bits of the result of the division (notice in fact that the last instruction, the &, is a logical AND Boolean operator with a full bitmask of the desired length, which returns the last d significant bits). So you want to use the largest s that allows you to keep d bits after the >> s operation. And this depends on the variable size: if the variable fits in a 32-bit register, then s = 32 - d, which is the case of poly_tomsg in KyberSlash1 and the cases of d=3,4,5,6 of polyvec_compress in KyberSlash2. In the cases of d=10,11 we are using 64-bit variables, so in theory we can afford a better approximation, but it would be overkill, and keeping operands to 32-bit values speeds up computation, so we stick to a maximum of s=32.

It remains to understand why KYBER_Q/2 is at times approximated as 1664 and at other times as 1665. This is done to “compensate” the other rounding coming from the division: if we use a constant which is a round-down (as in our first example, 80635) then we compensate this by rounding up KYBER_Q/2 as 1665, otherwise (as in our second example, 645084) we round down to 1664.

Understanding this allowed us to patch crystals-go against KyberSlash even for these legacy parameters.

Status and Conclusions

After patching crystals-go, we issued a security advisory and archived the crystals-go repository, alerting everyone that the library should not be used in production. After that, we notified the tech news outlets mentioned above, and they very professionally updated their articles. We also reached out to the NIST PQC community, and Dan Bernstein promptly updated the vulnerable library tracking page. We confirm again that crystals-go has never been used as part of any commercial services or products at Kudelski Security. The project is now unmaintained from our side because we want to focus on other projects, but feel free to fork it, this is the power of open source!

Finally, we are putting in place internal processes to reduce the possibility of similar incidents in the future.

Leave a Reply