NORX for CAESAR

NORX is a new cipher designed by Philipp Jovanovic, Samuel Neves, and yours truly (in this order). NORX’ features make it superior to legacy algorithms like the AES standard. For example it offers

  • Authentication: NORX protects not only the confidentiality (secrecy) of a message, but also its integrity. For comparison, AES requires complex constructions such as Galois Counter Mode (GCM) to protect integrity.

  • Parallelism: NORX can be executed in parallel on multiple cores, as found in modern CPU architectures. For example, if 4 cores are available then NORX is about 4 times as fast as with a single core, whereas legacy ciphers are generally designed for serial processing.

  • Simplicity: To understand NORX and implement it, you don’t need to understand math notions such as finite fields or polynomials. The only operations in NORX are bitwise AND, XOR and bit shifting. These operations are readily available as instructions on all CPUs, and straightforward to implement in dedicated hardware.

  • Interoperability: Developers often have to determine a structure to represent inputs and outputs of a cipher, which reduces interoperability across implementations and may even affect security. NORX thus defines a dedicated datagram to represent multiple data items as a single byte string.

  • Side-channel robustness: Whereas AES suffers from timing leaks and requires complex countermeasures, NORX was designed to run in constant time on any reasonable platform, and only uses operations that can be protected from side-channel leaks.

NORX’ site norx.io contains specifications and reference source code. The NORX code is also available on GitHub, and support the NEON and SSE family of instruction set extensions.

Before further details on NORX, I’ll briefly present our initial motivation: the CAESAR competition.

The CAESAR competition

The AES encryption standard was the outcome of a competition organized by NIST in 1997-2000, where algorithms submitted by cryptographers from academia and industry were made public from the start and evaluated in the largest part by the research community. NIST eventually picked that algorithm that would best meet their needs (or so they believed then).

The AES competition was a success for NIST, with high-quality work mostly for free, and an opportunity for the research community to make rapid progress in the fields of cryptographic design and cryptanalysis (as well as the opportunity to get research grants and publish papers).

Other competitions followed on a similar model: eSTREAM, SHA-3, and more recently PHC.

CAESAR is a competition initiated by Daniel J. Bernstein and funded by NIST. Quoting the CAESAR call for submissions:

CAESAR (Competition for Authenticated Encryption: Security, Applicability, and Robustness) will identify a portfolio of authenticated ciphers that (1) offer advantages over AES-GCM and (2) are suitable for widespread adoption.

"Authenticated ciphers" is synonymous to "authenticated encryption algorithm with associated data", a kind of cipher that takes as input

  • a message to be authenticated and encrypted
  • a message to be authenticated but not encrypted (that is, left in clear in the communication)

and that returns

  • an encrypted message, protecting the confidentiality of the first message
  • an authentication tag, protecting the integrity of the two messages

For example, if the data processed is a network protocol’s datagram, then the first message may be the payload and the second message the header and/or trailer.

The deadline for CAESAR submissions was March 15, 2014 and the final selection is expected in Q4 2017. It seems that the winner will not be named the CAESAR Cipher. More information on CAESAR is available on http://competitions.cr.yp.to/.

NORX

NORX is one of the 57 ciphers submitted to CAESAR. Whereas a number of submissions consist of a mode of operation using AES as a core, NORX specifies both

  • a mode of operation, similar to the sponge construction but with a few tweaks

  • a core algorithm, inspired by previous algorithms like BLAKE2 and Salsa20

NORX mode

Like the original sponge construction and its variants, NORX transforms an internal state through consecutive applications of a permutation algorithm, with interleaved injections of message blocks. Blocks of data extracted from the state first serve to encrypt the message (as in a stream cipher), and finally to produce an authentication tag.

The figure below shows NORX’ mode with a parallelism degree of 2. All technical details are in the specs.

NORX core

The NORX core transforms invertibly an array of 16 words. It relies on a function called G that transforms invertibly an array of 4 words. Our reference code defines G as follows for words of either 32 or 64 bits:

#define BITS(x) (sizeof(x) * CHAR_BIT)
#define ROTR(x, c) ( ((x) >> (c)) | ((x) << (BITS(x) - (c))) )

#define U(A, B) ( ( (A) ^ (B) ) ^ ( ( (A) & (B) ) << 1) )

#define G(A, B, C, D) \
do \
{ \
    (A) = U(A, B); (D) ^= (A); (D) = ROTR((D), R0); \
    (C) = U(C, D); (B) ^= (C); (B) = ROTR((B), R1); \
    (A) = U(A, B); (D) ^= (A); (D) = ROTR((D), R2); \
    (C) = U(C, D); (B) ^= (C); (B) = ROTR((B), R3); \
} while (0)

For the 64-bit version of NORX we have

#define R0  8
#define R1 19
#define R2 40
#define R3 63

For the 32-bit version of NORX we have

#define R0  8
#define R1 11
#define R2 16
#define R3 31

As shown below, G is used to transforms columns and diagonals of the 16-word state when viewed as a 4×4 array.

NORX speed

The operations of the NORX core (non-linear function, rotation counts) were chosen to take advantage of modern CPUs vectorized instructions (such as NEON, AVX, AVX2, XOP). These enable the computation of 4 G instances in parallel on a single core.

When run on multiple cores, NORX benefits of a speed-up close to linear with respect to the number of cores.

On a recent Intel CPU with the Haswell microarchitecture, our serial instance NORX64-4-1 runs at approximately 1.39 gibibytes per second (2.51 cycles per byte at 3.5 GHz). The plot below compares the speed of the reference and optimized implementations of NORX64-4-1 (4 rounds, fastest version) and of NORX64-6-1. Although the optimized implementations do outperform the reference code, the gap is not huge and thus portable reference would be acceptable in a majority of applications.

NORX security

NORX builds on the confidence gained by the predecessors of its mode and core. We believe that a reliable proof of security can be established for its mode, to provide a rigorous bound on the resilience of the NORX mode under theoretical attack models. We also believe that the core algorithm, as a variant of previous unbroken algorithms, is unlikely to suffer from exploitable flaws. That said, we are looking forward to third-party cryptanalysis results to establish real confidence in NORX.