Verilaptor: Software Fault Simultation in hardware designs

HACK@CHES 2021 competition

The HACK@CHES 2021 phase I competition happened from June 17 to August 16, 2021. During the competition, a bundle was given to the participants with a set of Verilog design files of a System on Chip (SoC). The goal was to discover vulnerabilities and report them to the judges of the CTF. According to a scoring system, a number of points was attributed for each vulnerability reported. Additional points were given if an exploit was provided or if the weakness was located in the ROM. The best teams were selected to Phase II of the contest which happenned during the CHES conference.

The SoC is based on the OpenPiton processor, an open source, processor which uses CVA6 64-bit RISC-V cores. The design files were available in the bundle and it was possible to simulate them by software using Verilator simulator or using a FPGA board. The SoC implements many peripherals among them there are three AES cores namely AES0, AES1 and AES2, a TRNG core and a RSA core.

We were granted 60 points for a vulnerability discovered in the AES0 module which is the maximum number of point possible for a unique bug not located in ROM. The following details our findings and more generally how to simulate fault attacks in hardware design to reveal hardware weaknesses.

Verilator

One important step during hardware design is the simulation. Since hardware is less easily patchable than software, the hardware designs are heavily tested before the tapeout. Basically, Verilator is an open-source simulator which transforms a Verilog design into a C++ program which keep track of the execution cycles of the original design. It means that it is possible to simulate the design in software and see the timing of each execution part and the output result of a module.

It was possible to emulate the full SoC in software but the simulation was very long to run. In our case we were interested by AES0 thus we simulated only this design. The design files were located in the folder /piton/design/chip/tile/ariane/src/aes0. AES0 is a peripheral implementing AES-192. The top level module is called aes_192_sed, it has as inputs a 16-byte plaintext, a 24-byte key and a start signal. As output, when the out_valid signal is high, the result contains the AES encryption results.

We have set-up a Git repository with all the files needed for the simulation. We create a simulation program simulation.cpp which is in charge to feed the inputs to the module and clock the module until we get an output. Then, using Verilator is similar to compiling with GCC:

$ verilator -cc aes_192_sed.v -f input.vc --Mdir build -o simu --exe simulation.cpp
make -C build/ -f Vaes_192_sed.mk simu

The input.vc file contains all the module need for the simulation. Then we are able to verify the AES0 simulation works properly:

$ ./build/simu 
[+] Simulation with Verilator
Using key:
8e73b0f7da0e6452c810f32b809079e562f8ead2522c6b7b
Using plaintext:
6bc1bee22e409f96e93d7e117393172a
Resulting ciphertext:
bd334f1d6e45f25ff712a214571fa5cc

So far we are able to simulate properly the AES0 module.

Verilaptor

The idea of fault simulation is not new, we already developed a tool called Glitchoz0r 3000 presented during R2Con 2020. The idea of this tool is to emulate a firmware using radare2 ESIL and try to inject a fault in registers or instructions at each step to see if a security mechanism can be bypassed or corrupted. Another tool called FiSim is developed by Riscure company and does similar tests using Unicorn Framework and the Capstone disassembler. Finally another tool called VerFI allowed to simulate fault in the netlist of a design but we need first to synthesis it which was not what we wanted to do. In our case, we were interested to have a pure software solution simulating faults using directly Verilog hardware designs.

Since Verilator was already available for the SoC, we based our solution on it and we created an executable we named Verilaptor.

One standard hardware attack of AES is differential fault analysis (DFA). The idea is to introduce fault at the last rounds of AES and collect the faulted ciphertext to recover the secret key. To have the attack successful, we must inject faults at the output of the AES Sbox. In the design this is implemented in the module table_lookup in the file table.v. However, to optimize the simulation execution, Verilator would not let us access internal signal of the design expect if we explicitly tell Verilator. To do so, we added the comment /*verilator public*/ after the signals definition in order to have access to those signals from our simulation program:

module table_lookup (clk, state, p0, p1, p2, p3);
input clk;
input [31:0] state;
output [31:0] p0, p1, p2, p3/*verilator public*/;
wire [7:0] b0, b1, b2, b3;

In Verilaptor, we create a function which simulates a random fault after the Sbox operation during round 10:

void tick_fault_r10(std::unique_ptr<Vaes_192_sed>& top, int sbox_num, int value) {
    top->clk = 0;
    top->eval();
    // Inject fault at output of sbox
    switch (sbox_num) {
        auto sbox = top->aes_192_sed->uut->r10->t0;
        sbox->p3 = sbox->p3 ^ value;
...

A random value is XORed to the sbox outputs. Thus when executing Verilaptor, we were able to obtain faulted ciphertexts:

$ ./simu/build/veriraptor -v
[+] Fault simulation with Verilator
Using key:
d3b80fd1a0b09cefc4d343c0a7dac0b1942ca63151a89b91
Using plaintext:
f8a53552683866603d9a7dfe5982bc6f
Getting ciphertext:
5a802cf68638a0ee341e3ee25201ae1a

5a802cf68638a0ee341e3ee25201ae1a
5a8064f6860fa0ee7d1e3ee25201ae43
5a80c0f68649a0ee5d1e3ee25201ae5a
5a802cd986384eee34193ee21a01ae1a
5a802c42863896ee349d3ee2a301ae1a
a2802cf68638a0d9341ea1e25269ae1a
87802cf68638a069341e83e25263ae1a
...

Then the faulted outputs were collected in files in order to recover the AES round keys by differential fault analysis. Currently the signals to fault are hardcoded in our simulation. An interesting automation of Verilaptor would be to iterate over a list of internal signal and fault each of them. Then it would allow to simulate fault attacks against various design like RSA.

Differential fault analysis

The standard DFA against AES was well documented in the past for example, Quarkslab gave a complete description of DFA when applied to White-box cryptography for all variant of AES. In our case we were attacking the 192-bit version of AES. This works basically the same as for AES-128 but the key schedule algorithm of AES-192 is a bit different. To revert it properly we need the last round key and half of the previous round key. Thus the idea is to perform the standard DFA against the last round key, the 13th. In order to do that, the corruption needs to happen during round 11. These faults are collected in a file and then feed to PhoenixAES tools from the Side-Channel Marvels which implements the DFA:

print("[+] DFA on 13th round\n")
subkey13 = phoenixAES.crack_file("tracefile_r11", verbose=0)

Once the 13th round key is recovered, it would allow to revert the last AES round. Then we are able to perform again a DFA attack against the previous round to recover the 12th round key. However, there is a trick to attack the 12th round key, we are now attacking a round with a MixColumn step which was not the case for the last round key. However, PhoenixAES v0.0.4 is able to handle that for us just by passing the previous round key as an argument (Thanks @doegox for the remark and the bugfixes):

print("[+] DFA on 12th round\n")
subkey12 = phoenixAES.crack_file("tracefile_r10", lastroundkeys=[unhexlify(subkey13)], verbose=0)

Thus, the full attack allows to recover the two last round keys:

$ python3 attack_hack_ches21.py 
[+] DFA on 13th round

Last round key #N found:
83FD2BED375F3431BFE5939B188C895B
[+] DFA on 12th round

Round key #N-1 found:
88BAA7AAA7691AC01614D51D28A7F490

Concatenated round keys:
88BAA7AAA7691AC01614D51D28A7F49083FD2BED375F3431

Finally, Stark is a really convenient tool, from round keys it reverses the key schedule algorithm and recover the secret key, the first round key:

$ ./Stark/aes_keyschedule 88BAA7AAA7691AC01614D51D28A7F49083FD2BED375F3431 11
K00: D3B80FD1A0B09CEFC4D343C0A7DAC0B1
K01: 942CA63151A89B9110AC8E00B01C12EF
K02: 74CF512FD315919E473937AF1691AC3E
K03: 933D3C4723212EA857EE7F8784FBEE19
K04: C3C2D9B6D55375887AA0F8445981D6EC
K05: 0E6FA96B8A94477249569EC49C05EB4C
K06: 1949D19A40C807764EA7AE1DC433E96F
K07: 8D6577AB11609CE7D9974518995F426E
K08: D7F8EC7313CB051C9EAE72B78FCEEE50
K09: 72BF166BEBE054053C18B8762FD3BD6A
K10: B17DCFDD3EB3218D5F424BD9B4A21FDC
K11: 88BAA7AAA7691AC01614D51D28A7F490
K12: 83FD2BED375F3431BFE5939B188C895B

The full attack is implemented in the attack.sh script of our repository. It runs the fault injection and DFA on the faulted results.

We found the approach really interesting since Verilator allows to easily test a hardware design purely in software and mount fault attacks against it. We found this approach quite generic and we think it could be adapted to tests various hardware designs to test some countermeasures developed to thwart fault attack and see if they resists in simulation before the design is deployed in the field. The same approach could be done for timing attack simulation since the simulation is cycle accurate.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s