Dissecting and Detecting Babuk ransomware Cryptography

Written by Sylvain Pelissier and Antonio De La Piedra of the Kudelski Security Research Team

The Babuk or Babyk ransomware was detected two years ago. It’s an interesting case because after infecting the Metropolitan Police Department of Washington DC, one member of the operators decided to publish the source code of the ransomware on a forum in July 2021. However, the source code also helped other groups to create clones of the ransomware, like Rook or PrideLocker. Babuk was later copied to a GitHub repository. Studying the source code is interesting from a research point of view, and in this article, we explored the choice of Cryptography algorithms used for building this ransomware.

Even though the Babuk ransomware was released two years ago, it was still used in the wild recently on some VMware ESXi systems to encrypt VMs after having exploited OpenSLP vulnerabilities that we recently described. Babuk also spread via email phishing, unprotected RDP deployments, and unpatched vulnerabilities, particularly exploiting 3 bugs in Microsoft Exchange identified as CVE-2021-34473, CVE-2021-34523 and CVE-2021-31207.

Operation overview

The Babuk ransomware has 3 different versions, one for ESXi, one for NAS devices, and one for Windows. The Windows and ESXi version is written in pure C++ language and does not import many external libraries. It makes it very portable and is used on many different operating systems. The NAS version is written in Go language and allows binaries for ARM devices in addition to x86 devices.

The encryption algorithm consists of two steps:

  1. The first step use asymmetric Cryptography. It means the program embeds an attacker’s public key in the encryption program but not the corresponding private key, which is left on the attacker’s machine. The usage of asymmetric key algorithms prevents the victim from recovering the attacker’s private key since it is never present on the victim’s device. First, it derives a 32-byte shared secret between the attacker public key and the file ephemeral keys using Diffie-Hellman key agreement algorithm.
  2. The second step uses symmetric Cryptography. The shared secret is hashed to obtain a 32-byte key to be used with a stream cipher for high-speed file encryption. After file encryption, each file’s public key is written at the end of the file.

We noticed that for the ESXi version, after each operation, the sensitive information like key or algorithm states are cleared with the memset function. This prevents the typical memory forensics operations like the one we have presented in the past against GPG.

Asymmetric Cryptography

During the first step of the encryption algorithm, an ephemeral key is generated. For the ESXi version it is done by calling a function with a misleading name called csprng for each encrypted files:

void csprng(uint8_t* buffer, int count) {
    if (FILE *fp = fopen("/dev/urandom", "r"))
    {
        fread(buffer, 1, count, fp);
        fclose(fp);
    }
}

This function consists of reading the /dev/urandom device. This can be seen as the file private key used during the Elliptic-Curve Diffie-Hellman (ECDH) algorithm. This value is stored in the variable u_priv. It is later used to generate the public key of the file stored in the variable u_publ. The attacker public key is also stored in the code in the global variable m_publ.

ECDH is typically used as a key agreement mechanism between two parties. That is, it allows generating a shared key between two parties. Then, this shared key can be used to encrypt (generally using a symmetric algorithm) messages between the two involved parties. Originally in DH, when a party a (typically Alice) and another one b, (generally Bob) want to communicate over an insecure channel do the following: they agree on using a multiplicative group of integers of order p (see for instance RFC3526), being p prime with generator or base point g. Each user has a key pair, being the secret key a \in_R Z_p and the public key computed as A = g^a \mod p for Alice and b \in_R Z_p for the secret key of Bob and B = g^b \mod p the public part of Bob. Over the channel, each party exchanges g^a and g^b and compute the shared key g^{a \cdot b} via (g^a)^b and (g^b)^a respectively, using A and B and their secret key. Similarly, DH can be performed over an elliptic curve (that is, over a multiplicative group of points on an EC) with generator G using point multiplication instead of modular exponentiation for A = a \cdot G and B = b \cdot G with shared key AB = a \cdot b \cdot G = b \cdot a \cdot G. In this particular case of Babuk, the ECDH algorithm is implemented on the Curve25519

The ransomware reuses the code from Adam Langley for the curve implementation of Windows and ESXi versions, and it uses the implementation of the Go Cryptography package for the NAS version. This choice is smart since this curve is one of the fastest and was designed to be less prone to implementation error.

The ECDH operation over the curve25519 is performed using the curve25519-donna implementation using the following calls:

  1. First we need to generate a private key of 32 bytes.
  2. Given the generator of the curve and our private key, we can generate our public key as:
static const uint8_t basepoint[32] = {9};
curve25519_donna(mypublic, mysecret, basepoint);

Being mypublic the resulting public key for the secret key of 32 bytes mysecret.

  1. When generating a shared key, for instance for a parameter B of Bob, we invoke the curve25519_donna function as:
uint8_t shared_key[32];
curve25519_donna(shared_key, mysecret, theirpublic);

Being for instance theirpublic the parameter B of Bob and mysecret the secret key if we were, for instance, Alice.

Why Babuk uses Asymmetric Cryptography

In the ESXi version of the ransomware is implemented as:

csprng(u_priv, 32);
u_priv[0] &= 248;
u_priv[31] &= 127;
u_priv[31] |= 64;
curve25519_donna(u_publ, u_priv, basepoint);
curve25519_donna(u_secr, u_priv, m_publ);
memset(u_priv, 0, 32);

The first call to curve25519_donna function computes the public key of the file by doing a scalar multiplication of the file private key generated previously and the base point (or generator) of the curve, which is defined to be the standard generator on the Curve25519. The second call generates the shared secret by doing a scalar multiplication of the file private key to the attacker’s public key. The generated shared secret is typically processed using a hash function such as SHA-256.

It can be difficult to see why ECDH is used in this context. We can see the malware operator as the party Alice, which generates a shared key with the part of the code that encrypts each file. At the end of the encryption operation, the shared key of the party that performs the encryption of the files is erased and only kept by the malware operator.

The file public key is written at the end of the file to be able to decrypt it if the victim pays the ransom. Indeed the decryptor program contains the attacker’s private key. To decrypt an encrypted file, it reads the file’s public key and derives again the shared secret allowing the file decryption.

We stress that the victim which pays the ransom would receive a decryptor program and thus would be able to extract the private key of the attacker. This means that if the attacker reuses the same key for another victim, the public key in the encryptor program would match the private key already observed in a decryptor. For example, the Babuk Windows version contains a file s.txt consisting of a pair of private and public keys. We can verify with simple Python command that, indeed, the private key generates the corresponding public key.

>>> import donna25519
>>> mypublic = [0x40, 0xC3, 0x48, 0x54, 0x71, 0x2C, 0xE8, 0x9F, 0x4D, 0xCF, 0x05, 0x5B, 0x99, 0xFE, 0xC2, 0xD3, 0x49, 0x2D, 0x6F, 0x62, 0x30, 0xCE, 0xD2, 0x67, 0x44, 0xFF, 0x76, 0x4C, 0xAE, 0x62, 0xF5, 0x74]
>>> mysecret = [0x50, 0xAF, 0x44, 0xDF, 0x99, 0x55, 0xD9, 0xC6, 0x7B, 0xF9, 0xCC, 0xFE, 0x41, 0xE5, 0xF3, 0xD5, 0xEB, 0x23, 0xE8, 0xB1, 0x00, 0x84, 0x87, 0x97, 0x54, 0xB4, 0x96, 0xF5, 0x7F, 0xFD, 0x3B, 0x60]
>>> sk = donna25519.PrivateKey(secret=bytes(mysecret))
>>> pk = sk.get_public()
>>> pk.public.hex()
'40c34854712ce89f4dcf055b99fec2d3492d6f6230ced26744ff764cae62f574'

To allow further malware research, we have released a Yara rule to identify the Curve25519 algorithm only based on the constants used. This is not obvious since Curve25519 does not use a lot of constants in its implementation, and we based our detection on the base point 9 followed by 31 zero and the constant a24 = 121665. It is a different approach from the previous Yara rule made for PrideLocker which was based on the x86 instructions.

Symmetric Cryptography

For symmetric encryption, Babuk initially uses the Chacha8 stream cipher but in the leaked source code the Sosemanuk is used in ESXi version, HC-128 is used in Windows version and Chacha20 is used in NAS version. The three of them are stream ciphers which were selected by the eSTREAM project as an efficient and secure stream ciphers. This is an interesting choice since Sosemanuk and HC-128 ciphers are not very widespread. They are pure software implementations and they are fast. Sosemanuk encrypts at 388MiB/s on my old laptop on a single core. Thus it is fast for encrypting data. It does not rely on special instructions like Intel AES-NI. Using AES in CTR mode with such instruction gives 5427MiB/s speed with OpenSSL benchmark and Salsa20 has a speed of 138MiB/s on the same benchmark. The choice of Sosemanuk or HC-128 algorithms may be because they are not well-known and lack Yara rules for detection whereas the is already existing rule for AES, Chacha and other ciphers. We have also implemented Yara rules for Sosemanuk stream cipher detection and improved the detection rule of Chacha20. Here is an example of usage of such rules combined with existing rules on a Babuk sample:

$ yara *.yar main
SHA2_BLAKE2_IVs main
Curve25519 main
Sosemanuk_constants main
Sosemanuk_encrypt_tables main

Notice that all the algorithms used in the malware are properly detected. We hope this helps researchers more quickly identify the Cryptography algorithms used in ransomware.

Conclusion

In this post, we shed some light on the cryptography used in the Babuk ransomware and explained our hypothesis on the choice of algorithms. We also released some Yara rules for detection. We hope this helps in understanding the inner working of the encryption mechanisms, and assist with further analysis of malware in the future.

Leave a Reply