ECC collision HOWTO
published on Tue 29 October 2024 by Xilokar
This is a small blog post I would have been happy to read while searching how to find ecc collision when dealing with how to exploit a limited write to flash.
No rocket science, just a little write-up.
Short Context
While working on a target, I found a primitive that allowed me to overwrite a flash page. By overwriting I mean, writing again. If you are not familiar with how some flash works, they must first be erased (aka, every bit is set to 1 (or 0) level, and the written (a bit can be flipped to the other state). Overwriting is doing the same flip again, but you can only flip on bit in the erased state to the written state.
ERASE flash state: 0xffffffff WRITE 0xffffff01 flash state: 0xffffff01 WRITE 0xffffff00 flash state: 0xffffff00 WRITE 0xffffff01 flash state: 0xffffff00
See ? On the last write, the last bit was not set back to 1, but kept to 0.
So, this set up with a limit on what we can overwrite to the flash page (eg, if this is the first page to be executed, a jump to another place), but this is something that can dealt with easily.
Let's imagine that we have the jmp 0x00abcd encoded as 0x8000abcd and that 0x00000000 is a nop
Inital flash state: 0x8000abcd 0xffffffff 0xffffffff ... 0xffffffff [jmp 0xabcd] WRITE 0x00000000 0x80e00000 0xffffffff flash state: 0x00000000 0x80e00000 0xffffffff ... 0xffffffff [nop, jmp 0xe00000]
and now we can jump to our (controlled) code
Here enter ECC
Unfortunatelly, the flash I was dealing with had a protection mechanism to deal with errors. Internally, every 256 bit pages of data was stored on 256+22 bits, and those 22 bits kept an ecc code to detect (and correct) errors.
(ECC stands for Error Code Correction)
A quick overview of what the problem is, with samller bits numbers:
Wen doing a normal write of data
ERASE flash state: DATA: 0xffffffff ECC: 0xff WRITE 0xFFFFFF01 (calculated ecc=0x20) flash state: DATA: 0xffffff01 ECC: 0x20
An ecc code is calulated and written
For another data, the ecc is different:
ERASE DATA: 0xffffffff ECC: 0xff WRITE 0xffffff00 (calculated ecc=0x30) flash state: DATA: 0xffffff00 ECC: 0x30
This mess with with our overwriting scheme (since ecc bits are stored in the flash, they also cannot be switched back to erase state)
eg:
ERASE flash state: DATA: 0xffffffff ECC: 0xff WRITE 0xFFFFFF01 (calculated ecc=0x20) flash state: DATA: 0xffffff01 ECC: 0x20 WRITE 0xFFFFFF00 (calculated ecc=0x30) flash state: DATA: 0xffffff00 ECC: 0x20 = (0x30 & 0x20)
And here, the flash controller will detect that 0xffffff00 and 0x20 don't match.
The necessary maths (or not)
Disclaimer, I am not (I mean, no longer) a math guy, so I'll just put here the basics things needed to understand.
The Ecc code generated is an hamming code. It has surely a lot of property, but what will interest us in the cas of ecc colisions, is that:
- Is is generated by a simple multiplication by a generation matrice
- at the bit level, the sum is a xor
(for more maths, go to wikipedia)
so:
Ecc(DATAS_BITS) = DATA_BITS x GEN_MATRICE
and:
Ecc(DATA1 ^ DATA2) = Ecc(DATA1) ^ Ecc(DATA2)
Calculating ECC
Back to our problem. We have:
Inital flash state: 0x8000abcd 0xffffffff 0xffffffff ... 0xffffffff ECC(inital_data) [jmp 0xabcd]
We want:
flash state: 0x00000000 0x80e00000 0xffffffff ... 0xffffffff ECC(new_data)
With
ECC(new_data) & ECC(initial_data) == ECC(new_data)
(ie, no bit writtent to zero in initial_ecc must be switched back to 1 in new_ecc)
But we may also be happy with
flash state: 0x00000000 0x80e00000 0xefffffff ... 0xffffffff ECC(new_data_2)
With the same constraint:
ECC(new_data_2) & ECC(initial_data) == ECC(new_data_2).
The first 64 bits of our payload need to be fixed, but we don't care of the state of the (256 - 64) bits remaining. So we have some latitude to forge a payload that will satisfy the constraint.
Remember that Ecc are additive ?
Let's call bit_array_n the 256 bits set to 0 except for bit n set to 1.
So here, as we have
new_data_2 = new_data ^ bit_array_65
we have
ECC(new_data_2) == ECC(new_data) ^ ECC(bit_array_65)
Let's call ECC_n = ECC(bit_array_n ( the ECC value of 256 bits set to 0 and bit n set to 1). (We can use an unprotected dev board of the same chip to extract thoses).
And for any interesting test_data, we have N bits (i1, i2, ..., iN) such as
test_data = new_data ^ (bit_array_i1 ^ ... ^ bit_array_iN)
and
ECC(test_data) = ECC(new_data) ^ ECC_i1 ^ ... ^ ECC_iN
Finding the collision
Now we can easily compute the ECC of any payload we will try to overwrite. But we still needs to find a collision, ie, we need to find the set of bits from 65 to 256 to set to zero in order for the resulting ECC to be initial_ecc.
So we must find (i1, i2, ..., iN) within {65..256} such that
ECC_i1 ^ ... ^ ECC_iN = ECC(new_data) ^ ECC(initial_data)
In order to do so, we first build for each couple and triplet of bits:
ECC(i,j) = ECC(i) ^ ECC(j) ECC(i,j,k) = ECC(i) ^ ECC(j) ^ ECC(k)
And sort them by the number of bits set, and keep the ones with only 3,4 bits set.
We have something like:
ECC(069,146) 002100a0 (04 bits changed) ECC(065,066) 00012880 (04 bits changed) ECC(130,134) 00012804 (04 bits changed) ECC(236,132) 00004281 (04 bits changed) ECC(227,108) 00004805 (04 bits changed) ... ECC(128,145,153) 00000284 (03 bits changed) ECC(231,124,144) 00000282 (03 bits changed) ECC(224,078,155) 00000281 (03 bits changed) ...
At that point, I stopped doing automated thing and started to search by hand which bit set would produce the desired ECC(new_data) ^ ECC(initial_data), noticing that some combo leads to a one bit setter (eg ECC(224,078,155) ^ ECC(236,132) = 0x4000)
I was lucky in that ECC(new_data) ^ ECC(initial_data) had only a few bit set. But hasn'it be the case, I would first search the ECC_i space that would have minised the number of bits set.
I also didn't handle the case where a bit is part of two set you want to use.
For the record, here is the sequence I identified:
target ecc 22e773 set_1: 00220202 -> e571 set 2 00000502 -> e073 set 3 00009011 -> 7062 set 4 00007000 -> 62 set 5 & 6 00020240 00020220 -> 2 set 7 & 8 00340002 00340000 -> 0
And voilĂ !
I can now overwrite the flash page with a controlled jump without the flash complaining about Ecc errors.
Conclusion
I tried to make this as much simple as I could, but reading it back, not sure I managed to do so.
Also note that, in my case, the ECC was also calculated on the address of the flash page, but in the end, it does not change the approach used.
Feel free to share if you found this interesting.