AES Encryption 1 - Concepts - Chaotic Chronicles

I was scrolling through YouTube when I came across a video by Spanning Tree where they explained the AES algorithm. And after watching it, my brain was like this seems easy enough, I can do it. But before I can code it, I need to understand it. It's time to dive into the concepts of AES and how it works.

Beginning

In the beginning, there was only darkness... ahem, no wait, yes. It all started with me watching this YouTube video after finishing work in the evening. "You know what, implementing AES sounds like a fun small project? Let's do this." Is what my brain said.

And then it did not stop. I spent the entire night coming up with the python prototype.

AES

Basically the actual algo is called Rijndael or something but because the US govt deemed it so, we all call it AES now.

Like most cryptographic algos, you feed in some data and a key and it outputs encrypted data. In AES, data is fed in 16 byte chunks and key can either be 128bit (AES-128), 192bit (AES-192) or 256bit (AES-256) and it outputs 16 byte chunks of encrypted data.

Inner peace workings

AES encrypts the data in multiple rounds with a different key each round (called the round key) which is derived from the main key. In each round it performs operations like XOR, shifting rows, mixing columns and substituting bytes. And finally after all the rounds are over, it outputs the encrypted data. The number of rounds (R) depends on the size of the key, R=11 for 128bit keys, 13 for 192 and 15 for 256.

def encrypt_chunk(input_block, key):
    state = input_block
    round_keys = KeyExpansion(key)

    # initial round key addition
    state = AddRoundKey(state, round_keys[0])

    for round_key in round_keys[1:-1]:
        state = SubBytes(state)
        state = ShiftRows(state)
        state = MixColumns(state)
        state = AddRoundKey(state, round_key)
    
    # final round
    state = SubBytes(state)
    state = ShiftRows(state)
    state = AddRoundKey(state, round_keys[-1])

    return state

KeyExpansion

This is the most complicated thing here, everything else is smooth sailing. In essence, it is simply a function that takes in your key, and gives a bunch of round keys to use for each round. Some complicated math is used to determine the round key but we don't need to understand why it is the way it is. We can just implement the math functions.

Now for a bit of math:

Define a word as 32bits.
Define $%N%$ as length of key in 32bit words, so N=4, 6 and 8 for AES-128, AES-192 and AES-256.
$%K_0, K_1, ..., K_{N-1}%$ as 32bit words of the original key.
$%R%$ as the number of round keys needed, so R=11, 13, 15 for AES-128, AES-192 and AES-256.
We know that AES encrypts data in 16 byte chunks, so therefore each round key must also be 16 bytes long. And a word is 32 bits or 4 bytes. So we need 4 words for each round key. So: $%W_0, W_1, ..., W_{4R-1}%$ as 32bit words of the expanded key. Each group of 4 in this is a the key for a single round.
$%\operatorname{SubBytes}%$ as the SubBytes step. Details for this are mentioned below.
$%\operatorname{RotWord}%$ as left circular shift. So $%[A_1, A_2, A_3, A_4]%$ becomes $%[A_4, A_1, A_2, A_3]%$.
$% rcon_i = [rc_i, 00_{16}, 00_{16}, 00_{16}] %$ where $%rc_i%$ is given in a table on wikipedia.

$$W_i = \begin{cases} K_i & \text{if } i < N \ W_{i-N} \oplus \operatorname{SubWord}(\operatorname{RotWord}(W_{i-1})) \oplus rcon_{i/N} & \text {if } i \ge N \text{ and } i \equiv 0 \pmod{N} \ W_{i-N} \oplus \operatorname{SubWord}(W_{i-1}) & \text{if } i \ge N \text{, } N > 6 \text{, and } i \equiv 4 \pmod{N} \ W_{i-N} \oplus W_{i-1} & \text{otherwise.} \ \end{cases} $$

Once you've calculated $%W_i%$ for $%i = 0, ... 4R-1%$, you have the round keys for all $%R%$ rounds.

AddRoundKey

Very simple, XOR the input and the round key given, output the result.

ShiftRows

Remember we said that input is 16 byte chunks. Reshape it into a 4x4 grid where each cell is a single byte. Now shift the first row to the right by 0 cells. The second row by 1 cell. Keep in mind that the rightmost element just comes back from the left side. Apparently this is called a left circular shift. Can't figure why though. Now shift the third and fourth rows by 2 and 3 cells respectively.

SubBytes

You remember the most basic form of encryption we did as kids? Shift the letters of the alphabet by X. Like if X=1, HELLO becomes IFMMP. If you can't shift letters in your mind, well then GVDL PGG. But apparently this is called a substitution cipher where we substitute each letter of the alphabet with another letter. SubBytes is just that but with bytes. So each byte has an substitute byte defined. And no, the Rijndael folks didn't just shift the bytes by one. They came with some complicated math function to substitute the bytes.

Lucky for us, we don't even have to use that, we can substitute the substitution function with a lookup table from wikipedia. The reason this is feasible is because we're just creating a dictionary with the values of all possible bytes, and there's just 256 of them. So yeah, just take each byte from the 16 bytes and replace it with the value from the table.

MixColumns

I LIED. KEYEXPANSION IS CHILDS PLAY COMPARED TO THIS. So if you look this up on Wikipedia, it basically says:

split 16 bytes into a 4x1 matrix
and here's a 4x4 predefined matrix that we came up with using math
matrix multiply these together and get 1x4 matrix of your output.

Sounds easy enough right? (I'm assuming you know matrix multiplication. If not, then it already sounds complicated but just keep on reading.) I thought so too but my implementation wasn't working at all. So I did a little digging. Turns out this is not your normal mathematics. This is finite field mathematics (I still have no clue what that is). So multiplication and addition are not what you expect.

But on paper it seems fine, just replace addition with XOR, and multiplication is just XORs itself mutliple times. Again math folks knew us mortals could not comprehend such wonders and they gave us lookup tables for the output of commonly used multipliers with all the bytes. Assuming the 4x4 matrix is called $%A_{4*4}%$ and our input is called $%B_{4*1}%$ and the output would be $%C_{4*1}%$. Now instead of $$C_1 = A_{1,1} * B_1 + A_{1,2} * B_2 + A_{1,3} * B_3 + A_{1,4} * B_4$$ we get $$C_1 = LOOKUP[A_{1,1} \rightarrow B_1] \oplus LOOKUP[A_{1,2} \rightarrow B_2] \oplus LOOKUP[A_{1,3} \rightarrow B_3] \oplus LOOKUP[A_{1,3} \rightarrow B_4]$$ which is understandable (barely) to my smoothbrain.

(Note that $%LOOKUP[X \rightarrow Y]%$ means to lookup the value of Y in the substitution table of X, and $%\oplus%$ means the bitwise XOR operation)

Next Steps

Write all these functions in python!

Read that in [part 2]({% link Programming/_posts/2023-09-04-AES Encryption 2 - Python.md %}).

AES Encryption 1 - Concepts