CS152-lecture-20210613

Block cipher encryption mode examples #

For these examples, consider

  • \( E: \{0,1\}^6 \to \{0,1\}^6 \)
  • \( E(x) = \text{ROTL } (x,2) \)
  • If needed
    • \( \text{nonce} = 101 \)
    • \( \text{IV} = 110111 \)
    • Counters start at <1>
    • 10* padding

Encrypt the following

0000 1111 0000 1111

ECB #

Since we are using block sizes of 6 bits, we are encrypting

plaintext
000011 110000 11

So the first thing we have to do is pad the rightmost block.

padded plaintext
000011 110000 111000

Then we can send each block through \( E \)

ciphertext
001100 000011 100011

image_2021-06-13-09-53-43

To decrypt, we have to use the inverse of \( E \) , ie rotate right by 2, then we depad

image_2021-06-13-09-54-52

CBC #

We also need to pad our plaintext for CBC mode, so

plaintext
000011 110000 11

padded plaintext
000011 110000 111000

We will use \( IV = 110111 \)

So the first operation is to xor the first padded plaintext with the \( IV \) , then put through \( E \) ,

image_2021-06-13-10-10-53

That value is now the new xor for the next block, and the process continues

image_2021-06-13-10-11-33 image_2021-06-13-10-12-38

Note when doing these types of problems on a quiz: Any error when calculating these xors will propagate through the rest of the ciphertext.

This gives us the ciphertext

ciphertext
010011 001110 011011

For decryption:

  • We would receive the ciphertext, and
  • The previous given variables/functions

We can reverse the arrows in our diagram to visualize it,

image_2021-06-13-10-14-05

Note: The arrows from block \( n - 1 \) that is xor’d with block \( n \)’s arrow is not inverted. The reason is because xor is its own inverse.

Also, note that

  • \( E^{-1} = \text{ROTR } (x, 2) \) , and
  • We still need to know \( IV \)

We can start with the first block

image_2021-06-13-10-16-47

and continue the process

image_2021-06-13-10-17-12

Finally, we unpad by

  • Strippping trailing zeros, and
  • strip rightmost one

image_2021-06-13-10-17-47

Note on CBC: While encryption is done serially, decryption can be done in parallel. Some hardware allows for multiple invocations of the permutation at the same time.

CTR #

Recall that counter mode is a stream cipher. It generates a stream of bits that is xor’d with the plaintext to obtain cipher (and opposite for decrypt).

It does not need padding. The goal is to produce at least as many bits as the message to be encrypted. So we will generate 3 CTR blocks to be enough for our message length of 18 bits

Our \( \text{nonce} = 101 \) , and our counter starts at 1.

So our 3 blocks look like this, where each block is the nonce concatenated with the counter:

101001 101010 101011

Every single block will be a different value, but we must never repeat a nonce for 2 different plaintexts.

We can then put our blocks through \( E \)

image_2021-06-13-10-42-14

to produce our key stream. Then we xor our key stream with our plaintext

\[\begin{aligned} \text{ciphertext} = \text{plaintext } \oplus \text{ key stream} \end{aligned}\]

image_2021-06-13-10-44-01

but we only use the amount of bits we need out of the key stream.

Note: When we send a message using CTR mode, we send both the nonce and the ciphertext.

Decryption:

\[\begin{aligned} \text{plaintext} = \text{ciphertext } \oplus \text{ key stream} \end{aligned}\]

A benefit to CTR is that decryption uses the same process as encryption. Generate the keystream again using the same process, then xor the ciphertext to obtain the plaintext.

OFB #

Output feedback mode is also a stream cipher.

Recall

  • Our initialization vector \( \text{IV} = 110111 \)

This goes through permutation \( E \) and generates the first block of the key stream.

The output block is then fed back into the next block

image_2021-06-13-10-48-48

This process continues until you have enough bits of key stream to xor with your plaintext

image_2021-06-13-10-49-57

Because its a stream cipher,

  • To encrypt we can xor plaintext with the key stream to obtain the ciphertext. \[\begin{aligned} \text{ciphertext} = \text{plaintext } \oplus \text{ key stream} \end{aligned}\]
  • To decrypt we can xor the ciphertext with the key stream to obtain the plaintext. \[\begin{aligned} \text{plaintext} = \text{ciphertext } \oplus \text{ key stream} \end{aligned}\]

image_2021-06-13-10-51-58

AES #

AES is the standard block cipher used for symmetric encryption.

Recall, a block cipher is intended to resemble a random permutation. If we have a distinguisher with 2 worlds,

  • a random world
  • a block cipher with a random key

it should be hard to distinguish between the two worlds (this is how cryptographer’s can prove the strength of a block cipher).

File: Ch 4 slides

image_2021-06-13-11-02-04

Since one iteration doesn’t give enough security, there are at least 10 rounds (depending on key size) to acquire as much confusion and diffusion as possible. Each round could be thought of as a weak block cipher, as if 10 weak block ciphers are composed. As we compose more and more ciphers, they become more and more random looking.

Structure #

image_2021-06-13-11-04-55

ByteSubstitution #

ByteSubstitution can be thought of as a 1 byte random permutation. Also called a S-Box (substitution box). This is the non-linear step, which is important because of the strength acquired from alternating between different mathematical structures. This adds confusion.

image_2021-06-13-11-12-31

The ByteSubstitution is a Galois Field \( GF(2^8) \) ,

\[\begin{aligned} \text{ByteSubstitution } (x) = x^{-1} \cdot C_1 + C_2 \end{aligned}\]

This is implemented as a look-up table.

image_2021-06-13-11-18-05

ShiftRows #

image_2021-06-13-11-18-31

ShiftRows rearranges the order of the data elements. Each group of 4 bytes is scattered across the range of the next layer. They each maintain their respective position, but they are scattered. This adds diffusion.

image_2021-06-13-11-18-18

MixColumns #

MixColumns is another source of diffusion. It uses a cipher to scramble the 4 input bytes and which adds a smaller amount of local diffusion.

image_2021-06-13-11-18-41

This is basically a system of equations.

If you had an input \( B = \text{0x03} \)

\[\begin{aligned} \text{0x02} \cdot B &= 00000010 \cdot 00000011 \\ &= x (x + 1) \\ &= x^2 + x \\ &= 00000110 \end{aligned}\]

This leads to compact notation.

If we have 4 bytes coming into our MixColumns

image_2021-06-13-11-24-24

We can find our output like by using the matrix multiplication,

image_2021-06-13-11-25-17

We can use a \( GF(256) \) calculation to obtain the value

\[\begin{aligned} w &= 2A + 3B + C + D \\ &= (x)(x^2 + 1) + (x + 1)(x^4) + (x^4 + 1) + (x^2 + x) \\ &= (x^3 + x) + (x^5 + x^4) + (x^4 + 1) + (x^2 + x) \\ &= x^5 + 2x^4 + x^3 + x^2 + 2x + x^0 \\ &\text{mod coefficient by 2} \\ &= x^5 + x^3 + x^2 + x^0 \\ &= \text{0b}00101101 \\ &= \text{0x2D} \end{aligned}\]

Key addition layer #

image_2021-06-13-11-32-51 image_2021-06-13-11-32-56 image_2021-06-13-11-33-01

Note that the first key is xor’d directly with the plaintext. After that, the round key is generated using the last key using a simple transformation.

\( g \) is defined in this slide,

image_2021-06-13-11-34-31

Usually, the 11 round keys are generated and kept in memory so its available each time the block cipher is used.

Decryption #

image_2021-06-13-11-38-07

Introduction to SSE programming #

File: Intro to SSE pdf

We will stick to the original 16 byte wide registers for these examples.

mem     000102030405060708090a0b0c0d0e0f
reg     0f0e0d0c0b0a09080706050403020100    // intel does little-endian

SSE instructions can be told how to split up the register. For example if we split by 4

reg     0f0e0d0c 0b0a0908 07060504 03020100

There are a bunch of instructions that you can use on these elements.

Tips when programming with SSE #

See intrinsics in above pdf

When programming with SSE, we will be using compiler intrinsics. These intrinsic functions compile into 1 single assembly instruction.

Link: Intrinsics documentation, use SSE2

Since we don’t have an intrinsic rotate instruction in SSE2, we need to do the split rotate and or the result to achieve the rotation.

Some SSE example code #

Consider,

#include <stdio.h>
#include <stdlib.h>
#include <immintrin.h>
#include <time.h>

// accumulator function
uint32_t sum(uint32_t * buf, int n)
{
    uint32_t res = 0;
    for (int i = 0; i < n; i++)
    {
        res = res + buf[i];
    }
    return res;
}

int main(int argc, char * argv[])
{
    // allocate 4 million bytes
    uint32_t * buf = malloc(1000000 * 4);

    // read timer
    clock_t t = clock();

    // sum all elements
    uint32_t total = sum(buf, 1000000);

    // get elapsed time
    t = clock() - t;
    printf("%u\n", (unsigned) t);
}

When this runs it outputs

4790

which is different on each run, because it is measuring the amount of ticks it takes to complete the calculation. (They are averaging around 3000).

Lets see if we can do better using SSE instructions. Instead of reading each element individually, lets read them in groups of 4.

We can modify our sum function to use SSE instructions, we’ll call it sum_sse:

uint32_t sum_sse(uint32_t * buf, int n)
{
    uint32_t res_arr[4];
    __m128i res = _mm_setzero_si128();  // 0 0 0 0
    for (int i = 0; i + 4 < n; i = i + 4)
    {
        // load current element
        __m128i temp = _mm_loadu_si128((__m128i *)(buf + i));

        // add to sum
        res = _mm_add_epi32(res, temp);
    }

    // store res into memory
    _mm_storeu_si128((__m128i *) res_arr, res);

    // return sum of 4 elements
    return res_arr[0] + res_arr[1] + res_arr[2] + res_arr[3];
}

When we run this we get an average output time of around 1500, so while this was harder to write, it is a lot faster.

Note: To sum the 4 values in our res value, we could do something like this:

1 2 3 4
0 1 2 3     shift right
1 3 5 7     sum last 2
0 0 1 3     shift right twice
1 3 6 10    sum last 2
      10    our answer

However, we just write res back to memory and then sum the result of that.

Introduction to SSL programming #

While SSL was introduced for securing network connections, we aren’t going to use it in that way. We will be using the open source toolkit because it contains implementations of almost everything we’re learning in the class.

To tell what version of openssl we’re on:

openssl version

We will be using the EVP authenticated encryption and decryption.

We will be using the evp.h header file.

So lets write something using some openssl code to make sure our environment is working, in a file called test.c:

#include <openssl/evp.h>

int main()
{
    // make a call to some kind of openssl function
    EVP_aes_128_ctr();
    return 0;
}

We can compile with, linking the crypto lib

gcc -lcrypto test.c

An OpenSSL encryption example #

Lets encrypt something, note

  • OpenSSL uses contexts, whatever the encryption mode needs is in the context. We allocate them in the stack, ie EVP_CIPHER_CTX *ctx;
    • init this with EVP_CIPHER_CTX_new()
  • We aren’t handling errors robustly here, for brevity
  • In order to stay in our cryptographic model, our key should be random bytes. However for brevity we are using a simple key.
  • To specify what block cipher and mode of operation, we call EVP_EncryptInit_ex and pass it
    • a pointer to our context
    • specifier for what block cipher / mode
    • an ENGINE *, we will always put NULL. This value is if you have a special implementation of your mode of operation.
    • a pointer to the key, 16 bytes long
    • a pointer to the initialization vector, we’ll use our nonce when we create iv
  • EVP_EncryptUpdate’s name including the word “update” indicates that it does not encrypt all the data at once.
  • EVP_EncryptFinal is called to finalize the encryption
    • if there is anything in the buffer, it would pad/encrypt/write, needs a pointer to where to continue writing bytes should there be any in the buffer
  • EVP_CIPHER_CTX_free is used to clean up allocated memory
    • securely releases memory
#include <stdio.h>
#include <openssl/evp.h>

int main()
{
    /* ENCRYPTION */
    // plaintext
    char pt[44] = "The quick brown fox jumps over the lazy dog";
    char pt2[44];
    // ciphertext
    char ct[44];
    // key (should be random)
    unsigned char key[16] = {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6};
    // nonce || counter
    unsigned char iv[16]  = {1,2,3,4,5,6,7,8,0,0,0,0,0,0,0,1};

    // our cipher context
    EVP_CIPHER_CTX * ctx;
    ctx = EVP_CPHER_CTX_new();
    EVP_EncryptInit_ex(ctx, EVP_aes_128_ctr(), NULL, key, iv);

    // do the encryption
    int len;                // how many bytes written
    int ciphertext_len;
    EVP_EncryptUpdate(ctx, ct, &len, pt, 44);
    ciphertext_len = ciphertext_len + len;

    // finalize, then free context
    EVP_EncryptFinal(ctx, ct + ciphertext_len, &len);
    ciphertext_len = ciphertext_len + len;
    EVP_CIPHER_CTX_free(ctx);

    /* DECRYPTION */
    // reset counter block
    iv[15] = 1;

    // reinit context
    EVP_DecryptInit_ex(ctx, EVP_aes_128_ctr(), NULL, key, iv);

    // do the decryption
    len = 0;
    int plaintext_len;
    EVP_DecryptUpdate(ctx, pt2, &len, ct, 44);
    plaintext_len = plaintext_len + len;

    // finalize, then free context
    EVP_DecryptFinal_ex(ctx, pt2 + plaintext_len, &len);
    plaintext_len = plaintext_len + len;
    EVP_CIPHER_CTX_free(ctx);

    // lets check if it worked
    printf("%s\n", pt2);
}
gcc -lcrypto test.c
./a.out
The quick brown fox jumps over the lazy dog