Authentication #
We can authenticate our messages using hashing.
Recall that there are two types of hashing
Cryptographic hashing
- more secure
- preimage/2nd preimage/collision resistance
- slower
Universal hashing
- no guarantees
- collision is probability based
- faster
There are authentication schemes that are built using each of these 2 types of hashing.
The idea behind authentication is
- If Bob receives a message from Alice, Bob would like to trust that the message is
- truly from Alice
- hasn’t been tampered with
- Alice will generate a tag \( t \) (sometimes called a MAC) using a tag generator \[ t = \text{TagGen(k, m)} \]
- Alice sends the message and the tag to Bob
- Bob can verify that the tag was generated with the specific key \( k \) and message \( m \)
This verifies that Alice knows the key \( k \) , which gives Bob trust (because we’re assuming that only Alice and Bob have that secret key. It also verifies that the message has not been changed. This requires that the tag \( t \) cannot be generated without the key \( k \) .
Security model #
We can think of the security model as a game, with an adversary that is trying to win. A security model is often setup as a game where the adversary wins if they reach a goal.
We will say that the adversary knows legitimate sequence of message/tag pairs, ie \( (m_1, t_1), (m_2, t_2), \ldots, (m_q, t_q) \) where each pair obeys \[\begin{aligned} t_i = \text{TagGen}(k, m_i) \end{aligned}\]
The adversary must guess a new message/tag pair.
Note: The message can be a reused message if a new tag is found, or a new message that evaluates to an old tag.
The adversary wins if they can create a “forgery”.
\[\begin{aligned} t = \text{TagGen}(k, m) \end{aligned}\]The MAC is \( \epsilon \) -secure if \( P(t = \text{TagGen} (k,m)) \leq \epsilon \) .
Building a tag generator #
Consider a function defined as
\[\begin{aligned} \text{HashMac}(k, m) = H(k || m) \end{aligned}\]Where our domains are defined as
- the key \( k \) is \( \{0,1\}^a \)
- the message \( m \) is \( \{0,1\}^* \)
- the hash value function is a public cryptographic hash \( H : \{0,1\}^* \to \{0,1\}^b \)
So we can theorize that \( \text{HashMac} \) is \( \epsilon \) -secure where \( \epsilon = \frac{1}{2^b} + \frac{n}{2^a} \) (plus brute force term).
Note: An adversary always has the ability to try to brute force.
Let \( k \) be random, and the adversary knows legitimate pairs \( (m_1, t_1), (m_2, t_2), \ldots, (m_q, t_q) \) . The adversary then attempts to guess a new pair \( (m,t) \) .
We want to calculate the probability \( P(H(k||m) = t) \) .
We can break this into 2 cases:
Case 1 #
The adversary is reusing a message from their known pairs.
\[m \in \{m_1, \ldots, m_q\} \]
This is bad strategy through because if \(m = m_i\) then \(t \neq t_i\) because \((m, t)\). So,
\[ P(H(k||m) = t) = P(t_i \neq t) = 0\]
No probability that the forgery will pass the test.
Case 2 #
The adversary picks a new message \(m\) where
\[m \not \in \{m_1, \ldots, m_q\} \]
So \(H(k||m)\) is uniformly distributed (never been given to the black box). So our probability
\[ P(H(k||m) = t) = \frac{1}{2^b} \]
The “bruce force term” is something like
// n = number of attempts
for i from 1 to n
check if t[1] = HashMac(i, m[1])
which gives a probability of
\[\begin{aligned} P(\text{finding a key}) = \frac{n}{2^a} \end{aligned}\]Wegman-Carter MAC – Another tag generator construction #
Wegman-Carter MACing uses universal hashes.
A message authentication algorithm that the tag generator takes
- a key
\( (h,f) \)
(a pair of functions)
- \( h \) is an almost-universal hash \( h: \{0,1\}^* \to Z_p \)
- \( f \) is a random function \( f: \{0,1\}^a \to Z_p \)
- a nonce \( n \)
- a message \( m \)
defined as
\[\begin{aligned} \text{WCMac}((h,f),n,m) = h(m) + f(n) \end{aligned}\]So
- \( h(m) \) is a small representation of \( m \) (which may be really large). We will hash our data into a tag and then use that represent the large piece of data.
- \( f(n) \) encrypts \( h(m) \) (basically a shift cipher).
When Alice sends a message to Bob she includes the message, nonce, and tag. It is very important that the nonce never repeats.
We can theorize that \( \text{WCMac} \) is \( \epsilon \) -secure where \( \epsilon = \)
The adversary sees \( (m_1, n_1, t_1), \ldots, (m_q, n_q, t_q) \) and guesses a new \( (m, n, t) \) tuple (not matching any prior tuples).
We want to compute the probability
\[\begin{aligned} P(h(m) + f(n) = t) \end{aligned}\]We have 2 cases
Case 1 #
The adversary uses a new nonce:
\[n \not \in \{n_1, \ldots, n_q\}\]
Then \(f(n)\) is uniform, so the probability that the adversary wins is
\[P(h(m) + f(n) = t) = \frac{1}{p} \]
Case 2 #
The adversary uses an old nonce:
\[ n \in \{n_1, \ldots, n_q\} \]
Then the messages will not match, \(m \neq m_i \). Notice that \( t_i = h(m_i) + f(n_i) \), so
\[ f(n) = h(m_i) + t_i \]
So our probability that the adversary wins is
\[ P(h(m) + h(m_i) = t + t_i) \leq \epsilon \]
This is considered an “almost \(\Delta\) universal”.
So the probability that the adversary wins is \[P(\text{adversary wins}) = \max \left( \frac{1}{p}, \epsilon \right) + \frac{n}{\text{number of keys} } \]
The most popular Wegman-Carter tag generator construction is defined as:
\[\begin{aligned} \text{PolyHash}_{k_i} (m) + \text{AES}_{k_i}(n) \end{aligned}\]Authenticated encryption #
In the past we would encrypt the plaintext into the ciphertext
\[\begin{aligned} c = \text{encrypt}(p) \end{aligned}\]and generate a MAC tag
\[\begin{aligned} t = \text{MAC}(c) \end{aligned}\]and send the pair \( (c, t) \) to the recipient.
However, we want the cryptographic to define the security, not leave anything to user error. So, cryptographers have been cutting down on the amount of errors that can happen during authentication/encryption.
For example, what if the user
- uses the same key in the encryption and the MAC
- if the user uses the same key when using CBC encryption and CBC-MAC, it opens up the user for an attack
- mismatches the order (ie MAC then encrypt)
- if the user authenticates the plaintext instead of the cipher, then it will have to be decrypted until it can tell if the plaintext was tampered with
- reads memory into the CPU twice (encrypt, then authenticate)
- not really a user choice, but can lead to an attack
Around the early 2000s, encryption/authentication methods were bundled to eliminate these user made issues. A set of goals were drawn up,
- safely use a single key
- make efficient (interleave encryption and authentication)
- encourage use of both encryption and authentication
GCM – Galois counter mode #
The most popular authentication encryption scheme is called GCM (Galois counter mode).
Some high level pseudo:
// k key
// n nonce (12 bytes)
// p plaintext
// c ciphertext
// t tag
GCMEncrypt(k, n, p):
hashkey = AES(k, <0>_128)
mask = AES(k, n || <1>_32)
// start block at n || <2>_32
c = AES_CTR(k, p)
t = PolyHash(hashkey, c) xor mask
return (c, t)
So long as the nonce \( n \) changes with each message, there is guaranteed no overlap because
- Hash key generation uses
<0>_128
as input to AES - WC mask generation uses
n || <1>_32
- CTR encryption uses
n || <2+>_32
So it is safe to use AES(k ..
for all parts of the algorithm.
This algorithm is efficient because of Horner’s rule.
while looping block by block:
// doing some encryption
c[i] = AES xor p[i]
// do some authentication also
acc += c[i]
acc *= hashkey