$$ \renewcommand{\textsc}[1]{\style{font-variant-caps: small-caps}{\text{#1}}} \renewcommand{\textsf}[1]{\style{font-family: "Linux Biolinum"}{\text{#1}}} \renewcommand{\textit}[1]{\style{font-family: "Linux Libertine"; font-style: italic}{\text{#1}}} \newcommand{\Set}[1]{\left\{#1\right\}} \newcommand{\s}[1]{\texttt{#1}} \newcommand{\Len}[1]{\left|#1\right|} \newcommand{\abs}[1]{\left|#1\right|} \newcommand{\ENC}{\textsf{Enc}} \newcommand{\DEC}{\textsf{Dec}} \newcommand{\GEN}{\textsf{Gen}} \newcommand{\SIGN}{\textsf{Sign}} \newcommand{\VER}{\textsf{Ver}} \let\MAC\relax \newcommand{\MAC}{\textsf{Mac}} \newcommand{\Gen}[1]{\GEN\!\left(#1\right)} \newcommand{\Enc}[2]{\ENC_{#1}\!\left(#2\right)} \newcommand{\Dec}[2]{\DEC_{#1}\!\left(#2\right)} \newcommand{\Mac}[2]{\MAC_{#1}\!\left(#2\right)} \newcommand{\Sign}[2]{\SIGN_{#1}\!\left(#2\right)} \newcommand{\Ver}[2]{\VER_{#1}\!\left(#2\right)} \newcommand{\NN}{\mathbb{N}} \newcommand{\ZZ}{\mathbb{Z}} \newcommand{\QQ}{\mathbb{Q}} \newcommand{\RR}{\mathbb{R}} \newcommand{\CC}{\mathbb{C}} \newcommand{\FF}{\mathbb{F}} \newcommand{\OO}{\mathcal{O}} \newcommand{\gen}[1]{\left\langle #1 \right\rangle} \newcommand{\pk}{\Id{pk}} \newcommand{\sk}{\Id{sk}} \newcommand{\Id}[1]{\textit{#1}} \DeclareMathOperator{\ord}{ord} \let\E\relax \DeclareMathOperator*{\E}{E} \DeclareMathOperator*{\Var}{Var} \newcommand{\concat}{\mathrel{\hspace{1pt}||\hspace{1pt}}} \renewcommand{\P}{\textsf{P}} \renewcommand{\NP}{\textsf{NP}} \renewcommand{\RP}{\textsf{RP}} \renewcommand{\BPP}{\textsf{BPP}} \renewcommand{\ZPP}{\textsf{ZPP}} \renewcommand{\gets}{:=} \newcommand{\getsrandom}{\mathrel{\hspace{1pt}\leftarrow\hspace{1pt}}} \newcommand{\isequal}{\mathrel{=\hspace*{0.1pt}=}} \newcommand{\BitAnd}{\mathbin{\&}} \newcommand{\BitOr}{\mathbin{|}} \newcommand{\shl}{\ll} \newcommand{\shr}{\gg} \newcommand{\sra}{\ggg} \newcommand{\BitNeg}{\mathord{\sim}} \definecolor{tomred}{RGB}{200, 40, 41} $$

Message authentication codes

Relevant readings: Katz and Lindell 4.1–4.4, 5.1–5.3, 6.3 (Katz and Lindell 2021), Wong ch. 3, 4.4–4.6 (Wong 2021)

A naïve approach to message integrity

As we have seen from previous lectures, some encryption schemes are malleable, such as a block cipher in CBC mode. As a consequence, an attacker can intercept a message and send a modified message, and the recipient won’t even know the message has been tampered.

A straightforward solution is to include the hash along with the message, thus we now have $m \mathrel{\hspace{1pt}||\hspace{1pt}}H(m)$. This way, the recipient can use the hash to verify the message. Sounds good, right?

Not so fast. Since it is trivial to recover the hash and the message, an attacker can modify the message and update the hash accordingly, so the attacker then sends $m^\prime \mathrel{\hspace{1pt}||\hspace{1pt}}H(m^\prime)$. Because both the message and the hash are tampered, the recipient still won’t even know the message has been tampered as the hash agrees with the message!

As anyone including the attacker knows the hash function used to hash the message, a step towards the right direction is to introduce some sort of a key so that nobody but the communicating parties can authenticate and verify their messages.

Message authentication codes

A message authentication code (or MAC) protects a message’s integrity by computing a value $t \mathrel{\hspace{1pt}\leftarrow\hspace{1pt}}\Mac{k}{m}$ called the tag of the message $m$. Intuitively, MACs are just hash functions with secret keys.

When one party wants to send a message $m$ to the other, she computes a tag $t$ based on the message and the shared key, and sends the message $m$ along with $t$ to the other party. The tag is computed using a tag-generation algorithm Mac. Upon receiving $(m,t)$, the second party verifies whether $t$ is a valid tag on the message $m$ (with respect to the shared key) or not.

Definition

A message authentication code consists of three probabilistic polynomial-time algorithms $(\textsf{Gen}{}, \textsf{Mac}{}, \textsf{Ver}{})$ such that:

The key-generation algorithm Gen takes as input the security parameter $\texttt{1}^n$ and outputs a key $k$ with $\Len{k} \ge n$.
The tag-generation algorithm Mac takes as a key $k$ and a message $m \in \Set{\texttt{0}, \texttt{1}}^*$ and outputs a tag $t$. Since this algorithm may be randomized, we write this as $t \mathrel{\hspace{1pt}\leftarrow\hspace{1pt}}\Mac{k}{m}$.
The deterministic verification algorithm Ver takes as input a key $k$, a message $m$, and a tag $t$. It outputs a bit $b$, with $b = \texttt{1}$ meaning valid and $b = \texttt{0}$ meaning invalid. We write this as $b :=\Ver{k}{m, t}$.

It is required that for every $n$, every key $k$ output by $\Gen{\texttt{1}^n}$, and every $m \in \Set{\texttt{0}, \texttt{1}}^*$, it holds that $\Ver{k}{m, \Mac{k}{m}} = \texttt{1}$.

Verifying authentication tags

For deterministic MACs (i.e., where Mac is a deterministic algorithm), the canonical way to perform verification is simply to re-compute the tag and check for equality. In other words, $\Ver{k}{m, t}$ works as follows:

Compute $t^\prime :=\Mac{k}{m}$.
Output 1 if $t^\prime = t$ (the tag is valid), otherwise output 0.

When verifying an authentication tag, the comparison between the received authentication tag and the one you compute must be done in constant time. This means the comparison should always take the same time, assuming the received one is of the correct size. If the time it takes to compare the two authentication tags is not constant time, it is probably because it returns the moment the two tags differ. This usually gives enough information to enable attacks that can recreate byte by byte a valid authentication tag by measuring how long it takes for the verification to finish.

Security of MACs

The general security goal of a MAC is to prevent authentication tag forgery on a new message.

A message authentication code is considered secure if it is existentially unforgeable, meaning no efficient adversary should be able to generate a valid tag on any “new” message that was not previously sent (and authenticated) by one of the communicating parties.

This notion has an associated game, called the EUF-CMA game, defined for a MAC $\Pi = (\textsf{Gen}{}, \textsf{Mac}{}, \textsf{Ver}{})$ as follows:

A key $k$ is generated by running $\Gen{\texttt{1}^n}$.
The adversary $\mathcal{A}$ is given input $\texttt{1}^n$ and oracle access to $\Mac{k}{\cdot}$. The adversary eventually outputs $(m, t)$. Let $\mathcal{Q}$ denote the set of all queries that $\mathcal{A}$ submitted to its oracle.
$\mathcal{A}$ wins if and only if $\Ver{k}{m, t} = \texttt{1}$ and $m \notin \mathcal{Q}$ (i.e., $\mathcal A$ never queried $m$).

Thus a MAC is secure if no efficient adversary can win the game with non-negligible probability.

Definition

A message authentication code $\Pi = (\textsf{Gen}{}, \textsf{Mac}{}, \textsf{Ver}{})$ is existentially unforgeable under an adaptive chosen-message attack, or simply secure, if for all PPT adversaries $\mathcal{A}$ there is a negligible function $\mu$ such that: $$\Pr[\text{$\mathcal{A}$ wins the EUF-CMA game}] \le \mu(n).$$

“Strongly secure” MACs

As defined, a secure MAC ensures that an adversary cannot generate a valid tag on a message that was never previously authenticated. But it does not rule out the possibility that an attacker might be able to generate a new, valid tag on a previously authenticated message.

In standard applications of MACs, this type of adversarial behavior is not a concern. Nevertheless, in some settings it is useful to consider a stronger definition of security for MACs where such behavior is ruled out.

Fix a MAC $\Pi = (\textsf{Gen}, \textsf{Mac}, \textsf{Ver})$, a security parameter $n$, and a PPT adversary $\mathcal{A}$. We define the SUF-CMA game as follows:

A key $k$ is generated by running $\Gen{\texttt{1}^n}$.
The adversary $\mathcal{A}$ is given input $\texttt{1}^n$ and oracle access to $\Mac{k}{\cdot}$. The adversary eventually outputs $(m, t)$.
$\mathcal{A}$ wins if and only if $\Ver{k}{m, t} = \texttt{1}$ and $(m,t)$ was not among the signed pairs.

We then say that $\Pi$ is strongly secure if the probability of $\mathcal A$ winning the SUF-CMA game is negligible in $n$.

In this scenario, the adversary can query the target message but not return one of the responses. This stronger definition ensures that the adversary cannot “maul” the authentication tag.

MACs in practice

CBC-MAC

CBC-MAC is based on a block cipher in CBC mode. Unlike CBC-mode encryption, its starting value is not random (hence it cannot be considered as an IV), and the output is only a single block (representing the tag).

Generate key $k \mathrel{\hspace{1pt}\leftarrow\hspace{1pt}}\Set{\texttt{0}, \texttt{1}}^n$.
For input $m = m_1\cdots m_\ell$, compute $\Mac{k}{m}$ as follows:
1. Set $t_0 :=\texttt{0}^n$ (fixed, non-random value).
2. For $i :=1, \dots, \ell$, set $t_i :=F_k (t_{i-1} \oplus m_i)$.
3. Output last block $t_\ell$.
$\Ver{k}{m, t}$ returns 1 if $\Mac{k}{m} = t$, 0 otherwise.

CBC-MAC is secure as long as $F$ is a secure pseudorandom function (in practice, $F$ should be some Enc from a well-known block cipher).

Length extension attacks

It’s also possible to construct a MAC from hash functions; however, care must be taken to ensure that the underlying hash function is not vulnerable to a length extension attack. Here’s an example of how not to construct a MAC from a hash function.

Consider a candidate MAC construction, where Mac is defined as: $$\Mac{k}{m} :=H(k \mathrel{\hspace{1pt}||\hspace{1pt}}m).$$ If $H$ was derived from a Merkle–Damgård construction (like MD5, SHA-1, and SHA-2), we can then carry out a length extension attack on $H$ in order to forge a valid MAC for $m$ appended with additional data.¹

First, let’s recall how a Merkle–Damgård hash works. Starting from $t_0$, we apply the compression function $h$ to each plaintext block and the previous $t$-value, thus the final iteration of $h$ will output the digest of the whole message. Notice that the final output of the hash function is equivalent to an intermediate state of a hash function for a longer message, specifically, for $m \mathrel{\hspace{1pt}||\hspace{1pt}}\text{padding}(m) \mathrel{\hspace{1pt}||\hspace{1pt}}\cdots$.

For this attack, suppose we have the MAC computed as $H(k \mathrel{\hspace{1pt}||\hspace{1pt}}m)$ where $m$ is the message and $k$ is the secret key. We can then forge a valid MAC by computing the MAC for the extended message $k \mathrel{\hspace{1pt}||\hspace{1pt}}m \mathrel{\hspace{1pt}||\hspace{1pt}}\text{padding}(k \mathrel{\hspace{1pt}||\hspace{1pt}}m) \mathrel{\hspace{1pt}||\hspace{1pt}}m^\prime$. Since we already know what $H(k \mathrel{\hspace{1pt}||\hspace{1pt}}m)$ is, we can essentially use this to “restart” the hash function from where it ended when it originally hashed $k \mathrel{\hspace{1pt}||\hspace{1pt}}m$ by overriding its internal state.

Afterwards, hashing the additional message $m^\prime$ from this new state is essentially equivalent to hashing $(k \mathrel{\hspace{1pt}||\hspace{1pt}}m \mathrel{\hspace{1pt}||\hspace{1pt}}\text{padding}(k \mathrel{\hspace{1pt}||\hspace{1pt}}m) \mathrel{\hspace{1pt}||\hspace{1pt}}m^\prime \mathrel{\hspace{1pt}||\hspace{1pt}}\text{padding}(m^\prime))$ from the “original” state. The only stipulation though is we need to know how long $k$ is to compute the padding correctly, but that can be easily determined using brute force.

HMAC

HMAC, like its name indicates, is a way to use hash functions with a key. Using a hash function to build MACs is a popular concept as hash functions have widely available implementations, are fast in software, and also benefit from hardware support on most systems.

For this construction, we define Mac as: $$\Mac{k}{m} :=H \left((k \oplus \textit{opad}) \mathrel{\hspace{1pt}||\hspace{1pt}}H\left( (k \oplus \textit{ipad}) \mathrel{\hspace{1pt}||\hspace{1pt}}m\right) \right),$$ where ipad and opad are padding constants defined as $\textit{ipad} = \texttt{0x36363636}\dots$ and $\textit{opad} = \texttt{0x5c5c5c5c}\dots$.

HMAC was constructed this way in order to facilitate proofs. In several papers, HMAC is proven to be secure against forgeries as long as the hash function underneath holds some good properties, which all cryptographically secure hash functions should. Due to this, we can use HMAC in combination with a large number of hash functions. Today, HMAC is mostly used with SHA-2.

Chosen-ciphertext attacks

The strongest security notion in the context of encryption is security against chosen-ciphertext attacks. To recall, a chosen-ciphertext attack is where the adversary can decrypt a ciphertext of its choice.

More formally, the adversary is assumed to have access to the decryption oracle $\Dec{k}{\cdot}$, in addition to the encryption oracle $\Enc{k}{\cdot}$.

There are two variants of chosen-ciphertext attacks:

Non-adaptive chosen-ciphertext attack (CCA1)

The adversary cannot make further calls to the decryption oracle.
Adaptive chosen-ciphertext attack (CCA2)

The adversary continues to have access to encryption and decryption oracles.

A non-adaptive chosen-ciphertext attack is also lunchtime attack² because it works similarly when an attacker has access to a computer (that can decrypt messages) while its owner has gone out for lunch.

CCA-security

Now we consider the IND-CCA game defined for a private-key encryption scheme $\Pi = (\textsf{Gen}, \textsf{Enc}, \textsf{Dec})$, a security parameter $n$, and a PPT adversary $\mathcal{A}$:

A key $k$ is generated by running $\Gen{\texttt{1}^n}$.
The adversary $\mathcal{A}$ is given input $\texttt{1}^n$ and oracle access to $\Enc{k}{\cdot}$ and $\Dec{k}{\cdot}$, and outputs a pair of message $m_{\texttt{0}}$ and $m_{\texttt{1}}$ of the same length.
A uniform bit $b \in \Set{\texttt{0}, \texttt{1}}$ is chosen, and then a challenge ciphertext $c \mathrel{\hspace{1pt}\leftarrow\hspace{1pt}}\Enc{k}{m_b}$ is computed and given to $\mathcal{A}$.
$\mathcal{A}$ continues to have oracle access to $\Enc{k}{\cdot}$ and $\Dec{k}{\cdot}$, but is not allowed to query the latter on the challenge ciphertext itself. $\mathcal{A}$ outputs a bit $b^\prime$.
$\mathcal{A}$ wins if $b^\prime = b$.

Definition

A private-key encryption scheme $\Pi = (\textsf{Gen}, \textsf{Enc}, \textsf{Dec})$ has indistinguishable encryptions under a chosen-ciphertext attack, or CCA-secure for short, if for all PPT adversaries $\mathcal{A}$ there is a negligible function $\mu$ such that for all $n$, $$\Pr\left[\text{$\mathcal{A}$ wins the IND-CCA game}\right] \le \tfrac{1}{2} + \mu(n).$$ The probability above is taken over the randomness used by $\mathcal{A}$ and the randomness used in the game.

Authenticated encryption

Until now, we have considered how to obtain secrecy (using encryption) and integrity (using message authentication codes) separately. We have also witnessed that, on their own, proper encryption is “hard” and proper authentication is “harder” to get right without screwing up. For this reason, a lot of research has emerged seeking to standardize all-in-one constructions that simplify the use of encryption for developers.

The aim of authenticated encryption is to achieve both goals simultaneously. It is best practice to always ensure secrecy and integrity by default in the private-key setting. Indeed, in many applications where secrecy is required it turns out that integrity is essential also. Moreover, a lack of integrity can sometimes lead to a breach of secrecy.

Definition

A private-key encryption scheme is an authenticated encryption (AE) scheme if it is CCA-secure and unforgeable.

Let $\Pi_E = (\textsf{Enc}{}, \textsf{Dec}{})$ be a CPA-secure encryption scheme and let $\Pi_M = (\textsf{Mac}{}, \textsf{Ver}{})$ denote a strongly secure MAC, where key generation in both schemes simply involves choosing a uniform $n$-bit key.

Encrypt-and-MAC

Encryption and authentication are done independently. Given a message $m$, the sender transmits the ciphertext $(c,t)$ where: $$c \mathrel{\hspace{1pt}\leftarrow\hspace{1pt}}\Enc{k_E}{m} \text{ and } t \mathrel{\hspace{1pt}\leftarrow\hspace{1pt}}\Mac{k_M}{m}.$$
MAC-then-encrypt

A tag $t$ is first computed, then the message and tag are encrypted together. Given a message $m$, the sender transmits the ciphertext $c$ computed as: $$t \mathrel{\hspace{1pt}\leftarrow\hspace{1pt}}\Mac{k_M}{m} \text{ and } c \mathrel{\hspace{1pt}\leftarrow\hspace{1pt}}\Enc{k_E}{m \mathrel{\hspace{1pt}||\hspace{1pt}}t}.$$
Encrypt-then-MAC

The message $m$ is first encrypted and then a tag is computed over the result. Given a message $m$, the sender transmits the ciphertext $(c,t)$ where: $$c \mathrel{\hspace{1pt}\leftarrow\hspace{1pt}}\Enc{k_E}{m} \text{ and } t \mathrel{\hspace{1pt}\leftarrow\hspace{1pt}}\Mac{k_M}{c}.$$

Encrypt-and-MAC

In theory, this is the least secure AE construction. Because the goal of using MACs is simply to make tags unforgeable, and because tags aren’t necessarily random looking, the tag $t$ of a plaintext $m$ could still leak information even though the MAC is considered secure!

In particular, if a deterministic MAC like CBC-MAC is used, then the tag computed on a message (for some fixed key $k_M$) is the same every time. This allows an eavesdropper to identify when the same message is sent twice, something that is not possible for a CPA-secure scheme. Many MACs used in practice are deterministic, so this represents a real concern.

In practice, it has proven good enough for use with SSH, provided that strong MAC algorithms like HMAC-SHA-256 are used.

MAC-then-encrypt

The recipient must decrypt $c$ before they can determine whether they are receiving corrupted packets, which could expose potentially corrupted plaintexts to the receiver.

For example, consider CBC-mode encryption with padding, where it works by first padding the plaintext (which in our case will be $m \mathrel{\hspace{1pt}||\hspace{1pt}}t$) in a specific way so the result is a multiple of the block length, and then encrypting the result using CBC mode. During decryption, if an error in the padding is detected after performing the CBC-mode decryption, then a “bad padding” error is returned. With regard to the MAC-then-encrypt approach, this means there are now two sources of potential decryption failure: the padding may be incorrect, or the tag may not verify.

Nevertheless, it is more secure than encrypt-and-MAC because it hides the plaintext’s authentication tag, thus preventing the tag from leaking information on the plaintext. MAC-then-encrypt has been used in the TLS protocol for years, but TLS 1.3 (the lastest version) replaced MAC-then-encrypt with authenticated ciphers.

Encrypt-then-MAC

This is the most sound approach of the three. One advantage with this method is that the receiver only needs to compute a MAC in order to detect corrupt messages, meaning that there is no need to decrypt a corrupt ciphertext. Another advantage is that attackers can’t send pairs $(c, t)$ to the receiver to decrypt unless they have broken the MAC, which makes it harder for attackers to transmit malicious data to the recipient. This combination of features makes encrypt-then-MAC stronger than the encrypt-and-MAC and MAC-then-encrypt approaches. This is one reason why the widely used IPSec secure communications protocol suite uses it to protect packets (for example, within VPN tunnels).

The cryptographic doom principle

Moxie Marlinspike, creator of the Signal messaging app, proposed a rule of thumb called the “cryptographic doom principle” to determine which among the AE constructions are sound:

When it comes to designing secure protocols, I have a principle that goes like this: if you have to perform any cryptographic operation before verifying the MAC on a message you’ve received, it will somehow inevitably lead to doom.³

Authenticated encryption with associated data

In practice, the more correct answer would be to not bother about how to combine an encryption scheme and a MAC on your own, and just use a pre-defined authentication mode.

The most current way of encrypting data is to use an all-in-one construction called authenticated encryption with associated data (AEAD). The construction is extremely close to what AES-CBC-HMAC provides as it also offers confidentiality of your plaintexts while detecting any modifications that could have occurred on the ciphertexts. What’s more, it provides a way to authenticate associated data.

The associated data argument is optional and can be empty or it can also contain metadata that is relevant to the encryption and decryption of the plaintext. This data will not be encrypted and is either implied or transmitted along with the ciphertext. In addition, the ciphertext’s size is larger than the plaintext because it now contains an additional authentication tag (usually appended to the end of the ciphertext).

To decrypt the ciphertext, we are required to use the same implied or transmitted associated data. The result is either an error, indicating that the ciphertext was modified in transit, or the original plaintext.

Nowadays, the most popular AEAD ciphers are AES-GCM (Galois/Counter mode) and ChaCha20-Poly1305. AES-GCM combines AES in CTR mode with a MAC that works over finite fields (hence the name). ChaCha20-Poly1305 combines the ChaCha20 stream cipher and the Poly1305 MAC algorithm, designed for fast performance in software. For this reason, it is often preferred over AES-GCM for mobile and embedded systems that run ARM-based processors, which cannot take advantage of native x86 instructions dedicated for AES and finite fields.

References

Katz, Jonathan, and Yehuda Lindell. 2021. Introduction to Modern Cryptography. 3rd ed. CRC Press.
Wong, David. 2021. Real-World Cryptography. Manning Publications.

It’s also worth mentioning that SHA-3 (and other hash functions derived from a sponge construction) is secure against length extension attacks, and so this particular MAC construction is considered secure in those cases.

Hence, https://lunchtimeattack.wtf.

https://moxie.org/2011/12/13/the-cryptographic-doom-principle.html

CSCI 184.03 Notes