$$ \renewcommand{\textsc}[1]{\style{font-variant-caps: small-caps}{\text{#1}}} \renewcommand{\textsf}[1]{\style{font-family: "Linux Biolinum"}{\text{#1}}} \renewcommand{\textit}[1]{\style{font-family: "Linux Libertine"; font-style: italic}{\text{#1}}} \newcommand{\Set}[1]{\left\{#1\right\}} \newcommand{\s}[1]{\texttt{#1}} \newcommand{\Len}[1]{\left|#1\right|} \newcommand{\abs}[1]{\left|#1\right|} \newcommand{\ENC}{\textsf{Enc}} \newcommand{\DEC}{\textsf{Dec}} \newcommand{\GEN}{\textsf{Gen}} \newcommand{\SIGN}{\textsf{Sign}} \newcommand{\VER}{\textsf{Ver}} \let\MAC\relax \newcommand{\MAC}{\textsf{Mac}} \newcommand{\Gen}[1]{\GEN\!\left(#1\right)} \newcommand{\Enc}[2]{\ENC_{#1}\!\left(#2\right)} \newcommand{\Dec}[2]{\DEC_{#1}\!\left(#2\right)} \newcommand{\Mac}[2]{\MAC_{#1}\!\left(#2\right)} \newcommand{\Sign}[2]{\SIGN_{#1}\!\left(#2\right)} \newcommand{\Ver}[2]{\VER_{#1}\!\left(#2\right)} \newcommand{\NN}{\mathbb{N}} \newcommand{\ZZ}{\mathbb{Z}} \newcommand{\QQ}{\mathbb{Q}} \newcommand{\RR}{\mathbb{R}} \newcommand{\CC}{\mathbb{C}} \newcommand{\FF}{\mathbb{F}} \newcommand{\OO}{\mathcal{O}} \newcommand{\gen}[1]{\left\langle #1 \right\rangle} \newcommand{\pk}{\Id{pk}} \newcommand{\sk}{\Id{sk}} \newcommand{\Id}[1]{\textit{#1}} \DeclareMathOperator{\ord}{ord} \let\E\relax \DeclareMathOperator*{\E}{E} \DeclareMathOperator*{\Var}{Var} \newcommand{\concat}{\mathrel{\hspace{1pt}||\hspace{1pt}}} \renewcommand{\P}{\textsf{P}} \renewcommand{\NP}{\textsf{NP}} \renewcommand{\RP}{\textsf{RP}} \renewcommand{\BPP}{\textsf{BPP}} \renewcommand{\ZPP}{\textsf{ZPP}} \renewcommand{\gets}{:=} \newcommand{\getsrandom}{\mathrel{\hspace{1pt}\leftarrow\hspace{1pt}}} \newcommand{\isequal}{\mathrel{=\hspace*{0.1pt}=}} \newcommand{\BitAnd}{\mathbin{\&}} \newcommand{\BitOr}{\mathbin{|}} \newcommand{\shl}{\ll} \newcommand{\shr}{\gg} \newcommand{\sra}{\ggg} \newcommand{\BitNeg}{\mathord{\sim}} \definecolor{tomred}{RGB}{200, 40, 41} $$

(Pseudo)randomness

Relevant readings: Wong 8.1–8.4

What is randomness?

It is a fact that both computers and humans are very good at finding patterns. For most centuries, people tried very hard at finding patterns in seemingly-random natural phenomena like the movements of the planets and the weather, and for the most part they have succeeded; the early astronomers have found patterns and managed to make some predictions, and meteorologists have made great advances in the field of weather prediction.

What these examples have in common is that the underlying physics behind them did not change, but only our powers to predict them. So to a large extent, randomness is a function of the observer, or in other words:

If a quantity is hard to compute, it might as well be random.

Unfortunately, this means both computers and humans are very bad at producing randomness.

What is a source of randomness?

As said in Lecture 6, the main reason why we assume all algorithms are randomized by default is because randomness is essential to cryptography; in particular, we mainly use randomness to generate keys.

Though randomness is also often used for things not related to cryptography (i.e., for numerical simulations), the “non-negotiable” feature for security and cryptography purposes is that the randomness must be unpredictable. As such, cryptographic applications extract randomness from observing hard-to-predict physical phenomena.

For example, CloudFlare relies on a wall of lava lamps inside their headquarters to generate random numbers, where the “randomness” comes from the assumption that the shapes of the wax blobs are unpredictable. A camera is set in front of the wall to extract and convert the images to random bytes.

Applications usually rely on the operating system to provide usable randomness, which in turn, gather randomness using different tricks, depending on the type of device it is run on. Common sources of randomness (also called entropy sources) can be the timing of hardware interrupts (for example, your mouse movements), software interrupts, hard disk seek time, and so on.

Pseudorandom number generators

A downside of using true random number generators is that the process of obtaining random numbers from noise can be rather slow (one of the reasons being, the entropy sources had ran out), which could be a problem for particular applications that need “real-time” random numbers. One solution that trades off speed for being “truly random” is by resorting to pseudorandom number generators (PRNGs).

A PRNG needs an initial secret, usually called a seed, that we can obtain from mixing different entropy sources together and can then produce lots of random numbers quickly.

A pseudorandom number generator (PRNG) generates a sequence of random numbers based on a seed. Using the same seed makes the PRNG produce the same sequence of random numbers. (Wong 2021)

Cryptographically secure PRNGs usually tend to exhibit the following properties:

Deterministic

In this context, this means that once you know the initial seed, and how the next number is generated (i.e., you know how the PRNG algorithm works), then you can generate the entire sequence. This implies that using the same initial seed will always give you the same sequence of numbers.
Indistinguishable from random

You should not be able to distinguish a number that is randomly generated by a PRNG, and a number that is chosen uniformly at random from the same set. Consequently, observing the random numbers generated alone shouldn’t allow anyone to recover the internal state of the PRNG.

Additional security properties

A PRNG has forward secrecy if an attacker learning the state (by getting in your computer at some point in time, for example) doesn’t allow the PRNG to retrieve previously generated random numbers.

Obtaining the state of a PRNG means that you can determine all future pseudorandom numbers that it will generate. To prevent this, some PRNGs have mechanisms to “heal” themselves periodically (in case there was a compromise). This healing can be achieved by reinjecting (or re-seeding) new entropy after a PRNG was already seeded. This property is called backward secrecy.

PRNGs can be extremely fast and are considered safe methods to generate large numbers of random values for cryptographic purposes if properly seeded. Using a predictable number or a number that is too small is obviously not a secure way to seed a PRNG. This effectively means that we have secure cryptographic ways for quickly stretching a secret of appropriate size to billions of other secret keys.

Using randomness in practice

As a summary, an operating system needs the following things in order to provide “cryptographically secure” random numbers:

Sources of entropy

These are ways for the OS to obtain raw randomness from unpredictable physical phenomena like the temperature of the device or your mouse movements.
Cleaning and mixing

A single source of entropy by itself may be a rather poor source of randomness, so the OS cleans up and mixes a number of sources together to produce a better, more consistent quality of randomness.
PRNGs

In order to optimize the speed of random number generation, the OS uses a PRNG seeded from a single, uniformly random value to efficiently generate random numbers.

More often than not, the operating system comes with an interface that abstracts away the fine details into simplified functions for developers to generate a random number just by issuing a system call.

The way operating systems implement and provide access to this functionality obviously differ from one OS to another. Linux uses a PRNG that’s based on the ChaCha20 stream cipher, and following the “everything is a file” metaphor, such an interface is available as a special file called /dev/urandom. For instance, one can read $128$ bytes from the terminal using the dd command-line utility:

brian@thonkpad ~/classes/csci184.03
% dd if=/dev/urandom bs=128 count=1 2> /dev/null | xxd -p
36041307dcb45514899cd0a5765921f07043ffeeeb23d6d9dbd79f18337e
9d1b2bad022bd4b4eba667faf925c7182b556ef2cd4cfc29d75191dfc453
d46dc59570137b8adae09ffcf1af521b9195b68145e5969b76cfe58beea2
7a7ff4ab612e92d365fc55465a7277531b2e90cc8146d486d37f037e547c
e0bc1515cebf520e

One problem with /dev/urandom in particular is that it might not provide enough entropy (its numbers won’t be random enough) if used too early after booting the device.

To avoid “low-level” issues like this, most programming languages provide methods and functions for random number generation. In Python, there are two modules dedicated for this purpose, one called random,¹ and the other called secrets.² The functions provided in random use a PRNG called the Mersenne Twister, which is not by any means considered cryptographically secure (the Python docs even state that random is designed for modeling and simulation). On the other hand, secrets actually rely on the operating system’s random number generator, which is guaranteed to produce cryptographically strong random numbers suitable for managing sensitive data like passwords and tokens.

References

Wong, David. 2021. Real-World Cryptography. Manning Publications.

https://docs.python.org/3/library/random.html

https://docs.python.org/3/library/secrets.html

CSCI 184.03 Notes