$$ \renewcommand{\textsc}[1]{\style{font-variant-caps: small-caps}{\text{#1}}} \renewcommand{\textsf}[1]{\style{font-family: "Linux Biolinum"}{\text{#1}}} \renewcommand{\textit}[1]{\style{font-family: "Linux Libertine"; font-style: italic}{\text{#1}}} \newcommand{\Set}[1]{\left\{#1\right\}} \newcommand{\s}[1]{\texttt{#1}} \newcommand{\Len}[1]{\left|#1\right|} \newcommand{\abs}[1]{\left|#1\right|} \newcommand{\ENC}{\textsf{Enc}} \newcommand{\DEC}{\textsf{Dec}} \newcommand{\GEN}{\textsf{Gen}} \newcommand{\SIGN}{\textsf{Sign}} \newcommand{\VER}{\textsf{Ver}} \let\MAC\relax \newcommand{\MAC}{\textsf{Mac}} \newcommand{\Gen}[1]{\GEN\!\left(#1\right)} \newcommand{\Enc}[2]{\ENC_{#1}\!\left(#2\right)} \newcommand{\Dec}[2]{\DEC_{#1}\!\left(#2\right)} \newcommand{\Mac}[2]{\MAC_{#1}\!\left(#2\right)} \newcommand{\Sign}[2]{\SIGN_{#1}\!\left(#2\right)} \newcommand{\Ver}[2]{\VER_{#1}\!\left(#2\right)} \newcommand{\NN}{\mathbb{N}} \newcommand{\ZZ}{\mathbb{Z}} \newcommand{\QQ}{\mathbb{Q}} \newcommand{\RR}{\mathbb{R}} \newcommand{\CC}{\mathbb{C}} \newcommand{\FF}{\mathbb{F}} \newcommand{\OO}{\mathcal{O}} \newcommand{\gen}[1]{\left\langle #1 \right\rangle} \newcommand{\pk}{\Id{pk}} \newcommand{\sk}{\Id{sk}} \newcommand{\Id}[1]{\textit{#1}} \DeclareMathOperator{\ord}{ord} \let\E\relax \DeclareMathOperator*{\E}{E} \DeclareMathOperator*{\Var}{Var} \newcommand{\concat}{\mathrel{\hspace{1pt}||\hspace{1pt}}} \renewcommand{\P}{\textsf{P}} \renewcommand{\NP}{\textsf{NP}} \renewcommand{\RP}{\textsf{RP}} \renewcommand{\BPP}{\textsf{BPP}} \renewcommand{\ZPP}{\textsf{ZPP}} \renewcommand{\gets}{:=} \newcommand{\getsrandom}{\mathrel{\hspace{1pt}\leftarrow\hspace{1pt}}} \newcommand{\isequal}{\mathrel{=\hspace*{0.1pt}=}} \newcommand{\BitAnd}{\mathbin{\&}} \newcommand{\BitOr}{\mathbin{|}} \newcommand{\shl}{\ll} \newcommand{\shr}{\gg} \newcommand{\sra}{\ggg} \newcommand{\BitNeg}{\mathord{\sim}} \definecolor{tomred}{RGB}{200, 40, 41} $$

Computational complexity

Relevant readings: Sipser 7.1–7.4, 10.2, 10.6, Menezes, et al. 3.1

Time complexity

If you recall your CSci 30 or MSys 30 class, the time complexity or running time of an algorithm refers to the number of steps the algorithm makes given the size of the input. More often than not, we don’t really need the exact number of steps as minor changes in the implementation can change it, so we only care about how fast it grows as the input size increases.

Asymptotic analysis

When we are dealing with input sizes that are large enough so that only the rate of growth of the running time matters, we are looking at the asymptotic efficiency of an algorithm.

Instead of finding the exact expression that describes the running time, we can simply use asymptotic notation to express the running time of algorithms. In a nutshell, asymptotic notation suppresses constant factors and lower-order terms. We don’t care about constant factors because they are too system- and implementation-dependent. We also don’t care about lower-order terms because these terms will become irrelevant as the size becomes large, and we only care about efficiency for large input sizes.

Example

Suppose we have an algorithm whose running time is defined by $T(n) = 6n^3 + 2n^2 + 20n + 45$. For large values of $n$, the highest order term $6n^3$ will overshadow the other terms. Disregarding the coefficient $6$, we say that $T$ is asymptotically at most $n^3$.

In terms of big-$O$ notation, we can also say that $T(n) = O(n^3)$.

Definition

We say that $f(n)$ is $O(g(n))$ (pronounced “big-oh of $g$ of $n$” or sometimes just “oh of $g$ of $n$”) if and only if there exist positive constants $c$ and $n_0$ such that $f(n) \le c g(n)$ for all $n \ge n_0$.

Example

Suppose we have $3n^2+5n$.

Among the terms, $3n^2$ is the fastest-growing. Dropping the constant factor $3$, we have $3n^2+5n = O(n^2)$.
Suppose we have $n + 2\sqrt{n} + 10$.

Note that $\sqrt{n}$ can be rewritten as $n^{1/2}$, so we now have $n + 2n^{1/2} + 10$. Since $n$ grows faster than both $2n^{1/2}$ and $10$, we conclude that $n + 2\sqrt{n} + 10 = O(n)$.
Suppose we have $n \log_3 n + 4n \log_7 n + n + 1$.

When dealing with terms containing logarithms, it would be easier to manipulate if the logs have the same base. We then convert all logs to base $2$ using the change of base formula, so we get $\frac{1}{\log 3} \cdot n \log n + \frac{1}{\log 7} \cdot n \log n + n + 1$. Ignoring constant factors, we can see that the $n \log n$ term grows the fastest among the terms.

Therefore $n \log_3 n + 4n \log_7 n + n + 1 = O(n \log n)$.

(Pro-tip: Logarithms with different bases differ only by a constant factor, so as long as we are concerned with asymptotic notation, the base of the logarithm is irrelevant.)
Suppose we have $2 \cdot 3^{n-2} + 6 \cdot 2^{n+1} + 12n^4$.

Ignoring constant factors, we can see that the $3^{n-2}$ term grows the fastest among the terms. Notice that $3^{n-2}$ and $2^{n+1}$ have hidden constant factors, i.e., $3^{n-2} = 3^n \cdot 3^{-2}$ and $2^{n+1} = 2^n \cdot 2$.

Therefore, $2 \cdot 3^{n-2} + 6 \cdot 2^{n+1} + 12n^4 = O(3^n)$.

(Pro-tip: Unlike logarithms, the base of exponentials does matter!)

Classifying algorithms

Algorithms can be classified according to their asymptotic running time. We can establish a sort of hierarchy where we arrange these functions in terms of growth rate. Table [tbl:timecmp] shows the common time complexities, from slowest-growing to fastest-growing. Algorithms that rely on brute force (e.g., those involving enumerating subsets and permutations) often run in exponential or factorial time.

notation	name	examples
$O(1)$	constant	adding two integers
$O(\log n)$	logarithmic	binary search
$O(n^c)$ where $0<c<1$	fractional power	primality testing by trial division
$O(n)$	linear	iterating through a list
$O(n \log n)$	linearithmic	merge sort, quicksort
$O(n^2)$	quadratic	checking all possible pairs
$O(n^3)$	cubic	naïve matrix multiplication
$O(n^c)$ where $c > 1$	polynomial	nested for-loops (in most cases)
$O(c^n)$ where $c > 1$	exponential	enumerating subsets
$O(n!)$	factorial	enumerating permutations

In summary, the rule of thumb is: logarithms grow more slowly than polynomials and polynomials grow more slowly than exponentials.

Polynomial vs. pseudo-polynomial time

If an algorithm’s running time grows polynomially with the input size (i.e., its running time is a polynomial function with respect to the input size), then we say that the algorithm runs in polynomial time. Most of the time (pun not intended), we analyze the running time of algorithms with respect to the size of the input, such as the length of the array or string, and the number of bits or digits needed to represent an integer.

However, there are algorithms (specifically, numeric algorithms) whose running times grow polynomially with the numeric value of the input, but not necessarily with the size. We call such algorithms as pseudo-polynomial. The simple reason is that the numeric value of the input is exponential in the input size. For example, the largest value for an $n$-bit (unsigned) integer is $2^n - 1$.

The following section shows a more concrete example.

Segue: fast modular exponentiation

A naïve way to do modular exponentiation, i.e., to compute $a^p \bmod N$ where $p \ge 0$ is an integer, is to multiply $a$ by itself $p$ times while reducing the result modulo $N$. In pseudocode:

NaïveModExp($a$, $p$, $N$):

1: $x \gets 1$

2: for $i \gets 1 \ \textbf{to}\ p$

3: $x \gets (a\cdot x) \bmod N$

4: return $x$

We can see that NaïveModExp runs in $O(p)$ time, or more concretely, it takes $p-1$ multiplications to raise an integer to $p$. To put it another way, since $p$ is $m = \left\lfloor \log_2 p \right\rfloor + 1$ bits long, we can say that NaïveModExp runs in $O(2^m)$ time. Since the running time of NaïveModExp is actually exponential in the number of bits (the input size) even if it is polynomial in the value of the input $p$, it is a pseudo-polynomial time algorithm.

However, this approach becomes problematic when $p$ gets very large, since RSA in practice involves raising integers to very large exponents. A better approach is to consider exponentiation as a series of repeated squarings. If we use the fact that $$a^p = a^{p/2} \cdot a^{p/2} = \left(a^{p/2}\right)^2 = \left(a^2\right)^{p/2},$$ then we can compute, for example, $a^8$ in just $3$ multiplications. If the exponent $p$ is odd, we can use the property $$a^p = a \cdot a^{p-1} = a \cdot \left(a^2\right)^{(p-1)/2}.$$ Thus we have:

FastModExp($a$, $p$, $N$):

1: if $p \mathrel{=}0$

2: return $1$

3: else if $p$ is even

4: return FastModExp($a^2$, $p/2$, $N$)

5: else

6: $x \gets$ FastModExp($a^2$, $(p-1)/2$, $N$)

7: return $(a \cdot x) \bmod N$

Now, FastModExp runs in $O(\log p)$ time since the exponent gets progressively halved until it reaches $1$. Alternatively, FastModExp runs in $O(m)$ time where $m$ is the number of bits of $p$. Unlike our naïve pseudo-polynomial time algorithm, the running time of FastModExp is actually polynomial in the number of bits, hence FastModExp can be considered as a polynomial-time algorithm.

The classes P and NP

$\P$ (“polynomial time”) is the class of problems that are solvable in polynomial time, while $\NP$ (“nondeterministic polynomial time”) is the class of problems whose solutions can be verified in polynomial time.

A nice property of $\P$ is that it is closed under composition, which means if an algorithm makes a polynomial amount of calls to another polynomial-time algorithm, then the entire algorithm still takes polynomial time.

The main question: P vs. NP

Time to address the elephant in the room. Suppose a solution to some problem can be checked in polynomial time. Does that imply we can also solve that same problem in polynomial time? We don’t know yet.

We know for sure that there are more problems for which solutions can be verified efficiently than problems that can be solved efficiently. However it may be entirely possible that $\P = \NP$ because we’re unable to prove that there exists some problem in that’s not in $\P$.

As of now, the common consensus points to $\P \ne \NP$, mainly because efforts have been underway for the past 60 or so years to find efficient solutions to problems in $\NP$, but without much success. $\P \ne \NP$ means that the only solution to $\NP$ problems are brute force algorithms.

Reducibility

If we can “reduce” some problem $A$ to another problem $B$, it means that we use an algorithm that solves problem $B$ as part of an algorithm that solves problem $A$. Basically we treat the algorithm for problem $B$ as a “black box.”

We’ll modify this concept a bit to take efficiency into account. If we can efficiently reduce problem $A$ to problem $B$, an efficient solution to problem $B$ can be used to solve $A$ efficiently.

Definition

Problem $A$ is polynomial-time reducible to problem $B$, written as $A \le_p B$ if there exists an algorithm $f$ that transforms every instance of $A$ into every instance of $B$ in polynomial time. We call $f$ the polynomial-time reduction of $A$ to $B$.

The notion of (in)tractability

We want our algorithms to scale well when the input size gets larger, so we rule out those that run in at least exponential time. Thus:

An algorithm is efficient if it runs in polynomial time.

It reinforces the notion that a particular problem may not have an efficient algorithm to solve it. If an efficient algorithm exists that solves a particular problem, then we call that problem tractable. On the other hand, if no efficient algorithm exists that solves a particular problem, then that problem is intractable.

Most intractable problems only have an algorithm that solves them, those that involve brute force, which are infeasible for anything other than really small inputs.

Remark

Most of the time, efficient algorithms are feasible enough for practical applications. For example, the way Python multiplies very large integers is by using Karatsuba’s algorithm after a certain threshold of number of digits is reached (i.e., the crossover point $n_0$).

However, there are algorithms whose crossover point $n_0$ is “astronomically” large, meaning the stated efficiency cannot be attained until the input becomes sufficiently large that they are never used in practice. As such they are called galactic algorithms. One example is the fastest-known algorithm to multiply two numbers, whose efficiency can only be achieved if the numbers have at least $2^{1729^{12}}$ bits (or at least $10^{10^{38}}$ digits) long, which is much greater than than the number of atoms in the known universe!

In complexity theory, intractable problems belong to the class $NP$.

Intractable problems in number theory

Modern cryptography exploits the fact that some problems in number theory have no efficient solutions. Here is a selection of number-theoretic problems of cryptographic relevance, taken from (Menezes, Oorschot, and Vanstone 1996):

The integer factorization problem

Given a positive integer $n$, find its prime factorization.
The discrete logarithm problem

Given a prime $p$, a generator $g$ of $\ZZ_p^\ast$, and an element $y \in \ZZ_p^\ast$, find the integer $x$ such that $g^x \equiv y \pmod{p}$.
The RSA problem

Given a positive integer $n = pq$, where $p$ and $q$ are distinct odd primes, a positive integer $e$ such that $\gcd\left(e, (p-1)(q-1)\right) = 1$, and an integer $c$, find an integer $m$ such that $m^e \equiv c \pmod{n}$.
The Diffie–Hellman problem

Given a prime $p$, a generator $g$ of $\ZZ_p^\ast$, and elements $g^a \bmod p$ and $g^b \bmod p$, find $g^{ab} \bmod p$.

It is not yet known if these problems are NP-complete, meaning all problems in can be reduced to these problems that are also in . For instance, if we have discovered an efficient algorithm to factor integers, then it doesn’t necessarily imply that $\P \ne \NP$ since there’s no way yet to reduce any problem in into the integer factorization problem. However, if anyone has come up an efficient algorithm to solve an NP-complete problem, then we can use that algorithm to construct an efficient algorithm to factor integers.

Randomized algorithms

From the name itself, a randomized algorithm (also called probabilistic algorithm), incorporates randomness during its execution. Think of it like an algorithm that rolls a dice or flips a coin to make decisions on how it would proceed. As such, for each fixed input, a randomized algorithm may perform differently from run to run. The variation comes from the deliberate random choices made during the execution of the algorithm; but not from a distribution assumption of the input. On the other hand, we have deterministic algorithms that don’t use randomness at all. There are hundreds of computational problems for which a randomized algorithm leads to a more efficient solution, and sometimes to the only efficient solution.

Two flavors of randomized algorithms

Most randomized algorithms fall into two classes:

Las Vegas algorithms

These are guaranteed to return the correct output, but likely to run in polynomial time. In other words, these could run for a very long time (i.e., the running time is unbounded).

An example of a Las Vegas algorithm is randomized quicksort.
Monte Carlo algorithms

These are guaranteed to run in polynomial time, but likely to return the correct output (i.e., there’s a chance that the output is wrong).

An example of a Monte Carlo algorithm is the Miller–Rabin primality test.

There’s a way to convert a Las Vegas algorithm into Monte Carlo; all we need to do is to run it a fixed number of iterations and if the algorithm finds an answer it outputs the answer, or a $\bot$ denoting failure if otherwise. The converse is not necessarily true, as there’s no known general way to convert Monte Carlo algorithms into Las Vegas.

The class of problems solvable by a Las Vegas algorithm is called $\ZPP$ (“zero-error probabilistic polynomial time”), while problems solvable by a Monte Carlo algorithm belong to the classes called $\RP$ (“randomized polynomial time”) and $\BPP$ (“bounded-error probabilistic polynomial time”).

Running time of randomized algorithms

How do we measure the running time of a randomized algorithm? Consider the following two scenarios that illustrate the two possible approaches to measuring running time:

You publish your algorithm and a bad guy picks the input, then you run your randomized algorithm.
You publish your algorithm and a bad guy picks the input, then the bad guy chooses the randomness (or “rigs the dice” so to speak) in your randomized algorithm.

In the first scenario, the running time is a random variable as it depends on the randomness that your algorithm employs, so here it makes sense to reason about the expected running time. In the second scenario, the running time is not random since we know how the bad guy will choose the randomness to make our algorithm suffer the most, so we can reason about the worst-case running time.

It’s important to keep in mind that, even with randomized algorithms, we are still considering the worst-case input, regardless of whether we’re computing expected or worst-case running time, since in both scenarios we assume that a bad guy picks the input.

The expected running time is not the running time when given an expected input! We are taking the expectation over the random choices that our algorithm would make, not an expectation over the distribution of possible inputs.

One-way and trapdoor functions

One-way functions are functions that are “easy” to compute but “hard” to invert, which makes them an extremely important cryptographic primitive. While there isn’t a definite proof yet of the existence of one-way functions because proving it involves resolving the $\P$ vs. $\NP$ problem, we just assume for the meantime that one-way functions exist.

We should make a cryptosystem out of this!

One-way functions are like powdered three-in-1 instant coffee mix. It’s easy enough to mix their individual constituents (coffee, creamer, sugar) into a powdered mixture, but if you start off with the powdered mixture, it would be very difficult (but not impossible) to “reverse” the mixing process by separating each grain or particle of coffee/creamer/sugar.

In particular, one-way functions are used to construct hash functions, those that take in some input and returns an output of a fixed size. For example, a hash function $H$ may map the input string as $$H(\s{hash browns}) = \s{c97b3f584d47619185635bb3a6ac9ce2}.$$ To be considered “one-way” nobody should be able to recover the string just from the output , so its “inverse function” should be infeasible to implement. For this reason, one-way functions are used for symmetric-key primitives, and cryptographic mechanisms dealing with integrity and authentication.

Trapdoor functions

If we allow one-way functions to be efficiently inverted with some special knowledge, but still maintaining the “easy” to compute but “hard” to invert property, then we get another family of functions called trapdoor functions. They are called like that since it’s easy to fall through a trapdoor, but it’s very hard to climb back out and get to where you started unless you have a ladder.

For example, the RSA cryptosystem uses the trapdoor function $$f(x) = x^e \bmod n.$$ If we can find the factorization of $n$, we can compute $\phi(n)$, so the inverse of $e$ can be computed $d = e^{-1} \bmod {\phi(n)}$.

References

Menezes, Alfred J., Paul C. van Oorschot, and Scott A. Vanstone. 1996. Handbook of Applied Cryptography. CRC Press.

CSCI 184.03 Notes