## Lecture 39 -04/26

### Impossible and Intractible Problems, Problem Complexity, NP Completeness

• Continuing from last time, we learnt how files are compressed into a smaller file and then decompressed back to the original content
• A compression model that builds on this is the idea of self-extracting bits:
• From an original input B$B$, let us use the compression algorithm from last time to create a smaller, compressed bitstream
• Let us create a program SelfExtractExample.java that contains this compressed bitstream, along with the decompression trie needed to uncompress it
• Let the bits that comprise this entire program (including the compressed bitstream and decompression trie) be CB$C_B$
• Running the program in an interpreter will output the original file B$B$. Thus, running the bits of CB$C_B$ creates B$B$. In other words, the CB$C_B$ bits self-extract themselves to generate B$B$
• Question: given a target input bitstream B$B$, what is the shortest possible compressed bitstream CB$C_B$ that outputs B$B$?
• We want some compression algorithm that generates the shortest possible self-extracting set of bits CB$C_B$ from an input B$B$, such that when we run CB$C_B$, we get back B$B$
• Answer: see rest of lecture, we will find that such an algorithm is an NP$NP$ class problem, which has unknown difficulty

#### Kolmogorov Complexity

• We define Java-Kolmogorov complexity, KJ(B)$K_J(B)$, as the length of the shortest Java program possible (in bytes) that B$B$
• An answer definitively exists, it may just be very difficult to find
• Note: Kolmogorov complexity is independent of language
• For example, the Python-Kolmogorov complexity of B$B$, KP(B)$K_P(B)$ is no more than a constant factor larger than KJ(B)$K_J(B)$
• This is because it takes a fixed, constant amount of bytes to write a Java interpreter in Python, so KP(B)$K_P(B)$ would be no more than KJ(B)$K_J(B)$ plus the additional constant number of bytes to write the interpreter (this constant term for the interpreter allows us to generalize KJ(B)$K_J(B)$ to KP(B)$K_P(B)$)
• We can generalize this to any language, thus Kolmogorov complexity for any input bitstream is language independent
• Also note: it is impossible to create a program that calculates the Kolmogorov complexity of any bitstream

#### Difficulty of Problems

• We know the following runtimes for the following problems:
• Finding the median of an array takes Θ(N)$\Theta(N)$
• Comparison based sorting: Θ(NlogN)$\Theta(N\log N)$
• Checking if an array contains a pair of elements that satisfy some criteria: Θ(N2)$\Theta(N^2)$
• Calculating Kolmogorov complexity, K(B)$K(B)$ of bitstream B$B$: impossible
• Goal of this lecture is to figure out the complexity of a good compression algorithm, but first some detours

#### Sidenote: The Independent Set Problem

• For a graph consisting of nodes and edges, the independent set problem determines if there exists a set of verticies of size k$k$ that are independent - no two vertices in the set of k$k$ elements are adjacent on the graph
• We can think of having elements in our set of k$k$ vertices as colored, and the other elements uncolored
• For a graph of N$N$ vertices, there are 2N$2^N$ total coloring posibilites, and for each possibility:
• It takes O(k)$O(k)$ runtime to check if the number of colored vertices is equal to k$k$
• For each colored vertex, we need to check that all neighbors are uncolored. This takes O(kN)$O(kN)$
• If for any of the 2N$2^N$ possibilities, both these conditions are met, we have solved the independent set problem for this graph
• Thus, combining this all gives a worst-case runtime of O(kN2N)$O(kN*2^N)$ for the independent set problem
• Since kN$k\leq N$, this simplifies to a worst case runtime of O(N22N)$O(N^2*2^N)$
• Intuition tells us that the best-case runtime of the independent set problem is Ω(N)$\Omega(N)$, since we must do something with at least each vertex once
• Thus we have a complexity of O(N22N)$O(N^2*2^N)$ and Ω(N)$\Omega(N)$. The following sidenote sections are to tighten these bounds

#### Sidenote: Reductions

• We say that problem X$X$ reduces to problem Y$Y$ if Y$Y$ can be used to solve X$X$
• For example, if Y$Y$ is sorting and X$X$ is finding the median element ignoring all even elements of an array, X$X$ can be solved using Y$Y$. Thus, X$X$ reduces to Y$Y$.
• Another example: the 3SAT problem
• Let us have a boolean formula, that is a set of three-variable disjunctive constraints (something like (x1 or x2 or x3) and (! x1 or ! x2 or x4) and (! x1 or x3 or ! x4) and (x1 or x3 or x4), where each parenthesis group represents a single three-variable disjunctive constraint)
• 3SUM problem: given such a formula, assign values to the variables that satisfies the formula
• We can use the independent set problem to solve the 3SUM problem, by constructing a graph as follows:
• For each disjunctive constraint clause, add three vertices to the graph in a triangle (each vertex contains a variable of the clause, such as x3 or !x1)
• Between each occurence of a variable and its negation (for example, all x1, !x1 pairs) in the graph, add an additional edge (thus all x1, !x1 pairs are connected)
• We solve 3SUM by finding an independent set in this graph of size k$k$, where k$k$ is the number of variables in the boolean formula (in the above example, 4$4$)
• Summary: 3SUM reduces to the independent set problem. Thus, a worst case lower bound on 3SAT also applies to the optimal algorithm for solving the independent set problem

#### Complexity Classes

• A decision problem is one that has a yes or no answer
• Example: determine if an array has a duplicate pair of elements
• Not example: count number of duplicate pairs in an array
• A problem is said to be in complexity class P$P$ if:
• It is a decision problem
• It can be solved in O(Nk)$O(N^k)$ time, for some k$k$
• Note: P$P$ stands for polynomial, referring to polynomial time
• Minor note: N$N$ is the number of bits needed to specify the problem's input
• For example, the problem of determining if an array of numbers has a pair of elements that sum to zero has runtime O(N2)$O(N^2)$ (sort, then use two pointers), and is in complexity class P$P$
• A problem is said to be in complexity class NP$NP$ if:
• It is a decision problem
• A 'yes' answer to the problem can be verified in O(Nk)$O(N^k)$ time for some k$k$
• More precicely, a specific 'yes' example for the problem can be verified in O(Nk)$O(N^k)$ time
• For example, the independent set problem is in complexity class NP$NP$, as it takes O(kN)O(N2)$O(kN)\leq O(N^2)$ time to verify that there indeed exists a set of independent vertices of some size kN$k\leq N$
• Problems in P$P$ are considered "easy"
• The runtime, O(Nk)$O(N^k)$ is closed under addition (run a P$P$ algorithm N$N$ times, and the runtime is still in P$P$ - still in th P$P$ complexity class)
• For practical problems, the k$k$ exponent is usually small
• Most practical problems are in NP$NP$
• Example 1: given an integer Z$Z$, does there exist two primes X$X$ and Y$Y$ whose product is Z$Z$?
• Example 2: is there a way to route an airline's airplanes such that the total cost is lower than my operating expense's annual budget?

#### Completeness

• Let a problem π$\pi$ be defined as NP$NP$-complete if:
• π$\pi$ is in NP$NP$
• π$\pi$ solves every other problem in NP$NP$ (all problems in NP$NP$ reduce to π$\pi$)
• A 1971 proof by Stephen Cook proved that the 3SAT problem is NP$NP$-complete
• 3SAT (described above) is in NP$NP$, and if we can efficiently solve 3SAT, we can efficiently solve any decision problem whose answer can be efficiently verified (decision problem in NP$NP$)
• High level understanding of Cook's proof:
• Create an enormous boolean logic expression that represents the entire state of a computer at every timestep
• For example, we could have a boolean variable in this expression that is true if the kth$k^{th}$ bit of memory is true, and we are on the jth$j^{th}$ line of code and on the lth$l^{th}$ execution cycle
• If the solution to this giant logic expression takes polynomial time, the boolean logic circuit is polynomial in size
• In 1972, Dick Karp showed that 21 famous NP$NP$ problems solve 3SAT, and are thus also NP$NP$-complete
• Since then, thousands more problems have been proven to be NP$NP$-complete
• Some NP$NP$-complete problems: 3SAT, independent set, hamiltonian path finding, hamiltonian cycle finding, finding longest paths in graph, and even determining if a Super Mario Bros. level has a solution
• We prove problem X$X$ is NP$NP$-complete by solving any NP$NP$-complete problem with X$X$
• The worst and best case runtime bound for any NP$NP$-complete problem applies to all other NP$NP$-complete problems (including the independent set problem)
• There is no known difficulty/complexity for NP$NP$ problems
• Connection to last lecture:
• Finding a good compression algorithm is an NP$NP$ class problem, and for all NP$NP$ problems there is no definite known complexity. NP$NP$ problems are conjectured to be quite hard (complexity somewhere between a runtime like Θ(N2)$\Theta(N^2)$ and impossible (like Kolmogorov complexity))

#### P = NP?

• Review:
• P$P$ is the set of decision problems that can be solved in O(Nk)$O(N^k)$ time
• NP$NP$ is the set for which a yes answer can be verified in O(Nk)$O(N^k)$ time
• NP$NP$-complete problems are NP$NP$ problems that solve all other NP$NP$ problems
• P$P$ problems and NP$NP$-complete problems are a subset of all NP$NP$ problems
• This is intuitively because if you can solve a problem, you can verify it
• Question posed by Cook in 1971: are all NP$NP$ problems also in P$P$?
• are efficiently verifiable problems also efficiently solvable (P=NP=NP$P=NP=NP$-complete)?
• It is unknown if P=NP$P=NP$ - to date, nobody has provided a valid proof
• If in fact P=NP$P=NP$, then we would be able to determine statements like "does there exist a program of less than X$X$ bytes that compresses B$B$ in less than C$C$ cycles?"