# NP-Complete isn't (always) Hard

A common assumption I see on the ‘net is that NP-complete problems are impossible to solve. I recently read that dependency management in Python is hard because package resolution is NP-complete. This is true in principle, but the reality is more complicated. When we say “NP-complete is hard” we’re talking about *worst-case complexity*, and the average-case complexity is often much more tractable. Many industry problems are “well-behaved” and modern SAT solvers can solve them quickly.

Okay I’ve probably lost 40% of my readers by this point so let’s back up and explain “NP-complete” and “SAT solver” and stuff.

### Terms terms terms

(If you’re not familiar with computational complexity and O(n) and what “solvable in polynomial time” means, this is a pretty good overview.)

NP is the set of all problems that are solvable in polynomial time by a “nondeterministic Turing Machine” (NTM). Very roughly, that’s a TM that, when faced with a choice, can create parallel timelines and try a different choice in each timeline. This is equivalent to the set of all problems that can be *verified* in polynomial time, which is the definition most non-CS people learn.^{1}

All problems in P are also in NP. It’s an open question whether NP problems *can’t* be solved in deterministic polynomial time, aka the P=NP problem. If P!=NP, then some NP problems are harder than others. There’s also an upper bound to “how hard” an NP problem can be. This set of “most difficult” NP programs (which again, *might* just be P) form the NP-complete complexity class. There are also problems that can’t even be *verified* in polynomial time. NP-hard is the set of all problems that are NP-complete *or harder*.^{2} People often conflate NP-complete and NP-hard.

The quintessential NP-complete problem is Boolean SATisfiability, or SAT. A SAT problem consists of a set of boolean variables and a set of clauses, like this:

```
A | C
!A | D | !E
B | !D | E | F
…
```

The goal is to find an assignment of variables that makes *every* clause true. Checking an answer is obviously in P: just plug the proposed assignments into the clauses and see if they all pass. We don’t have a polynomial time algorithm for finding assignments, though, and mathematicians are confident there isn’t one. It’s NP-complete. By the Cook-Levin Theorem, any other NP-complete problem can be converted into a SAT problem, and in fact “convert NP-complete problem to SAT” is in P.

That’s a lot of words about why NP-complete *is* hard, so let’s talk about why it *isn’t*.

### Why it’s not hard

When we talk about computational complexity, we’re almost always talking about worst-case complexity. The *hardest-possible* SAT problem with 19,203 variables will take twice-ish as long to solve as the *hardest-possible* SAT problem with 19,202 variables.^{3}

However, this says nothing about the *average* problem. Imagine if 99.99% of SAT problems could be solved instantly, while the remaining 0.01% took exponential time. Then SAT would still be considered NP-complete, even though the “typical” case was trivially solvable.

Of course life isn’t that easy and randomly generated SAT problems tend to be intractable.^{4} But a lot of industrial problems map to the subset of SAT problems that *are* tractable (*why* this is is an open question), meaning you can some pretty big gains from making fast SAT solvers. *In principle*, that’s all you need to realistically solve NP-complete problems. Any NP-complete problem can be converted to SAT, and since most industrial problems are well-behaved, this means that solving NP-complete problems is often quite viable.

### Walking the walk

I grabbed one SAT solver, Glucose, and tried it on a Sudoku, from my how to solve Sudoku satire. That has 729 variables with 12,000 clauses, and Glucose could solve it in a tenth of a second.^{5}

Next, I downloaded the SAT Race 2019^{6} problems and picked out a typical example. `CarSequencing-90-02_c18`

is an example of the “Car Sequencing” problem, which is described and shown to be NP-complete in this paper. This particular problem has 48,491 variables and 550,000 clauses, so we’d expect it to take about `1.4^47762`

times longer, well above the lifetime of the universe. But Glucose finds a solution in just 25 seconds.

On the other hand, the problem `58bits_dimacs`

(I *think* it’s an integer factorization problem?) has just 1,019 variables and 23,771 clauses. I let it run overnight and it didn’t finish. Checking the scoreboard, only 4 out of 55 SAT solvers were able to solve it in under two hours. That’s closer to a worst-case scenario where the blowup is exponential. RSA is still safe!

Problem | Variable Ratio | Clause Rat. | Time Rat. | Time/Var |
---|---|---|---|---|

Sudoku | 1 | 1 | 1 | 1 |

Car Sequencing | 66 | 46 | 250 | 4 |

58bits | 1.4 | 2 | >500,000 | >500,000 |

### So Why Aren’t We Using SAT Solvers?

When I started this post, I thought that SAT solvers were widely used in package dependency management. Turns out they’re not! Most dependency resolution uses PubGrub or an ad-hoc solver. Why is that?

I have four guesses:

- Converting problems to SAT is intellectually difficult. I felt real clever when I figured out how to convert Sudoku, but I wouldn’t be able to convert much else without a lot more experience.
- Converting problems introduces overhead. SAT clauses are a restrictive format, so you have to add lots of auxiliary booleans to track various aspects of your original problem. And there’s often several ways to convert a problem to SAT with different performance characteristics. The Car Sequencing folk came up with three different conversions, and none were strictly “the best”.
- Dedicated solvers can exploit properties in the problem domain that are lost by conversion to SAT. Integer Programming is an NP-complete problem but solvers just work with problems directly.
- Often you want to pull metadata from the problem, like return which specific dependency requirements conflict. That’s a lot harder to do (if at all possible) after you’ve converted to SAT.

## Further Reading

If you’re interested in learning more about SAT solvers:

- A Primer on Boolean Satisfiability
- SAT/SMT by Example (pdf)
- Using SAT Solvers to Prevent Causal Failures in the Cloud
- SAT competitions
- Modern SAT solvers: fast, neat and underused

If you’re interested in learning more about NP-completeness:

*Thanks to Alex Koppel and Jimmy Koppel (no relation) for feedback.*

- Very-rough-sketch-of-proof: 1) If you can check a solution of length N in polynomial time, just branch on
*every single possible input*of length N. 2) If you can solve a problem with branching, have the NTM also track the history of every step. Then a valid solution will also have a linear set of steps.^{[return]} - Finding problems that are NP-hard but
*not*NP-complete is really annoying, because a lot of mathematicians are smartasses and say “the halting problem!” I*think*most bounded model checking falls under this category, though.^{[return]} - In practice it’s more like 1.4ish times harder, so adding 10 variables “only” makes it 30x harder (as opposed to 1000x).
^{[return]} - Early research showed that there’s a phase transition in the ratio of clauses to variables: there’s a band of ratios where the problem is
*really hard*, and it’s otherwise pretty easy. Later research is challenging this idea, though.^{[return]} - All of these are done on an i9-12900K with 64gb ram.
^{[return]} - Oh yeah, there’s an annual SAT competition. The 2019 problem set is 3.5 gigabytes (zipped).
^{[return]}