Optimizing MiniZinc

This is part two of a two-part article. Part one is here.

Last time we left off with this model:

include "globals.mzn";

enum DAYS;
enum ROOMS;
enum TIMES;

enum talks;
constraint assert(card(DAYS) * card(ROOMS) * card(TIMES) == card(talks), "Number of slots must match number of talks");

enum people;

array[people, talks] of 0..4: pref;
array[DAYS, TIMES, ROOMS] of var talks: sched;

constraint alldifferent (d in DAYS, t in TIMES, r in ROOMS) (sched[d, t, r]);

array[people] of var int: score;
constraint forall(p in people) (
  score[p] = sum([if exists (r in ROOMS) (pref[p, sched[d, t, r]] = 4) 
       then 1 else 0 endif | d in DAYS, t in TIMES] 
   )
);

solve maximize sum(score);

output 
 ["SCORE: \(sum(score)) \n\n"] ++
 [show(sched[d, t, r]) ++ " " ++ 
   if r == card(ROOMS) then if t == card(TIMES) then "\n\n" else "\n" endif else "" endif | d in DAYS, t in TIMES, r in ROOMS];

We wanted to run it for two days, four time slots, and three rooms. I generated the data by running 20 (?@:$) 5 in J and copying it over:

pref = [| 
 2, 4, 0, 1, 1, 4, 0, 3, 0, 3, 3, 3, 4, 1, 0, 3, 3, 2, 0, 3, 2, 3, 1, 2,
|2, 1, 1, 3, 2, 1, 4, 0, 1, 2, 3, 4, 2, 0, 3, 1, 0, 2, 1, 3, 4, 3, 1, 1,
|4, 0, 0, 3, 4, 1, 4, 4, 2, 2, 3, 4, 1, 2, 2, 3, 0, 3, 3, 3, 1, 2, 1, 2,
|3, 2, 2, 2, 4, 3, 0, 2, 4, 4, 0, 1, 4, 0, 3, 0, 4, 1, 0, 2, 3, 1, 1, 0,
|3, 3, 2, 4, 4, 0, 3, 1, 4, 4, 1, 2, 3, 1, 3, 2, 0, 2, 2, 4, 2, 4, 2, 1,
|4, 1, 1, 4, 1, 1, 3, 0, 3, 0, 0, 3, 1, 3, 3, 0, 2, 3, 3, 4, 4, 2, 3, 2,
|4, 2, 0, 2, 3, 3, 4, 3, 3, 4, 1, 3, 2, 3, 2, 0, 1, 4, 4, 1, 1, 1, 0, 0,
|0, 4, 4, 1, 3, 3, 2, 0, 4, 4, 0, 0, 4, 0, 1, 1, 1, 1, 4, 1, 3, 1, 0, 3,
|0, 0, 0, 4, 1, 3, 4, 0, 3, 4, 3, 3, 2, 4, 2, 4, 3, 4, 1, 1, 0, 0, 1, 1,
/* you get the picture

The model does not finish running after 8 hours. Instead of using 20 people, I switched to solver for just the first 4. Under this, the median of 5 runs was 17 seconds. That’s what we’ll be optimizing.

Switch Solvers

The first and biggest change is in switching solvers. MiniZinc is only the higher-level language you write in. You can choose the backend solver most appropriate to your model. This could be the difference between “finished in 300ms” and “still going a year later”.

The problem here is that it’s not obvious what the right solver is. Gecode is faster than Chuffed for very small schedules but tails off much faster. I’ve seen some problems where one solver is faster at finding intermediate solutions, while the other is faster at reaching the final solution. There’s also cases where one solver is much slower, but if you add a certain optimization to it then it becomes much faster than the rest. The entire thing is nonlinear and just because I found the best solver for a given set of improvements doesn’t mean I’ve found the best solver for the problem. I settled on Chuffed, but for all I know there’s some configuration which makes Gecode a million times faster.

One super frustrating thing is that every time you open the IDE it defaults to Gecode, no matter what you were using before. There doesn’t seem to be a way to configure a default solver for a project. This isn’t a problem if you use the command line, though.

More Hardware

The other obvious change. Faster computer = faster solving. I’m running this on a 2016 Surface Pro 4 with 8 GB ram and a 2.4 Ghz i5. That’s a hilarious underspecced computer for any kind of computational work.

The first problem with “moar hardware” is that it’s sublinear: if I double the cores I’d expect at most a 2x speedup. The second is it doesn’t teach me anything about writing optimized models. I’m better off making my model fast and only then, once I’ve done all the optimizations I could, start paying my way out of this.

Change Optimization Level

The solver isn’t directly solving the MiniZinc constraints. Rather MiniZinc compiles the model and data into an intermediate language called FlatZinc that the solvers can target. We can change the compilation optimization level, ideally making simpler FlatZinc constraints for the solvers to process.

We start on O1. Jumping to O2 increases the runtime to 65 s, while O3 is currently blocked off due to an unbound variable. We’ll come back to this.

Flatten the Axis

The first change we can make is to reduce the cardinality of the data. We’re currently processing over three axes: days, times, and rooms. But at a high level, there’s really no difference between days and times! We’re never using just days or just times, and we always iterate over both at once. We can get rid of days entire and double the number of valid times, thereby preserving the same problem with one fewer axis.

- enum DAYS;
enum ROOMS;
enum TIMES;

- array[DAYS, TIMES, ROOMS] of var talks: sched;
+ array[TIMES, ROOMS] of var talks: sched;

- constraint assert(card(DAYS) * card(ROOMS) * card(TIMES) == card(talks), "Number of slots must match number of talks");
+ constraint assert(card(ROOMS) * card(TIMES) == card(talks), "Number of slots must match number of talks");

- constraint  alldifferent (d in DAYS, t in TIMES, r in ROOMS) (sched[d, t, r]);
+ constraint  alldifferent (t in TIMES, r in ROOMS) (sched[t, r]);

constraint forall(p in people) (
- score[p] = sum([if exists (r in ROOMS) (pref[p, sched[d, t, r]] = 4) 
-      then 1 else 0 endif | d in DAYS, t in TIMES] 
+ score[p] = sum([if exists (r in ROOMS) (pref[p, sched[t, r]] = 4) 
+      then 1 else 0 endif | t in TIMES] 
   )
)

output 
 ["SCORE: \(sum(score)) \n\n"] ++
- [show(sched[d, t, r]) ++ " " ++ 
-   if r == card(ROOMS) then if t == card(TIMES) then "\n\n" else "\n" endif else "" endif | d in DAYS, t in TIMES, r in ROOMS];;
+ [show(sched[t, r]) ++ " " ++ 
+   if r == card(ROOMS) then "\n" else "" endif | t in TIMES, r in ROOMS];

Then, in our data file, we replace all of the days with additional times:

- DAYS = anon_enum(2);
- TIMES = {m1, m2, n1, n2};
+ ROOMS = {r1, r2, r3};
+ TIMES = {m11, m12, n11, n12, m21, m22, n21, n22};

With this change, we’ve gone from 17 s to 16 s, or just under a 6% time reduction. Not a big improvement, but this change sets us up for much bigger optimizations down the road.

Optimize `score` constraint

I spent way too much time on this one.

constraint forall(p in people) (
    score[p] = sum([if exists (r in ROOMS) (pref[p, sched[t, r]] = 4) 
        then 1 else 0 endif | t in TIMES] 
     )
);

Look at how ugly that is! Why am I adding one for every matching room? I should just count the number of (ROOMS, TIMES) tuples that match a prefered talk. I can do that by using a set comprehension and taking the cardinality of the resulting set:

constraint forall(p in people) (
    score[p] = card({pref[p, sched[t, r]] = 4 | r in ROOMS, t in TIMES} 
   )
);

With this change, we decrease the time from 16 to

> MiniZinc: internal error: var set comprehensions not supported yet

…what?

Okay, here’s something important about MiniZinc: it’s REALLY hard to convert an array into a set. There’s a set2array but not an array2set. There is no unique function. Set comprehensions don’t work except on constant arrays. As far as I can tell, the reason is MiniZinc supports par arrays of vars and var sets of pars but not any kind of var array or set of vars. Those kinds of values are way too complicated for solvers to handle.

What if, instead of an if statement, we just coerce the exists to an integer?

constraint forall(p in people) (
    score[p] = sum([bool2int(exists (r in ROOMS) (pref[p, sched[t, r]] = 4)) | t in TIMES] 
     )
);

This increases the time from 16 s to 18 s. Guess if is faster than casting. Finally I have an epiphany and remember that MiniZinc supports implicit type coercion. So I can just write

constraint forall(p in people) (
    score[p] = sum([exists (r in ROOMS) (pref[p, sched[t, r]] = 4) | t in TIMES] 
     )
);

True values are 1, false values are 0, sum works as it should. Great! This still leaves us with 18 s. I’m guessing it’s because Chuffed just uses bool2int behind the scenes but haven’t confirmed that.

We’re going to come back to this one later. Much later.

Bound on `score`

According to the MiniZinc manual, you should avoid having unbound vars, as the solver has to deal with a larger search space. So let’s put a bound on score. We know that the score is the sum of the times where a person has a favorite talk, which means the score for each person can’t be more than the number of existing times.

- array[people] of var int: score;
+ array[people] of var 0..card(TIMES): score;

This change takes the runtime from 16 s to 43 s, and almost 3x slowdown.

It was a while before I noticed the slowdown! I was originally working in Minizinc 2.1, where the fastest solver was lazyfd. For lazyfd, bounding the score is a pretty significant improvement, so I kept it. In MiniZinc 2.2 lazyfd was depreciated and replaced by Chuffed, which is on the whole much faster, but for which bounding the score is a regression. One solver’s improvement is another solver’s regression.

Switching to bounds lets us compile with O3, which reduces the runtime to 21 s. Still worse overall, so let’s go back to using var int.

Symmetry Breaking

In our current model, rooms and times are each symmetric. If I take all the talks at a single time and shuffle the rooms they are in, I’ll end with the exact same score. Similarly, if I swap all of the talks between two time slots, I get the same score. If T is the number of times and R is the number of rooms, every schedule has (R!^T)T! symmetrically-identical counterparts. We can reduce the search space by adding constraints that rule out these symmetries as valid solutions. This is called symmetry breaking.

+ constraint forall(t in TIMES) (
+    increasing (r in ROOMS) (sched[t, r])
+ );
+ 
+ constraint increasing (r in ROOMS) (sched[min(TIMES), r]);

By breaking symmetry, we go from 16 s to 5 s, more than a threefold increase. We’re not done yet. We’re still doing generator expressions in both of our symmetry constraints. We can eliminate the overhead by using the native col and row functions instead of handrolling them ourselves.

constraint forall(t in TIMES) (
-    increasing (r in ROOMS) (sched[t, r])
+    increasing (row(sched, t))
);

+ constraint increasing (r in ROOMS) (sched[min(TIMES), r]);
- constraint increasing(col(sched, 1));

This reduces the time by a whole order of magnitude, from 5 s to 0.5 s. Our model is now too fast to meaningfully benchmark on a 4 person model, so I’m going to increase the model to 8 people. This raises the runtime from 0.5 s to 7 s. This one optimization meant we could double our state space.

Unfortunately, 8 people is a wall for us. If we add a 9th person, the runtime balloons to over five minutes! It turns out in my generated data set, the 9th person is a bit of a problem child: switching them with the 8th person makes our 8 person model take 20 seconds instead of 7. We need more consistent optimizations.

Symmetry breaking optimizations are some of the most “fragile” optimizations. That’s because adding constriants might break them. For example, if one room was bigger than the others we’d no longer have symmetric rooms, and if we wanted to put the lighter talks just after lench then we’d no longer have symmetric times. We might have weaker symmetry optimizations (“all talks but big_room are symmetric”), but we’d still have to rewrite our constraints.

Bound on `score` again

What happens if we place bounds on score now?

- array[people] of var int: score;
+ array[people] of var 0..card(TIMES): score;

For the model with people 1-7 and 9 (1-7+9), this changes the time from 19 s to 24 s. Still a slowdown, but not as much as before. For the model with people 1-8, this changes the time from 7 s to 2 s. Once we’ve added symmetry breaking, Chuffed is better able to leverage the finite bounds we added to score. We see two things:

Optimizations are nonlinear. Whether something’s an improvement or regression depends on your other optimizations.
Whether something is an improvement or a regression depends on the data set.

The second lesson is a huge bummer to me. It means that we can’t optimize a model on a smaller set of data and then apply it to a larger set.

Let’s hold off on the bounds and try to find some more consistent improvements.

Channels

Let’s try something completely different. Our score constraint is doing lookups on sched, a 2 dimensional var array. I have a feeling that retrieving values from a multidimensional array is more expensive than retreiving them from a flat array. We might save some time by replacing a lookup on sched with a lookup on just an array of the talk times.

That means we need a way of decomposing the schedule into an array of talks. The simplest way to do that is to add two more variables, talk_times and talk_rooms, then add a constraint relating them to sched. Such a constraint, which relates two representations of the same model, is called a channel. They’re useful if different constraints are more elegant in different representations. In our case, talk_times is better for maximizing the score, and sched is better for ensuring each talk is scheduled (via alldifferent).

array[TIMES, ROOMS] of var talks: sched;

constraint alldifferent (t in TIMES, r in ROOMS) (sched[t, r]);

+ array[talks] of var TIMES: talk_time;
+ array[talks] of var ROOMS: talk_room;
+ constraint forall(t in talks) (t = sched[talk_time[t], talk_room[t]]);

# ...

constraint forall(p in people) (
-   score[p] = sum([if exists (r in ROOMS) (pref[p, sched[t, r]] = 4) 
+   score[p] = sum([if exists (talk in talks) (pref[p, talk] = 4 /\ talk_time[talk] = t) 
        then 1 else 0 endif | t in TIMES] 
     )
);

This takes both 1-7+9 (19 s) and 1-8 (7 s) to 0.1 s. The full data set takes just 10 seconds. Recall that before we started optimizing, it wasn’t done solving the 20-person case after 8 hours, so we’ve gotten at least a 3,000x speedup so far.

Precompute favorites

Around here’s where I finally deciding to try finding hotspots with trace. trace(s, x) evaluates to x and prints s to the console. I could check how often a single character was printed to see how often something is evaluated. For example, in score we’re iterating over all of talks and selecting only the talks which the person favors. Let’s add trace to see how often this is evaluated.

score[p] = sum([if exists (talk in talks) (trace("+", pref[p, talk] = 4 /\ talk_time[talk] = t))

The inner loop is computed 7600 times. Assuming we don’t expect to change our scoring algorithm, we can get a small boost by precomputing each person’s favorite talks once and then iterating over that smaller set.

+ array[people] of set of talks: fav = [{t | t in talks where pref[p, t] = 4}| p in people];
constraint forall(p in people) (
-   score[p] = sum([if exists (talk in talks) (pref[p, talk] = 4 /\ talk_time[talk] = t) 
+   score[p] = sum([if exists (talk in fav[p]) (talk_time[talk] = t) 
        then 1 else 0 endif | t in TIMES] 
     )
);

If we trace again, the new loop is only computed 1600 times. This is enough to reduce our runtime from 10 s to 20 s.

I feel vaguely betrayed.

Native functions

We’re not quite done, though! Under the new, slower version of score, we’re adding to score whenever there’s a favorite talk for a given time. We’d get the same result if we took every favorite talk, took the corresponding times, and then grabbed the cardinality of that set. Maybe that would be faster?

Unfortunately, this would mean converting an array to a set, and as mentioned before, MiniZinc really doesn’t like you doing that. Fortunately, we have an alternative, which is using the native function nvalues on the array.

constraint forall(p in people) (
-     score[p] = sum([if exists (talk in fav[p]) (talk_time[talk] = t) 
-         then 1 else 0 endif | t in TIMES] 
+     score[p] = nvalue([talk_time[talk] | talk in fav[p]]) 
);

This changes our time from 20 s to just over 2 s. Even though our fav stepping-stone was a regression, the end result is five times faster than without the regression! And we finally, finally, get rid of that if. Native beats bespoke.

To make things a little nicer I decided to replace the forall with a list comprehension.

- constraint forall(p in people) (
-     score[p] = sum([if exists (talk in fav[p]) (talk_time[talk] = t) 
- );

+ constraint score = [nvalue([talk_time[talk] | talk in fav[p]]) | p in people];

This doesn’t significantly improve the runtime but it is a little terser.

Removing `talk_room`

Switching score to use the talk_time channel was a big improvement. But we were only able to enforce the channel by adding a redundant talk_room var which isn’t used in our scoring mechanism. Could we remove the overhead?

array[talks] of var TIMES: talk_time;
- array[talks] of var ROOMS: talk_room;
- constraint forall(t in talks) (t = sched[talk_time[t], talk_room[t]]);
+ constraint forall(t in talks) (exists (r in ROOMS) (t = sched[talk_time[t], r]));

This adds a regression of 1 second. What if we simplify the check?

+ constraint forall(t in talks) (member(row(sched, talk_time[t]), t));

This is actually an error, as row only accepts slicing on pars, not vars. I think this is a good place to stop. Final model:

include "globals.mzn";

enum ROOMS;
enum TIMES;

enum talks;
constraint assert(card(ROOMS) * card(TIMES) == card(talks), "Number of slots must match number of talks");

enum people;

array[people, talks] of 0..4: pref;
array[TIMES, ROOMS] of var talks: sched;


constraint alldifferent (t in TIMES, r in ROOMS) (sched[t, r]);

array[talks] of var TIMES: talk_time;
array[talks] of var ROOMS: talk_room;

constraint forall(t in talks) (t = sched[talk_time[t], talk_room[t]]);

array[people] of set of talks: fav = [{t | t in talks where pref[p, t] = 4}| p in people];

array[people] of var int: score;

constraint score = [nvalue([talk_time[talk] | talk in fav[p]]) | p in people];

constraint forall(t in TIMES) (
  increasing (row(sched, t))
);

constraint increasing(col(sched, 1)); 

solve maximize sum(score);

output
  ["SCORE: \(sum(score)) \n\n"] ++
  [show(sched[t, r]) ++ " " ++ 
    if r == card(ROOMS) then "\n" else "" endif | t in TIMES, r in ROOMS];

Next steps

If we needed to make the model faster, there’s a few things we could try next.

The first is we could formalize the channel. MiniZinc provides native channel predicts we could use instead of doing it manually. Their format is a little funky, though, and I wasn’t able to easily get them working.

One thing I haven’t even touched is annotations. Annotations modify the solver’s strategy, for example by making sure every variable has been constrained before narrowing things down further. This seems way out of my pay grade right now.

Finally, I’m wondering if we could get a good speedup by switching from enums to ints. In theory this shouldn’t matter, but in practice optimizers are weird.

Takeaways

Optimizing a model can be pretty hard. Here’s the heuristics I got out of this exercise:

Optimizations are nonlinear.
Whether something’s an improvement or regression depends on your choice of solver.
Whether something’s an improvement or regression depends on your other optimizations.
Whether something’s an improvement or a regression depends on your data set.
Profile everything and then don’t trust it.
Native constraints beat bespoke constraints.
Global constraints beat localized constraints.
Flat arrays beat deep arrays.
Symmetry is really nice.
Casting arrays to sets is the road to madness.

All of these rules have plenty of exceptions. Do not follow them blindly.

Overall we were able to improve the runtime by three orders of magnitude, possibly even four. But getting there takes some practice. I hope this made the gotchas a little clearer.