kaashif's blog

Programming, with some mathematics on the side

Searching for Planet X with the Z3 solver

2023-02-26

A few weeks ago, I played the board game "The Search for Planet X". The premise is that you have a circular board divided into 12 sectors, each containing one object. That object could be an asteroid, a gas cloud, and so on, but most importantly, it could be Planet X. Which object is in each space is hidden at the start of the game and you're racing your opponents to discover Planet X by scanning sectors and deducing information using a set of rules like "an asteroid is always adjacent to another asteroid". The winner is the first player to correctly guess the location of Planet X and the two adjacent objects.

The full rules can be found here: https://foxtrotgames.com/planetx/.

I don't find these kinds of games very fun, but it did get me thinking: what's the best strategy? How many possible boards are there, and how hard is this game?

This meant I had to write a program to:

  1. Generate all possible boards
  2. Come up with various strategies to pick the best action
  3. See how good those strategies are

The source code is here: https://github.com/kaashif/search-for-planet-x-solver.

Step 1: Generating all possible boards

My first attempt at doing this was to randomly assign locations for comets, then the asteroids, then the gas clouds, and so on, and just run that process for a long while. Not very intelligent.

A better way is to use the Z3 theorem prover, encode the logic rules describing the possible boards, and generate all models satisfying those rules.

This is actually pretty easy, Z3 has a set of very ergonomic Python bindings.

To encode the board state for the standard board with 12 sectors, we can create 12 variables $X_i$, all integers ranging from 1 to 6 (representing the 6 types of object), and apply the constraints from the rulebook. For example, the rule saying that an asteroid must be adjacent to another asteroid can be encoded as:

z3.Implies(
    X[i] == ObjectType.ASTEROID.value,
    z3.Or(
        X[prev_sector(i)] == ObjectType.ASTEROID.value,
        X[next_sector(i)] == ObjectType.ASTEROID.value
    )
)

Once we have all of the constraints, we can generate a model i.e. a set of values for the variables satisfying the constraints.

solver = z3.Solver()
solver.add(*constraints)
model = solver.model()

But how do we get the other models? The trick is to save that model and add a new constraint saying that we only want new models, i.e. at least one variable is different from the model we've just seen. Then we can just generate models until the constraint set is unsatisfiable.

while solver.check() == z3.sat:
    model = solver.model()
    models.append(model)

    # At least one of the variables is different
    solver.add(
        z3.Or([
            x != model[x]
            for x in X
        ])
    )

Using this method, I get 4446 possible boards. There may also be an extra constraint present in any real game that it's always possible to know the location of Planet X. The reason that's sometimes not possible is that Planet X shows up in scans as "empty", and there are two "truly empty" sectors. It's possible to work out if a sector is truly empty most of the time, since gas clouds must be adjacent to a truly empty sector - if there are three empty sectors and only one of them is away from a gas cloud, that's Planet X.

It's unclear to me whether that constraint really exists, so I didn't use it.

Step 2: Picking the best actions

There are a few types of action possible:

  • You can survey a range of sectors for a particular kind of object and get back the number of objects of that type in your range, this takes less time the wider the search - it costs 3 days for a survey of 4, 5, or 6 sectors.

  • You can target a sector and know what's there - that takes 4 days.

Those two are easy to implement, since the set of possibilities is limited and well-defined. Target is just bad and never a good action given its time cost, as we'll see later, so I didn't implement it.

  • You can research a topic, getting a clue in the form of a new logic rule, e.g. "Planet X is next to a comet". I didn't implement this because I don't have the full set of research clues. The best I have is some clues I extracted from a set of PDFs I found on BoardGameGeek, but clues are very particular to the board - the creators don't give clues that narrow things down too much given the board. e.g. you won't get "Planet X is next to a comet" when the comets are next to each other.

    I did have a fun time writing a script that uses pdfminer and some regexes to extract the clues from those PDFs, but there were only 50 boards in that set - not a big enough sample to be able to derive how research clues depend on the board.

What's the best action?

Intuitively, you might say something like you want the most information for the time taken. It turns out that has a rigorous definition: the self-information of an event $x$ is $-log_b(p(x))$ where $b$ is any base we choose. If we choose $b=2$, we measure information in Shannons where 1 Shannon is the information content of 1 bit. I'll just say "bit" from now on.

We're trying to guess the board, and we can reasonably say that each possibility for a board has equal probability.

I claim (without proof, and I'm not going to prove it because I'm not completely sure it's true) that the best action to take is the one with the largest expected self-information. The expected information of a random variable $X$ is called its entropy and is denoted by $\mathcal{H}(X)$.

This makes some sense because you'd obviously never take an action with a guaranteed outcome since that tells you nothing. The entropy in that case is zero. On the other end of the spectrum, knowing which way a 50/50 choice went tells you much less than knowing which way a uniform choice among a billion choices went. Intuitively, playing the lottery isn't expected to have a surprising outcome so if there's one universe where you win and a million where you lose, playing the lottery isn't a great way to narrow down which universe you're in.

Now we're in good shape, our algorithm is:

  1. Generate all boards.
  2. Pick the action $A$ which maximises $\mathcal{H}(A)$ and execute it.
  3. Filter down the set of possible boards given the result.
  4. Continue until we know where Planet X is and what its two adjacent objects are.

Step 3: How does it perform?

Without doing any research and just doing surveys, we find Planet X in ~30 days on average. That's after running a few thousand simulations.

It's nice to get some confirmation that some players' intuitions are good: the best action in terms of entropy per day is to make the widest possible search for asteroids. This makes sense since there are four asteroids, there are a wide range of possible answers - you can learn a lot.

Another confirmed intuition is that research is really good. Although I didn't implement research actions in our algorithm, we can calculate the expected information gain from some research clues by hand. Some clues give on the order of 2 or 3 bits of information per day compared to the best survey with an entropy of less than 1 bit per day.

It may be possible to kind of guess which research topics are usually better than others using our small sample of 50 sets of rules. If we knew the full set of rules for all boards, we could:

  1. Know when to pick certain research projects given our current information

  2. Deduce which board we're on from the clues - the clues are part of the board.

(1) is in keeping with the spirit of the game. (2) seems like cheating. In any case, doing an immediate research, a survey, then another research is sure to be a good move. Intuitively it seems like research projects for more numerous objects are better - the asteroid research projects in particular may be a strong starter.

Conclusions

We don't really have enough information (haha) to reliably test the performance of our algorithm against the Planet X web app "AI". That's because we don't have enough information on the strongest move: research.

Given that, there's only one possible next step: write a program to scrape the web app for all research projects, boards, and bot moves.

The bot moves are pre-determined, so given those we know whether we can beat the bot or not on that board. I think it's probably not fair to use advance knowledge of the bot's moves to deduce information about the board - the bot's moves take no time for us and would tell us something, so those moves would be infinitely good. That also wouldn't work against a real player, unless we knew exactly what their algorithm is (an easy countermeasure: randomly pick a slightly suboptimal move).

I'm thinking of using something like https://playwright.dev/, which I've already successfully used to auto-play some online puzzle games.