SMT-for-IN3070/commities.org

14 KiB

Introduction

At the Department of Informatics (University of Oslo), all exams are corrected by a committee consisting of two examiners. For large courses, there are often many examiners where some wants to correct more than others. The administration is responsible for forming these committees. Sometimes there are additional constraints on what examiners can form a committee (the typical example being that two examiners are professors and two are master students).

Before digitizing exams at the department, the administration would have physical copies of the exam to distribute. This would actually make it easier to form the committees, because the constraints could be handled on the fly. When digitized, the problem would essentially turn into a math problem which is not particularly easy to solve.

This is an actual email (in Norwegian) forwarded to me from someone in the administration:

Mine mattekunnskaper er tydeligvis fraværende. Jeg klarer ikke finne en fornuftig løsning på dette:

A: 160
B: 150
C: 110
D: 60
E: 60
F: 30

Det er snakk om sensur i inf1300 med 566 besvarelser. D og E kan ikke rette mot hverandre. De bør helst rette mot B eller C.

Har du et bra forslag til meg? Jeg blir GAL. Det var bedre før, da jeg hadde besvarelsene fysisk å kunne telle ark.

Har du mulighet til å hjelpe en stakkar?

Takk

Being programmers who have recently heard of this thing called SMT-solving, we happily research the subject in trying to find a general solution to this cry for help.

Satisfiability modulo theories (SMT)

Satisfiability refers to solving satisfiability problems, i.e. given a first order logical formula $\phi$, decide whether or not there exists a model $\mathcal{M}$ such that $\mathcal{M} \models \phi$. In general, this is an undecidable problem. However, there are theories within first order logic that are decidable. SMT solvers can produce models that satisfy a set of formulas for many useful theories, some of which are satisfiable.

The solver we will be using is Z3.

Theories

Example of theories can be the theory of booleans (or propositional logic), integers or real numbers with equations and inequations, or other common programming concepts like arrays or bitvectors. Z3 supports solving constraint problems in such theories. More formally, we define theories as follows:

A theory is a set of first order logic formulas, closed under implication.

We can imagine how this might work. The natural numbers can, for instance, be expressed with the Peano axioms.

  1. $\forall x \in \mathbb{N} \ (0 \neq S ( x ))$
  2. $\forall x, y \in \mathbb{N} \ (S( x ) = S( y ) \Rightarrow x = y)$
  3. $\forall x \in \mathbb{N} \ (x + 0 = x )$
  4. $\forall x, y \in \mathbb{N} \ (x + S( y ) = S( x + y ))$
  5. $\forall x \in \mathbb{N} \ (x \cdot 0 = 0)$
  6. $\forall x, y \in \mathbb{N} \ (x \cdot S ( y ) = x \cdot y + x )$

In addition, one axiom is added to formalize induction. Because a theory is closed under implication, the theory consists of all true first-order propositions about natural numbers.

However, in Z3, we don't see such axioms; they just provide the formal justification for implementing special solvers for problem domains like natural numbers other commonly used theories. In z3, we could write something like this:

(declare-const a Int)
(declare-const b Int)
(declare-const c Int)

(assert (< 0 a b c))

(assert (= (+ (* a a) (* b b)) (* c c)))

(check-sat)
(get-model)

This encodes two constraints

  • $\phi_1 = 0 < a < b < c$
  • $\phi_2 = a^2 + b^2 = c^2$

where $a,b,c$ are whole numbers. Then we ask Z3 to produce a model $\mathcal{M}$ such that $\mathcal{M} \models \phi_1 \land \phi_2$, which outputs:

sat
(model
  (define-fun c () Int
    5)
  (define-fun b () Int
    4)
  (define-fun a () Int
    3)
)

The first line sat indicates that the formula is satisfiable, and produce a model where $a^\mathcal{M}=3$, $b^\mathcal{M}=4$ and $c^\mathcal{M}=5$.

Many-sorted first order logic

Z3 implements SMT-LIB, a standardized syntax and semantics for SMT solvers. It's underlying logic is a Many-sorted first order logic, where values must have an associated sort (which is a basic form of type). Think of it as partitioning the domain, where each sort corresponds to a part. A signature in a many-sorted first logic is defined as follows.

A signature $\Sigma = (S, F, P)$ consists of a countable set of

  • Sorts $S$.
  • Function symbols $F$, where each member is a function symbol $f$ with an associated type $s_1 \times \dots \times s_n \to s$, where $s \in S$ and $s_1, \dots, s_n \in S$. Constants are simply zero-arity function symbols.
  • Predicate symbols $P$, where each predicate has an associated type $s_1 \times \dots \times s_n$. We assume an equality $=_s$ predicate with type $s \times s$ for all sorts in $S$.

The equality relation will be denoted $=$, letting the sort remain implicit.

For example, the signature for the integers can be formalized as $\Sigma_{int} = (S_{Int}, F_{Int}, P_{Int})$ where

  • $S_{Int} = \{Int\}$
  • $F_{Int} = \{0, 1, +, -, *\}$ where the constant symbols $0, 1$ has a type signature $\to Int$ and the function symbols $+,-,*$ has a type signature $Int \times Int \to Int$.
  • $P_{Int} = \{<, =\}$ where the predicate symbols $<, =$ has type signature $Int \times Int$.

Back to the problem

We have 283 exams. Every exam must be corrected by a committee consisting of two examiners. Each examiner has an associated capacity of exams they want to correct. Examiners D and E can't be in the same committee, and should rather be in committee with B or C. We prefer a smaller number of committees.

We use the Python API for Z3. Create a Python file and populate it with:

from z3 import *

This allows us to generate instances for Z3 to solve with Python.

Instances

Let's formulate an instance as a four-tuple $(N, C, S, A)$ where

  • $N$ is the number of exams to correct
  • $C$ is a list of capacities, where each examiner is identified by her position of the list
  • $S$ is a relation, relating an examiner to one they should form a committee with
  • $A$ is a relation, relating examiners that we should avoid placing in the same committee

The instance, described in the introduction, can be represented with the following Python code:

def example_instance():
    N = 283
    #    A    B    C    D   E   F
    C = [160, 150, 110, 60, 60, 30]
    S = {3 : {1, 2}, 4 : {1, 2}}
    A = {frozenset([3, 4])}
    return (N, C, S, A)

Constraint modeling

We need to capture our intention with first-order logic formulas, and preferably quantifier-free. In SMT-solving, quantifier-free means that we only try to solve a set of constraints where no variable is bound by a quantifier; these are usually much easier to solve. Rather, we use a finite set of constant symbols, with some associated sort, and try to find an interpretation for them.

The end result needs to be a set of committees, where each committee consists of two examiners with a number of exams to correct. An important part of finding a reasonable encoding is to balance what part of the problem should be solved in pure Python and what should be solved by the SMT-solver.

Modeling committees

A natural encoding could be modeling all possible committees as an integer constant, where the value assigned to a committee corresponds to the number of exams they correct.

We let a committee be a set of two exactly two examiners, identified by their index in the capacity list. Let's write a function that takes a list of capacities, and return a dictionary, associating (Python) committees to their corresponding integer constant.

def committees(C):
    cs = {frozenset([i,j])
          for i in range(len(C))
          for j in range(i+1, len(C))}
    return {c : Int(str(c)) for c in cs}

Now we must ensure that no examiner receives more exams than their capacity. Given an examiner $i$, where $0 <= i < N$, we let $c_i$ denote the set of all committees $i$ participates in. Then $\sum{c_i} <= C[i]$, i.e. the sum of the committees $c_i$ does not exceed the capacity of the examiner $i$. We write a function that encodes these constraints.

def capacity_constraint(comms, C):
    return [sum(comms[c] for c in comms if i in c) <= C[i]
            for i in range(len(C))]

Because we are modeling committees as integers, we have to be careful not to allow committees correcting a negative number of exams.

def non_negative_constraint(comms):
    return [0 <= comms[c] for c in comms]

The $S$ relation is sort of odd. That one examiner should form a committee with someone they relate to by $S$. Let's interpret it as $e_1$ should not form a committee to someone they are not related to by $S$.

def should_correct_with_constraint(comms, S, C):
    examiners = set(range(len(C)))
    return [comms[frozenset([i, j])] == 0
            for i in S
            for j in examiners - S[i]
            if j != i]

The $A$ relation is similar, and easier.

def avoid_correct_with_constraint(comms, A):
    return [comms[frozenset([i, j])] == 0 for i, j in A]

Each committee will correct their exams to times, so if the sum of all the committees is $N$, then all exams have been corrected twice. Let's encode that as a constraint.

def all_corrected_constraint(comms, N):
    return [sum(comms.values()) == N]

Let's collect all the constraints in a single list.

def constraints(instance):
    N, C, S, A = instance
    comms = committees(C)
    return (capacity_constraint(comms, C) +
            non_negative_constraint(comms) +
            all_corrected_constraint(comms, N) +
            should_correct_with_constraint(comms, S, C) +
            avoid_correct_with_constraint(comms, A))

Invoking Z3

Now that we have functions that model our problem, we can invoke Z3.

def check_instance(instance):
    s = Solver()

    s.add(constraints(instance))

    s.check()
    return s.model()

Calling check_instance(example_instance()) returns a model:

[frozenset({0, 1}) = 83,
 frozenset({0, 2}) = 47,
 frozenset({0, 5}) = 30,
 frozenset({2, 3}) = 0,
 frozenset({1, 2}) = 3,
 frozenset({1, 3}) = 60,
 frozenset({2, 5}) = 0,
 frozenset({1, 5}) = 0,
 frozenset({2, 4}) = 60,
 frozenset({4, 5}) = 0,
 frozenset({0, 4}) = 0,
 frozenset({3, 5}) = 0,
 frozenset({3, 4}) = 0,
 frozenset({0, 3}) = 0,
 frozenset({1, 4}) = 0]

This is not especially readable, so let's write a quick prettyprinter.

def prettyprint(m):
    for k in m:
        cname = k.name()[10:-1]
        cval = m[k].as_long()
        if cval > 0:
            print(cname + ':', cval)

This outputs the something like:

{0, 1}: 23
{0, 2}: 110
{0, 5}: 23
{1, 3}: 60
{1, 5}: 7
{1, 4}: 60

Note the something like. There are multiple ways to satisfy this set of constraints, and Z3 only guarantees to provide some solution (if one exists).

Optimization

So far, we have found a way to model the problem and satisfy all the constraint. However, it is preferable to have fewer committees, because all committees have to discuss the exams, causing administrative overhead. Z3 also provides optimization, meaning that we can find a smallest or largest solution for numeric theories.

Minimize committees

In our case, we want to minimize the number of committees.

def minimize_committees(comms):
    return sum(If(0 < comms[c], 1, 0) for c in comms)

Now we can invoke Z3, using an Optimize instance and adding our minimization constraint.

def optimize_instance(instance):
    o = Optimize()

    o.add(constraints(instance))
    o.minimize(minimize_committees(committees(instance)))

    o.check()
    return o.model()

There is still more than one way to satisfy this model, but we are guaranteed to get a minimal number of committees (which is 6 in our example).

{2, 3}: 56
{0, 1}: 137
{2, 5}: 7
{0, 5}: 23
{2, 4}: 47
{1, 4}: 13

COMMENT Implementation draft

N, C, S, A = instance = example_instance()

comms = committees(C)

s = Solver()

s.add(capacity_constraint(comms, C))
s.add(non_negative_constraint(comms))
s.add(all_corrected_constraint(comms, N))
s.add(should_correct_with_constraint(comms, S, C))
s.add(avoid_correct_with_constraint(comms, A))

s.check()
print(s.model())

COMMENT Local variables