AI: A Modern Approach

This semester, I’ve had the pleasure of taking “Intro to Artificial Intelligence” with Stuart Russell and Pat Virtue. Russell authored the definitive Artificial Intelligence, A Modern Approach, used as a foundation for AI curricula across the world.

In the first section of the class we covered “search”, including “constraint satisfaction problems” (CSP). Constraint satisfaction is an efficient way to represent a problem as a set of variables and rules for those variables. For example, you could have two variables whose domains are integers in [0, 10] that share a “diff” constraint- in other words, they may not be assigned the same number. A solution to a CSP happens when all variables been assigned a value and no constraints are violated.

The textbook AIAMA defines a search algorithm for solving CSPs that uses “backtracking” (which is really just a fancy term for how common search strategies like DFS work inherently). Essentially, if the search finds itself at a dead end, it will back up and traverse a different branch. In the case of CSPs, a dead end is an assignment from which no more values may be given to variables without constraint violation. If the backtracking algorithm finds itself here, it will un-assign some number of variables and continue the search. The algorithm is laid out below:

python
def backtrack(assignment, csp):
  if assignment is complete: return assignment
  var = select-unassigned-variable(csp)
  for each value in order-domain-values(var, assignment, csp):
    if value is consistent with assignment:
      set var = value in assignment
      inferences = inference(csp, var, value)
      if inferences do not fail:
        add inferences to assignment
        result = backtrack(assignment, csp)
        if result is not failure:
          return result
    remove var = value and inferences from assignment
  return failure

Some signatures to talk about:

select-unassigned-variable(csp)

At this point, we choose a variable to make an assignment for. We’re not actually assigning a value to anything yet. You may wonder, does the order that we choose variables for assignment matter? The answer is yes. Many problems find useful to always pick the variable that has the smallest remaining domain. This technique is called Minimum Remaining Values (MRV).

order-domain-values(var, assignment, csp)

The order in which we check the values in our chosen variable’s domain can make a difference as well. A common approach is to order values by how much they would affect neighboring domains. We’d like to choose the values that cause the LEAST amount of domain-constriction first. This is called Least Constraining Value (LCV).

“is consistent with assignment”

This checks if the assigning the given value to the given variable would violate any constraints. Therefore, it only involves checking the domains of variables that are constrained to the given one.

inference(csp, var, value)

When we give a variable a value, we have to prune the domains of all variables constrained with it. Sometimes, pruning a domain causes a variable to only have one value left in its domain. We could then make an assignment there, after which we could recursively begin another inference.

Sudoku as a CSP

I represent the various tiles of the sudoku board as A1 … I9, similarly to chess. The only constraints we need in sudoku are binary differences. E.g., “G4” != “G5”. What are the differences we need to enforce in sudoku? Each variable has 20 constraints: 8 in its row, 8 in its column, and 8 in its region; of these 24 constraints, 4 appear twice. I chose to store each variable’s constraints in a dictionary called csp, mapping {variable => set(variable)}. Lookup is easy: csp["G4"] returns a set of variables constrained to “G4”.

How should we represent the variables and assignments? I tried two methods. In the first representation I attempted, the assignments of variables to values were kept in a dictionary, mapped as VARIABLE -> VALUE. In the second representation I tried, I represented the assignments as a dictionary mapping VARIABLE -> DOMAIN. I tried this out because I realized that with the first representation, the domain of each variable was being recalculated many times. The results of the switch caused a huge decrease in time taken to solve: from ~850 seconds for ~2,000 puzzles to ~35 seconds.

For representing the domains of the variables with the second approach, I decided to use strings of values (because the values are strictly 1-9). So as we doing inferencing after making an assignment, some domains will have numbers in them replaced with nothing, e.g. “124589” | replace(“4”,””). This means that “4” is no longer a part of this domain.

Implementation

Right away, we can write two methods: solved and consistent. We can say that an assignment is solved if the length of every variable’s domain is 1. An value for a variable is consistent with an assignment if no other variables are already assigned that value, i.e. if the none of the domains of the constrained variables are equal to the new value. Here are implementations:

python3
def solved(assignment):
    """ Return TRUE if 1 assignment per tile. """
    return all(len(item) == 1 for item in assignment.values())
python3
def consistent(variable, value, assignment, csp):
    """ Return TRUE if no domain of variables constrained to VARIABLE equals VALUE. """
    return all(assignment[constraint] != value for constraint in csp[variable])

Moving on to select-unassigned-variable(csp), we simply find a variable with the smallest domain length, since each domain is just a string of values. We have to be careful not to select variables with domain lengths == 1, because this means that the variable is already assigned. Here are two implementations:

python3
def unassigned(assignment, csp):
    """ Return an unassigned variable from CSP. Minimum Remaining Values. """
    return min([var for var in assignment.keys() if len(assignment[var]) > 1], key=assignment.get)

Interestingly, the above solution is much slower than the following one:

python3
def unassigned(assignment, csp):
    bestValue = 10
    bestVar = None
    for var, value in assignment.items():
        if len(value) > 1 and len(value) < bestValue:
            bestValue = len(value)
            bestVar = var
    return bestVar

That’s ok with me, because the second version is more readable, I suppose.

For order-domain-values(var, assignment, csp), I did some experimentation. I found that LCV causes a slower search than without it, for my implementation. And that’s okay: In AIAMA, it is mentioned that “least-constraining-value heuristic can be effective in some cases”. (p. 217) Apparently, not this case, with this implementation. Therefore, I simply returned the domain of the input variable with no processing with regards to order.

This leaves inference(csp, var, value), where the fun happens. If a variable/value pair makes it to this step, then inference’s job is to make the assignment, then prune the domains of all constrained variables. If any variable’s domain is pruned to length 0, then the assignment has failed. If any variable’s domain is pruned to length 1, we should treat this as an assignment and call inference for that variable and value as well. Here’s my implementation:

python3
def inference(assignment, var_to_update, value, csp):
    """ Assign VALUE to VAR_TO_UPDATE in ASSIGNMENT. Update domains of
       constrained variables from CSP. If any domains are reduced to 1, also
       inference from them. If any domains are reduced to 0, return False.
       Recursive forward checking.
    """
    assignment[var_to_update] = value
    for constraint in csp[var_to_update]:
        if value not in assignment[constraint]:
            continue
        assignment[constraint] = assignment[constraint].replace(value, "")
        remaining = assignment[constraint]
        if len(remaining) == 1:
            if not inference(assignment, constraint, remaining, csp):
                return False
        elif len(remaining) == 0:
            return False
    return True

The only thing left to do when setting up your main() is run an initial inference pass when the puzzle is first loaded.

Results

I wrote my version of the solver so that it could take puzzles from this large set I found here: 2365 hard sudoku puzzles. On average, my solver takes ~35 seconds to solve this set, with the median time to solve a puzzle being 0.005 seconds. The 35 seconds is dominated by a select few puzzles that take several seconds to solve. ◼