Solution methods in Sudoku

The strategy for solving a puzzle may be regarded as comprising a combination of three processes: scanning, marking up, and analyzing.
The 3×3 region in the top-right corner must contain a 5. By hatching across and up from 5s located elsewhere in the grid, the solver can eliminate all the empty cells in the top-right corner which cannot contain a 5. This leaves only one possible cell for a 5 (highlighted in green).
Scanning
Scanning is performed at the outset and throughout the solution. Scans need be performed only once in between analyses. Scanning consists of two techniques:
- Cross-hatching: the scanning of rows to identify which line in a region may contain a certain numeral by a process of elimination. The process is repeated with the columns. For fastest results, the numerals are scanned in order of their frequency. It is important to perform this process systematically, checking all of the digits 1–9.
- Counting 1–9 in regions, rows, and columns to identify missing numerals. Counting based upon the last numeral discovered may speed up the search. It also can be the case, particularly in tougher puzzles, that the best way to ascertain the value of a cell is to count in reverse—that is, by scanning the cell's region, row, and column for values it cannot be, in order to see what remains.
Advanced solvers look for "contingencies" while scanning, narrowing a numeral's location within a row, column, or region to two or three cells. When those cells lie within the same row and region, they can be used for elimination during cross-hatching and counting. Puzzles solved by scanning alone without requiring the detection of contingencies are classified as "easy;" more difficult puzzles cannot be solved by basic scanning alone.
A method for marking likely numerals in a single cell by the placing of pencil dots. To reduce the number of dots used in each cell, the marking would only be done after as many numbers as possible have been added to the puzzle by scanning. Dots are erased as their corresponding numerals are eliminated as candidates.
Marking up
Scanning stops when no further numerals can be discovered, making it necessary to engage in logical analysis. One method to guide the analysis is to mark candidate numerals in the blank cells. There are two popular notations: subscripts and dots.
- In the subscript notation the candidate numerals are written in subscript in the cells. However, original puzzles printed in a newspaper usually are too small to accommodate more than a few digits of normal handwriting. Thus, solvers often create a larger copy of the puzzle.
- The second notation uses a pattern of dots in each square, where the dot position indicates a number from 1 to 9. The dot notation can be used on the original puzzle. Dexterity is required in placing the dots, since misplaced dots or inadvertent marks inevitably lead to confusion and may not be easily erased.
An alternative technique is to "mark up" the numerals that a cell cannot be. A cell will start empty and as more constraints become known, it will slowly fill until only one mark is missing. Assuming no mistakes are made and the marks can be overwritten with the value of a cell, there is no longer a need for any erasures.
Analysis
The two main approaches to analysis are "candidate elimination" and "what-if".
Candidate elimination
In "candidate elimination", progress is made by successively eliminating candidate numerals from cells to leave one choice. After each answer has been achieved, another scan may be performed—usually checking to see the effect of the contingencies. One method works by identifying "matched cells". If precisely two cells within a scope (a particular row, column, or region) contain the same two candidate numerals (p,q), or if precisely three cells within a scope contain the same three candidate numerals (p,q,r), these cells are said to be matched. The placement of these numerals anywhere else within that same scope would make a solution impossible; thus, the candidate numerals (p,q,r) scope can be deleted. When all else fails, ask the question, 'Would entering the eliminated numeral prevent completion of the other necessary placements?' If the answer to the question is 'Yes,' then the candidate numeral in question can be eliminated.
The "What-If" Approach
In the "what-if" approach (also called "guess-and-check", "bifurcation", "backtracking" and "Ariadne's thread"), a cell with two candidate numerals is selected, and a guess is made. The steps are repeated unless a duplication is found or a cell is left without a possible candidate, in which case the alternative candidate must be the solution. For each cell's candidate, the question is posed: 'will entering a particular numeral prevent completion of the other placements of that numeral?' If the answer is 'yes', then that candidate can be eliminated. The what-if approach requires a pencil and eraser or a good layout memory.
Computer solutions
A computer program is capable of exhaustively searching a Sudoku puzzle for solutions, thereby determining whether it is valid or not, with great ease relative to a human attempting the same. There are two general approaches taken in the creation of serious Sudoku-solving programs: Human solving method and rapid-style method.
Human-style solvers will typically operate by maintaining a mark-up matrix, and search for contingencies, matched cells, and other elements that a human solver can utilize in order to determine and exclude cell values.
Many rapid-style solvers still employ backtracking searches, but with various shortcuts and optimizations to reduce the width of the search tree. Another alternative uses finite domain constraint programming. A constraint program specifies the constraints of the puzzle (the fact that every number in each row, each column, and each 3×3 region must be unique, and the provided "givens"); a finite-domain solver applies the constraints successively to narrow down the solution space until a solution is found. Backtracking may be applied when alternate values cannot be excluded.
Rapid solvers are preferred for trial-and-error puzzle-creation algorithms, which allow for testing large numbers of partial problems for validity in a short time; human-style solvers can be employed by hand-crafting puzzlesmiths for their ability to rate the challenge of a created puzzle and show the actual solving process their target audience can be expected to follow.
Mathematics of Sudoku
A completed Sudoku grid is a special type of Latin square with the additional property of no repeated values in any 3×3 block. The number of classic 9×9 Sudoku solution grids was shown in 2005 by Bertram Felgenhauer and Frazer Jarvis to be 6,670,903,752,021,072,936,960 (sequence A107739 in OEIS) : this is roughly 0.00012% the number of 9×9 Latin squares. Various other grid sizes have also been enumerated. The number of essentially different solutions, when symmetries such as rotation, reflection and relabelling are taken into account, was shown by Ed Russell and Frazer Jarvis to be just 5,472,730,538 (sequence A109741 in OEIS). Both results have been confirmed by independent authors.
The maximum number of givens provided while still not rendering the solution unique is four short of a full grid; if two instances of two numbers each are missing and the cells they are to occupy form the corners of an orthogonal rectangle, and exactly two of these cells are within one region, there are two ways the numbers can be assigned. Since this applies to Latin squares in general, most variants of Sudoku have the same maximum. The inverse problem—the fewest givens that render a solution unique—is unsolved, although the lowest number yet found for the standard variation without a symmetry constraint is 17, a number of which have been found by Japanese puzzle enthusiasts, and 18 with the givens in rotationally symmetric cells.
Difficulty Ratings
The difficulty of a puzzle is based on the relevance and the positioning of the given numbers rather than their quantity. Surprisingly, the number of givens does not always reflect a puzzle's difficulty. Computer solvers can estimate the difficulty for a human to find the solution, based on the complexity of the solving techniques required. Some online versions offer several difficulty levels.
Most publications sort their Sudoku puzzles into four or five rating levels, although the actual cut-off points and the names of the levels themselves can vary widely. Typically, however, the titles are synonyms of "easy", "intermediate", "hard", and "challenging". Another approach is to rely on the experience of a group of human test solvers. Puzzles can be published with a median solving time rather than an algorithmically defined difficulty level. |