The Jotto word game - analysis and a Perl implementation

What is Jotto ?

Jotto is a two-player game where each player attempts to guess the other's secret five letter word. A player scores a guess based on the number of letters it has in common with the secret word. This is best shown with an example: Lets say Bob thought of a 5 letter word - stamp, and Sam should guess it. Sam's first guess is woman. There are 2 letters in common between the words (m and a), so Bob gives this guess the score 2. Sam gained some information from this, and he can now make more educated guesses. The game ends when the guess is correct, and the goal is to do it with minimal amount of guesses. Here is a transcript from my game against the computer (the script I'm demonstrating in this article). The word I thought of is woman and the computer is guessing it (the scores are entered by me, everything else is printed by the script):

If I guess correctly, please enter 999 as the score
Specify dictionary file: easy.txt
My guess is: feats
Score: 1
Legal words left: 1380
My guess is: rumps
Score: 1
Legal words left: 610
My guess is: bribe
Score: 0
Legal words left: 142
My guess is: coons
Score: 2
Legal words left: 31
My guess is: donut
Score: 2
Legal words left: 8
My guess is: mango
Score: 4
Legal words left: 1
My guess is: woman
Score: 999

Yay, I won !!

This game is very easy to grasp conceptually, but to play it well one needs to have a very good knowledge of English words and the ability to count letters (gaining valuable information from the given scores).

How did you get to Jotto, anyway ?

I was inspired by an article of Kevin Jackson-Mead in The Perl Review, Vol 0 Issue 7. Kevin explained the rules of the game and presented his Perl implementation. I decided to write an implementation of my own, as it would allow me to understand the game and do some interesting Perl hacking. I'm also interested in the search algorithms involved in such a game (simply put - how to make the computer play the best possible game).

The 5 letter dictionary

Jotto doesn't necessarily have to be played with 5 letter words, and my implementation inposes no restrictions on this. However, I needed a good dictionary, so I used Kevin's easy.txt wordlist, which consists of about 4000 5-letter words.

Lets us code a Jotto player !

If you still didn't understand what the Jotto game is about, read Kevin's article in The Perl Review, or just hang on - you'll probably get it when I explain the implementation. First of all, we need a routine to score a guess. Recall that the score is the number of letters common to the two words. I tried a few implementations of this score routine - my goal was efficiency. This routine is called a helluva lot of times during the search, hence it must be as swift as possible. After running a lot of timing tests (the Benchmark module has been very helpful), I finally adopted the following implementation:


# Compute the "score" of two words - how many 
# characters
# they have in common.
#
# Returns $words_equal if the given words are 
# equal, otherwise returns the number of common 
# characters
#
sub score
{
    my ($word1, $word2) = @_;
    my %bag;
    my $score = 0;

    return $words_equal if ($word1 eq $word2);

    foreach (split '', $word1) 
    {
        $bag{$_}++;
    }

    foreach (split '', $word2) 
    {
        if ($bag{$_}) 
        {
            $bag{$_}--;
            $score++;
        }
    }
    return $score;
}

For simplicity, $words_equal is just a constant defined globally as:


my $words_equal = 999;

This score cleverly uses a hash, and runs in linear time in the words length. First, all letters in $word1 are recorded in the hash. Then, the second word is traversed, and $score is increased only if this word's counter in the hash is positive (the counter is reduced each time a match is found). The whole procedure may seem overly complex, but it has a good reasoning behind it. Consider the (artificial) words abcde and aaxyz. They have only 1 letter in common, not 2, as a is only counted once (it only appears once in the first word). aaxyz and aaghj, on the other hand, have 2 letters in common - twice a, as it appears twice in both words. This is why the hash counter is needed. Note that the definition of count is commutative - it should give the same score for two words no matter in what order it sees them. Think about it for a minute - it makes sense (because "in common" is commutative). I will first implement a "human game" - a game in which the computer picks a word and lets the user guess it, printing the score of each attempt.


# Returns a random element from a given array
#
sub random_arr_elem
{
    my @arr = @{$_[0]};
    
    return $arr[rand() * ($#arr + 1)];
}


# Given the name of a dictionary file, picks a 
# random 
# word from it
#
sub pick_random_word_from_file
{
    my $filename = $_[0];

    open(FH, $filename) 
        or die "Can't open $filename: $!\n";
    my @words = ;
    my $the_word = random_arr_elem(\@words);
    chomp $the_word;

    return $the_word;
}

These routines should be pretty straightforward. Now, everything is set to code the human_guess_game routine:


# Play a human guess game - the human tries to 
# guess a word
#
# Asks for a dictionary file. Picks a random word 
# from this
# file, and lets the human guess
#
sub human_guess_game
{
    print "Specify dictionary file: ";
    my $dict_file = <>;
    chomp $dict_file;

    my $word = 
        pick_random_word_from_file($dict_file);

    while (1)
    {
        print "\nEnter a guess: ";
        my $guess = <>;
        chomp $guess;
    
        if (score($word, $guess) == $words_equal)
        {
            print "\nCongrats, you guessed it !\n";
            last;
        }
        else
        {
            print score($word, $guess);
        }
    }
}

First, it asks for a dictionary file. Given a file name, a word is picked randomly and the game begins. On each iteration, the user is prompted for a guess. If the guess is correct, the game ends. Otherwise, the score for this guess is printed and the user is prompted for the next guess. Simple, eh... Now, the more interesting part - implementing a computer "agent" that plays Jotto, guessing and improving its guesses based on scores given by the user. Think for a moment how you'd incorporate the information supplied by a score to make better guesses later. If the score is 0, for example, you know that neither of the letters in your guess appear in the target word. If the score is 5 you know that the letters are the same - and your guess is just a permutation of the target word. But these are the simple cases. How do you take into account information gained from different guesses / scores ? The best tactic would be, given a guess and its score, go over the dictionary and remove all words that don't fit - wouldn't yield the same score with the guess. For a human it's a long, tedious job and one has to be very experienced to do it well in his head. For the computer it's piece of cake ! Going over a long list of words, removing the ones that don't fit a given criteria - this is just what computers love doing ! This brings us to the refine_words_array routine:


# Given an array of words, a guess, and the score 
# of that guess, removes all array elements that 
# don't get the same score with the guess
#
sub refine_words_array
{
    my @arr = @{$_[0]};
    my $guess = $_[1];
    my $score = $_[2];

    my @res_arr;

    foreach (@arr)
    {
        push(@res_arr, $_) 
            if ($score == score($guess, $_));
    }

    return \@res_arr;
}

Say you have a list of words, a guess - apple and a score - 2. refine_words_array goes over the list of words, and removes all words that don't score 2 with apple. So, for example, stamp will be removed (it scores 1), but plant will stay (it scores 2). It is important to understand that after going over the list, this routine returns a list with ALL legal words and ONLY the legal words (if we define "legal" as a word scoring the given score with the guess). By ALL, I mean that there are no legal words that will be removed. By ONLY, I mean that there are no illegal words that won't be removed. In fact, from the resulting list, one and only one word is the guess (granted that the list is unique). Now we have everything we need to code the computer_guess_game routine:


# Play a computer guess game - the computer 
# tries to guess a work
#
# Asks for a dictionary file and starts guessing 
# words. The user must supply the score for 
# each guess
#
sub computer_guess_game
{
    print "If I guess correctly, please enter 
            $words_equal as the score\n";
    print "Specify dictionary file: ";
    my $dict_file = <>;
    chomp $dict_file;

    # Get a list of words from the dictionary file
    #
    open(FH, $dict_file) 
        or die "Can't open $dict_file: $!\n";
    my @words = ;
    chomp(@words);

    my $guess = random_arr_elem(\@words);

    while (1)
    {
        print "My guess is: $guess\n";
        print "Score: ";
        my $score = <>;
        chomp $score;

        if ($score == $words_equal)
        {
            print "\nYay, I won !!\n";
            last;
        }

        my $ref = refine_words_array(
                         \@words, $guess, $score);
        @words = @$ref;

        if (scalar(@words) == 0)
        {
            print "\nNo suitable word in the given 
                    dictionary !!\n";
            last;
        }

        print "Legal words left: " . 
               scalar(@words) . "\n";

        $guess = random_arr_elem(\@words);
    }
}

First, the routine asks for a dictionary file. At this point, it expects the user to think of a word (from the given file) and be ready to score guesses. The first guess is randomly picked from the full list. On each iteration, the guess is printed and the user is prompted for a score. Given the score, refine_words_array is used to remove all illegal words from the list, and a new guess is randomly picked from the refined list. That's it... Putting all of these routines together and calling either computer_guess_game or human_guess_game you can have an interactive game against the computer, either guessing the computer's word or letting it guess your word. For your convenience, you can find the full script here.

Acknowledgements

Big thanks to Kevin Jackson-Mead, for the article that made me interested in Jotto and his great dictionary file. Thanks also to the Perl gurus at perlmonks.org, for helping me write a robust and efficient score routine.