Fortune cookie generator

The Fortune cookie generator is in essence a Markov-chain-based text generator, but with the novel improvement of “smoothing” the model with a prior based on the 1- through (n − 1)-grams.

The standard algorithm involves creating a Markov transition matrix for all observed n-grams in a corpus. The choice of n has a large effect on the quality of the generated text. For small n, the effect is word salad, but for large n because there have been so few observations the generator tends to quote large portions of text from the corpus.

The question I wondered was whether there was a way to merge the models of various chain lengths so that, while it could quote large portions, it could also pivot on one or two words, creating more interesting generated output.

1. Creating the model

The intuition of the smoothed model comes from thinking about text generation as a prediction problem. What is the most likely next word given the words seen thus far? Given a sequence of n words never before seen in the training data, an n-gram model would have nothing to say. But surely there is a way to predict a next word, since, at the least, the last word in the sequence could predict a next word using the 1-gram model.

The first thing to notice is that it is possible to create an equivalent n-gram model from an (n − 1)-gram model by prefixing each (n − 1)-gram with every word from the dictionary. The second thing to notice is that it is possible to combine two n-gram models by computing the weighted average of the two transition matrices, and this is equivalent to creating the n-gram model of a corpus created by appending the corresponding corpuses (with multiplicity, when the weight is a rational number).

The smoothed n-gram model with parameter β then is the smoothed (n − 1)-gram model lifted to an n-gram model averaged with β of the standard n-chain model. This has the stated effect of creating an n-chain model with a prior from the smaller-chained models. This particular way of doing it is simple enough to implement, and doubtless there are other ways of choosing the prior.

2. Implementation

Luckily we do not need to create the full transition matrix for the smoothed model, since it would contain every sequence of n words in the dictionary. Instead, the generator chooses which model to follow at random (with geometrically decreasing probability), from all models which have a prediction for the next word in the sequence. The parameter which controls the expected length of the chain is called “sense.”

3. Other details

One problem many Markov text generators face is of how long to keep generating text. One choice is to choose in advance some number of words to generate. However, this does not give good results because sentences may be cut off. Instead, this generator has special “words” for the beginning and end of a document so that the generator can 1) start at the start word and 2) end when it generates the end word.

Also: why fortunes? I thought people might have a tolerance for grammatical issues in this genre, as well as a willingness to look for some deeper meaning.

4. See also