# Fortune cookie generator

The Fortune cookie generator is in
essence a Markov-chain-based text generator, but with the novel
improvement of “smoothing” the model with a prior based on the
1- through (`n` − 1)-grams.

The standard algorithm involves creating a Markov transition matrix
for all observed `n`-grams in a corpus. The choice of `n` has a
large effect on the quality of the generated text. For small `n`,
the effect is word salad, but for large `n` because there have been
so few observations the generator tends to quote large portions of
text from the corpus.

The question I wondered was whether there was a way to merge the models of various chain lengths so that, while it could quote large portions, it could also pivot on one or two words, creating more interesting generated output.

## 1. Creating the model

The intuition of the smoothed model comes from thinking about text
generation as a *prediction* problem. What is the most likely
next word given the words seen thus far? Given a sequence of `n`
words never before seen in the training data, an `n`-gram model
would have nothing to say. But surely there is a way to predict a
next word, since, at the least, the last word in the sequence could
predict a next word using the 1-gram model.

The first thing to notice is that it is possible to create an
equivalent `n`-gram model from an (`n` − 1)-gram model by prefixing
each (`n` − 1)-gram with every word from the dictionary. The second
thing to notice is that it is possible to combine two `n`-gram
models by computing the weighted average of the two transition
matrices, and this is equivalent to creating the `n`-gram model of a
corpus created by appending the corresponding corpuses (with
multiplicity, when the weight is a rational number).

The smoothed `n`-gram model with parameter `β` then is the
smoothed (`n` − 1)-gram model lifted to an `n`-gram model averaged
with `β` of the standard `n`-chain model. This has the stated
effect of creating an `n`-chain model with a prior from the
smaller-chained models. This particular way of doing it is simple
enough to implement, and doubtless there are other ways of choosing
the prior.

## 2. Implementation

Luckily we do not need to create the full transition matrix for the
smoothed model, since it would contain every sequence of `n` words
in the dictionary. Instead, the generator chooses which model to
follow at random (with geometrically decreasing probability), from
all models which have a prediction for the next word in the
sequence. The parameter which controls the expected length of the
chain is called “sense.”

## 3. Other details

One problem many Markov text generators face is of how long to keep generating text. One choice is to choose in advance some number of words to generate. However, this does not give good results because sentences may be cut off. Instead, this generator has special “words” for the beginning and end of a document so that the generator can 1) start at the start word and 2) end when it generates the end word.

Also: why fortunes? I thought people might have a tolerance for grammatical issues in this genre, as well as a willingness to look for some deeper meaning.

## 4. See also

- https://twitter.com/exquisit_ebooks is a Twitter account which has text generated using the same method but with different corpuses.
- Wisdom is a different version in which I tried to do topic modeling to create a Markov model for each topic. What you say determines a weighted average of models for the text generation. The results are questionable.