Blog

Four Letter Word Generator

2012.2.8

One of the ideas we originally had when designing the null was to add a four letter word generator mode. The IV-17 shield with its four alphanumeric isideal for this.

A four letter word (FLW) generator is, as the name suggests, a device that generates words that are four letters long. There are many ways of creating a FLW generator. The simplest is probably just to create a word list and switch randomly between the words in the list. This is sure to create occasional fun sequences of words, but most of the time it will just be, well, a random collection of unrelated words.

We decided to go with a slightly more interesting approach: An associative word dictionary. We used data files from this project: The Edinburgh Associative Thesaurus. The picture shows one of the first words it spit out during testing (a word that brings back memories of monster slaying in Dungeons and Dragons :)

The data files are stored in a relatively straight forward manner using a plain-text format. There is a list of words, and for each word, a list of associated words. This format is not very suitable for processing on a microcontroller, so first we want to parse all the data and generate a data file that is easy to read from a microcontroller.

The first thing you'll notice if you have a look at the data files, is that the database is a general word association dictionary, and most words are not four letters long. We simply throw these words away.

We wanted a data format that would satisfy two simple constraints:

It must be possible to read a word, get the number of associated words said words has, and then pick one of the associated words.
It must be possible to start at a random point in the data file and figure out where a new word starts easily.

We came up with the following scheme:

Word, 4 bytes
Number of associations, 1 byte
List of 16-bit offsets to the associated words
End of word marker, 2 bytes, 0xFFFF

This is of course not the only way to store the data, and not the most space efficient, but it has the advantage that the code that reads the database becomes very simple.

The Processing sketch for generating the database is here.

The generated database is 57 kbytes, which is much too big to fit into the 32k of the ATMega328P processor (of which we are already using 10kb for the current firmware). To store the data we will use a a 64kbyte (or 512kbit) EEPROM. Most I2C EEPROMs can be used, and are readily available in DIP-8 packages. Using I2C/TWI allows us to hook the EEPROM directly to the "expansion port" on the IV-17 shield.

(PS: TWI connectors are also available on the left header on the base board. They are marked SDA and SCL)

The code: Reading from an I2C EEPROM is quite easy. Here's how to read a single byte from the EEPROM in Arduino:

Code block with id '18' not found.

The firmware for the VFD Modular Clock uses a variant of the Wire library from Arduino, so this code can be transferred with very few changes.

The current code is at the branch fourletterword in the VFD-Modular-Clock repository.

The function get_word in flw.c is the core of the new functionality: It reads a word from the EEPROM, picks one of the associations randomly and returns the offset to this word. The next time get_word is called with the new offset, and the process will repeat.

This is just a proof of concept implementation: We will improve the code a little bit and turn it in to a proper display mode.

Here's a demo video:

Pretty cool, eh? :)

You may have noticed that we've cheated a little. The data file generated by the Processing application needs to be put on to the EEPROM. There are several ways to do this: But we'll get back to that in a follow-up blog post.

Another problem is that the current code uses the same random seed every time, so each time you apply power to the clock, it will show the same sequence of words. We'll need to figure out a way of randomizing the seed value.

WARNING: The database we use is based on linguistic research and is meant to measure how real people associate words. This means that it contains profanity. Also, the database is in British English, so US English users may see some unfamiliar words or spellings.