The game stores its questions using simplistic dictionary coding. The master data structures are stored at:
Language | Offset |
---|---|
English | $6af3 |
French | $6b08 |
German | $6b1d |
Spanish | $6b32 |
Each structure holds pointers to the per-language dictionary and questions:
The dictionary consists of up to 4096 words, stored as lowercase ASCII (within the game's font character set - it covers ASCII $20 (' ') to $7A ('z')) with the last character's high bit set. These seem to consist of mostly actual words - covering every word used at least twice in the respective question set. It's possible better compression could be achieved by including substrings without as much clamping to word boundaries.
The most frequently used words are at the start, because they can be indexed more efficiently and because the game does an O(n) lookup for every word.
The question data consists of repeated structures in the form
While reading this data, the game may cross page boundaries; it does a check after every read to ensure it swaps the data page and moves the pointer back to $8000 when it reaches $c000.
If the type is music or picture, then that is handled specially - see below. The "key" holds an index or key to the data, in the form of a decimal number, stored as ASCII minus 11, so the index "123" is stored as 0x26 0x27 0x28. This is converted to a 16-bit number and passed to the relevant function to look up its data.
In all cases, the question text is decoded to RAM:
Then the question is post-processed by capitalising the first letter, splitting it on the position of the '?' character (so we have a separate question and answer) and appending a '.'.
Finally the question is rendered to the screen with word wrapping and centre alignment within the text window used in the game.
Because the zero index is used to represent the end of the test, no word indexes can be multiple of 256 because this gives a zero byte in the stream. These dictionary indexes are therefore unused.
English has a dictionary of 2717 entries, French has 3167, German 2062, Spanish 2243.
The game tries to select questions at random, but it also tries to avoid repeating them. To do this, it first tries to generate some random seeds using the R register:
This is probably not a very good source of entropy, no attempt is made to map the predictable sequence of R as you might expect in a PRNG. Four bytes of RAM are filled with the result of this, acting as a seed for this function:
Any time a new question is needed, this second function is run, the result masked down to the range 0..512, then discarded if it is greater than 299 (there are 300 questions per subject per language). Once a satisfactory "random" index is found, the game checks against an area of RAM which stores a bitmask of used questions (300 questions for six subjects requires 228 bytes, from offset $c8d4). If the question was used, it tries again up to 8 times, before switching to a linear search of the "used question" area space to find the next unused index. If all the questions are used, it resets the state and starts selecting them at random again.
If you want to manipulate the question index, you can break at offset $41ec and alter the value in register bc to a number between 0 and 299 (decimal).
Many questions seem to produce invalid results - but none in English. This may suggest the testing was less good for the other languages. For example, French Sport & Leisure question 137 uses a word index beyond the valid dictionary, which results in this:
Some Spanish questions are valid encoding, but nonsense questions, for example:
And, of course, there are some questions with answers which are just plain wrong - but that's a feature of the board game too.
Picture data is stored starting at offset $74000. Each picture has a header in the form
This is followed by the picture data.
Subsequent pictures are stored immediately afterwards. When the game wants to find a picture for a given key, it linearly walks through the data until it finds a header with the same key; mismatching pictures can be skipped because the data length is known.
The game draws pictures at a resolution of 128x64, drawing into a RAM buffer at $ca00. Because pictures are limited to four colours, it can store four pixels per byte, so the RAM buffer is 2KB in size.
The picture data consists of:
%--bbpppp
The 16 four-colour palettes are:
The palette selector is followed by a stream of picture commands, each of which starts with a "command header":
%tttssfcc
The dither flag is used to make drawing alternate between the primary and secondary colour, giving a 50% dither. The game seems to only ever use this for flood filling - but the other commands seem to support it, some experimentation may confirm this.
A type of zero simply ends the drawing process.
This is the most complex, and powerful, command. It is followed by a stream of bytes defining a path, with straight and curved segments.
%1xxxxxx- %yyyyyyyy
This draws a straight line from the previous location to x*2, y. (The low bit of the first byte is masked out for some reason.) If there is no previous point, then a single pixel is plotted at x*2, y.
The last pixel plotted is then the "previous location" for the next command.
%0xxxxxx- %ayyyyyyy %dddddddd
This draws a circle arc from the previous location, around a centre point at x*2, y*2. It draws anticlockwise if a=1, clockwise if a=0. The arc length is d*2 gradians, i.e. a value of 200 indicates a full circle.
Circle arcs leave the "previous position" set to the last point plotted; it is not explicit from the data.
%11111111
A value of $ff indicates the end of the path.
%fxxxxxx- %yyyyyyyy %rrrrrrrr
This draws a circle centred at x*2, y with radius r. The circle is filled if f=1.
%xxxxxxxx %yyyyyyyy
This performs a flood fill starting at location x, y. The flood fill algorithm seems to be imperfect as it seems not to handle "spilling" around corners very well - the game tends to apply multiple fills to fill in any gaps it finds.
As noted above, the game sometimes uses dithering with flood fills.
%yyyyxxxx %wwwwhhhh
This draws a filled rectangle, aligned to the 8x8 tile grid, with the top left at x*8, y*8 and width w*8+8, h*8.
This is presumably an optimised version of type 5, because it can operate on larger chunks of data (applying to only whole bytes of RAM) and can store the command in two fewer bytes.
%fxxxxxx- %yyyyyyyy %aaaaaaa- %bbbbbbbb
This draws a rectangle with the top left at x*2, y and the bottom right at a*2, b. The drawn lines are inclusive of the bottom right corner. The rectangle is filled if f=1.
%xxxxxxxx %yyyyyyyy
This command is never used by the game. Some experimentation may determine what, if anything, it does.
Due to the key lookup system, the game data can contain more than one picture for the same key, where only the first will be used. There can also be pictures with keys that are not referenced by any question. Many of these are duplicates of other, referenced pictures but there are nine unused ones.
The game has two music stores.
The one at offset $7c000 is used for English, French and German questions. It uses the same type of headers as the pictures:
This is followed by the music data. The game will walk through the data looking for a given key, using the length byte to skip to the next header.
The music store at $5e492 is used for Spanish questions. It does not use headers; instead, the code makes use of the fact that the music data has exactly two $ff terminators inside. When it needs to find the music data for key (index) n, it walks through the data until it has passed n*2 $ff bytes.
A music stream starts with a header byte:
%ooOOdddd
The game uses an array of PSG frequency commands at offset $4558. These cover six octaves of 12TET notes, from value $3ff to $011. The octave selectors index into these by a mutiple of 12.
This is followed by a stream of commands for channel 1, and then a stream of commands for channel 2.
Each command is in the form:
%dddiiiii
Index | Note length |
---|---|
0 | 1 |
1 | 2 |
2 | 3 |
3 | 4 |
4 | 6 |
5 | 8 |
6 | 12 |
7 | 16 |
The game will start a new note at the given tone index, and play it for the given duration. Every note uses the same attenuation envelope, stored at $461a:
This attenuation envelope applied to every note, regardless of the tick length or note duration.
If the index is greater than 24, it is a special command.
This sets a flag which makes the playback not reset the volume envelope when the next note is played. The game uses this to play extra-long notes, by "joining" notes of the same pitch together. In theory it could be used with note changes too.
The duration bits are not used. The game will immediately move onto the next note/command when this is encountered.
The base octave value is replaced with the high three bits of the command. This allows the game to address more octaves, as it now uses a three-bit index.
The game will immediately move onto the next note/command when this is encountered.
This makes the game wait for the given number of frames without updating the note. It seems like this ought to be used for "rests", i.e. silence, but the implementation seems to not silence the output - and the data only ever uses it at the start of a track. Some experimentation may confirm this.
The index bits are not used. This makes the channel's output stop (silenced).
There are 40 instances of duplicated music tracks in the data, as well as 92 unique, unreferenced tracks.
In order to verify and experiment with much of this, I wrote a JavaScript "page" to parse the data from the ROM. You can load the ROM and it will render the questions, word lists, pictures (drawn to a canvas, but fills are impossible to replicate) and music (rendered to in-memory VGM data for download).