Sega Master System / Mark III / Game Gear
This is for the reference of those wishing to put sample playback in their demos, and for those whose sound core doesn't do voices. Emulator authors may wish to add implementation suggestions.
Sample playback makes use of a feature of the SN76489's tone generators: when the half-wavelength (tone value) is set to 1, they output a DC offset value corresponding to the volume level (i.e. the wave does not flip-flop). By rapidly manipulating the volume, a crude form of PCM is obtained.
(Note that this may be a feature of Sega's implementation of the SN76489. It does not appear in any Sega 8-bit code designed for older systems with the standard chip so it may need to be confirmed by experiment.)
Because the volume levels are non-linear, and only have four bits of resolution, it makes the quality of reproduction rather poor. Furthermore, there is no facility to stream data to the SN76489 so it must be done by the CPU; in almost all games this means the game is frozen while samples are played. In the few games that do some work during sample playback, the sample playback quality is usually made even worse.
Sega 8-bit games play their sampled audio in a few different ways:
A stream of 4-bit linear PCM data is read from ROM (packed two samples to a byte), and emitted as SN76489 attenuation commands to one or more of the tone channels. This results in a non-linear output which can make samples sound quieter than expected.
This can be ameliorated by pre-processing the data to account for the non-linear response; however, very few games do this.
A stream of 1-bit PCM data is read from ROM (packed eight samples to a byte), and emitted as either no, or maximum, SN76489 attenuation commands to one or more of the tone channels. This results in a faithful representation of the data, but the dynamic range of 1-bit audio is extremely poor so the result is not very good. Typically the samples seem to have been heavily amplified and clipped, resulting in loud samples.
A stream of 8-bit PCM data is read from ROM (one sample per byte) and used to look up a triplet of SN76489 attenuation commands from a table in ROM. These are emitted in close succession. By careful construction of the lookup table, the commands are able to address a large number of volume levels by combining the non-linear volumes.
During the transition from one sample to another, this can produce unwanted artefacts because the intermediate total attenuation may not lie between the start and end points. For example, transitioning from attenuations 4,0,0 (total output level 79.9%) to 2,1,0 (total output level 80.1%) may temporarily be in the state 2,0,0 (total output level 87.7%). This can be avoided by minimising the transition time, but seems to still produce noise.
The results of sample playback can be improved by better preprocessing of the data. It should at least be normalised to 100% to allow the best use of the output range. The dynamic range can be compressed to make the sample sound louder, and make quiet sounds more reproducible. Quiet parts should be silenced or offset to avoid a 1-bit noise floor in a 4-bit sample.
In almost all cases, the underlying audio data will be based on a uniform sampling rate. If it is played back at a non-uniform rate, due to branches in the code (for example to retrieve the next byte for <8bit samples), it may produce unwanted effects. In practice, however, this seems not matter very much and many games have some small non-uniformity. It is usually possible to cancel out the branching effect by adding some time-wasting opcodes to the faster branches.
For all sample playback methods, there is generally an improvement in quality as the sampling rate goes up; but this has a high cost for ROM space. No game plays samples as fast as the CPU can go, they all include some sort of busy wait to limit the rate.
The lowest quality audio seen in games is around 4kHz at 1 bit, which can fit 32.8s of audio in 16KB. The highest is around 21kHz at 4 bits, which can fit 1.6s of audio in 16KB.