Introduction
Most games store their graphics compressed, to save ROM space/allow more graphics in the same space. This page documents some of the schemes used.
Deinterleaving
Many compression formats conceptually deinterleave the data before compressing it. For tiles, this effectively means the four bitplanes are compressed individually; for tiles, it means the high tile number bit and flags are compressed as one data block and the low tile numbers are compressed as another.
"Phantasy Star" RLE
This format is used to compress both tiles and tilemap data.
The data is deinterleaved, and each then classified into runs of consecutive identical bytes, and runs of consecutive non-identical bytes. Any runs of more than 127 bytes are split into multiple runs. They are then stored in the format:
%0nnnnnnn dd ; run of n consecutive identical bytes, value dd
%1nnnnnnn dd dd dd... ; run of n consecutive non-identical bytes; values follow
%00000000 ; end of data block
Used in
Sample decoder
; Decompresses tile data from hl to VRAM address de
LoadTiles4BitRLENoDI:
ld b,$04
-:push bc
push de
call _f ; called 4 times for 4 bitplanes
pop de
inc de ; next bitplane
pop bc
djnz -
ret
__:
ld a,(hl) ; read count byte <----+
inc hl ; increment pointer |
or a ; return if zero |
ret z ; |
; |
ld c,a ; get low 7 bits in b |
and $7f ; |
ld b,a ; |
ld a,c ; set z flag if high |
and $80 ; bit = 0 |
; |
-:call SetVRAMAddressToDE ; <--+ |
ld a,(hl) ; Get data byte in a | |
out ($be),a ; Write it to VRAM | |
jp z,+ ; If z flag then -+ | |
; skip inc hl | | |
inc hl ; | | |
; | | |
+:inc de ; Add 4 to de <----+ | |
inc de ; | |
inc de ; | |
inc de ; | |
djnz - ; repeat block -----+ |
; b times |
jp nz,_b ; If not z flag -------+
inc hl ; inc hl here instead |
jp _b ; repeat forever ------+
; (zero count byte quits)
"Wonder Boy" RLE
The data is deinterleaved, and then classified into runs of consecutive identical bytes, and runs of consecutive non-identical bytes. Any runs of more than 256 identical bytes are split into multiple runs. They are then stored in the format:
$00 nn dd ; run of n consecutive identical bytes, value dd
$ff dd ; run of 2 consecutive identical bytes, value dd
$00 $00 ; end of data block
<any other value> ; raw data
Raw data with value $00 or $ff must be stored as a "run".
Used in
Sample decoder
; hl = source
; de = dest
decompress:
ex de,hl
ld c,4 ; 4 bitplanes
_OuterLoop:
push hl
_Loop:
ld a,(de) ; get byte
or a ; if zero, it is either RLE or the end of the bitplane
jr z,_RLE
inc a ; if $ff, write the following byte twice
jr z,_WriteNextByteTwice
dec a ; else write the byte itself
_WriteByteIncrementBothAndLoop:
call WriteAToVRAMAtHL
inc hl
inc hl
inc hl
inc hl
_IncrementSrcAndLoop:
inc de ; move to next byte and repeat
jr _Loop
_WriteNextByteTwice:
inc de ; get next byte
ld a,(de)
call WriteAToVRAMAtHL ; write it to VRAM
inc hl ; skip bitplane
inc hl
inc hl
inc hl
jr _WriteByteIncrementBothAndLoop ; then write it again and loop
_RLE:
inc de ; get next byte = RLE count
ld a,(de)
inc de
or a ; if it is 0, it is actually the end of the bitplane
jr z,_EndOfBitplane
ld b,a ; output the next byte repeatedly
ld a,(de)
-:call WriteAToVRAMAtHL
inc hl
inc hl
inc hl
inc hl
djnz -
jr _IncrementSrcAndLoop ; move to next byte and repeat
_EndOfBitplane:
pop hl ; move original dest pointer on by 1
inc hl
dec c ; loop 4 times for 4 bitplanes
jr nz,_OuterLoop
ret ; done
"Phantasy Star Gaiden" dictionary coding
The data for a single tile is de-interleaved into the four bitplanes, each 8 bytes long. Each bitplane is then classified into many categories:
- All bits are unset
- All bits are set
- The data is compressible because
- Three or more bytes are identical (some common value)
- Three or more bytes match the corresponding bytes in an earlier bitplane
- Three or more bytes are the inverse (complement) of the corresponding bytes in an earlier bitplane
- None of the above
A bitplane may fall into more than one category; the one that stores its data in the least space is chosen. The first-level categorisation for all four bitplanes is stored in a single byte:
%aabbccdd ; encoding method for bitplanes (high order bits for first bitplane encountered)
where:
%00 = all bits are unset
%01 = all bits are set
%10 = compression
%11 = raw
- For methods %00 and %01, no more data is needed.
- For method %11, the next eight bytes are the uncompressed bitplane data.
$dd $dd $dd $dd $dd $dd $dd $dd ; 8 bytes of raw data
- For method %10, the next byte defines the category:
%000000pp ; [A] entire bitplane is duplicate of bitplane p
%000100pp ; [B] entire bitplane is inverse of bitplane p
%001000pp ; [C] some of bitplane is duplicate of bitplane p
%010000pp ; [D] some of bitplane is inverse of bitplane p
<any other value> ; [E] some of bitplane is a common value
Note that %pp must be either 0, 1 or 2, and must be less than the index of the bitplane currently being decoded. If it is 3, that counts as <any other value>.
- If [A], no more data is needed - the earlier bitplane is copied.
- If [B], no more data is needed - the earlier bitplane is copied and inverted.
- If [C], the next byte is a bitmask. The MSB is 1 if the first byte of the bitplane should be copied and 0 if it should not, down to the LSB corresponding to the 8th byte. For each 0 bit encountered, there is an additional data byte giving the wanted value.
$mm $dd $dd ... ; m = mask, d... = extra raw data
- If [D], the next byte is a bitmask, similar to [C] but with the copied value being inverted. Again, the following bytes give values for the non-copied byte.
$mm $dd $dd ... ; m = mask, d... = extra raw data
- If [E], the category byte itself is a bitmask, similar to [C] and [D]. However, instead of copying the data from an earlier bitplane, the 1s indicate that the decompressed byte value is equal to the "common" byte following the bitmask. Again, there then follow more bytes giving the data corresponding to the 0s.
$cc $dd $dd ... ; m = mask, c = common value, d... = extra raw data
Due to the way the [A]-[D] are encoded, there will never be more than two bits set in the category byte. Thus, by only allowing type [E] when there are three or more "common" bytes, using the category byte itself as the bitmask for type [E] does not cause any ambiguity.
Multiple compressed tile data blocks are concatenated and preceded by a word indicating the number of tiles.
Used in
- Phantasy Star Gaiden
- Phantasy Star Retranslation (unofficial)
Sample decoder
.define vram_ptr $DFC0 ; word: VRAM address
.define buffer $DFD0 ; 32-byte decompression buffer
; hl = dest
; ix = src
decompress:
ld (vram_ptr),hl ; cache VRAM address
ld c,(ix) ; bc = number of tiles
inc ix
ld b,(ix)
inc ix
_DecompressTile:
push bc ; save number of tiles
ld b,$04 ; count 4 bitplanes
ld de,hl ; write to de
ld c,(ix) ; c = encoding information for 4 bitplanes
inc ix
_DecompressBitplane:
rlc c ; %0x = all bits either 0 or 1
jr nc,_AllTheSame
rlc c ; %11 = raw data
jr c,_RawData
_Compressed:
ld a,(ix) ; get method byte
inc ix
ex de,hl ; get bitplane, if it's referring to one
ld d,a
and $03
add a,a ; calculate address of that bitplane
add a,a ; = buffer + bitplane * 8
add a,a
ld e,a
ld a,d ; get method byte back
ld d,$00
ld iy,buffer
add iy,de ; now iy points to the referred to bitplane
ex de,hl
; now check the method byte
cp $03 ; %000000pp
jr c,_DuplicateBitplane
cp $10
jr c,_CommonValue
cp $13 ; %000100pp
jr c,_DuplicateBitplaneInvert
cp $20
jr c,_CommonValue
cp $23 ; %001000pp
jr c,_DuplicateBitplanePartial
cp $40
jr c,_CommonValue
cp $43 ; %010000pp
jr c,_DuplicateBitplanePartialInvert
; fall through
_CommonValue:
ld h,a ; h = bitmask
ld l,(ix) ; l = common value
inc ix
jr _OutputCommonValue
_RawData:
ld h,$00 ; empty bitmask; no common value
jr _OutputCommonValue
_AllTheSame:
rlc c ; get next bit into carry
sbc a,a ; will make $00 if carry = 0, $ff if it's 1
ld l,a ; that's the common value
ld h,$ff ; full bitmask
; fall through
_OutputCommonValue:
push bc
ld b,8 ; loop counter
-: ld a,l ; get common value
rlc h ; get bit out of bitmask
jr c,+ ; if 1, use the common value
ld a,(ix) ; else get it from (ix++)
inc ix
+: ld (de),a ; write to dest
inc de
djnz - ; loop over 8 bytes
pop bc
jr _BitplaneDone
_DuplicateBitplane:
ld hl,$ff00 ; full copy bitmask, empty inversion bitmask
jr _OutputDuplicate
_DuplicateBitplaneInvert:
ld hl,$ffff ; full copy bitmask, full inversion bitmask
jr _OutputDuplicate
_DuplicateBitplanePartial:
ld h,(ix) ; get copy bitmask
ld l,$00 ; empty inversion bitmask
inc ix
jr _OutputDuplicate
_DuplicateBitplanePartialInvert:
ld h,(ix) ; get copt bitmask
ld l,$ff ; full inversion bitmask
inc ix
; fall through
_OutputDuplicate:
push bc
ld b,8 ; loop counter
-: ld a,(iy) ; read byte to copy
inc iy
xor l ; apply inversion mask
rlc h ; get bit out of bitmask
jr c,+ ; if 1, use the copied value
ld a,(ix) ; else get it from (ix++)
inc ix
+: ld (de),a ; write to dest
inc de
djnz - ; loop over 8 bytes
pop bc
; fall through
_BitplaneDone:
dec b ; decrement bitplane counter
jp nz,_DecompressBitplane ; loop if not zero
_OutputTileToVRAM:
ld hl,(vram_ptr)
call SetVRAMAddressToHL
ld de,$0008 ; we are interleaving every 8th byte
ld c,e ; counter for the interleaving run
ld hl,buffer ; point at data to write
--: ld b,4 ; there are 4 bytes to interleave
push hl
-: ld a,(hl) ; read byte
out ($be),a; write to vram
add hl,de ; skip 8 bytes
djnz -
pop hl
inc hl ; next interleaving run
dec c
jr nz,--
; Add 32 bytes to vram_ptr
ld hl,(vram_ptr)
ld bc,32
add hl,bc
ld (vram_ptr),hl
pop bc
dec bc ; next tile
ld a,b
or c
jp nz,_DecompressTile
ret ; done
"Sylvan Tale" LZ
This is used to compress the tilemap.
The data is split into raw data and LZ lookups. Raw data is by definition one byte. LZ lookups have a maximum offset of -4096 bytes and a maximum length of 18 bytes. The split data is then arranged in groups of eight, as follows:
%ffffffff ; eight flags, packed right-to-left. 1 = raw, 0 = LZ
<block data> x 8 where <block data> is either:
$dd ; raw: data
or:
$looo ; LZ: length l+3, offset -4096 + ooo
or:
$0000 ; LZ: end
Used in
Sample decompressor
;============================================================
;tilemap data decompression routine from Sylvan Tale
;
; DATA is stored into RAM, not VRAM. can be modified to do so but have to read from vram.
;
; Hl = address of compressed data
; DE = RAM address to load data into
SylvanTaleTilemapDecompression:
--:
ld c,(hl) ; load control byte
inc hl ;
ld b,$08 ; repeat 8 times
-:
rr c ; evaluate a bit from control byte
jr nc,TD_decomp ; if bit 0 load new pointer
ldi ; (de) = (hl)
inc bc ; restore bc from the ldi
jr + ;
TD_decomp: ; loads new source pointer + count byte
push bc ;
ld c,(hl) ;
inc hl ;
ld b,(hl) ;
inc hl ; bc = (hl)
ld a,c ;
or b ;
jr z,TD_exit ; finish if bc = 0;
push hl ;
ld a,b ;
or $f0 ;
ld h,a ;
ld l,c ;
add hl,de ; hl = de + (bc OR $f000) = new pointer
ld a,b ;
and $f0 ;
rrca ;
rrca ;
rrca ;
rrca ; a = high nibble of b
add a,$03 ;
ld c,a ;
ld b,$00 ; counter = high nibble of b + 3
ldir ; load out previous data
pop hl ; restore pointer
pop bc ;
+
djnz - ;
jr -- ;
TD_exit:
pop bc ;
ret ;
"High School Kimengumi" RLE
This is an RLE variant. It is used to decode both tiles and tilemap data.
The data is deinterleaved, and then classified into runs of up to 127 identical bytes and runs of up to 127 non-identical bytes. These are then stored in the format:
$nnnn ; size of data in bytes / interleaving factor (4 for tiles, 2 for tilemap)
followed by multiples of:
%0nnnnnnn dd ; run of n consecutive identical bytes, value dd
%1nnnnnnn dd dd dd... ; run of n consecutive non-identical bytes; values follow
followed by:
%00000000 ; end of data block
This has a minor advantage over "Phantasy Star" compression because it can encode runs across the end of a bitplane, and saves a byte by having 2 bytes length + 1 terminator rather than four bitplane terminators. (The terminator would also be unnecessary if the code were to count the bitplanes and stop at the end.)
Used in
Sample decoder
.enum $c141 ; memory used: 4 bytes
rowCount dw
rowCountTotal dw
.ende
.section "High School! Kimengumi tile decompressor" free
; Usage:
; bc = source data
; de = destination (VRAM address ORed with $4000)
decode:
call _decode
; clean up
xor a
ld (rowCount),a
ld (rowCount+1),a
ld (rowCountTotal),a
ld (rowCountTotal+1),a
ret
_decode:
ld a,(bc) ; hl = (bc)
ld l,a
inc bc
ld a,(bc)
ld h,a
dec bc
ld (rowCount),hl ; load row count into RAM
ld (rowCountTotal),hl
ld h,b ; hl = bc
ld l,c
inc hl ; hl += 2
inc hl
_nextBlock:
ld a,(hl) ; read control byte
or a ; 0 = terminator
ret z
bit 7,a ; if bit 7 = 1 then it's raw
jr nz,+
; RLE
ld b,a ; run length
inc hl ; run value
-: call _outputByte
djnz -
inc hl ; next control byte
jp _nextBlock
+: ; Raw
and $7f ; length
ld b,a
inc hl ; value
-: call _outputByte
inc hl
djnz -
jp _nextBlock
_outputByte:
; writes a byte from hl to VRAM address de with interleaving 4
; set VRAM address
ld a,e
out ($bf),a
ld a,d
out ($bf),a
; write value
ld a,(hl)
out ($be),a
; decrement row counter
push hl
ld hl,(rowCount)
dec hl
ld a,h ; if zero, move to next bitplane
or l
jr z,_setNextBitplane
ld (rowCount),hl
inc de ; de += 4 (interleaving)
inc de
inc de
inc de
pop hl
ret
_setNextBitplane:
ld hl,(rowCountTotal) ; restore the counter
ld (rowCount),hl
; now:
; - hl = number of tile rows
; - de = last byte written in previous bitplane
; so we want to set de = de - hl * 4 + 5
; first calculate hl = hl * 4
xor a ; clear carry
rl l ; shift hl left by 1
rl h
xor a ; repeat
rl l
rl h
; subtract hl and add 5
ex de,hl
xor a ; clear carry
sbc hl,de
inc hl
inc hl
inc hl
inc hl
inc hl
ex de,hl
pop hl
ret
.ends
Benchmark
| Compressor | Tile compression | Tilemap compression |
| pscompr | 38% | 60% |
| psgcompr | 53% | n/a |
| stcompr | n/a | 50% |
Note that this benchmark is incomplete. It needs a larger corpus of images and is missing a Wonder Boy compressor. It is based on BMP2Tile for the compression and a Windows batch file for the analysis.
Attach:benchmark.zip