Introduction
Most games store their graphics compressed, to save ROM space/allow more graphics in the same space. This page documents some of the schemes used.
Deinterleaving
Many compression formats conceptually deinterleave the data before compressing it. For tiles, this effectively means the four bitplanes are compressed individually; for tilemaps, it means the high tile number bit and flags are compressed as one data block and the low tile numbers are compressed as another.
"Phantasy Star" RLE
This format is used to compress both tiles and tilemap data.
The data is deinterleaved, and each bitplane is then classified into runs of consecutive identical bytes, and runs of consecutive non-identical bytes. Any runs of more than 127 bytes are split into multiple runs. They are then stored in the format:
%0nnnnnnn dd ; run of n consecutive identical bytes, value dd
%1nnnnnnn dd dd dd... ; run of n consecutive non-identical bytes; values follow
%00000000 ; end of bitplane
Used in
Sample decoder
; Decompresses tile data from hl to VRAM address de
LoadTiles4BitRLENoDI:
ld b,$04
-:push bc
push de
call _f ; called 4 times for 4 bitplanes
pop de
inc de ; next bitplane
pop bc
djnz -
ret
__:
ld a,(hl) ; read count byte <----+
inc hl ; increment pointer |
or a ; return if zero |
ret z ; |
; |
ld c,a ; get low 7 bits in b |
and $7f ; |
ld b,a ; |
ld a,c ; set z flag if high |
and $80 ; bit = 0 |
; |
-:call SetVRAMAddressToDE ; <--+ |
ld a,(hl) ; Get data byte in a | |
out ($be),a ; Write it to VRAM | |
jp z,+ ; If z flag then -+ | |
; skip inc hl | | |
inc hl ; | | |
; | | |
+:inc de ; Add 4 to de <----+ | |
inc de ; | |
inc de ; | |
inc de ; | |
djnz - ; repeat block -----+ |
; b times |
jp nz,_b ; If not z flag -------+
inc hl ; inc hl here instead |
jp _b ; repeat forever ------+
; (zero count byte quits)
"Wonder Boy" RLE
The data is deinterleaved, and then classified into runs of consecutive identical bytes, and runs of consecutive non-identical bytes. Any runs of more than 255 identical bytes are split into multiple runs.
They are then stored in the format:
$00 nn dd ; run of n consecutive identical bytes, value dd
$ff dd ; run of 2 consecutive identical bytes, value dd
$00 $00 ; end of data block
<any other value> ; raw data
Raw data with value $00 or $ff must be stored as a "run".
Used in
- Wonder Boy
- Fantasy Zone: The Maze / Opa Opa
Sample decoder
; hl = source
; de = dest
decompress:
ex de,hl
ld c,4 ; 4 bitplanes
_OuterLoop:
push hl
_Loop:
ld a,(de) ; get byte
or a ; if zero, it is either RLE or the end of the bitplane
jr z,_RLE
inc a ; if $ff, write the following byte twice
jr z,_WriteNextByteTwice
dec a ; else write the byte itself
_WriteByteIncrementBothAndLoop:
call WriteAToVRAMAtHL
inc hl
inc hl
inc hl
inc hl
_IncrementSrcAndLoop:
inc de ; move to next byte and repeat
jr _Loop
_WriteNextByteTwice:
inc de ; get next byte
ld a,(de)
call WriteAToVRAMAtHL ; write it to VRAM
inc hl ; skip bitplane
inc hl
inc hl
inc hl
jr _WriteByteIncrementBothAndLoop ; then write it again and loop
_RLE:
inc de ; get next byte = RLE count
ld a,(de)
inc de
or a ; if it is 0, it is actually the end of the bitplane
jr z,_EndOfBitplane
ld b,a ; output the next byte repeatedly
ld a,(de)
-:call WriteAToVRAMAtHL
inc hl
inc hl
inc hl
inc hl
djnz -
jr _IncrementSrcAndLoop ; move to next byte and repeat
_EndOfBitplane:
pop hl ; move original dest pointer on by 1
inc hl
dec c ; loop 4 times for 4 bitplanes
jr nz,_OuterLoop
ret ; done
"Phantasy Star Gaiden" dictionary coding
The data for a single tile is de-interleaved into the four bitplanes, each 8 bytes long. Each bitplane is then classified into many categories:
- All bits are unset
- All bits are set
- The data is compressible because
- Three or more bytes are identical (some common value)
- Three or more bytes match the corresponding bytes in an earlier bitplane
- Three or more bytes are the inverse (complement) of the corresponding bytes in an earlier bitplane
- None of the above
A bitplane may fall into more than one category; the one that stores its data in the least space is chosen. The first-level categorisation for all four bitplanes is stored in a single byte:
%aabbccdd ; encoding method for bitplanes (high order bits for first bitplane encountered)
where:
%00 = all bits are unset
%01 = all bits are set
%10 = compression
%11 = raw
- For methods %00 and %01, no more data is needed.
- For method %11, the next eight bytes are the uncompressed bitplane data.
$dd $dd $dd $dd $dd $dd $dd $dd ; 8 bytes of raw data
- For method %10, the next byte defines the category:
%000000pp ; [A] entire bitplane is duplicate of bitplane p
%000100pp ; [B] entire bitplane is inverse of bitplane p
%001000pp ; [C] some of bitplane is duplicate of bitplane p
%010000pp ; [D] some of bitplane is inverse of bitplane p
<any other value> ; [E] some of bitplane is a common value
Note that %pp must be either 0, 1 or 2, and must be less than the index of the bitplane currently being decoded. If it is 3, that counts as <any other value>.
- If [A], no more data is needed - the earlier bitplane is copied.
- If [B], no more data is needed - the earlier bitplane is copied and inverted.
- If [C], the next byte is a bitmask. The MSB is 1 if the first byte of the bitplane should be copied and 0 if it should not, down to the LSB corresponding to the 8th byte. For each 0 bit encountered, there is an additional data byte giving the wanted value.
$mm $dd $dd ... ; m = mask, d... = extra raw data
- If [D], the next byte is a bitmask, similar to [C] but with the copied value being inverted. Again, the following bytes give values for the non-copied byte.
$mm $dd $dd ... ; m = mask, d... = extra raw data
- If [E], the category byte itself is a bitmask, similar to [C] and [D]. However, instead of copying the data from an earlier bitplane, the 1s indicate that the decompressed byte value is equal to the "common" byte following the bitmask. Again, there then follow more bytes giving the data corresponding to the 0s.
$cc $dd $dd ... ; m = mask, c = common value, d... = extra raw data
Due to the way the [A]-[D] are encoded, there will never be more than two bits set in the category byte. Thus, by only allowing type [E] when there are three or more "common" bytes, using the category byte itself as the bitmask for type [E] does not cause any ambiguity.
Multiple compressed tile data blocks are concatenated and preceded by a word indicating the number of tiles.
Used in
- Phantasy Star Gaiden
- Phantasy Star Retranslation (unofficial)
Sample decoder
.define vram_ptr $DFC0 ; word: VRAM address
.define buffer $DFD0 ; 32-byte decompression buffer
; hl = dest
; ix = src
decompress:
ld (vram_ptr),hl ; cache VRAM address
ld c,(ix) ; bc = number of tiles
inc ix
ld b,(ix)
inc ix
_DecompressTile:
push bc ; save number of tiles
ld b,$04 ; count 4 bitplanes
ld de,buffer ; write to de
ld c,(ix) ; c = encoding information for 4 bitplanes
inc ix
_DecompressBitplane:
rlc c ; %0x = all bits either 0 or 1
jr nc,_AllTheSame
rlc c ; %11 = raw data
jr c,_RawData
_Compressed:
ld a,(ix) ; get method byte
inc ix
ex de,hl ; get bitplane, if it's referring to one
ld d,a
and $03
add a,a ; calculate address of that bitplane
add a,a ; = buffer + bitplane * 8
add a,a
ld e,a
ld a,d ; get method byte back
ld d,$00
ld iy,buffer
add iy,de ; now iy points to the referred to bitplane
ex de,hl
; now check the method byte
cp $03 ; %000000pp
jr c,_DuplicateBitplane
cp $10
jr c,_CommonValue
cp $13 ; %000100pp
jr c,_DuplicateBitplaneInvert
cp $20
jr c,_CommonValue
cp $23 ; %001000pp
jr c,_DuplicateBitplanePartial
cp $40
jr c,_CommonValue
cp $43 ; %010000pp
jr c,_DuplicateBitplanePartialInvert
; fall through
_CommonValue:
ld h,a ; h = bitmask
ld l,(ix) ; l = common value
inc ix
jr _OutputCommonValue
_RawData:
ld h,$00 ; empty bitmask; no common value
jr _OutputCommonValue
_AllTheSame:
rlc c ; get next bit into carry
sbc a,a ; will make $00 if carry = 0, $ff if it's 1
ld l,a ; that's the common value
ld h,$ff ; full bitmask
; fall through
_OutputCommonValue:
push bc
ld b,8 ; loop counter
-: ld a,l ; get common value
rlc h ; get bit out of bitmask
jr c,+ ; if 1, use the common value
ld a,(ix) ; else get it from (ix++)
inc ix
+: ld (de),a ; write to dest
inc de
djnz - ; loop over 8 bytes
pop bc
jr _BitplaneDone
_DuplicateBitplane:
ld hl,$ff00 ; full copy bitmask, empty inversion bitmask
jr _OutputDuplicate
_DuplicateBitplaneInvert:
ld hl,$ffff ; full copy bitmask, full inversion bitmask
jr _OutputDuplicate
_DuplicateBitplanePartial:
ld h,(ix) ; get copy bitmask
ld l,$00 ; empty inversion bitmask
inc ix
jr _OutputDuplicate
_DuplicateBitplanePartialInvert:
ld h,(ix) ; get copy bitmask
ld l,$ff ; full inversion bitmask
inc ix
; fall through
_OutputDuplicate:
push bc
ld b,8 ; loop counter
-: ld a,(iy) ; read byte to copy
inc iy
xor l ; apply inversion mask
rlc h ; get bit out of bitmask
jr c,+ ; if 1, use the copied value
ld a,(ix) ; else get it from (ix++)
inc ix
+: ld (de),a ; write to dest
inc de
djnz - ; loop over 8 bytes
pop bc
; fall through
_BitplaneDone:
dec b ; decrement bitplane counter
jp nz,_DecompressBitplane ; loop if not zero
_OutputTileToVRAM:
ld hl,(vram_ptr)
call SetVRAMAddressToHL
ld de,$0008 ; we are interleaving every 8th byte
ld c,e ; counter for the interleaving run
ld hl,buffer ; point at data to write
--: ld b,4 ; there are 4 bytes to interleave
push hl
-: ld a,(hl) ; read byte
out ($be),a; write to vram
add hl,de ; skip 8 bytes
djnz -
pop hl
inc hl ; next interleaving run
dec c
jr nz,--
; Add 32 bytes to vram_ptr
ld hl,(vram_ptr)
ld bc,32
add hl,bc
ld (vram_ptr),hl
pop bc
dec bc ; next tile
ld a,b
or c
jp nz,_DecompressTile
ret ; done
"Sylvan Tale" LZ
This is used to compress the tilemap.
The data is split into raw data and LZ lookups. Raw data is by definition one byte. LZ lookups have a maximum offset of -4096 bytes and a maximum length of 18 bytes. The split data is then arranged in groups of eight, as follows:
%ffffffff ; eight flags, packed right-to-left. 1 = raw, 0 = LZ
<block data> x 8 where <block data> is either:
$dd ; raw: data
or:
$looo ; LZ: length l+3, offset -4096 + ooo
or:
$0000 ; LZ: end
Used in
Sample decompressor
;============================================================
;tilemap data decompression routine from Sylvan Tale
;
; DATA is stored into RAM, not VRAM. can be modified to do so but have to read from vram.
;
; Hl = address of compressed data
; DE = RAM address to load data into
SylvanTaleTilemapDecompression:
--:
ld c,(hl) ; load control byte
inc hl ;
ld b,$08 ; repeat 8 times
-:
rr c ; evaluate a bit from control byte
jr nc,TD_decomp ; if bit 0 load new pointer
ldi ; (de) = (hl)
inc bc ; restore bc from the ldi
jr + ;
TD_decomp: ; loads new source pointer + count byte
push bc ;
ld c,(hl) ;
inc hl ;
ld b,(hl) ;
inc hl ; bc = (hl)
ld a,c ;
or b ;
jr z,TD_exit ; finish if bc = 0;
push hl ;
ld a,b ;
or $f0 ;
ld h,a ;
ld l,c ;
add hl,de ; hl = de + (bc OR $f000) = new pointer
ld a,b ;
and $f0 ;
rrca ;
rrca ;
rrca ;
rrca ; a = high nibble of b
add a,$03 ;
ld c,a ;
ld b,$00 ; counter = high nibble of b + 3
ldir ; load out previous data
pop hl ; restore pointer
pop bc ;
+
djnz - ;
jr -- ;
TD_exit:
pop bc ;
ret ;
"High School Kimengumi" RLE
This is an RLE variant. It is used to decode both tiles and tilemap data.
The data is deinterleaved, and then classified into runs of up to 127 identical bytes and runs of up to 127 non-identical bytes. These are then stored in the format:
$nnnn ; size of data in bytes / interleaving factor (4 for tiles, 2 for tilemap)
followed by multiples of:
%0nnnnnnn dd ; run of n consecutive identical bytes, value dd
%1nnnnnnn dd dd dd... ; run of n consecutive non-identical bytes; values follow
followed by:
%00000000 ; end of data block
This has a minor advantage over "Phantasy Star" compression because it can encode runs across the end of a bitplane, and saves a byte by having 2 bytes length + 1 terminator rather than four bitplane terminators. (The terminator would also be unnecessary if the code were to count the bitplanes and stop at the end.)
Used in
Sample decoder
.enum $c141 ; memory used: 4 bytes
rowCount dw
rowCountTotal dw
.ende
.section "High School! Kimengumi tile decompressor" free
; Usage:
; bc = source data
; de = destination (VRAM address ORed with $4000)
decode:
call _decode
; clean up
xor a
ld (rowCount),a
ld (rowCount+1),a
ld (rowCountTotal),a
ld (rowCountTotal+1),a
ret
_decode:
ld a,(bc) ; hl = (bc)
ld l,a
inc bc
ld a,(bc)
ld h,a
dec bc
ld (rowCount),hl ; load row count into RAM
ld (rowCountTotal),hl
ld h,b ; hl = bc
ld l,c
inc hl ; hl += 2
inc hl
_nextBlock:
ld a,(hl) ; read control byte
or a ; 0 = terminator
ret z
bit 7,a ; if bit 7 = 1 then it's raw
jr nz,+
; RLE
ld b,a ; run length
inc hl ; run value
-: call _outputByte
djnz -
inc hl ; next control byte
jp _nextBlock
+: ; Raw
and $7f ; length
ld b,a
inc hl ; value
-: call _outputByte
inc hl
djnz -
jp _nextBlock
_outputByte:
; writes a byte from hl to VRAM address de with interleaving 4
; set VRAM address
ld a,e
out ($bf),a
ld a,d
out ($bf),a
; write value
ld a,(hl)
out ($be),a
; decrement row counter
push hl
ld hl,(rowCount)
dec hl
ld a,h ; if zero, move to next bitplane
or l
jr z,_setNextBitplane
ld (rowCount),hl
inc de ; de += 4 (interleaving)
inc de
inc de
inc de
pop hl
ret
_setNextBitplane:
ld hl,(rowCountTotal) ; restore the counter
ld (rowCount),hl
; now:
; - hl = number of tile rows
; - de = last byte written in previous bitplane
; so we want to set de = de - hl * 4 + 5
; first calculate hl = hl * 4
xor a ; clear carry
rl l ; shift hl left by 1
rl h
xor a ; repeat
rl l
rl h
; subtract hl and add 5
ex de,hl
xor a ; clear carry
sbc hl,de
inc hl
inc hl
inc hl
inc hl
inc hl
ex de,hl
pop hl
ret
.ends
"Berlin Wall" LZ
This is used to compress tiles. It consists of a packed bitstream, read left to right, which operates on a 256 byte history buffer. The history buffer is used to avoid reading data back from VRAM, but an alternative implementation could avoid this.
The buffer is initialised to all values set to $20, and the current position to offset $ef. Each time a byte is emitted, it is written to the current position, and the position is incremented, wrapping from $ff to $00.
Format:
1 dddddddd: literal byte
0 oooooooo llll: copy l+2 bytes from buffer offset o (absolute offset, not relative to current position)
Used in
A representative hex search suggests it's not used in any other Sega 8-bit games.
Sample decoder
decompress:
; A = page number
; HL = source pointer
; DE = destination VRAM address (write bit set)
; BC = byte count / 16
; example:
; ld hl, $b255 ; Sega logo, font
; ld de, $6000
; ld bc, $00a0
; ld a, $02
; call $0948
; Memory usage:
; bitstream_next_byte_pointer dw pointer to next byte for bitstream
; buffer_output_pointer dw pointer to next byte for history buffer
; LZ_counter db counter for LZ bytes to emit
; LZ_source dw pointer to next byte for LZ match
; bitstream_byte db unconsumed bits from last byte read from bitstream
; bitstream_byte_bitcount db counter for bits left in bitstream_byte, must follow it
; history_buffer dsb 256 buffer of previous 256B emitted, must be aligned to 256B boundary
; format (bitstream):
; 1 dddddddd: emit literal d
; 0 oooooooo llll: emit l+2 bytes from buffer offset o
; at each call, we retrieve the next byte, put it into the history buffer and return it
; the history buffer starts with all values = $20 and next write position = $ef
ld ($ffff), a ; Page A
ld a, e ; Set VRAM address de
out ($bf), a
ld a, d
out ($bf), a
ld a, c ; manipulate bc for nested loop where c == 0, also swap b and c
ld c, b ; this lets us loop over b, then c, below
and a
jr z, +
inc c
+:ld b, a
call _init ; Could be inlined...
-:exx
.repeat 16 ; 16 bytes emitted for each iteration
call _getByte ; get next byte
out ($be), a ; emit
.endr
exx
djnz - ; loop
dec c
jp nz, -
ret
_init: ; initialise work RAM
ld a, (hl) ; read first bitstream byte
ld (bitstream_byte), a
inc hl ; increment and save pointer
ld (bitstream_next_byte_pointer), hl
exx
ld hl, history_buffer ; set buffer initial data
ld de, history_buffer+1
ld bc, 256-1
ld (hl), $20
ldir
xor a
ld (LZ_counter), a ; LZ_counter = 0
ld hl, history_buffer + $ef ; buffer_output_pointer, LZ_source point at $d9ef (buffer start point). This is mostly to get the right MSB.
ld (buffer_output_pointer), hl
ld (LZ_source), hl
ld a, 8
ld (bitstream_byte_bitcount), a ; bitstream_byte_bitcount = 8
exx
ret
_getByte: ; get a byte to emit
ld a, (LZ_counter) ; check if we are in a match
and a
jr nz, _getNextLZByte
ld b, 1 ; get next flag bit
call _getBits
and a
jr z, _lz
_literal:
ld b, 8 ; get value
call _getBits
_saveInBufferAndReturn:
ld hl, (buffer_output_pointer) ; save in buffer
ld (hl), a
inc l
ld (buffer_output_pointer), hl
ret ; return value in a
_lz:
ld b, 8 ; read source offset in buffer
call _getBits
ld (LZ_source), a ; save to LZ_source (pointer low byte)
ld b, 4 ; get run length
call _getBits
add a, 2 ; add 2
ld (LZ_counter), a ; save to LZ_counter
; fall through
_getNextLZByte:
ld hl, (LZ_source) ; get next byte (wrapping within buffer)
ld a, (hl)
inc l
ld (LZ_source), hl
ld hl, LZ_counter ; decrement counter
dec (hl)
jp _saveInBufferAndReturn
_getBits: ; gets b bits fro teh bitstream into a
ld hl, bitstream_byte ; get current byte
ld a, (hl)
inc l ; point at bitstream_byte_bitcount
ld c, 0
ld de, (bitstream_next_byte_pointer)
-:add a, a ; shift byte into c
rl c
dec (hl) ; decrement counter and check for zero
jr z, _nextDataByte
djnz - ; else loop b times
ld (bitstream_next_byte_pointer), de ; save bitstream pointer
dec l
ld (hl), a ; save remaining bits in bitstream_byte
ld a, c ; return consumed bits in a
ret
_nextDataByte: ; Ran out of bits in bitstream_byte
ld a, (de) ; get next byte from *de++
inc de
ld (hl), 8 ; set counter for 8 more bits
djnz - ; continue looping
ld (bitstream_next_byte_pointer), de ; duplicate exit, same as above
dec l
ld (hl), a
ld a, c
ret
LZ/RLE Compression present in games developed by SIMS
Games developed by SIMS use the same decompression routine, which is a mix of RLE and LZ.
Compression Mode is defined by bits 7,6 and 5.
7 = LZ
6 = RAW COPY
5 = RLE
No bit set = RLE DEFAULT
Used in
- Disney Alladin
- Master of Darkness
- Masters of Combat
- Ninja Gaiden
Sample decoder
;-----------------------------------------------------------------
; Master of Darknesss - Decompression Routine
;-----------------------------------------------------------------
.define LZ_WINDOW $D000
.define LZ_WINDOW_CURSOR $D800
.define COMPRESSED_DATA_SIZE $D802
.define LZ_COPY_LENGTH $D804
.define LZ_COPY_TEMP_BUFFER $D80E
.define _RAM_DFFF_ $DFFF
.define _RAM_FFFF_ $FFFF
; Reset
RESET:
dec bc
ld (hl), a
ld d, h
ld e, l
inc de
ldir
ret
; Starts Decompression
; HL = Compressed Data Address
DECOMPRESSOR:
ld b, (hl)
inc hl
-:
push bc
ld e, (hl)
inc hl
ld d, (hl)
inc hl
push hl
ex de, hl
ld b, (hl)
inc hl
ld e, (hl)
inc hl
ld d, (hl)
inc hl
ld a, (hl)
inc hl
ld h, (hl)
ld l, a
xor a
call +
pop hl
pop bc
djnz -
ret
+:
ld a, (_RAM_DFFF_)
push af
ld a, b
ld (_RAM_FFFF_), a
call CONFIGURE_LZ_WINDOW
call CONFIGURE_VDP_ADDRESS
call DECOMPRESSOR_LOOP
call CONFIGURE_LZ_WINDOW
pop af
ld (_RAM_FFFF_), a
ret
CONFIGURE_LZ_WINDOW:
ex af, af'
exx
xor a
ld hl, LZ_WINDOW
ld bc, $0820
; RESET
rst $08
exx
ex af, af'
ret
CONFIGURE_VDP_ADDRESS:
ld c, Port_VDPAddress
out (c), e
out (c), d
; Read DATA SIZE, first 2 bytes from COMPRESSED DATA
ld c, (hl)
inc hl
ld b, (hl)
inc hl
add hl, bc
ld (COMPRESSED_DATA_SIZE), hl
or a
sbc hl, bc
ret
DECOMPRESSOR_LOOP:
ld bc, (COMPRESSED_DATA_SIZE)
or a
sbc hl, bc
ret nc
add hl, bc
call _CHECK_DECOMPRESSION_TYPE
jp DECOMPRESSOR_LOOP
; CHECK DECOMPRESSION TYPE
; BIT 7 = LZ
; BIT 6 = RAW
; BIT 5 = RLE COPY
; NO BIT = RLE COPY DEFAULT REPETIONS
_CHECK_DECOMPRESSION_TYPE:
ld a, (hl) ; Load byte from HL (CURSOR) in A
inc hl ; Increment HL (CURSOR)
bit 7, a ; Test 7th bit of A
jp z, _LZ_COPY ; If set do LZ COPY
bit 6, a ; Test 6th bit of A
jp z, _RAW_COPY ; If set do RAW COPY
push af
bit 5, a ; Test 5th bit of A
jr z, _RLE_COPY ; If set do RLE COPY
; GET RLE REPETITIONS
; -------------------------------
; If all bit test fails
and $07 ; Get first 3 bits of Byte
ld b, a
ld a, (hl) ; Load next byte from HL (CURSOR) in A
inc hl ; Increment HL (CURSOR)
ld c, a
jp _RLE_COPY_GET_LENGTH ; Skip default GET RLE REPETITIONS
; and GET RLE LENGTH
; Performs RLE Copy
; BC = REPETITIONS
; E = LENGTH
_RLE_COPY:
; GET RLE REPETITIONS
; -------------------------------
and $07 ; Get first 3 bits of Byte
ld c, a
ld b, $00
_RLE_COPY_GET_LENGTH:
inc bc ; Increment REPETITIONS
inc bc ; Increment REPETITIONS
pop af
; GET RLE LENGTH
; -------------------------------
rra ; Bitshift a >> 3
rra
rra
and $03 ; Get first 2 bits of byte
inc a ; Increment LENGTH
ld e, a
--: ; <<<<<<<<<<<<|
push bc ; |
push hl ; |
ld d, e ; |
; |
; |
-: ; <<<<<<<<| |
ld a, (hl) ; Load Byte from CURSOR (HL) in A | |
out (Port_VDPData), a ; Copy Byte to VRAM | |
call _STORE_IN_LZ_WINDOW ; Copy Byte do LZ_WINDOW | |
dec d ; Decrement D (LENGTH) | |
jr nz, - ; If NOT ZERO loop again >>>>>>>>| |
; |
pop hl ; |
pop bc ; |
dec bc ; Decrement BC (REPETITIONS) |
ld a, b ; |
or c ; |
jr nz, -- ; If NOT ZERO loop again >>>>>>>>>>>>|
ld d, $00
add hl, de ; Add DE (LENGTH) to HL (CURSOR POSITION)
ret
; PERFORMS LZ COPY
_LZ_COPY:
push hl
; GET DISTANCE FROM LZ PAIR
; -------------------------------
ld l, a
ld h, $00
add hl, hl ; Bitshift HL << 4
add hl, hl
add hl, hl
add hl, hl
ld b, h
ld c, l
pop hl
; GET DISTANCE IN LZ PAIR
; -------------------------------
ld a, (hl)
rra ; Bitshift A >> 4
rra
rra
rra
and $0F ; Get first 4 bits from Byte
or c ; or C
ld c, a
push hl
ld hl, (LZ_WINDOW_CURSOR) ; Store LZ_WINDOW CURSOR position in HL
or a ; or A
sbc hl, bc ; Subtract BC (LZ_WINDOW CURSOR) from HL (DISTANCE)
ld a, h
and $07 ; Get first 3 bits of A
ld b, a
ld c, l
pop hl
; GET LENGTH IN LZ PAIR
; -------------------------------
ld a, (hl) ; Load next byte from CURSOR (HL) in A
inc hl ; Increment (HL) CURSOR
and $0F ; Get first 4 bits of A (LENGTH)
add a, $02 ; Add 2 to A
ld (LZ_COPY_LENGTH), a ; Store A in LZ_COPY_LENGTH
push hl
push af
push de
ld de, LZ_COPY_TEMP_BUFFER
-: ; <<<<<<<<<<<<|
ld hl, LZ_WINDOW ; Set HL (CURSOR) as LZ_WINDOW |
add hl, bc ; Add DISTANCE in HL |
ld a, (hl) ; Load Byte from HL (CURSOR) |
ld (de), a ; Copy Byte to Temp Buffer |
out (Port_VDPData), a ; Copy Byte to VRAM |
inc de ; Increment Temp Buffer |
inc bc ; Increment DISTANCE |
ld a, b ; |
and $07 ; Get first 3 bits from Byte |
ld b, a ; |
ld hl, LZ_COPY_LENGTH ; |
dec (hl) ; Decrement LENGTH |
jr nz, - ; If NOT ZERO loop again >>>>>>>>>>>>|
pop hl
pop af
ld hl, LZ_COPY_TEMP_BUFFER ; Set HL (CURSOR) as Temp Buffer position
ld d, a
-:
; Copy Temp Buffer to LZ_WINDOW
; -------------------------------
; <<<<<<<<<<<<|
call _STORE_IN_LZ_WINDOW ; Copy Byte to LZ WINDOW |
dec d ; Decrement LENGHT |
jr nz, - ; If NOT ZERO loop again >>>>>>>>>>>>|
pop hl
ret
; PERFORMS RAW COPY
; D = LENGTH
_RAW_COPY:
; GET LENGTH
; -------------------------------
and $3F ; Get first 6 bits from Byte
inc a ; Increment LENGTH
ld d, a
-: ; <<<<<<<<|
ld a, (hl) ; Load next byte in A |
out (Port_VDPData), a ; Copy Byte to VRAM |
call _STORE_IN_LZ_WINDOW ; Copy ByteA to LZ_WINDOW |
dec d ; Decrement D |
jr nz, - ; If NOT ZERO loop again >>>>>>>>|
ret
; STORE BYTE IN LZ WINDOW
_STORE_IN_LZ_WINDOW:
ld a, (hl) ; Load byte from CURSOR (HL) in A
inc hl ; Increment CURSOR (HL)
push hl
ld hl, LZ_WINDOW ; Store LZ_WINDOW BASE address in HL
ld bc, (LZ_WINDOW_CURSOR) ; Store LZ_WINDOW CURSOR position in BC
add hl, bc ; Add BC in HL (BASE address + CURSOR)
ld (hl), a ; Store Byte in HL (LZ_WINDOW)
inc bc ; Increment BC (LZ_WINDOW CURSOR position)
res 3, b
ld (LZ_WINDOW_CURSOR), bc ; Store LZ_WINDOW CURSOR position
pop hl
ret
"Sonic 2" Decompression Routine
The data starts with an unused header (always $01 $00), then a tile count (two bytes, little endian), then an offset (again two bytes, little endian). Next is a block of raw tile data, and then a block of tile descriptors (pointed to by the offset), containing two bits per tile, right to left:
Bits | Meaning |
%00 | Tile is 32 zeroes |
%01 | Tile is 32 bytes of raw data (uncompressed) |
%10 | Tile is compressed, see below |
%11 | Tile is compressed after having been XORed |
If compressed, the next part of the raw data contains four bytes, defining the 32 bytes of the tile either as zero (%0) or a further byte from the raw data block (%1). If the XOR flag is set, then an XOR operation is performed over the tile data before the compression; this operation is intended to make it so consecutive identical bytes in a bitplane turn into zeroes, allowing this compression to succeed. Unfortunately it does that wrong and the compression rate is lower than what would have been if had it done correctly.
Used in
These games were all apparently developed by Aspect or Arc System Works.
Sample decoder
Sonic2TileLoader_Decompress:
; Set VDP address
ld c,$bf
out (c),e
out (c),d
push hl
; Skip header
inc hl
inc hl
; Read tile count
ld a, (hl)
ld (Sonic2TileLoader_TileCount), a
inc hl
ld a, (hl)
ld (Sonic2TileLoader_TileCount+1), a
inc hl
; Read bitstream pointer
ld e, (hl)
inc hl
ld d, (hl)
inc hl
; Data starts here
ld (Sonic2TileLoader_DataPointer), hl
pop hl
add hl, de
ld (Sonic2TileLoader_BitStreamPointer), hl
; Reset the consumed tile counter
xor a
ld (Sonic2TileLoader_BitstreamTileCounter), a
➿:
ld a, (Sonic2TileLoader_BitstreamTileCounter)
cp 4
jr nz, +
; Read next byte from bitstream
ld hl, (Sonic2TileLoader_BitStreamPointer)
inc hl
ld (Sonic2TileLoader_BitStreamPointer), hl
xor a
ld (Sonic2TileLoader_BitstreamTileCounter), a
+:; Get the current bitstream byte
ld b, a
ld hl, (Sonic2TileLoader_BitStreamPointer)
ld a, (hl)
; Get the right two bits in the low two according to the value of Sonic2TileLoader_BitstreamTileCounter
-:dec b
jp m, +
rrca
rrca
jp -
+:ld hl,Sonic2TileLoader_BitstreamTileCounter
inc (hl)
and %11 ; Mask to two bits
; Type 0 = all zero
jr z, _emitZeroBuffer
dec a
; Type 1 = raw
jr z,_raw
push af
; Type 2 or 3 both need this
; Output pointer
ld ix, Sonic2TileLoader_DecompressedData
; Get 32 bits of flags
ld hl, (Sonic2TileLoader_DataPointer)
ld e, (hl)
inc hl
ld d, (hl)
inc hl
ld c, (hl)
inc hl
ld b, (hl)
inc hl
ld a, 32 ; bit counter
-: push af
; Rotate a bit out
rr b
rr c
rr d
rr e
jr c, +
; 0 => emit a 0
ld (ix+0), 0
jr ++
+: ; 1 => emit a byte
ld a, (hl)
ld (ix+0), a
inc hl
++: inc ix
pop af
; Repeat 32 times
dec a
jr nz, -
ld (Sonic2TileLoader_DataPointer), hl
pop af
dec a
; Only XOR for type 3
call nz, _xor
; Fall through
ld hl, Sonic2TileLoader_DecompressedData
ld b, 32
-:ld a, (hl)
out ($be), a
inc hl
djnz -
; fall through
_tileDone:
; Count down the tile count until done
ld hl, (Sonic2TileLoader_TileCount)
dec hl
ld (Sonic2TileLoader_TileCount), hl
ld a, l
or h
ret z
jp ➿
_raw:
; Emit 32 bytes
ld hl, (Sonic2TileLoader_DataPointer)
ld b, 32
-:ld a, (hl)
out ($be), a
inc hl
djnz -
ld (Sonic2TileLoader_DataPointer),hl
jp _tileDone
_emitZeroBuffer:
; a must be 0 to get here
.repeat 32
out ($be),a
.endr
jp _tileDone
_xor:
ld ix, Sonic2TileLoader_DecompressedData
ld b, 7
-:; XOR bytes against each other across bitplanes on a few rows
; We end up using XORed bytes as values XORed later so this is not symmetric, it must be done in reverse order when compressing
ld a, (ix+0)
xor (ix+2)
ld (ix+2), a
ld a, (ix+1)
xor (ix+3)
ld (ix+3), a
ld a, (ix+16)
xor (ix+18)
ld (ix+18), a
ld a, (ix+17)
xor (ix+19)
ld (ix+19), a
inc ix
inc ix
djnz -
ret
Benchmark
Benchmarks of many compression algorithms, and compressor and decompressor implementations, can be found at https://github.com/maxim-zhao/bmp2tilecompressors.