Introduction

Most games store their graphics compressed, to save ROM space/allow more graphics in the same space. This page documents some of the schemes used.

Deinterleaving

Many compression formats conceptually deinterleave the data before compressing it. For tiles, this effectively means the four bitplanes are compressed individually; for tilemaps, it means the high tile number bit and flags are compressed as one data block and the low tile numbers are compressed as another.

"Phantasy Star" RLE

This format is used to compress both tiles and tilemap data.

The data is deinterleaved, and each bitplane is then classified into runs of consecutive identical bytes, and runs of consecutive non-identical bytes. Any runs of more than 127 bytes are split into multiple runs. They are then stored in the format:

 %0nnnnnnn dd          ; run of n consecutive identical bytes, value dd
 %1nnnnnnn dd dd dd... ; run of n consecutive non-identical bytes; values follow
 %00000000             ; end of bitplane

Used in

Sample decoder

; Decompresses tile data from hl to VRAM address de
LoadTiles4BitRLENoDI:
  ld b,$04
-:push bc
    push de
      call _f        ; called 4 times for 4 bitplanes
    pop de
    inc de           ; next bitplane
  pop bc
  djnz -
  ret

__:
  ld a,(hl)          ; read count byte <----+
  inc hl             ; increment pointer    |
  or a               ; return if zero       |
  ret z              ;                      |
                     ;                      |
  ld c,a             ; get low 7 bits in b  |
  and $7f            ;                      |
  ld b,a             ;                      |
  ld a,c             ; set z flag if high   |
  and $80            ; bit = 0              |
                     ;                      |
-:call SetVRAMAddressToDE ;            <--+ |
  ld a,(hl)          ; Get data byte in a | |
  out ($be),a        ; Write it to VRAM   | |
  jp z,+             ; If z flag then  -+ | |
                     ; skip inc hl      | | |
  inc hl             ;                  | | |
                     ;                  | | |
+:inc de             ; Add 4 to de <----+ | |
  inc de             ;                    | |
  inc de             ;                    | |
  inc de             ;                    | |
  djnz -             ; repeat block  -----+ |
                     ; b times              |
  jp nz,_b           ; If not z flag -------+
  inc hl             ; inc hl here instead  |
  jp _b              ; repeat forever ------+
                     ; (zero count byte quits)

"Wonder Boy" RLE

The data is deinterleaved, and then classified into runs of consecutive identical bytes, and runs of consecutive non-identical bytes. Any runs of more than 256 identical bytes are split into multiple runs. They are then stored in the format:

 $00 nn dd            ; run of n consecutive identical bytes, value dd
 $ff dd               ; run of 2 consecutive identical bytes, value dd
 $00 $00              ; end of data block
 <any other value>    ; raw data

Raw data with value $00 or $ff must be stored as a "run".

Used in

Sample decoder

; hl = source
; de = dest
decompress:
  ex de,hl
  ld c,4                     ; 4 bitplanes
_OuterLoop:
  push hl
_Loop:
    ld a,(de)                ; get byte
    or a                     ; if zero, it is either RLE or the end of the bitplane
    jr z,_RLE
    inc a                    ; if $ff, write the following byte twice
    jr z,_WriteNextByteTwice
    dec a                    ; else write the byte itself

_WriteByteIncrementBothAndLoop:
    call WriteAToVRAMAtHL
    inc hl
    inc hl
    inc hl
    inc hl
_IncrementSrcAndLoop:
    inc de                   ; move to next byte and repeat
    jr _Loop

_WriteNextByteTwice:
  inc de                     ; get next byte
  ld a,(de)
  call WriteAToVRAMAtHL      ; write it to VRAM
  inc hl                     ; skip bitplane
  inc hl
  inc hl
  inc hl
  jr _WriteByteIncrementBothAndLoop ; then write it again and loop

_RLE:
  inc de                     ; get next byte = RLE count
  ld a,(de)
  inc de
  or a                       ; if it is 0, it is actually the end of the bitplane
  jr z,_EndOfBitplane
  ld b,a                     ; output the next byte repeatedly
  ld a,(de)
-:call WriteAToVRAMAtHL
  inc hl
  inc hl
  inc hl
  inc hl
  djnz -
  jr _IncrementSrcAndLoop    ; move to next byte and repeat

_EndOfBitplane:
  pop hl                     ; move original dest pointer on by 1
  inc hl
  dec c                      ; loop 4 times for 4 bitplanes
  jr nz,_OuterLoop
  ret                        ; done

"Phantasy Star Gaiden" dictionary coding

The data for a single tile is de-interleaved into the four bitplanes, each 8 bytes long. Each bitplane is then classified into many categories:

  1. All bits are unset
  2. All bits are set
  3. The data is compressible because
    1. Three or more bytes are identical (some common value)
    2. Three or more bytes match the corresponding bytes in an earlier bitplane
    3. Three or more bytes are the inverse (complement) of the corresponding bytes in an earlier bitplane
  4. None of the above

A bitplane may fall into more than one category; the one that stores its data in the least space is chosen. The first-level categorisation for all four bitplanes is stored in a single byte:

 %aabbccdd ; encoding method for bitplanes (high order bits for first bitplane encountered)

where:

 %00 = all bits are unset
 %01 = all bits are set
 %10 = compression
 %11 = raw
 $dd $dd $dd $dd $dd $dd $dd $dd ; 8 bytes of raw data
 %000000pp         ; [A] entire bitplane is duplicate of bitplane p
 %000100pp         ; [B] entire bitplane is inverse of bitplane p
 %001000pp         ; [C] some of bitplane is duplicate of bitplane p
 %010000pp         ; [D] some of bitplane is inverse of bitplane p
 <any other value> ; [E] some of bitplane is a common value

Note that %pp must be either 0, 1 or 2, and must be less than the index of the bitplane currently being decoded. If it is 3, that counts as <any other value>.

  • If [A], no more data is needed - the earlier bitplane is copied.
  • If [B], no more data is needed - the earlier bitplane is copied and inverted.
  • If [C], the next byte is a bitmask. The MSB is 1 if the first byte of the bitplane should be copied and 0 if it should not, down to the LSB corresponding to the 8th byte. For each 0 bit encountered, there is an additional data byte giving the wanted value.
 $mm $dd $dd ... ; m = mask, d... = extra raw data
  • If [D], the next byte is a bitmask, similar to [C] but with the copied value being inverted. Again, the following bytes give values for the non-copied byte.
 $mm $dd $dd ... ; m = mask, d... = extra raw data
  • If [E], the category byte itself is a bitmask, similar to [C] and [D]. However, instead of copying the data from an earlier bitplane, the 1s indicate that the decompressed byte value is equal to the "common" byte following the bitmask. Again, there then follow more bytes giving the data corresponding to the 0s.
 $cc $dd $dd ... ; m = mask, c = common value, d... = extra raw data

Due to the way the [A]-[D] are encoded, there will never be more than two bits set in the category byte. Thus, by only allowing type [E] when there are three or more "common" bytes, using the category byte itself as the bitmask for type [E] does not cause any ambiguity.

Multiple compressed tile data blocks are concatenated and preceded by a word indicating the number of tiles.

Used in

Sample decoder

.define vram_ptr $DFC0          ; word: VRAM address
.define buffer $DFD0            ; 32-byte decompression buffer

; hl = dest
; ix = src
decompress:
  ld (vram_ptr),hl  ; cache VRAM address
  ld c,(ix)      ; bc = number of tiles
  inc ix
  ld b,(ix)
  inc ix

_DecompressTile:
  push bc        ; save number of tiles
    ld b,$04     ; count 4 bitplanes
    ld de,buffer ; write to de
    ld c,(ix)    ; c = encoding information for 4 bitplanes
    inc ix

_DecompressBitplane:
    rlc c        ; %0x = all bits either 0 or 1
    jr nc,_AllTheSame
    rlc c        ; %11 = raw data
    jr c,_RawData

_Compressed:
    ld a,(ix)    ; get method byte
    inc ix

    ex de,hl     ; get bitplane, if it's referring to one
    ld d,a
    and $03
    add a,a      ; calculate address of that bitplane
    add a,a      ; = buffer + bitplane * 8
    add a,a
    ld e,a
    ld a,d       ; get method byte back
    ld d,$00
    ld iy,buffer
    add iy,de    ; now iy points to the referred to bitplane
    ex de,hl

    ; now check the method byte
    cp $03       ; %000000pp
    jr c,_DuplicateBitplane
    cp $10
    jr c,_CommonValue
    cp $13       ; %000100pp
    jr c,_DuplicateBitplaneInvert
    cp $20
    jr c,_CommonValue
    cp $23       ; %001000pp
    jr c,_DuplicateBitplanePartial
    cp $40
    jr c,_CommonValue
    cp $43       ; %010000pp
    jr c,_DuplicateBitplanePartialInvert
    ; fall through

_CommonValue:
    ld h,a       ; h = bitmask
    ld l,(ix)    ; l = common value
    inc ix
    jr _OutputCommonValue

_RawData:
    ld h,$00     ; empty bitmask; no common value
    jr _OutputCommonValue

_AllTheSame:
    rlc c        ; get next bit into carry
    sbc a,a      ; will make $00 if carry = 0, $ff if it's 1
    ld l,a       ; that's the common value
    ld h,$ff     ; full bitmask
    ; fall through

_OutputCommonValue:
    push bc
      ld b,8     ; loop counter
-:    ld a,l     ; get common value
      rlc h      ; get bit out of bitmask
      jr c,+     ; if 1, use the common value
      ld a,(ix)  ; else get it from (ix++)
      inc ix
+:    ld (de),a  ; write to dest
      inc de
      djnz -     ; loop over 8 bytes
    pop bc
  jr _BitplaneDone

_DuplicateBitplane:
    ld hl,$ff00  ; full copy bitmask, empty inversion bitmask
    jr _OutputDuplicate

_DuplicateBitplaneInvert:
    ld hl,$ffff  ; full copy bitmask, full inversion bitmask
    jr _OutputDuplicate

_DuplicateBitplanePartial:
    ld h,(ix)    ; get copy bitmask
    ld l,$00     ; empty inversion bitmask
    inc ix
    jr _OutputDuplicate

_DuplicateBitplanePartialInvert:
    ld h,(ix)    ; get copy bitmask
    ld l,$ff     ; full inversion bitmask
    inc ix
    ; fall through

_OutputDuplicate:
    push bc
      ld b,8     ; loop counter
-:    ld a,(iy)  ; read byte to copy
      inc iy
      xor l      ; apply inversion mask
      rlc h      ; get bit out of bitmask
      jr c,+     ; if 1, use the copied value
      ld a,(ix)  ; else get it from (ix++)
      inc ix
+:    ld (de),a  ; write to dest
      inc de
      djnz -     ; loop over 8 bytes
    pop bc
    ; fall through

_BitplaneDone:
    dec b        ; decrement bitplane counter
    jp nz,_DecompressBitplane ; loop if not zero

_OutputTileToVRAM:
    ld hl,(vram_ptr)
    call SetVRAMAddressToHL

    ld de,$0008  ; we are interleaving every 8th byte
    ld c,e       ; counter for the interleaving run
    ld hl,buffer ; point at data to write

--: ld b,4       ; there are 4 bytes to interleave
    push hl
-:    ld a,(hl)  ; read byte
      out ($be),a; write to vram
      add hl,de  ; skip 8 bytes
      djnz -
    pop hl
    inc hl       ; next interleaving run
    dec c
    jr nz,--

    ; Add 32 bytes to vram_ptr
    ld hl,(vram_ptr)
    ld bc,32
    add hl,bc
    ld (vram_ptr),hl

  pop bc
  dec bc         ; next tile
  ld a,b
  or c
  jp nz,_DecompressTile
  ret            ; done

"Sylvan Tale" LZ

This is used to compress the tilemap.

The data is split into raw data and LZ lookups. Raw data is by definition one byte. LZ lookups have a maximum offset of -4096 bytes and a maximum length of 18 bytes. The split data is then arranged in groups of eight, as follows:

 %ffffffff ; eight flags, packed right-to-left. 1 = raw, 0 = LZ
 <block data> x 8 where <block data> is either:
   $dd     ; raw: data
 or:
   $looo   ; LZ: length l+3, offset -4096 + ooo
 or:
   $0000   ; LZ: end

Used in

Sample decompressor

;============================================================
;tilemap data decompression routine  from Sylvan Tale
;
; DATA is stored into RAM, not VRAM. can be modified to do so but have to read from vram.
;
;         Hl = address of compressed data
;         DE = RAM address to load data into

SylvanTaleTilemapDecompression:
--:
    ld     c,(hl)          ;      load control byte
    inc    hl              ;
    ld     b,$08           ;           repeat 8 times
-:
    rr     c               ;       evaluate a bit from control byte
    jr     nc,TD_decomp    ;     if bit 0 load new pointer

    ldi                    ;     (de) = (hl)
    inc    bc              ;    restore bc from the ldi
    jr     +               ;


TD_decomp:                 ;   loads  new source pointer + count byte
    push   bc              ;
    ld     c,(hl)          ;
    inc    hl              ;
    ld     b,(hl)          ;
    inc    hl              ;       bc = (hl)

    ld     a,c             ;
    or     b               ;
    jr     z,TD_exit       ;    finish if bc = 0;


    push   hl              ;
    ld     a,b             ;
    or     $f0             ;
    ld     h,a             ;
    ld     l,c             ;
    add    hl,de           ;       hl = de + (bc OR $f000) = new pointer

    ld     a,b             ;
    and    $f0             ;
    rrca                   ;
    rrca                   ;
    rrca                   ;
    rrca                   ;     a = high nibble of b
    add    a,$03           ;
    ld     c,a             ;
    ld     b,$00           ;    counter = high nibble of b + 3

    ldir                   ;       load out previous data
    pop    hl              ;    restore pointer
    pop    bc              ;

+
    djnz   -               ;
    jr     --              ;

TD_exit:
    pop    bc              ;
    ret                    ;

"High School Kimengumi" RLE

This is an RLE variant. It is used to decode both tiles and tilemap data.

The data is deinterleaved, and then classified into runs of up to 127 identical bytes and runs of up to 127 non-identical bytes. These are then stored in the format:

$nnnn                 ; size of data in bytes / interleaving factor (4 for tiles, 2 for tilemap)
followed by multiples of:
%0nnnnnnn dd          ; run of n consecutive identical bytes, value dd
%1nnnnnnn dd dd dd... ; run of n consecutive non-identical bytes; values follow
followed by:
%00000000             ; end of data block

This has a minor advantage over "Phantasy Star" compression because it can encode runs across the end of a bitplane, and saves a byte by having 2 bytes length + 1 terminator rather than four bitplane terminators. (The terminator would also be unnecessary if the code were to count the bitplanes and stop at the end.)

Used in

Sample decoder

.enum $c141 ; memory used: 4 bytes
    rowCount dw
    rowCountTotal dw
.ende

.section "High School! Kimengumi tile decompressor" free
; Usage:
; bc = source data
; de = destination (VRAM address ORed with $4000)
decode:
    call _decode

    ; clean up
    xor a
    ld (rowCount),a
    ld (rowCount+1),a
    ld (rowCountTotal),a
    ld (rowCountTotal+1),a
    ret

_decode:
    ld a,(bc)       ; hl = (bc)
    ld l,a
    inc bc
    ld a,(bc)
    ld h,a
    dec bc

    ld (rowCount),hl ; load row count into RAM
    ld (rowCountTotal),hl

    ld h,b          ; hl = bc
    ld l,c

    inc hl          ; hl += 2
    inc hl

_nextBlock:
    ld a,(hl)       ; read control byte
    or a            ; 0 = terminator
    ret z

    bit 7,a         ; if bit 7 = 1 then it's raw
    jr nz,+

    ; RLE
    ld b,a          ; run length
    inc hl          ; run value
-:  call _outputByte
    djnz -
    inc hl          ; next control byte
    jp _nextBlock

+:  ; Raw
    and $7f         ; length
    ld b,a
    inc hl          ; value
-:  call _outputByte
    inc hl
    djnz -
    jp _nextBlock

_outputByte:
    ; writes a byte from hl to VRAM address de with interleaving 4

    ; set VRAM address
    ld a,e
    out ($bf),a
    ld a,d
    out ($bf),a

    ; write value
    ld a,(hl)
    out ($be),a

    ; decrement row counter
    push hl
        ld hl,(rowCount)
        dec hl
        ld a,h      ; if zero, move to next bitplane
        or l
        jr z,_setNextBitplane
        ld (rowCount),hl

        inc de      ; de += 4 (interleaving)
        inc de
        inc de
        inc de
    pop hl
    ret

_setNextBitplane:
        ld hl,(rowCountTotal) ; restore the counter
        ld (rowCount),hl

        ; now:
        ; - hl = number of tile rows
        ; - de = last byte written in previous bitplane
        ; so we want to set de = de - hl * 4 + 5

        ; first calculate hl = hl * 4
        xor a           ; clear carry
        rl l            ; shift hl left by 1
        rl h
        xor a           ; repeat
        rl l
        rl h

        ; subtract hl and add 5
        ex de,hl
            xor a       ; clear carry
            sbc hl,de
            inc hl
            inc hl
            inc hl
            inc hl
            inc hl
        ex de,hl
    pop hl
    ret
.ends

Benchmark

CompressorTile compressionTilemap compression
pscompr38%60%
psgcompr53%n/a
stcomprn/a50%

Note that this benchmark is incomplete. It needs a larger corpus of images and is missing a Wonder Boy compressor. It is based on BMP2Tile for the compression and a Windows batch file for the analysis.

Attach:benchmark.zip




Return to top
0.193s