- Joined: 05 Dec 2019
- Posts: 56
- Location: USA
|
Sprite loading, flipping, and expansion routines
Posted: Thu Dec 05, 2019 8:09 pm Last edited by PinoBatch on Sat Jul 31, 2021 12:03 am; edited 1 time in total
|
I began programming for the NES in about 2000 (with dedication increasing in late 2008), the Game Boy Advance in 2002, and the Game Boy in April 2018. (I can give details once I'm no longer "new here.") Now I'm trying my hand at SMS. The Z80 CPU is mostly similar to the Game Boy CPU, but the VDP looks like a fish swimming in reverse.
- 4bpp
- Background flipping, not sprite flipping
- Background priority, not sprite priority
- Background horizontal scrolling is in the opposite direction
- No mid-screen vertical scroll changes
To work around the ROM size hit of the first two of these, and knowing that I use horizontal flipping far more often than vertical, I devised subroutines to flip sprites horizontally as I load them. Each of them takes about 5 scanlines per 8x8-pixel tile. I've tested them in a framework based on Maxim's "How to Program" tutorial.
Can you spot any poor practices?
;;
; Loads 4bpp tile data to VRAM with optional bitplane transformation.
; 4bpp version: 137 cycles/sliver
; A 16x32 sprite cel like Mario is 64 slivers or 8768 cycles
; A scanline is 228 cycles, so this is 39 lines
; @param HL source
; @param B sliver count (width*height/8, or data size/4)
; @param D high byte of pointer to transformation (identity, bit
; reverse table, or scale) table
load_4bpp_cel:
ld e, [hl] ; 7
inc hl ; 6
ld a, [de] ; 7
out (VDPDATA), a ; 11
ld e, [hl] ; 7
inc hl ; 6
ld a, [de] ; 7
out (VDPDATA), a ; 11
ld e, [hl] ; 7
inc hl ; 6
ld a, [de] ; 7
out (VDPDATA), a ; 11
ld e, [hl] ; 7
inc hl ; 6
ld a, [de] ; 7
out (VDPDATA), a ; 11
djnz load_4bpp_cel; 13
ret
;;
; 2bpp to 4bpp expansion and optional flipping: 129 cycles/sliver
; A 16x32 sprite cel is 8256 or 37 lines
; @param HL source
; @param B sliver count (width*height/8, or data size/4)
; @param D high byte of pointer to transformation (identity, bit
; reverse table, or scale) table
; @param IX subpalette choice. $0000: use colors 0, 1, 2, 3;
; $00FF: 0, 5, 6, 7; $FF00: 0, 9, 10, 11; $FFFF: 0, 13, 14, 15
load_2bpp_cel:
ld e, [hl] ; 7
ld a, [de] ; 7
out (VDPDATA), a ; 11
inc hl ; 6
ld c, a ; 4
ld e, [hl] ; 7
; peak register pressure is here:
; HL: src ptr; DE: next flip byte; C: plane 0; B: count;
; A: must be open to retrieve plane 1
; Thus we need IX for 2bpp to 4bpp expansion
ld a, [de] ; 7
out (VDPDATA), a ; 11
or c ; 4
ld c, a ; 4
and ixl ; 8
out (VDPDATA), a ; 11
inc hl ; 6 - increment HL here to space out VDPDATA writes
ld a, c ; 4
and ixh ; 8
out (VDPDATA), a ; 11
djnz load_2bpp_cel; 13
ret
(I'm in the habit of using square brackets for register pairs because a widely used Game Boy assembler requires them.)
The code uses two lookup tables: one for identity (no flipping) and one for bit reversing (horizontal flipping). The same principle would allow Neo Geo-style shrinking, which incidentally I've done in a tech demo on the NES, and I might investigate that once I implement skipping source rows for vertical shrinking as well.
.section "idtable" align 256 free
identity_table:
.repeat 256 index I
.db I
.endr
hflip_table:
.repeat 256 index I
.db ((I&$80)>>7)|((I&$40)>>5)|((I&$20)>>3)|((I&$10)>>1)|((I&$08)<<1)|((I&$04)<<3)|((I&$02)<<5)|((I&$01)<<7)
.endr
.ends
|
- Site Admin
- Joined: 19 Oct 1999
- Posts: 14745
- Location: London
|
Posted: Thu Dec 05, 2019 8:37 pm
|
The unrolled loop could be done with a .repeat.
I didn’t do the maths but maybe your loader could use outi to combine the load, out and increment but this would also mess with b, it would need to be the byte count and that could easily exceed 8 bits.
Some assemblers don’t support ixh as it’s technically undocumented; this is an issue for interfacing with SDCC.
|
- Joined: 05 Sep 2013
- Posts: 3828
- Location: Stockholm, Sweden
|
Posted: Fri Dec 06, 2019 11:52 am
|
Hi! I remember you from the GBAdev.org forum times - welcome here!
As for your code, I'd say it's pretty good, it shows you're not a first-timer. (I would say most of times you load tiles to VRAM when screen is off or when in vblank, so there are no speed constraints - but if your code can work while in vdraw without needing to slow it down on purpose, that's nice!)
|
- Joined: 05 Dec 2019
- Posts: 56
- Location: USA
|
Posted: Mon Dec 09, 2019 7:52 pm
|
I've made a slightly less trivial example. The player movement code is written in Game Boy ASM, and it translated to Z80 almost verbatim. It currently reloads the character's 16x24-pixel (6-tile) cel every vblank; application in a real game would reload cels for only those actors whose cels have changed.
Does this ROM behave as expected on authentic hardware? It runs in BlastEm, but all I have hardware-wise are an EverDrive and a Genesis 3 VA1 that hasn't been modified to wire up the signals used by the PBC. (I bought a Genesis 2 on eBay, but its power button proved unstable, and I've sent it back.)
|
- Site Admin
- Joined: 19 Oct 1999
- Posts: 14745
- Location: London
|
Posted: Mon Dec 09, 2019 8:22 pm
|
Emulicious is probably the best emulator for testing if it breaks any timing constraints.
|