Z80 Programming Techniques

+Contents

Block instructions
Code speed and size optimizations

Block instructions

The repeating block instructions like otir and ldir are used to copy or transfer large strings of data. Internally the Z80 handles these like individual outi and ldi instructions, after executing one it will move the program counter back to re-execute the instruction until the value in register pair bc becomes zero.

It is actually faster to use multiple outi and ldi instructions in sequence to mimic the behavior of otir or ldir. For example:

           .rept 1024

            outi

           .endr

outiblk:    ret

Get Code

This defines 1024 outi instructions with a ret at the end, making it into a subroutine. You can then call it like so:

 ld hl, data          ; Source data

 ld c, $be            ; Output port

 call outiblk-768*2   ; Transfer 768 bytes from (hl) to (c)

Get Code

To keep your code readable, it might be a good idea to put a call to the block of outi instructions in a macro.

Using this technique with ldi can be useful for filling memory too. Just replace the source address with a table that contains the fill value.

cleartab:  .rept 1024  ; Table of 1024 zeroes

           .db   $00  

           .endr

           .rept 1024

           ldi

           .endr

ldiblk:    ret

 ld hl, cleartab       ; Quickly clear $c000-$c3ff

 ld de, $c000

 call ldiblk-1024*2

Get Code

Code speed and size optimizations

Left-shifting

add a,a is faster than a sla a for single bit left shifts.

Zeroing `a`

xor a is faster and smaller than ld a,0.

`ret` == `reti`

IRQ handlers can end in ei; ret instead of ei; reti to spare a byte and 4 clock cycles.

Extra registers

If you are out of registers, try using ixl/ixh/iyl/iyh and even the i register for loop counters instead of maintaining a counter in memory or pushing/popping an already used register to the stack inside a loop.

Conditional `rst`

For a smaller conditional rst $38, use jr cc, -1. This will cause a conditional jump to the displacement byte ($FF) which is the rst $38 opcode.

Use shadow registers for interrupts

To maximize interrupt handler response (such as with raster effects) you can load all your working data into the alternate register set and switch to the working one using EXX at the start of your interrupt routine, then calculate data for the next interrupt to be stored in the alternate register set before returning.

Rotate the other way, it's shorter

When moving a bitfield within a register to a desired position, it may be faster to rotate the register in the opposite direction rather than shift it in the intended direction. E.g. rotating right twice instead of shifting left 6 times to move bits 1,0 to bit 7,6.

Fallthrough looping

If you need to repeat a routine several times but can't spare registers for a loop counter or ROM space to duplicate the code, try structuring the routine so it can call itself several times and fall through at the end. For example:

foo:

  ld hl, data

  call bar      ; Run routine once

  call bar      ; .. twice

  call bar      ; .. three times

bar:

  ld a, (hl)    ; .. fourth and final time

  inc l

  and $0F

  out (c), a

  ret

Get Code

Incrementing pages

The write-only ROM paging registers located at $FFFC-$FFFF overlay work RAM. This allows you to conveniently increment the page count when cycling through multiple ROM pages:

  ; Outside of loop

  ld ix, $FFFE  ; Point to register

  :

  ; Within loop

  inc (ix)      ; Next page

Get Code

Table alignment

If you align tables to a 256-byte boundary, you can access the contents by placing the index in a register such as l and the table address in h. This is faster than loading the full unaligned 16-bit address and adding a 16-bit index to it, and makes accessing tables with a size of 256 bytes or less very convenient:

 ld h, (sineTable >> 8) & $FF    ; Get MSB of table

 ld a, (frame_count)             ; Get index

 ld l, a

 ld a, (hl)                      ; Look up value

Get Code

Instead of:

 ld hl, sineTable                ; Get address of table

 xor a

 ld d, a                         ; Set index high byte to zero

 ld a, (frame_count)

 ld e, a                         ; Set index low byte

 add hl, de                      ; Add offset to base

 ld a, (hl)                      ; Look up value

Get Code

`sub hl,de`

Cursing the lack of a 16-bit SUB instruction? If you're using a constant for one side of the operation, try this:

  ; 4 bytes, 21 cycles

  ld de,-1000

  add hl,de

Get Code

Instead of:

  ; 5 bytes, 30 cycles

  ld de,1000

  or a ; reset carry flag

  sbc hl,de

Get Code

Two's complement takes care of the rest.

Never `call` and then `ret`

Any function that looks like

SomeFunction:

  ; ...

  call SomeOtherFunction

  ret

Get Code

can be optimised to

SomeFunction:

  ; ...

  jp SomeOtherFunction

Get Code

16-bit `neg`

Changes hl to -hl in 6 bytes and 24 cycles.

 xor a

 sub l

 ld l,a

 sbc a,a

 sub h

 ld h,a

Get Code

Returning set/reset carry flag

Rather than

return_set:

    scf

    jr +

return_unset:

    or a ; clears carry flag

+:pop ...

  ret

Get Code

...you can save a bit of space and execution time with:

return_set:

    scf

.db $3e

return_unset:

    or a ; clears carry flag

  pop ...

  ret

Get Code

...which changes the "or a" into a (relatively) harmless ld a,$b7, at the cost of being quite obtuse to read. It saves you 1 byte and 8 cycles (or 2 bytes and 6 cycles if you used a jp instead of a jr in the first case).

8-bit Loop Counters

Prefer to use the b register to hold 8-bit loop counters. This allows the djnz instruction to be used, which efficiently decrements the counter and performs a conditional jump back to the top of the loop.
If b is not available and the counter is placed in a different register, the loop will require separate decrement and jump instructions. If the loop body is likely to execute 3 or more times, it is faster to use dec & jp nz rather than the slightly smaller dec & jr nz.

Back to Z80 index

Back to Development index

Development

Block instructions

Code speed and size optimizations

Left-shifting

Zeroing a

ret == reti

Extra registers

Conditional rst

Use shadow registers for interrupts

Rotate the other way, it's shorter

Fallthrough looping

Incrementing pages

Table alignment

sub hl,de

Never call and then ret

16-bit neg

Returning set/reset carry flag

8-bit Loop Counters

Zeroing `a`

`ret` == `reti`

Conditional `rst`

`sub hl,de`

Never `call` and then `ret`

16-bit `neg`