Block instructions

The repeating block instructions like otir and ldir are used to copy or transfer large strings of data. Internally the Z80 handles these like individual outi and ldi instructions, after executing one it will move the program counter back to re-execute the instruction until the value in register pair bc becomes zero.

It is actually faster to use multiple outi and ldi instructions in sequence to mimic the behavior of otir or ldir. For example:

           .rept 1024
            outi
           .endr
outiblk:    ret

This defines 1024 outi instructions with a ret at the end, making it into a subroutine. You can then call it like so:

 ld hl, data          ; Source data
 ld c, $be            ; Output port
 call outiblk-768*2   ; Transfer 768 bytes from (hl) to (c)

To keep your code readable, it might be a good idea to put a call to the block of outi instructions in a macro.

Using this technique with ldi can be useful for filling memory too. Just replace the source address with a table that contains the fill value.

cleartab:  .rept 1024  ; Table of 1024 zeroes
           .db   $00  
           .endr

           .rept 1024
           ldi
           .endr
ldiblk:    ret

 ld hl, cleartab       ; Quickly clear $c000-$c3ff
 ld de, $c000
 call ldiblk-1024*2

Code speed and size optimizations

Left-shifting

Zeroing a

ret == reti

Extra registers

Conditional rst

Use shadow registers for interrupts

Rotate the other way, it's shorter

Fallthrough looping

foo:
  ld hl, data
  call bar      ; Run routine once
  call bar      ; .. twice
  call bar      ; .. three times
bar:
  ld a, (hl)    ; .. fourth and final time
  inc l
  and $0F
  out (c), a
  ret

Incrementing pages

  ; Outside of loop
  ld ix, $FFFE  ; Point to register
  :
  ; Within loop
  inc (ix)      ; Next page

Table alignment

 ld h, (sineTable >> 8) & $FF    ; Get MSB of table
 ld a, (frame_count)             ; Get index
 ld l, a
 ld a, (hl)                      ; Look up value

Instead of:

 ld hl, sineTable                ; Get address of table
 xor a
 ld d, a                         ; Set index high byte to zero
 ld a, (frame_count)
 ld e, a                         ; Set index low byte
 add hl, de                      ; Add offset to base
 ld a, (hl)                      ; Look up value

sub hl,de

  ; 4 bytes, 21 cycles
  ld de,-1000
  add hl,de

Instead of:

  ; 5 bytes, 30 cycles
  ld de,1000
  or a ; reset carry flag
  sbc hl,de

Two's complement takes care of the rest.

Never call and then ret

SomeFunction:
  ; ...
  call SomeOtherFunction
  ret

can be optimised to

SomeFunction:
  ; ...
  jp SomeOtherFunction

16-bit neg

Changes hl to -hl in 6 bytes and 24 cycles.

 xor a
 sub l
 ld l,a
 sbc a,a
 sub h
 ld h,a

Returning set/reset carry flag

Rather than

return_set:
    scf
    jr +
return_unset:
    or a ; clears carry flag
+:pop ...
  ret

...you can save a bit of space and execution time with:

return_set:
    scf
.db $3e
return_unset:
    or a ; clears carry flag
  pop ...
  ret

...which changes the "or a" into a (relatively) harmless ld a,$b7, at the cost of being quite obtuse to read. It saves you 1 byte and 8 cycles (or 2 bytes and 6 cycles if you used a jp instead of a jr in the first case).

8-bit Loop Counters




Return to top
0.099s