Forums

Sega Master System / Mark III / Game Gear
SG-1000 / SC-3000 / SF-7000 / OMV
Home - Forums - Games - Scans - Maps - Cheats - Credits
Music - Videos - Development - Hacks - Translations - Homebrew

View topic - Looking For Optimized ASM Routines for SMS

Reply to topic
Author Message
  • Joined: 08 Sep 2018
  • Posts: 236
Reply with quote
Looking For Optimized ASM Routines for SMS
Post Posted: Sun Jul 12, 2020 7:38 pm
Is anyone interested in sharing some optimized ASM routines they like to use for projects on the Sega Master System?

Things like faster maths, VDP tricks for fast and stable writes or just to find specific tiles on screen, perhaps even sprite tricks like optimal draws, basic movement, or even on screen checks. Even beefy stuff like custom libraries. Stuff that you would think would help the novice z80 programmer push their skill further.

A lot of this information seems scattered even on here so perhaps we can bring it all together into one thread!

I'm going to try and hunt down some stuff I do and post it here too! (though I'm sure those of you in the master class would put my code to shame)
  View user's profile Send private message
  • Joined: 23 Aug 2009
  • Posts: 119
  • Location: Seattle, WA
Reply with quote
Post Posted: Mon Jul 13, 2020 4:30 am
I’ll humbly submit the modules I’ve been developing as I build a small game that aims to tie together the aspects of a full project:

https://github.com/SavagePencil/SMSFramework
  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 13290
  • Location: London
Reply with quote
Post Posted: Mon Jul 13, 2020 4:39 am
I think the biggest efficiency hacks for me are aligned tables and the use of blocks of outi opcodes with computed jumps to maximise speed of VDP writes. Both of these could be explained a bit more but they are not really drop-in “routines”.
  View user's profile Send private message Visit poster's website
  • Joined: 04 Jul 2010
  • Posts: 319
  • Location: Angers, France
Reply with quote
Post Posted: Mon Jul 13, 2020 5:47 am
+1 with Maxim.
Everything that can be precalculated is generally good.

but "optimisation" and "novice" in the same sentence seems a little complicated.
  View user's profile Send private message
  • Joined: 01 Feb 2014
  • Posts: 529
Reply with quote
Post Posted: Mon Jul 13, 2020 7:11 am
What ichigo says.

Basically, everything you'd need something like faster maths for, you really want to precalculate instead. (For example look at my game Flight of Pigarus. There's no math involved beyond simple addition/subtraction. Everything, from character movement to bullet aiming uses LUTs.)

I think the other thing ichigo points out is also true. Don't try to start optimizing rightaway if it isn't necessary. The z80 is very beginner-friendly, and you can get very far without squeezing out the last few cpu cycles. (Fun fact: My first two games Bruce Lee and Bara Burū didn't even use the index registers ix and iy, simply because I didn't really know how to handle them back then.)


One tip I can share is this, though: If you decide to 'stream in' tile data to animate a character (most likely your player character, whose animation frames can quickly use up the available VDP memory), try making it 3bpp, with 8 colours or less.
That way you can replace
OUTI
OUTI
OUTI
OUTI
with
OUTI
OUTI
OUTI
OUT ($BE), a
This saves 40 cycles per tile. If you construct your palettes accordingly, the same trick can be used with animated background tiles, which often are 3bpp, 2bpp or even 1bpp.


Also, the sprite table handling gvx32 proposes in this thread is pretty clever.
  View user's profile Send private message
  • Joined: 05 Sep 2013
  • Posts: 2709
Reply with quote
Post Posted: Mon Jul 13, 2020 11:46 am
on the side of "faster math", when you can't use pre-calculated tables (for instance you have to multiply two numbers A and B whose values are known only at run time) you might create specific routines to handle those if you realize you're not in the general case where you would use 'standard' routines

for example: if you know you're going to multiply a number A that is 0-255 by one B that can be only 0-15, you might save some by crafting your own 8x4 multiplier instead of using the standard 8x8 as yours will require just half the time, approximately.

(edit: using a LUT here would require 8 KB, tecnically still possible...)
  View user's profile Send private message Visit poster's website
  • Joined: 14 Oct 2008
  • Posts: 344
Reply with quote
Post Posted: Mon Jul 13, 2020 1:52 pm
Yes, optimized routines are probably not good learning material for novices.

I would suspect optimization usually comes by doing things that aren't obvious to the user.
Probably better for novices to learn how to write code that works first, and then worry about writing codes that works BETTER.
  View user's profile Send private message
  • Joined: 14 Aug 2000
  • Posts: 553
  • Location: Adelaide, Australia
Reply with quote
Looking For Optimized ASM Routines for SMS
Post Posted: Tue Jul 14, 2020 2:36 pm
XOR A is a faster way of clearing the A register than LD A,$00 and only takes 1 byte instead of 2 bytes.

Yep...
  View user's profile Send private message
  • Joined: 08 Sep 2018
  • Posts: 236
Reply with quote
Post Posted: Tue Jul 14, 2020 4:52 pm
Perhaps I should have used a bit better of a descriptor for someone who isnt exactly a beginner or an expert.

Either way I think optimization should be brought forward on this platform early though. The SMS is still a slow platform and some things about it can be a bit complex. Most programmers coming to it are most likely already skilled in programming in some way anyways. So why not provide information about not only how to do things, but how to do thing better.

Thanks everyone for the contributions so far!
  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 13290
  • Location: London
Reply with quote
Post Posted: Tue Jul 14, 2020 5:03 pm
As a counter to that, I’d say you have so many cycles per frame, optimisation can be premature.

So on to a tip: try setting the border colour during VBlank to visually “see” when it completes (and around individual phases). This can help you to see where you are with your frame time budget, and whether it spikes. You can do the same with profiling in Emulicious but this works anywhere that the border is visible, including on a real system. Use ifdefs to remove them from the final version.
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 2709
Reply with quote
Post Posted: Tue Jul 14, 2020 10:15 pm
asynchronous wrote
XOR A is a faster way of clearing the A register than LD A,$00 and only takes 1 byte instead of 2 bytes.


just keep in mind you're changing flags too using XOR A

on a similar note, instead of setting two different register to a byte value using two LD, it's faster to set them together using a single LD on a register pair such as

LD B,4
LD C,$BF


into

LD BC,$4BF


and same applies to 16 bit writes so, to initialize mappers, you might simply use

init_mappers:
  ld hl,$0000
  ld ($fffc),hl      ; [$FFFC]=$00, [$FFFD]=$00
  ld hl,$0201
  ld ($fffe),hl      ; [$FFFE]=$01, [$FFFF]=$02


instead of a loop (or 4 separate 8 bit LDs)
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 13290
  • Location: London
Reply with quote
Post Posted: Wed Jul 15, 2020 12:03 pm
However, for code that runs once at startup, maybe it’s better to make it more verbose.
  View user's profile Send private message Visit poster's website
  • Joined: 08 Sep 2018
  • Posts: 236
Reply with quote
Post Posted: Sat Jul 18, 2020 7:21 pm
Maxim wrote
As a counter to that, I’d say you have so many cycles per frame, optimisation can be premature.

So on to a tip: try setting the border colour during VBlank to visually “see” when it completes (and around individual phases). This can help you to see where you are with your frame time budget, and whether it spikes. You can do the same with profiling in Emulicious but this works anywhere that the border is visible, including on a real system. Use ifdefs to remove them from the final version.


I think my wording still has been a bit improper for the topic I wanted to bring up. So many have come forward to share and even this tip you've provided would help anyone interested in finding ways to budget cycles and optimize for each frame time.

The effort here even if small is still wonderful. I appreciate the community here, all of you are always willing to help rather than boast some superiority complex. I hope this thread helps others who might be stuck or just need a tip on the subject in the future.
  View user's profile Send private message
Reply to topic



Back to the top of this page

Back to SMS Power!