Downloading Stuff

You will need the following stuff, and it'll make life easier if you try to get the ones I mention - unless something newer and better is available!

A text editor

I suggest ConTEXT. It's a text editor with limited project support, customisable buttons, and most importantly custom syntax highlighting. Download and install it somewhere on your hard disk.

I've made a syntax highlighter file. Put it in the \Highlighters\ folder where you installed ConTEXT. This tells ConTEXT how Z80 assembler code should be highlighted to make it easier to work with.

An assembler

I suggest WLA DX. It's very full-featured, the only downside being the fact that it's available only as C source. So here are some compiled binaries. Unzip them somewhere on your hard disk and remember where.

A debugging emulator

You don't need a debugging one but it will surely help. In my opinion, MEKA has the best debugger at the moment. You may find it easier to follow this tutorial if you also use it.


I suggest you get the following SMS docs:

For Z80 reference I recommend:

There are a lot of Z80 docs around, but I like this one for its clarity - the others tend to be better for people who already know how to code and just need reminders of technical details. You will also want the WLA DX manual, which I included in the above WLA DX download, or you can get it from the WLA DX site.

Extract/save them all somewhere handy. You'll need a PDF viewer for the Z80 User Manual.

Other stuff

You may find it useful to use additional programs, depending on your project. I like to use Paint Shop Pro 7 for image editing (newer versions are less capable in this respect) and frhed for hex editing, for example.

For converting graphics I suggest my BMP2Tile program.

I am lucky enough to have an SMSReader and devcart so I have the software for that too, of course. (Getting homebrew games working on real hardware can be a real trial...) But you won't need them for a while yet.

Setting It Up

F9 = Compile

You already installed ConTEXT and my highlighter. Now we're going to make F9 our "compile" key :)

Now, there is a slight difficulty because of the way WLA DX works. It works in two stages, "compiling" and "linking", designed for systems set up for compiling programs with a "make" utility (like Linux). For us normal types, we need to get it going manually :( so I've written a batch file to do it. I'm sure there's a more elegant way to do it, and it will only work with a single source file so for a really advanced project I recommend you figure that stuff out. The batch file is called compile.bat and is included with the WLA DX download above.

Then open ConTEXT and click on Options, Environment Options, Execute Keys. Click Add and enter "asm" in the box. Then click on F9 in the list. Enter the path to compile.bat in the "Execute" box, "%p" in the "Start in" box, "%n" in the "Parameters" box, "*:%n:%l: *" (asterisk, colon, percent, lowercase N, colon, percent, lowercase L, colon, space, asterisk) in the "Compiler output parser rule" box, and check the "Capture console output" and "Scroll console output to the last line" checkboxes, as shown here:

We'll set up some more stuff later. For now, let's see if we're compiling properly. But first, a long and boring rant.


I'm writing this guide to try and help you get going with programming on the SMS. My motivation for this is to encourage others to give it a go, so they can produce new and exciting projects. Now, as a newbie you have to realise that the first few things you produce are not going to be much good - I know mine weren't. So do yourself a favour, and the rest of the world, and don't release them.

Why? Because of rom collectors. If you make a tiny, tiny modification to one of the demo programs I'm (hopefully) going to teach you to make, then release it on your webpage, and someone downloads it, then runs GoodSMS (for example), then sends it off to Cowering as a "new, unknown ROM!!!!11!", then he includes it in the next version of GoodSMS, then EVERY anal rom collector in the world goes crazy because he is "missing" this "ROM", and they all search it out and find it, and the WORLD HAS GONE CRAZY! So please, keep it to yourself unless it's actually worth sharing with the world.

Compile test

OK, no more rant. Download this:

Extract it somewhere and open it in ConTEXT. It should be coloured automatically according to what is written, in dark blue, light blue, black and red. Press F9 and at the bottom of your screen there should appear an Output Console. Resize it to show 3 or 4 lines - they should read something like:

Free space at $7ff8-$7ff9.
Free space at $7ffc-$7ffe.
31743 unused bytes of total 32768.
> Execution finished.

Congratulations, you've just made your first homebrew demo! Go look in the folder where you put the source file and you should find a file called output.sms. Open it in your favourite emulator and marvel at its amazingness!

If it doesn't compile: oh dear. Did you edit compile.bat? Did you press the right key? What error messages did you get? Try contacting me if you really can't figure it out.

More action keys

You saw how we set up F9 to compile the program using WLA DX? Well, that was actually the hardest one to do. Now we're going to add one more, for running the game in the emulator. Fill in the details as follows for key F10:

Start in<path to Meka>
HintRun in MEKA

Now pressing F10 should start up Meka with the file we just compiled.

Assembly Concepts

Assembly language is a general term for any code that is specific to a certain type of processor. On the lowest level, it's just a way of encoding the raw data the CPU executes. However, there are several extra facilities in modern assemblers that are useful to us.

  • Comments are essential.
  • Numbers is a quick overview of hex and binary.
  • Directives offer control over the way the assembler program understands our code.
  • Labels help us to refer to data and code in a readable way.
  • Opcodes let us enter the CPU instructions with total control.
  • Memory is where the code is stored, and where data comes from and goes to.
  • Registers are where we hold the data we are working on.
  • Ports are how we communicate with the outside world.


In all forms of programming, it is essential to add comments. These are bits of text that the assembler completely ignores - we add them in to help humans (including ourselves) to understand what the code is supposed to do (on a high level), and why it's doing what it's doing (on a low level).

There is one super-hugely-amazingly important reason to add comments to your code: you will forget what it's doing and why it's doing it in the way it does, and waste loads of time trying to figure it out. Save yourself the hassle by writing loads of helpful comments.

There are two types of comment: line and block.

Line comments

In WLA DX, everything that comes after a semicolon ";" is ignored until the end of the line.

Comment on what you're doing:

ld b,10 ; Initialise the counter

What something is:

; Font data:
.incbin "font.bin"

You can also comment out code instead of deleting it:

; Initialise the counter
;ld b,10
ld b,20

Block comments

In WLA DX, everything that comes after "/*" is ignored until "*/" is found, across multiple lines.

This is useful for commenting out large blocks:

/* Skip outputting
    ld a,$00
    out ($bf),a
    ld a,$38|$40
    out ($bf),a
    ld hl,Message
-:      ld a,(hl)
        cp $00
        jp z,+
        sub $20
        out ($be),a
        ld a,$00
        out ($be),a
        inc hl
        jp -

Or for making large blocks of explanatory text:

 In loving memory of Acorn Computers Ltd.
      Mike G proudly presents the:

       BBC Micro Character Set!

 This source was created automatically
 from character data stored in the BBC
 Microcomputer operating system ROM.

 The BBC set is a nice, clear font which
 (unlike the classic Namco/Sega font)
 includes lower-case letters as well as

 Being a British micro, there's even a
 Pound Sterling currency symbol, which
 should come in handy when I begin to
 write financial apps on the SMS. (Just

 Hopefully by using binary representation
 it will be obvious how the character
 data is stored.


Although you can make these large blocks out of line comments too.


"Normal" numbers count from 0 to 9, then increment the "tens" digit, and so on. This is the decimal system; also called base 10 because there are ten digits. We use it because we (mostly) have ten fingers on our hands. Computers don't have fingers, they have bits, so they generally don't count this way.


On the lowest level, all computer memory is ones and zeroes. This is base 2. You can express a binary number in WLA DX like this:


This is an 8-bit number. The Z80 works in terms of numbers that are eight binary digits long. "Binary digit" can be shortened to "bit". So the Z80 is an 8-bit CPU.

Let's figure out what this means, using a Calculator program. Here's Windows:

You need to be in "Scientific" mode:

There are some "radio buttons" labelled Hex Dec Oct Bin. You can use these to convert between number bases. Click on Bin to go into binary mode, then enter 10000110. Then click on Dec to convert to decimal; the answer is 134. So %10000110 is the same as 134.

Here are a bunch more things to know about binary numbers:

  • Binary numbers can only contain the digits 0 and 1.
  • In WLA DX, you tell it a number is binary by putting a % sign in front of it (with no space).
  • When we refer to the individual digits (bits), the high bit (or most significant bit, or MSB) is the first digit.
  • The low bit (least significant bit, LSB) is the last digit.
  • Bits normally come in groups of eight to make a byte. So when you enter a binary number, make sure it has all eight, to avoid confusion.

We rarely deal with binary numbers directly, unless we want to visualise the individual bits. WLA DX lets us enter numbers in binary, decimal or hexadecimal, so we pick the format that suits us best in each case.


Computers like binary but humans don't - we have to use a calculator to convert it, and it's hard to enter. Hexadecimal is base 16, and it lets us represent computer data more efficiently. There are sixteen digits, so we use 0 to 9 and then the letters a to f. The case doesn't matter; I'll try to use lowercase for consistency.

Two hexadecimal (or hex) numbers are enough to represent one byte. You enter them in WLA DX by preceding the number with a $ sign:


You can convert this in Windows Calculator by switching to Hex mode, entering a10c and then switching to Dec. You should get 41228.

Here are some more things to know about hex:

  • Don't forget that $9 + 1 = $a
  • There are many ways to represent them; you may see them written as
    • $a10c
    • 0xa10c
    • &a10c
    • A10Ch
  • When we want to signify that a number is a byte, we will make sure it is two digits by adding a leading zero, e.g. $01
  • When we want to signify that a number is a word (16 bits), we will make sure it is four digits by adding leading zeroes, e.g. $0002


Directives are commands to the assembler. That means WLA DX for us. They are used to tell it how to do things, and also what to do, for pretty much all the cases except actual code.

Directives are easy to spot because they all have a dot in front of their names. They are all well documented in the WLA DX documentation.

Here are some of the most useful ones:

Memory layout

  • .org and .bank tell WLA DX exactly where the output should be placed in the file.
  • .memorymap tells WLA DX where the ROM and RAM appear from the CPU's point of view.
  • .rombankmap tells WLA DX how the ROM file is structured, in terms of mapping.
  • .section and .slot let us pass the decisions on where to place code and data, and how to discard unused parts, to WLA DX.
  • .ramsection allows us to structure our memory usage in a flexible way that results in debugging symbols.

Data definition

  • .db, .dsb, .dw and .dsw let us specify raw data in our source file.
  • .dbsin, .dbcos and .dbrnd (and their .dwXXX equivalents) let us get WLA DX to generate us some data.
  • .incbin lets us import external data from a file.
  • .struct and .dstruct let you have structured data (with multiple instances) in a clean way.
  • .enum can be useful for producing names for things like an object state, similarly to how they are used in C/C++/C#/Java/etc. (They can be used for memory management too but that has drawbacks.)

System-specific helpers

WLA DX supports many CPUs and consoles, and it has helper directives to make sure the resulting file is valid for that system. For the Master System, the main one we use is .sdsctag, which does the following:

  • Inserts all the necessary data to make the resulting file pass the BIOS checks, so it will run on a system with a BIOS.
  • Inserts the special SDSC tag which lets you include the title of the program, your name, the version number and an arbitrary comment; emulators and ROM management tools can then display this.

Conditional inclusion

If there are parts of your source that you want to sometimes exclude from the result - as in, pretend the code or data was not there in the source at all - you can use the various .if directives. These are similar to #ifdefs in C/C++.


Macros let you insert things into your source based on parameters or external data. They can be complicated but they are sometimes very useful. I will include some later in the tutorial.


Labels are strings of text (without spaces or punctuation) that let you refer to lines of code, or data, by a name instead of a number.

When the code is assembled, it is all converted to numbers. That means that if you want to refer to some graphical data, for example, or a code routine, by the time it comes to the machine code, that location is merely a memory address in the ROM or RAM. However, when you're writing the code you don't want to pre-choose these addresses; you'd rather have the assembler figure out where to put everything, and then fill in the addresses accordingly.

That means that instead of this:

; load data
ld hl,$100
call $200


.org $100
.incbin "data.bin"


.org $200
; Data loading function... can have

; load data
ld hl,SomeData
call LoadData


.incbin "data.bin"


; Data loading function...

The assembler will include all the code and data, assemble it together into the output (with placeholders for the addresses), then go back and fill in the placeholders with the actual addresses. We don't need to remember which thing is at which address.


Opcodes are the actual instructions that will get executed by the Z80 CPU in the Master System or Game Gear. They are carefully structured sets of values for bytes, which makes them easier for the CPU to understand but horrible for humans to deal with.

Thus, we use assembler programs that understand mnemonics. These are small "words" that represent what an opcode does. For the Z80, they are generally (often abbreviated) English words representing what the instruction does, or an initialisation of a phrase describing it. This makes the source code more readable. Here's an example:

ld bc, $4000    ; Load register pair BC with number $4000
out ($be),a     ; Output to port number $be the number stored in register A
dec bc          ; Decrement register pair BC
ld a,b          ; Load register A with the number stored in register B
or c            ; Perform a logical OR operation between registers A and C
jp nz,SomeLabel ; Jump, if the result is non-zero, to the address of SomeLabel

Once you become familiar with the mnemonics, it is not necessary to add such comments all over the code.

There are more than 1000 possible unique opcodes, but they can be split into 158 "instruction types", where the operation is really the same but the registers are different. Of these, we will mainly be using a few dozen, so it is not so overwhelming. The instructions cover the groups:

  • 8-bit arithmetic and logic operations
    Adding, subtracting, AND, OR, XOR type operations
  • 16-bit arithmetic
    Similar to the 8-bit ones
  • 8-bit and 16-bit load
    Moving data around
  • Bit set, reset, and test
    Checking and manipulating the individual 1s and 0s
  • Call, return, and restart
    For functions
  • Exchange, block transfer, and search
    These simplify operations that work on lots of data
  • General purpose arithmetic and CPU control
    These are helpers for manipulating data and controlling how the CPU works
  • Input and output
    The CPU can also communicate with other devices in the system through these
  • Jump
    Let us execute different bits of code under different conditions
  • Rotate and shift
    Can be useful for calculations, but quite confusing at first

The Z80 User Manual groups the instructions in this way.

I will be describing each instruction as it appears in this tutorial. You don't need to memorise anything :) just let them sink in naturally, and look up the ones you forget.


There are (broadly) two types of memory - ROM and RAM.


ROM is memory that is full of data you can read but can't change. This is what's inside the game cartridges, and is why people call the dumped data "ROMs".

From our perspective, ROM is where all our code and data lives, at least at first.

ROM stands for Read Only Memory, to remind us that we can't change it.

We can have (almost) as much ROM as we want, although in fact it is hard to fill more than a few dozen KB.


RAM is memory that is initially empty, but you can write things in and read them out later. In general, we are considering the RAM inside the console that is used by every game you play (although game saves are a type of RAM, and some games have extra RAM in the cartridge to supplement the system RAM).

From our perspective, RAM is where we will keep any data that we have calculated or defined from code, especially data that might need to be remembered and/or might need to change.

RAM stands for Random Access Memory, which is nothing to do with what ROM stands for; it just looks similar. Random Access means you can get at any part of it without waiting (a bit like a CD compared to a tape), but that actually applies equally to ROM.

The Master System has 8KB of system RAM.


Due to the way that chips are connected to the CPU, it is possible to connect a chip such that it fills a slot bigger than the amount of memory on the chip. Memory sizes are generally powers of 2 (4KB, 8KB, 16KB, etc), so the slot will be two or four times the size of the memory. The result, when seen from the CPU, is that the whole of the memory is repeated, to fill the slot. Here's an example:

Memory:            Slot:                Result:
1111222233334444   +----------------+   +----------------+
5555666677778888   |                |   |1111222233334444|
99990000aaaabbbb   |                |   |5555666677778888|
ccccddddeeeeffff   |                |   |99990000aaaabbbb|
                   |                |   |ccccddddeeeeffff|
                   |                |   |1111222233334444|
                   |                |   |5555666677778888|
                   |                |   |99990000aaaabbbb|
                   |                |   |ccccddddeeeeffff|
                   +----------------+   +----------------+

This is called mirroring because you see a duplicated "image" of the memory. If you modify one of the "images" (if it is RAM), both are altered. But note that the "reflection" is not flipped!

The system RAM in the Master System and Game Gear is mirrored. The 8KB fills a 16KB "slot". To avoid confusion, we avoid using the "reflection"; we act (mostly) as if it didn't exist.

SMS Memory Map

The Z80 CPU has a 16-bit address space. That literally means that it has 16 pins on it that select a memory address; that means it can select addresses from $0000 to $ffff (64KB). This is divided (generally) into the following slots:

$0100-$3fffROM (slot 1)15KB
$4000-$7fffROM (slot 2)16KB
$8000-$bfffROM (slot 3)16KB
$e000-$ffffRAM (mirror)8KB

For larger games, ROM slots 1-3 can be selected ("mapped") by the game. For smaller ones, such as 32KB ones, then the first 1KB plus slots 1 and 2 are enough to hold all 32KB, and slot 3 is empty.


Registers are a special sort of memory inside the CPU. They can be accessed much more quickly and operations can be performed using what they contain. Pretty much any time you want to do something with data in ROM or RAM, you have to transfer it to one or more registers, perform the operation, and then (perhaps) transfer the result back again.

There are roughly 22 registers in the Z80, all with short one- or two-letter names. The ones with a single letter name can hold 8 bits (one byte); the two-letter ones hold 16 bits (two bytes). Here's a quick guide:

This is the main 8-bit arithmetic register. A lot of instructions can only operate on a.
The various bits hold flags which signal what happened during recent operations, allowing conditional branching.
b, c, d, e, h, lGeneral purpose registers
We can do what we like with these, because (usually) they don't matter. They can be accessed individually as 8-bit registers or in pairs as bc, de and hl. hl is a "super-register" a bit like a because it can do more maths.
ix, iyIndex registers
These two 16-bit registers are useful as indices to something else, because you can work with what they point near to without changing the value in them. And they're useable as 16-bit registers too.
pcProgram Counter
This is where the Z80 keeps track of where in the program it's got to.
spStack Pointer
Keeps track of the stack which we'll come to soon.
iInterrupt Page Address
We don't bother with this one, because it has no real use on the SMS.
This is to do with keeping memory updated, and isn't normally useful.
a', f', b', c', d', e', h', l'Alternate registers
It is possible to swap these with the regular versions, but not individually.

Some of the 8-bit registers can be paired up to make 16-bit ones: b and c become bc, d and e become de, and h and l become hl. You can also split ix into ixh and ixl, and similarly for iy, but you rarely do that.


The Z80 can input/output a byte from/to any of 256 ports, which is useful for sending and receiving data from other parts of the system. On the SMS, these are the relevant ports - note that almost all of them are mirrored in other places but it gets confusing if you use the mirrors so I won't go into it.

$3e Memory control
$3fI/O port control
$7eV counter 
$7fH counterPSG
$be VDP (data)
$bf VDP (control)
$dc, $ddControllersControllers

The main ones you will use with are highlighted in bold. To get a program running, you only really need the VDP ones.

Walkthrough Of Hello World

This section will go through every line of code in the "Hello World!" example program that we compiled in the Setting It Up page.

Assembler Directives

You should have "Hello World.asm" open in ConTEXT from the Setting it up stage. Let's see what it does.

WLA DX banking setup

In the Memory section we learned about the Z80 memory map. If we want to use more than 48KB of ROM then we need to tell WLA DX all about the slots; however, for a simple case (as in this one) all we need to do is tell it where and how big the ROM is.

; WLA-DX banking setup
defaultslot 0
slotsize $8000
slot 0 $0000

bankstotal 1
banksize $8000
banks 1

I put a huge comment there to remind me what it is for and to "split up" the file with the horizontal lines. This is just a matter of taste.

The directives basically say we want one 32KB ($8000 bytes) "bank" of ROM and that it starts at address $0000 in the memory map (slot 0).

SDSC tag

; SDSC tag and SMS rom header
.sdsctag 1.10,"Hello World!","SMS programming tutorial program (bugfixed)","Maxim"

This is one of the niceties that you ought to add to your homebrew software. Kindly added to WLA DX after being invented on the SMS Power development forum, it allows you to embed useful information about your program - its name, version, date, author and notes - into the resulting ROM image. It's another WLA directive - you write the version, name, notes and author name and it fills in the date for you.

By adding this tag you will also prompt WLA to insert a valid SMS rom header. This is something found in (almost) all SMS roms which acts to verify that the cartridge is correct when played on an original system. It's not absolutely necessary, but you do want your program to work on a real SMS, don't you?

You can add this anywhere in your source. Putting it at the start reminds you that it's there and that you ought to update it, though.

Where to put the code

.bank 0 slot 0
.org $0000

This tells WLA DX that code (and data) that we are about to specify should go in ROM bank 0 (even though there is only one ROM bank) and that this bank will be in slot 0 (even though there is only one slot). Then we say that we want the code to start right at the beginning (location $0000).


So far we've only had WLA DX directives. Now it's time for some code.

; Boot section
    di              ; disable interrupts
    im 1            ; Interrupt mode 1
    jp main         ; jump to main program

Nearly all SMS programs start this way.

The first three lines are just a big comment to make it clear what's going on here.

The first actual Z80 instruction is di. This means Disable Interrupts. Interrupts are things which cause code execution to jump to somewhere else ("interrupting" the program flow) to handle something, and we don't want that to happen until we're ready.

Next is im 1. This sets the Interrupt Mode of the Z80. For various reasons, the only one it makes sense to use is mode 1. So that's what we set it to.

Now we've got the most important two things done (to stop our program flow being affected) we can get on with the coding. For technical reasons, the start of the ROM is important and reserved for certain things so our "regular" code should be somewhere else. So we want to "jump" to where that is. Later in the code I've defined a label called "main". The CPU will go there and carry on executing the instructions after the label. We'll get to there in a moment.

Note that I've added a comment on each line telling what the instruction means. These are very verbose and once you're more advanced you'll think they're a bit unnecessary, since they don't tell you anything more than the instructions themselves.

Pause Handler

Pause button handler

.org $0066
; Pause button handler
    ; Do nothing

Remember how I turned off interrupts just now? Well, there are some interrupts you can't turn off, officially called Non-Maskable Interrupts (NMIs). On the SMS they're not too bad, they're just used for the pause button.

Whenever the pause button is pressed, the code execution will stop whatever it's doing and jump to offset $0066. It will then execute whatever's there until it gets to a retn instruction (return from NMI), when it will go back to what it was doing before it was pressed. If we don't want to do any special handling of the pause button, we should therefore put this command straight away at $0066. The .org $0066 directive tells WLA that this bit has to be at $0066.


Now we have finished with the special things that have to be at the start, we can get back to our "main" program. The first thing that has to do is initialise various parts of the system.

Main program start - stack pointer

; Main program
    ld sp, $dff0

Here's the "main" label we were coming to.

You may remember that sp is a special register called the Stack Pointer, and that I didn't explain the stack. I'm still not going to because we don't need to use it yet; but suffice to say that the stack takes up some of the available RAM, and we have to tell it which RAM to use. We tell it here to take some memory ending at offset $dff0. The amount it takes varies, but by telling it to be at the end of RAM, we can use memory at the start of RAM and be confident of not overlapping with it.

As a reminder, the SMS has 8KB of RAM, located from hex address $c000 to $dfff, and mirrored at $e000 to $ffff. We don't set sp to $dfff (the very end) for an important reason which I will come to later.

We load the register with a value using the ld <register>,<value> instruction. You can read it as "load stack pointer with $dff0".

Setting up the VDP registers - block transfer

Now we get a technical bit. The VDP (Video Display Processor) is the graphics chip in the SMS. It has a set of registers and some RAM inside it which we control through two ports, $be and $bf. Charles MacDonald's "Sega Master System VDP documentation" is a very good (advanced) document on how it works, but it's a lot to go into for now. Suffice to say that lower down in the program I've put a block of data we can use for setting these registers to suitable initial values:

; VDP initialisation data
.db $04,$80,$00,$81,$ff,$82,$ff,$85,$ff,$86,$ff,$87,$00,$88,$00,$89,$ff,$8a

.db is a WLA DX directive which instructs it to just put the (byte) data you write in the ROM with no modification. .db should be followed by a comma-separated list of values which are evaluated to bytes and stored.

In order to write to the VDP registers, we have to output this data to the VDP control port, which is port $bf. I'm going to do this using one of the Z80's block transfer instructions. These take values stored in certain registers and transfer (copy or output) a block of data according to those values. Here's the code:

    ; Set up VDP registers
    ld hl,VdpData
    ld b,VdpDataEnd-VdpData
    ld c,$bf

otir means "output the b bytes of data starting at the memory location stored in hl to the port specified in c". That's great - we can figure out all of those, as you can see. One trick is that we subtract two labels - "VdpDataEnd-VdpData". Because labels get turned into addresses, the difference between two labels will be the number of bytes between them.

There are other block transfer commands, most notably ldir which can be used for copying from one memory location to another.

Clearing VRAM - VRAM write access, looping, conditional jumps

We don't know what's in the VDP RAM and if we don't clean it up, it will make our screen ugly. (In actual fact it will contain the SEGA logo from the BIOS on a real system.) So let's do that, by setting every byte of it to zero.

    ; Clear VRAM
    ; 1. Set VRAM write address to 0 by outputting $4000 ORed with $0000
    ld a,$00
    out ($bf),a
    ld a,$40
    out ($bf),a
    ; 2. Output 16KB of zeroes
    ld bc, $4000    ; Counter for 16KB of VRAM
        ld a,$00    ; Value to write
        out ($be),a ; Output to VRAM
        dec bc
        ld a,b
        or c
        jp nz,ClearVRAMLoop

Wow, look at that! That's quite a piece of code there, quite daunting really. But it's not that bad, honestly - you'll laugh at something like that in no time. Let's see what's there.

First, we have to communicate with the VDP and tell it that we want to write to VRAM. (VRAM is what we'll call the RAM inside the VDP.) To do this we have to tell it the address we want to write to, and tell it we want to write. Because there's 16KB of VRAM, we'll need 14 bits (214 = 16384 = 16KB) for the address. The last two bits to make it up to a 16-bit (2-byte) number are used to signal what our intentions are. To get the final number we can use an OR calculation - the number $4000 only contains the bits required to tell the VDP we want to write to VRAM, and if we OR it with the address we'll get the final number to send to the VDP. In out case, we want to start at address $0000 so the final number is $4000. (Try it on a calculator which supports hexadecimal.)

We have to output this to port $bf, the VDP control port as I told you a while back. However, we have to consider byte ordering.

Most modern computer systems are based around bytes as a unit of storage. However, they also often want to deal with larger numbers - typically 32-bit or 64-bit on modern computers - so when these numbers are being transferred as multiple bytes, there needs to be some standard to decide in what order to transmit them. On the Z80, and the SMS VDP, the order is that the least significant byte' comes first, so if you have a number like $4000 you will communicate it in the order $00, $40. This is also known as "little-endianness". So that's why in part 1 above we output $00 and then $40.

Why can't we write "out ($bf),$4000"? Because the Z80 doesn't know how to. There are restrictions on how you can handle data. In this case, we can only output one byte at a time, and the data has to come from register a. (There are other possible ways to do it but this is the easiest.)

Now we've set the VDP ready to receive data. We send it data by outputting to port $be, the VDP data port. When it gets it, it will write it to VRAM and then (rather handily) move to the next byte of VRAM, so we can just send it a stream of data bytes and it will write them consecutively to VRAM. So we need to send 16384 zeroes. The way to do this is to start at $4000 (=16384), then output a zero and decrease our counter. Then we'll repeat this until our counter is zero. That's what part 2 is doing.

First, we store $4000 in register pair bc. Then we come across another Z80 instruction we wish we had. We want to go in a loop, decreasing bc by one each time and checking if it's zero - but while there is an instruction to decrease a register pair by one (decrement it), it doesn't have a built-in check if the answer is zero. So we will check it ourselves, using the fast or instruction. This combines the current byte in a with another byte, giving a result with a binary 1 where either of the two inputs had a 1. So it can only give a result of zero if both inputs were zero.

The or instruction also affects the z flag. This is one bit in the f (flag) register, which will be 1 if the last calculation result was zero - but only for certain calculations. (Check the Z80 CPU User Manual documentation to see which flags are affected by each instruction.) We can then use this flag in a conditional jump to keep outputting until the counter gets to zero.

A conditional jump is the same as a regular jump, except that it only jumps if a specified condition is met. These are all based on the f register; the contents of the f register are affected by different instructions in different ways, and some instructions do not affect it at all. So the flag may be looked at some distance from where it was assigned.

Here is the list of available conditions:

ConditionStands forDescription
nzNot ZeroLast calculation result wasn't 0
zZeroLast calculation result was 0
ncNo Carry (overflow)Last calculation result didn't go over the boundary from $ff to $00, or vice versa
cCarryLast calculation result went over the boundary from $ff to $00, or vice versa
poParity OddLast calculation result had an odd number of bits set
peParity EvenLast calculation result had an even number of bits set
pPositiveLast calculation result was between $00 and $7f
mMinus (negative)Last calculation result was between $80 and $ff (signed)

I'll explain the other conditions later, as they are quite confusing.

So, we can output something bc times using a loop like the one shown above. Here's a version with comments describing what's happening more verbosely:

    ; 2. Output 16KB of zeroes
    ld bc, $4000    ; Counter for 16KB of VRAM
        ld a,$00    ; Value to write
        out ($be),a ; Output to VRAM
        dec bc      ; Decrement counter
        ld a,b      ; Get high byte
        or c        ; Combine with low byte
        jp nz,ClearVRAMLoop ; Loop while the result is non-zero

So, this section has set all of VRAM to zero. Now we've got a blank space in which to start putting our data.

Setting Up The Graphics

Graphics on the Master System are built up from four things:

  • Palette
  • Tiles
  • Tilemap
  • Sprite table

To make the background, a combination of the palette, tiles and the tilemap are used. To make the sprites - i.e. the player, enemies, bullets, etc - we use the palette, tiles and the sprite table.

For Hello World, we are using just the background. All of the graphical element are stored in special RAM belonging to the VDP (the graphics chip). To get it there, the CPU communicates with the VDP using ports $bf and $be.


The palette defines which colours we can use. For Hello World, there are only two colours: black and white. Similarly to the VDP initialisation data, we want to store the necessary data in the ROM, and transfer it to the VDP.

.db $00,$3f ; Black, white

We must tell the VDP we want to write to the palette (sometimes called CRAM, Colour RAM). We do this similarly to how we set the VRAM address before we wrote 16KB of zeroes, but this time the address is the palette index (we want to start at 0), and the high two bits must be set, which we achieve by ORing with $c000 (it is $4000 to choose the VRAM write address).

    ; Load palette
    ; 1. Set VRAM write address to CRAM (palette) address 0 (for palette index 0)
    ; by outputting $c000 ORed with $0000
    ld a,$00
    out ($bf),a
    ld a,$c0
    out ($bf),a

Because we have a small amount of data (less than 256 bytes - actually, it's only two bytes) we can use the otir instruction again, to output it to the VDP data port $be:

    ; 2. Output colour data
    ld hl,PaletteData
    ld b,(PaletteDataEnd-PaletteData)
    ld c,$be

We are not setting up the other 14 colours in the background palette, or the sprite palette, because we are not using them.

Read more about the palette


Tiles are the building blocks of the graphics on the Master System. We must define all of the graphics in the form of 8x8 tiles; because we are drawing text, we will define one tile per character, to give us our font. The font data is stored in ROM; it's rather large, so I will not reproduce it all here.

.db $00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00
.db $00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00
.db $00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00

Tiles are loaded into VRAM at address $0000. We set the address as before:

    ; Load tiles (font)
    ; 1. Set VRAM write address to tile index 0
    ; by outputting $4000 ORed with $0000
    ld a,$00
    out ($bf),a
    ld a,$40
    out ($bf),a

Because the tile data is rather large, we can't just use otir to output it, because otir can only count through register b, which is 8 bits so the maximum is 256. Instead we must use a register pair, similarly to how we used one to count through 16KB when blanking VRAM. The difference is, this time we're reading the data from ROM instead of always outputting zeroes.

    ; 2. Output tile data
    ld hl,FontData              ; Location of tile data
    ld bc,FontDataEnd-FontData  ; Counter for number of bytes to write
        ld a,(hl)        ; Get data byte
        out ($be),a      ; Output it
        inc hl           ; Add one to hl so it points to the next data byte
        dec bc           ; Decrement the counter and repeat until it's zero
        ld a,b
        or c
        jp nz,WriteTilesLoop

This is the most complex bit so far, so let's go over it carefully. The first two lines are setting up two register pairs - bc and hl - with some numbers. hl stores the location of the data, and bc the length of the data (in bytes).

Then we read a byte of data into register a. The brackets around (hl) signify indirection - that means, we don't load a with the value of hl (that wouldn't make sense, since they are different numbers of bits anyway), instead we load a with the value at the memory address contained in hl. That means that we will get the tile data we want. Next we output it to the VDP data port, so effectively we have copied it from ROM to a register, and then to VRAM.

Next we increment hl. That means we add one to it. That way, next time we read in, it will get us the next byte. There is no need to increment the VRAM address because the VDP does that automatically.

Finally we decrement the counter and loop if the result is not zero, as before. This means we will output exactly the right number of bytes of tile data.

This is similar to how otir works underneath; the difference is, our version uses a 16-bit counter, and we specify the port number each time instead of using register c to hold it.

Read more about tiles


Finally we output the tilemap. This tells the VDP which tile to put at which location on the screen. While the result is text, the Master System doesn't understand text (e.g. ASCII); we have to tell it the tile numbers. Additionally, it wants some extra information about how to display each tile. So the message is stored in the format the VDP understands, and we simply copy it from ROM to VRAM again. The destination VRAM address for the tilemap is $3800. We set the address again:

    ; Write text to name table
    ; 1. Set VRAM write address to tilemap index 0
    ; by outputting $4000 ORed with $3800+0
    ld a,$00
    out ($bf),a
    ld a,$38|$40
    out ($bf),a

Then we copy the data from ROM to VRAM as before:

    ; 2. Output tilemap data
    ld hl,Message
    ld bc,MessageEnd-Message  ; Counter for number of bytes to write
        ld a,(hl)    ; Get data byte
        out ($be),a
        inc hl       ; Point to next letter
        dec bc
        ld a,b
        or c
        jp nz,WriteTextLoop

Read more about the tilemap

Now the graphics are all set up - all the necessary elements are configured in VRAM and CRAM.


This is a sidebar page

It provides more detailed information but you don't need to read it to follow the tutorial. You might want to come back to it later.


The palette defines which colours we can use. There are actually two palettes - one for the background, and one for the sprites. (The sprite palette can be used by the background too.) Each palette contains 16 entries. Here's the palette window in Meka:

The first sixteen colours are the background palette, the second sixteen are the sprite palette.

Each pixel of each tile is represented by four bits, giving a number between 0 and 15. This number is used to select which colour to use. It's a lot like "Paint By Number":

Each palette entry is one of the 64 possible colours on the Master System:

To pick a colour, you must choose a number between 0 and 3 for each of the red, green and blue colour channels. Then combine them in a byte:

Bit: 7 6 5 4 3 2 1 0
% Unused Blue Green Red

So, for example, if there was a little blue, no green and a lot of red, the colour would be %00010011. Here are all of the colours:

Colour SMS colour index RGB HTML
Hex Dec Binary Red Green Blue
$00 0 %00000000 0 0 0 #000000
$01 1 %00000001 85 0 0 #550000
$02 2 %00000010 170 0 0 #aa0000
$03 3 %00000011 255 0 0 #ff0000
$04 4 %00000100 0 85 0 #005500
$05 5 %00000101 85 85 0 #555500
$06 6 %00000110 170 85 0 #aa5500
$07 7 %00000111 255 85 0 #ff5500
$08 8 %00001000 0 170 0 #00aa00
$09 9 %00001001 85 170 0 #55aa00
$0a 10 %00001010 170 170 0 #aaaa00
$0b 11 %00001011 255 170 0 #ffaa00
$0c 12 %00001100 0 255 0 #00ff00
$0d 13 %00001101 85 255 0 #55ff00
$0e 14 %00001110 170 255 0 #aaff00
$0f 15 %00001111 255 255 0 #ffff00
$10 16 %00010000 0 0 85 #000055
$11 17 %00010001 85 0 85 #550055
$12 18 %00010010 170 0 85 #aa0055
$13 19 %00010011 255 0 85 #ff0055
$14 20 %00010100 0 85 85 #005555
$15 21 %00010101 85 85 85 #555555
$16 22 %00010110 170 85 85 #aa5555
$17 23 %00010111 255 85 85 #ff5555
$18 24 %00011000 0 170 85 #00aa55
$19 25 %00011001 85 170 85 #55aa55
$1a 26 %00011010 170 170 85 #aaaa55
$1b 27 %00011011 255 170 85 #ffaa55
$1c 28 %00011100 0 255 85 #00ff55
$1d 29 %00011101 85 255 85 #55ff55
$1e 30 %00011110 170 255 85 #aaff55
$1f 31 %00011111 255 255 85 #ffff55
$20 32 %00100000 0 0 170 #0000aa
$21 33 %00100001 85 0 170 #5500aa
$22 34 %00100010 170 0 170 #aa00aa
$23 35 %00100011 255 0 170 #ff00aa
$24 36 %00100100 0 85 170 #0055aa
$25 37 %00100101 85 85 170 #5555aa
$26 38 %00100110 170 85 170 #aa55aa
$27 39 %00100111 255 85 170 #ff55aa
$28 40 %00101000 0 170 170 #00aaaa
$29 41 %00101001 85 170 170 #55aaaa
$2a 42 %00101010 170 170 170 #aaaaaa
$2b 43 %00101011 255 170 170 #ffaaaa
$2c 44 %00101100 0 255 170 #00ffaa
$2d 45 %00101101 85 255 170 #55ffaa
$2e 46 %00101110 170 255 170 #aaffaa
$2f 47 %00101111 255 255 170 #ffffaa
$30 48 %00110000 0 0 255 #0000ff
$31 49 %00110001 85 0 255 #5500ff
$32 50 %00110010 170 0 255 #aa00ff
$33 51 %00110011 255 0 255 #ff00ff
$34 52 %00110100 0 85 255 #0055ff
$35 53 %00110101 85 85 255 #5555ff
$36 54 %00110110 170 85 255 #aa55ff
$37 55 %00110111 255 85 255 #ff55ff
$38 56 %00111000 0 170 255 #00aaff
$39 57 %00111001 85 170 255 #55aaff
$3a 58 %00111010 170 170 255 #aaaaff
$3b 59 %00111011 255 170 255 #ffaaff
$3c 60 %00111100 0 255 255 #00ffff
$3d 61 %00111101 85 255 255 #55ffff
$3e 62 %00111110 170 255 255 #aaffff
$3f 63 %00111111 255 255 255 #ffffff

So there are 64 possible colours on the Master System, but you have to select 16 of them (for each palette) - you can't use more without using special tricks.

Writing to the palette

To write to the palette, set the address using the VDP control port, with the two high bits set (i.e. address ORed with $c000). If you set an address higher than 32, the address wraps.


Palette index 0 is special. For sprites, it is always transparent - the colour you choose is never used (for sprites, it can be used for other things). For background tiles, it is drawn but when the tile is set to be drawn in front of sprites (see Tilemap), it is drawn behind sprites while the other palette entries are drawn in front, so it again has a sort of transparency.

Game Gear differences

The Game Gear palette is exactly the same as the Master System, except:

  • Each colour channel has four bits, not two, giving 212 = 4096 possible colours.
  • Each palette entry then takes up two bytes
    • So there are 64 bytes of colour RAM instead of 32
    • Each entry takes up two consecutive bytes, in the format:
Byte: 1 0
Bit: 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
% Unused Blue Green Red


This is a sidebar page

It provides more detailed information but you don't need to read it to follow the tutorial. You might want to come back to it later.

Data format

All graphics on the Master System are built up from 8×8 pixel tiles.

Each pixel is a palette index from 0 to 15, i.e. 4 bits.

So a whole tile is a stack of bits:

The tile data is in a planar format, split by tile row. That means that the first byte contains the least significant bit, bit 0, of each pixel in the top row of the tile. The second byte contains bit 1 of each pixel, the third bit 2, and the fourth bit 3. Thus the top eight pixels are represented by the first four bytes of data, split by "bitplane". The process is repeated for consecutive rows of the tile, producing 32 bytes total.

Data conversion

In order to convert images to this format, it is sensible to pre-process it before inclusion in the ROM. You can make your own tools to do this, or (if you use Windows) you can use BMP2Tile. This will take a range of common image formats, such as PNG, and convert it to data in a format the SMS understands.

How many tiles can we have?

In the most typical VRAM layout, 14KB of the total 16KB is available for tiles; that is enough space for 448 tiles. (With some tricks you can get space for a few more.)

Don't forget the palette

All of your tile data must be generated with attention to the palette that will be applied to it. The tile data does not contain the palette information. Try to use a program that offers close control over the palette, and won't re-order it.


Tile data takes up a relatively large amount of space - after all, to fill 14KB of VRAM will take 14KB of data. However, it is also typically quite compressible. There are a few compression schemes available through BMP2Tile, or you can make your own.

Game Gear differences

There aren't any - it works exactly the same way.


This is a sidebar page

It provides more detailed information but you don't need to read it to follow the tutorial. You might want to come back to it later.

The tilemap represents the screen-filling background of the graphics on the Master System. It is slightly larger than the screen, allowing the viewpoint to smoothly scroll while any updates to the tilemap happen in the off-screen parts.

Consider the "virtual screen", which is 256×224 pixels. This is built up from a grid of 32×28 8×8 tiles. Each entry in this 32×28 grid is defined by an entry in the tilemap, sometimes called the name table. Each entry determines:

  • Which tile to display
  • Which palette to use
  • Flags
    • to make the tile to be drawn flipped horizontally (like a mirror image)
    • to make the tile to be drawn flipped vertically (like a reflection in a mirrored floor)
    • to make the tile to be drawn in front of sprites

Data format

The data takes up a total of 13 bits, stored in two bytes:

DataUnusedPriorityPaletteVertical flipHorizontal flipTile number

The data is stored in VRAM in little-endian format, i.e. the last eight bits are stored first.

The tilemap is usually stored in VRAM at location $3800, and takes up 1792 bytes (32×28×2 bytes).



Flipping allows symmetric objects to be created with fewer tiles, thereby allowing greater variety in the graphics. In the above example, some tiles were flipped horizontally, shown by red arrows on the upper portion.


When a tile has its priority bit set, all pixels with index greater than 0 will be drawn on top of sprites. You must therefore choose a single colour in palette position 0 to be the background colour for such tiles, and they will have a "blank" background. Careful use of tile priority can make the graphics seem more multi-layered.

Notice how the monitor sprite is hidden behind the tree, and Sonic is hidden by the grass. The foreground tiles are highlighted by a drop shadow for explanatory purposes.

Not enough tiles to fill the tilemap

It would take nearly 900 tiles to fill the whole tilemap. It would take at least 744 to fill the smallest possible screen (on the Master System). There is only room in VRAM for roughly 450 tiles. Therefore, graphics have to be built up from tiles using some repetition of the same tile in more than one location in the tilemap.

This lack of a "bitmapped" display means you can't do some types of graphics, and others are very difficult to achieve. It is therefore best to design your graphics around this limitation - drawing lines and unaligned text are very hard, but block-based graphics are very easy.

Automatic conversion

BMP2Tile will generate the tilemap data to match the input image, although it is up to you to handle cases where the image is not the same size as the tilemap itself. By using tile flipping, it can also reduce the number of tiles needed.

Game Gear differences

There (almost) aren't any.

The "virtual screen" is the same size, but the displayed portion is of course the smaller 160×144 GG screen rather than the (typically) 248×192 SMS screen. That means that there are enough tiles to fill the entire screen and make a pseudo-bitmapped display (using 360 tiles, leaving plenty for sprites). It's still hard work to work with this mode, though, as the data is not laid out in a regular "bitmapped" fashion, but fullscreen graphics are achievable.


You might think we've finished, now that the graphics are set up, because that's all there is in Hello World. However, we still need to do two things:

  • Turn on the screen
    It's been turned off since we started - the VDP initialisation data turned it off.
  • Stop the program
    If we don't stop it, it will keep going and bad things will happen.

Turning on the screen - VDP register 1

The VDP registers control all aspects of the VDP's operation. Register 1 includes turning the screen on and off. In fact, it controls lots of things:

BitFunctionIf setIf reset
7UnusedDoesn't matterDoesn't matter
6Enable displayDisplay onDisplay off
5VBlank interruptsInterrupts generated on VBlankVBlank gives no interrupts
428 row displayScreen is taller than normal (e.g. Codemasters games)Screen is normal size
330 row displayScreen is even taller - but only on PAL systemsScreen is normal size
2Mega Drive mode 5On a Mega Drive, a Mega Drive video mode is selectedNormal Master System video mode
1Doubled spritesEach sprite defined will also show the next tile under itNormal sprites
0Zoomed spritesSprites are stretched to 16x16 pixelsNormal sprites

Charles MacDonald's "Sega Master System VDP documentation" describes all of the registers in very good detail. Anyway, there is far too much information here for us to remember every time, so it makes sense for us to add comments to make it clear what we're doing. We don't want to enable any of these features except for "Enable display".

To write to a VDP register, we use the VDP control port. This time, the high two bits must be %10. The rest is somewhat different as we aren't setting an address:

%10IgnoredRegister numberRegister value

The whole thing is done in 16 bits, so we don't need to use the VDP data port. We do need to write it in little-endian format again, though. So, to set register 1 to value %01000000, we do this:

    ; Turn screen on
    ld a,%01000000
;          ||||||`- Zoomed sprites -> 16x16 pixels
;          |||||`-- Doubled sprites -> 2 tiles per sprite, 8x16
;          ||||`--- Mega Drive mode 5 enable
;          |||`---- 30 row/240 line mode
;          ||`----- 28 row/224 line mode
;          |`------ VBlank interrupts
;          `------- Enable display
    out ($bf),a
    ld a,$81
    out ($bf),a

I suggest you use a comment like this each time you write to a VDP register because I think it's too hard to remember what all the bits do.

Time to stop - an infinite loop

OK, now our code is almost finished. We've done everything we wanted to do; but there's one more thing we have to do. The Z80 will execute all the code we've written, and then when it gets to the end it will keep on going and going forever, never stopping. We don't want that, we want it to stop; so what we'll do is put it in an infinite loop. Normally, infinite loops are a bad thing because they stop your program ever continuing; you'll probably create a few by accident and have to figure out why they're happening and fix the bug causing them. But here, we want one. We want the processor to keep doing the same thing (nothing) over and over again forever, which we will achieve by making a jump point to itself:

    ; Infinite loop to stop program
         jp Loop

When the Z80 gets to the instruction jp Loop it will jump to the label Loop:. When it gets there, it will find the instruction jp Loop and will jump to the label Loop:. When it gets there, it will find the instruction jp Loop and will jump to the label Loop:... and so on forever.

Now we've added that infinite loop, we know the program will never get past it; so here is a safe place to put data. Why does it matter where you put data? Because you have to make sure that the data is never accidentally interpreted as code. The Z80 can't tell if what it's looking at is sensible program code or data, it assumes everything is program code. So you have to make sure that the place you insert data is outside the program code and that execution will never accidentally get to your data. For a simple program like this one (with no "functions", just one code block) we put it after the program. We could equally have put it before the "main:" label, and at the start of the program execution would have jumped straight past it to that label.

I can put the data in any order I like because it doesn't matter - it's not necessary to put it in the order it's used. In larger projects you may choose to order the data logically to make it easier to navigate, and maybe split the data up according to what it's for.

Anyway, you may have noticed that the program is now finished. Press F10 to run it again.

Enhancing Our Program

There are a few things we can do with the program to make it a bit nicer.

Defines instead of magic numbers

In programming, there is a general rule that you shouldn't have "magic numbers" - numbers that do something special, which are just included in the middle of the code. Instead, you should use your programming language to give them a name so people can read your code and see the meaning, not the value:

; SMS defines
.define VDPControl $bf
.define VDPData $be
.define VRAMWrite $4000
.define CRAMWrite $c000

.define is a WLA DX directive that defines a name for a value. Now when we use these names (like "VDPControl") it will act as if we had used the value ("$bf") instead. So we can use these instead of the numbers, for example:

    ; Set up VDP registers
    ld hl,VDPInitData
    ld b,VDPInitDataEnd-VDPInitData
    ld c,VDPControl

Helper functions

There were a few tasks we did several times - for example, setting the VDP address, and copying data to the VDP. We can make functions to do this and use the functions instead of duplicating the code.

On the Z80, functions are mainly implemented using the call and ret instructions. These make use of the stack. Let's explain the stack first.

The stack - push, pop, call, ret

The usual description of the stack is that it's like a stack of playing cards, with the magical limitation that we can only take the top card from the stack, or put another one on there. The important thing is, the cards come off in the reverse order they're put on, so it's important not to get them mixed up.

For the Z80, the stack is a section of memory containing 16-bit words, not just 8-bit bytes. We can push a register pair onto the stack and the Z80 will store it in that section of memory. We can then pop it back into any register pair, although it usually only makes sense to pop it into the one you took it from. It allows you to do something like this:

ld hl,$1234
push hl
     ld hl,$5678
     ; Do something with hl (which contains $5678)
     ; ...
pop hl
; hl now contains $1234 again effect, "saving" the contents of that register so you can do something else with it, then restoring it to its previous state. The other main use for the stack is for functions. There is a Z80 instruction "call" which is exactly like jp, in that it makes execution jump to a certain point instead of continuing on linearly; except that first, it pushes the pc (Program Counter) register pair, which by now contains the address of the next instruction after the call, onto the stack. Then, some time after jumping to the given address, if it encounters a ret instruction it will pop the stored pc address and start executing code from there, in effect returning to the point it was at before:

Somewhere in the program, usually not in the normal flow of the program:

    inc a    ; Do something
    ret      ; return

In the normal flow:

    ld a,$00
    call MyFunction
    ; a now contains $01
    call MyFunction
    ; a now contains $02

Again, remember you have to be careful with the order you push/pop, especially when mixed in with calls and returns. This:

call MyFunction
    ld hl,$1234
    push hl
    ret        ; Error!

will not work, because the ret will take the last thing pushed, which is $1234, and execution will continue at $1234! Except in very few circumstances, that's not something you'll want to do, because $1234 might be some data, or some completely unrelated code, or even halfway through an instruction!

So, to conclude, the stack is an area of memory that we can push and pop registers to/from; it's also used to call functions and ret from them; and we have to be careful to balance our stack usage to avoid things going wrong.

Helper 1: set VDP address

To set the VDP address, to either VRAM or CRAM, we want to output it to the VDM control register, in little-endian order.

; Sets the VDP address
; Parameters: hl = address
; Affects: a
    ld a,l
    out (VDPControl),a
    ld a,h
    out (VDPControl),a

This is invoked using the code:

    ; 1. Set VRAM write address to $0000
    ld hl,$0000 | VRAMWrite
    call SetVDPAddress

Callers need to OR the address with $4000 or $c000 depending on whether they are setting a VRAM write address or CRAM write address. "VRAMWrite" and "CRAMWrite" were .defined earlier, to help make it clearer which one was being used, as shown above.

Notice the comments clearly state what the function does, what parameters it takes, and what registers it affects. That way, people using it can be careful not to leave anything important in those registers. An alternative would be to push/pop any registers used to avoid losing their values:

; Sets the VDP address
; Parameters: hl = address
    push af
        ld a,l
        out (VDPControl),a
        ld a,h
        out (VDPControl),a
    pop af

I've used indentation to help me be sure that my push and pop are balanced, and to show what's protected by them. However, if the calling code doesn't care about register a, this protection is unnecessary and will slow the program down.

Helper 2: copy data to VDP

; Copies data to the VDP
; Parameters: hl = data address, bc = data length
; Affects: a, hl, bc
-:  ld a,(hl)    ; Get data byte
    out (VDPData),a
    inc hl       ; Point to next letter
    dec bc
    ld a,b
    or c
    jr nz,-

This is exactly what we had before, except we have bundled it into a function, used an anonymous label and jr, both of which I will explain in a moment.

This function can be invoked like this:

    ld hl,PaletteData
    ld bc,PaletteDataEnd-PaletteData
    call CopyToVDP

Anonymous labels

In the original version, we had many labels which were only really used for looping. We had to give each one a different name, so WLA DX could tell them apart; and once you have a hundred loops in your program, thinking of new names gets difficult. Since they aren't particularly important points in the code, we don't need names that last throughout the entire program; we want to use temporary names. One way of doing this in WLA DX is to use anonymous labels. These fall into three categories:

Type of labelLooks likeUsed for
ForwardsOne or more "+" signsPlaces you want to jump forwards to
BackwardsOne or more "-" signsPlaces you want to jump backwards to
Both-waysTwo underscores: "__"A place you want to jump forwards or backwards to, using "_f" to jump forwards and "_b" to jump backwards

The special thing about anonymous labels is that we can re-use them. If some code wants to jump to label "-", WLA DX will find the nearest version of that label before the jump, and use that. So for our loops we can just use "-" instead of a full label.

Jump Relative - jr

Before, we only used the jp instruction to perform a jump. jr works (almost) exactly the same, except it is a relative jump. This means that in the final code, it is stored as a number of bytes to jump forwards or backwards, whereas jp is stored as the actual address to jump to. This has advantages and disadvantages:

  • It is one byte smaller
  • It can be faster to execute (because it is smaller)
  • But it can be slower to execute (because the address has to be calculated)

I consider its main advantage to be to tell you (when reading the code) which jumps are to something far away (i.e. to something that is distant from the previous code, like a different section of code) and which are local (within the section of code). So I always use jr for loops, for example, to help show that it is a jump as part of the current code block.

Writing the text in our file

Before, the tilemap data to draw the test was just a blob of data in the ROM. Wouldn't it be nicer to store it as the text? It's make it a lot easier to know what it said when looking at the file, and much easier to change too. To do that we need several things to happen.

Convert ASCII text to tile numbers - .asciitable, .asc

Using the .asciitable directive, we can tell WLA DX how to convert ASCII text. Our font includes everything from space (at tile number 0) to '~' (at tile number $7e), in the normal ASCII order (except for a few special characters like '£'). We can tell WLA DX about this as so:

map " " to "~" = 0

Then we can use the .asc directive to store text, and WLA DX will convert it so the bytes match the tile numbers:

.asc "Hello world!"

Using a sentinel (terminating) value to signal the end

Previously, we used labels to count the size of the tilemap data. There is another way, which is to make sure there's a special byte at the end of the text, which does not correspond to any letter. When this is encountered, the code can know that it is time to stop. This is unsuitable for general data, where any byte is valid, but suitable for text, where not every byte corresponds to a character.

Since the font uses tile numbers 0 to $7e, I will use value $ff as my "terminator":

.asc "Hello world!"
.db $ff

Output full tilemap data - tilemap format, cp, flags, xor a optimisation

The tilemap data does not just consist of one-byte tile numbers. For a start, it is possible to have more than 256 tiles. Additionally, there are "flag" bits, making one entry into 16 bits:

UsageUnusedHigh priorityUse sprite paletteFlip verticallyFlip horizontallyTile number

Since all our tiles are below number 256, and we don't want to use any of the advanced functionality, we just want to have the first eight bits at 0 and the rest to be the tile number.

When writing tile data to the VDP, we write it in little-endian order as before. That means we will write the tile number first, and then a zero.

So the program flow becomes:

  1. Read byte
  2. Is it $ff? If so, exit
  3. Output it to the VDP
  4. Output zero to the VDP
  5. Loop back to 1

Here's the code:

-:  ld a,(hl)
    cp $ff
    jr z,+
    out (VDPData),a
    xor a
    out (VDPData),a
    inc hl
    jr -

Notice that to exit, we use the "+" anonymous label; to loop, we use the "-" anonymous label (and jr).

To check if the value is $ff, we use the cp compare instruction. This sets some flags based on the comparison between register a and the parameter to the instruction (which could be a register or a literal number). Here is a simplified version of the flag effect:

FlagSet ifRelevant conditions
za = valuez, nz
ca < valuec, nc

Internally, it works by performing a subtraction, recording the flag effect, but throwing away the result. We mentioned the flags before; we are again using the z (zero) flag. If we subtract $ff from a value and the answer is zero, then the value must be $ff. Therefore our conditional jump will be taken, and the program code will continue with whatever comes after the + label.

If it's not, then it continues on to output the value to the VDP data port. Because the cp instruction does not keep the result of the subtraction, register a still contains the value that was read from ROM.

Next we want to output a zero. We could do

    ld a,$00

but it is faster, and takes up less ROM space, to do

    xor a

This instruction performs an XOR between register a and the register or number given as a parameter, and store the result in register a. XOR will give a binary 0 for each bit which is the same in both register a and the parameter, and a binary 1 where they are different from each other. Since the parameter is also register a, it is evaluating a XOR a, which will always give a result of 0. Or, in short, xor a is a fast and small way to set a to zero.

Finally, we increment hl to move on to the next tile number and repeat until $ff is found.

External data files - .incbin

You may have noticed that in the original Hello World, more than half of the file was taken up by the font data. We don't want to read and modify this data anyway (it's pre-generated automatically, not created by hand), so we ought to move it to an external file. There are two main ways to include external data:

.include "filename"

This acts as if the mentioned file's contents had been copied and pasted into the current file. It's a lot like #include in C/C++. The file must therefore be text that WLA DX understands.

.incbin "filename"

This includes the mentioned file as raw data. Each byte of the file will be transferred as one byte to the resulting ROM. We'll use this for our font data. I converted the data to raw binary data and saved it as "font.bin", then changed the code to look like this:

.incbin "font.bin" fsize FontDataSize

The "fsize" parameter tells WLA DX that I want it to set up a symbol called "FontDataSize" that corresponds to the size of the file. I can then use this instead of "FontDataEnd-FontData" any time I want the size of the data in bytes.

Final version

This looks exactly the same as the first version, but has all the changes mentioned above.