Maxim's World of Stuff - SMS/Meka stuff

Getting started with SMS programming - Lesson 1

1.0 Introduction

This guide will (hopefully) get you going with SMS programming, from setting up your IDE and assembler, to debugging, to getting a working program.

1.1 Downloading stuff

You will need the following stuff, and it'll make life easier if you try to get the ones I mention - unless something newer and better is available!

1.1.1 A text editor

I suggest ConTEXT. It's a text editor with limited project support, customisable buttons, and most importantly custom syntax highlighting. Download and install it somewhere on your hard disk.

I've made a syntax highlighter file here. Put it in the \Highlighters\ folder where you installed ConTEXT. This tells ConTEXT how Z80 assembler code should be highlighted to make it easier to work with.

1.1.2 An assembler

I suggest WLA DX. It's very full-featured, the only downside being the fact that it's available only as C source. So here are some compiled binaries. Unzip it somewhere on your hard disk and remember where.

1.1.3 A debugging emulator

You don't need a debugging one but it will surely help. At the time of writing this tutorial, Meka did not have a usable debugger in its Windows build, but it now does. Therefore, I used eSMS (which has now become Emukon) for this tutorial. You are welcome to use either and I am likely to use Meka in any future lessons.

1.1.4 A disassembler

You'll need one if you want to debug (unless you find a debugging emulator which supports some kind of assembler file). I suggest you get Z80Dasm, because I like how it formats the output, but with a disassembler there isn't really any difference. Extract it (just the .exe for Z80Dasm) somewhere sensible.

1.1.5 Documentation

OK, here's what you really need. I suggest you get Charles MacDonald's docs and Richard Talbot-Watkin's doc for SMS reference, and the official Z80 User Manual for Z80 reference. There are a lot of Z80 docs around, but I like this one for its clarity - the others tend to be better for people who already know how to code and just need reminders of technical details. You will also want the WLA DX manual, which I included in the above WLA DX download, or you can get it from the WLA DX site.

Extract/save them all somewhere handy. You'll need the Acrobat Reader for the Z80 User Manual.

1.1.6 Other stuff

You may find it useful to use additional programs, depending on your project. I like to use Paint Shop Pro for image editing and frhed for hex editing, for example. I am lucky enough to have an SMSReader and devcart so I have the software for that too, of course. (Getting homebrew games working on real hardware can be a real trial...) But you won't need them for a while yet.

1.2 Setting it up

1.2.1 F9 = Compile

You already installed ConTEXT and my highlighter. Now we're going to make F9 our "compile" key :)

Now, there is a slight difficulty because of the way WLA DX works. It works in two stages, "compiling" and "linking", designed for systems set up for compiling programs with a "make" utility (like Linux). For us normal types, we need to get it going manually :( so I've written a batch file to do it. I'm sure there's a more elegant way to do it, and it will only work with a single source file so for a really advanced project I recommend you figure that stuff out.

The batch file is included with the WLA DX download above. You'll have to edit it - instructions are inside it (right-click and choose "Edit" to see).

Then open ConTEXT and click on Options, Environment Options, Execute Keys. Click Add and enter "asm" in the box. Then click on F9 in the list. Enter the path to the magical batch file in the "Execute" box, "%p" in the "Start in" box, "%n" in the "Parameters" box, "*:%n:%l: *" (asterisk, colon, percent, lowercase N, colon, percent, lowercase L, colon, space, asterisk) in the "Compiler output parser rule" box, and check the "Capture console output" and "Scroll console output to the last line" checkboxes, as shown here:


Note: in the first version of this guide the "Use short DOS names" box was checked because long names crashed WLA DX. This bug is now fixed so if you re-download it above you can uncheck it and get long names in there.

We'll set up some more stuff later. For now, let's see if we're compiling properly. But first, a long and boring rant.

1.2.2 RANT

I'm writing this guide to try and help you get going with programming on the SMS. My motivation for this is to encourage others to give it a go, so they can produce new and exciting projects. Now, as a newbie you have to realise that the first few things you produce are not going to be much good - I know mine weren't. So do yourself a favour, and the rest of the world, and don't release them.

Why? Because of rom collectors. If you make a tiny, tiny modification to one of the demo programs I'm (hopefully) going to teach you to make, then release it on your webpage, and someone downloads it, then runs GoodSMS (for example), then sends it off to Cowering as a "new, unknown ROM!!!!11!", then he includes it in the next version of GoodSMS, then EVERY anal rom collector in the world goes crazy because he is "missing" this "ROM", and they all search it out and find it, and the WORLD HAS GONE CRAZY! So please, keep it to yourself unless it' s actually worth sharing with the world.

1.2.3 Compile test

OK, no more rant. Download this, extract it somewhere and open it in ConTEXT. It should be coloured automatically according to what is written, in dark blue, light blue, black and red. Press F9 and at the bottom of your screen there should appear an Output Console. Resize it to show 3 or 4 lines - they should read something like:
Free space at $7ff8-$7ff9.
Free space at $7ffc-$7ffe.
31743 unused bytes of total 32768.
> Execution finished.
Congratulations, you've just made your first homebrew demo! Go look in the folder where you put the source file and you should find a file called output.sms. Open it in your favourite emulator and marvel at its amazingness!

If it doesn't compile: oh dear. Did you edit compile.bat? Did you press the right key? What error messages did you get? Try contacting me if you really can't figure it out.

1.2.4 More action keys

You saw how we set up F9 to compile the program using WLA DX? Well, that was actually the hardest one to do. Now we're going to add a couple more, for disassembling (I'll explain why later) and running the game in the emulator.

Fill in the details as follows:

F10 - disassemble F11 - run
Execute <path>\dasmz80.exe <path>\esms.exe
Start in %p  
Parameters output.sms > dis.asm "%poutput.sms"
Window Minimized Normal
Hint Disassemble Run in eSMS
Save Nothing Nothing
If any of these don't work on your system, please let me know! They work perfectly on my Windows XP system.

1.3 Time to start learning

1.3.1 Registers, memory and ports

The Z80 can address only 64KB of memory (ROM and RAM usually). In the SMS, this is divided with the first 48KB of address space (from $0000 to $bfff in hex) mapped to the cartridge and so usually ROM; and the last 16KB mapped to the 8KB of RAM, so the contents of $c000 to $dfff are mirrored at $e000 to $ffff. For reasons of clarity it is therefore best to never access the second mirror (except for paging, which we'll come to much later).

However, while the CPU can access all this ROM and RAM, it can't do very much useful with it. To do that, it needs to transfer what's there into a very small amount of RAM inside the processor itself. This RAM is split up into several registers, called a, b, c, d, e, f, h, i, l, r, ix, iy, pc, sp. Some of these have special purposes and some are for general use. Some can be joined together when we want to deal with 16-bit values. Here's a quick guide:

RegisterName/use
aAccumulator
This is the main 8-bit arithmetic register. A lot of instructions can only operate on a.
fFlag
The various bits hold flags which signal what happened during recent operations, allowing conditional branching.
b, c, d, e, h, lGeneral purpose registers
We can do what we like with these, because (usually) they don't matter. They can be accessed individually as 8-bit registers or in pairs as bc, de and hl. hl is a "super-register" a bit like a because it can do more maths.
ix, iyIndex registers
These two 16-bit registers are useful as indices to something else, because you can work with what they point near to without changing the value in them. And they're useable as 16-bit registers too.
pcProgram Counter
This is where the Z80 keeps track of where in the program it's got to.
spStack Pointer
Keeps track of the stack which we'll come to soon.
iInterrupt Page Address
We don't bother with this one, because it has no real use on the SMS.
rRefresh
This is to do with keeping memory updated, and isn't normally useful.
a', f', b', c', d' e', h', l'Alternate registers
It is possible to swap all of these at once with the regular versions.

Basically, to work with anything stored in ROM or RAM you have to transfer it to a register (or register pair), do your work on that copy, then (if relevant, and to RAM) write the changed value back.

An important thing to note is byte ordering. When using two bytes together to signify a 16-bit value (called a word), the low-order byte is stored or written before the high-order byte. For example, $a1b2 is stored as b2 a1. To output $c3d4 (something I'll explain later) I have to output $d4 and then $c3.

This leaves ports. The Z80 can input/output a byte from/to any of 256 ports, which is useful for sending and receiving data from other parts of the system. On the SMS, these are the relevant ports - note that almost all of them are mirrored in other places but it gets confusing if you use the mirrors so I won't go into it.

Port Input Output
$3e Memory control
$3f I/O port control
$dc, $dd Controllers Controllers
$7e V counter
$7f H counter PSG
$be VDP (data)
$bf VDP (control)

The ones I've coloured pink are ones you don't normally have to worry about. Others you don't have to worry about if you don't want to use them - controllers, the PSG and the V Counter especially aren't needed to get a program running.

1.3.2 WLA DX banking setup

Banking is what allows you to access more than 48KB of game ROM when that's all Z80 is set up to access. I won't go into the details (yet) because you're unlikely to need even 32KB for a homebrew program. But when you do, you'll be glad of this.

You can consult WLA DX's documentation for more information on this but basically, until you need to do banking, just use this at the very start of your source file:

;==============================================================
; WLA-DX banking setup
;==============================================================
.memorymap
defaultslot 0
slotsize $8000
slot 0 $0000
.endme

.rombankmap
bankstotal 1
banksize $8000
banks 1
.endro
You'll notice this is what's at the start of the Hello World.asm file. It basically says we want one 32KB chunk of ROM and where it goes in the memory map.

1.3.3 SDSC tag

;==============================================================
; SDSC tag and SMS rom header
;==============================================================
.sdsctag 1.10,"Hello World!","SMS programming tutorial program (bugfixed)","Maxim"
This is one of the niceties that you ought to add to your homebrew software. Kindly added to WLA DX after being invented on the SMS Power development forum, it allows you to embed useful information about your program - its name, version, date, author and notes - into the resulting ROM image. It's another WLA directive - you write the version, name, notes and author name and it fills in the date for you.

By adding this tag you will also prompt WLA to insert a valid SMS rom header. This is something found in (almost) all SMS roms which acts to verify that the cartridge is correct when played on an original system. It's not absolutely necessary, but you do want your program to work on a real SMS, don't you?

You can add this anywhere in your source. Putting it at the start reminds you that it's there and that you ought to update it, though.

1.3.4 Boot section: comments, interrupts, labels, jumping

.bank 0 slot 0
.org $0000
;==============================================================
; Boot section
;==============================================================
    di              ; disable interrupts
    im 1            ; Interrupt mode 1
    jp main         ; jump to main program
The first bit:
.bank 0 slot 0
is actually to do with banking - we're telling WLA DX that the bit of code we're about to write belongs in bank 0. There is in fact only one bank (bank 0) anyway, but we have to tell WLA DX nevertheless.
.org $0000
Then we tell WLA DX whereabouts in the bank to put the code - at zero. The $ tells WLA DX that the number is hexadecimal - we don't need to write four zeroes, it's just a style thing, to show that it's a 16-bit number.
;==============================================================
; Boot section
;==============================================================
Next, I've put some words telling me what this code is doing. WLA DX doesn't understand English (or French, or Chinese...), only assembler so if it tried to process these comments it would get confused and give an error. So, I tell it that what's written can be ignored by putting a semicolon ;. This tells WLA DX that whatever else is on this line, up until the end of the line, should be ignored.

It is very important to add lots of comments to your code. You might understand it as you write it but next week you'll have forgotten and it might take you hours to figure out what you did. So write it down. Comments are also a useful way to temporarily disable some code. If it's marked as a comment then it won't be compiled.

There are several ways to mark comments in WLA DX:

; Comments everything until the end of the line
* When it's the very first thing on a line, acts like ;
/*...*/ Everything between the two is commented, even on multiple lines
.endasm ... .asm These WLA directives stop and start assembly respectively, so they act like /*...*/; however, .endasm ... .asm can be nested and /*...*/ cannot.

    di              ; disable interrupts
Now we get to the first actual Z80 instruction. This means Disable Interrupts. Interrupts are things which cause code execution to jump to somewhere else ("interrupting" the program flow) to handle something, and we don't want that to happen until we're ready.
    im 1            ; Interrupt mode 1
Next is im 1. This sets the Interrupt Mode of the Z80. For various reasons, the only one it makes sense to use is mode 1. So that's what we set it to.

Now we've got the most important two things done (to stop our program flow being affected) we can get on with the coding. For technical reasons, the start of the ROM is important and reserved for certain things so our "regular" code should be somewhere else. So we want to "jump" to where that is.

    jp main         ; jump to main program
Now we've come across a label. This is something found in virtually every assembler which allows us to do "jumping" (and more) easily. Basically, when running on the SMS, the program can't say "jump to the main section", it has to tell the Z80 exactly where to jump to - it has to give it a memory address. If we had to keep track of all the memory addresses by hand it'd be a nightmare, so instead we get the assembler to do it for us. We write a label in the form
Labelname:
before something (an instruction, or some data) and then whenever we write Labelname in the source, it will be replaced by the memory address that "something" is located at. main is the label I've given to the main program code block later in the file, so to jump to there, I just say jp main.

1.3.5 Pause button handler

.org $0066 ;============================================================== ; Pause button handler ;============================================================== ; Do nothing retn
Remember how I turned off interrupts just now? Well, there are some interrupts you can't turn off, officially called Non-Maskable Interrupts (NMIs). On the SMS they're not too bad, they're just used for the pause button.

Whenever the pause button is pressed, the code execution will stop whatever it's doing and jump to offset $0066. It will then execute whatever's there until it gets to a retn instruction (return from NMI), when it will go back to what it was doing before it was pressed. If we don't want to do any special handling of the pause button, we should therefore put this command straight away at $0066. The .org $0066 directive tells WLA that this bit has to be at $0066.

1.3.6 Main program start - stack pointer

;==============================================================
; Main program
;==============================================================
main:
    ld sp, $dff0
You may remember that sp is a special register called the Stack Pointer, and that I didn't explain the stack. I'm still not going to because we don't need to use it yet; but suffice to say that the stack takes up some of the available RAM, and we have to tell it which RAM to use. We tell it here to take some memory ending at offset $dff0. The amount it takes varies, but by telling it to be at the end of RAM, we can use memory at the start of RAM and be confident of not overlapping with it.

As a reminder, the SMS has 8KB of RAM, located from hex address $c000 to $dfff, and mirrored at $e000 to $ffff. We don't set sp to $dfff (the very end) for an important reason which I will come to later.

We load the register with a value using the ld <register>,<value> instruction. You can read it as "load stack pointer with $dff0".

1.3.7 Setting up the VDP registers - block transfer

Now we get a technical bit. The VDP (Video Display Processor) is the graphics chip in the SMS. It has a set of registers and some RAM inside it which we control through two ports, $be and $bf. Charles MacDonald's "Sega Master System VDP documentation" is a very good (advanced) document on how it works, but it's a lot to go into for now. Suffice to say that lower down in the program I've put a block of data we can use for setting these registers to suitable initial values:
; VDP initialisation data
VdpData:
.db $04,$80,$84,$81,$ff,$82,$ff,$85,$ff,$86,$ff,$87,$00,$88,$00,$89,$ff,$8a
VdpDataEnd:
.db is a WLA DX directive which instructs it to just put the data you write in the ROM with no modification. There are a group of them actually, depending on what you want to define and how; .db should be followed by a comma- separated list of values which are evaluated to bytes and stored. .dw stores words (with the correct byte ordering). Check the WLA DX documentation for more information on the more advanced ones.

I have to output this data to the VDP. I'm going to do this using one of the Z80's block transfer instructions. These take values stored in certain registers and transfer (copy or output) a block of data according to those values. Here's the code:

    ;==============================================================
    ; Set up VDP registers
    ;==============================================================
    ld hl,VdpData
    ld b,VdpDataEnd-VdpData
    ld c,$bf
    otir
otir means "output the b bytes of data starting at the memory location stored in hl to the port specified in c". That's great - we can figure out all of those, as you can see.

There are other block transfer commands, most notably ldir which can be used for copying from one memory location to another.

1.3.8 Clearing VRAM - VRAM write access, looping, conditional jumps

We don't know what's in the VDP RAM and if we don't clean it up, it will make our screen ugly. (In actual fact it will contain the SEGA logo from the BIOS on a real system.) So let's do that, by setting every byte of it to zero.
    ;==============================================================
    ; Clear VRAM
    ;==============================================================
    ; 1. Set VRAM write address to 0 by outputting $4000 ORed with $0000
    ld a,$00
    out ($bf),a
    ld a,$40
    out ($bf),a
    ; 2. Output 16KB of zeroes
    ld bc, $4000    ; Counter for 16KB of VRAM
    ClearVRAMLoop:
        ld a,$00    ; Value to write
        out ($be),a ; Output to VRAM address, which is auto-incremented after each write
        dec bc
        ld a,b
        or c
        jp nz,ClearVRAMLoop
Wow, look at that! That's quite a piece of code there, quite daunting really. But it's not that bad, honestly - you'll laugh at something like that in no time. Let's see what's there.

First, we have to communicate with the VDP and tell it that we want to write to VRAM. (VRAM is what we'll call the RAM inside the VDP.) To do this we have to tell it the address we want to write to, and tell it we want to write. Because there's 16KB of VRAM, we'll need 14 bits (214 = 16384 = 16KB) for the address. The last two bits to make it up to a 16-bit (2-byte) number are used to signal what our intentions are. To get the final number we can use an OR calculation - the number $4000 only contains the bits required to tell the VDP we want to write to VRAM, and if we OR it with the address we'll get the final number to send to the VDP. In out case, we want to start at address $0000 so the final number is $4000. (Try it on a calculator which supports hexadecimal.)

We have to output this to port $bf, the VDP control port as I told you a while back. I also told you about the byte ordering - we split $4000 and send it in the order $00 $40. And that's what part 1 above is doing.

Why can't we write "out ($bf),$4000"? Because the Z80 doesn't know how to. There are restrictions on how you can handle data. In this case, we can only output one byte at a time, and the data has to come from register a. (There are other possible ways to do it but this is the easiest.)

Now we've set the VDP ready to receive data. We send it data by outputting to port $be, the VDP data port. When it gets it, it will write it to VRAM and then (rather handily) move to the next byte of VRAM, so we can just send it a stream of data bytes and it will write them consecutively to VRAM. So we need to send 16384 zeroes. The way to do this is to start at $4000 (=16384), then output a zero and decrease our counter. Then we'll repeat this until our counter is zero. That's what part 2 is doing.

First, we store $4000 in register pair bc. Then we come across another Z80 instruction we wish we had. We want to go in a loop, decreasing bc by one each time and checking if it's zero - but while there is an instruction to decrease a register pair by one (decrement it), it doesn't have a built-in check if the answer is zero. So we will check it ourselves, using the fast or instruction. This combines the current byte in a with another byte, giving a result with a binary 1 where either of the two inputs had a 1. So it can only give a result of zero if both inputs were zero, and because it sets the z flag, it allows us to do a conditional jump (see below) based on the result.

So, we can output something bc times using a loop like the one shown above. Here's a version with comments describing what's happening more verbosely:

    ; 2. Output 16KB of zeroes
    ld bc, $4000    ; Counter for 16KB of VRAM
    ClearVRAMLoop:
        ld a,$00    ; Value to write
        out ($be),a ; Output to VRAM address, which is auto-incremented after each write
        dec bc      ; decrement counter
        ld a,b      ; get high byte
        or c        ; combine with low byte
        jp nz,ClearVRAMLoop ; loop until the result is zero

Here's where I explain conditional jumps. There is a short list of "conditions" you can attach to a jump (and a few other instructions) that depend on the result of an earlier calculation, which set some bits in the f (flag) register according to its result. You have to check the Z80 reference manual carefully to find which instructions affect which bits of the flag register. In our case, we have "Z is set if the result is zero; reset otherwise" listed under the 8-bit or s instruction. nz evaluates to true if the z bit is not set, and false otherwise - in other words, it's true if the result was not zero. Many instructions do not affect the flags at all so it is possible to do a conditional jump that's determined by an instruction some way before it. Available conditions are:

nzNot Zero
zZero
ncNo Carry (overflow)
cCarry
poParity Odd
peParity Even
pPositive
mMinus (negative)

So, this section has set all of VRAM to zero. Now we've got a blank space to start putting our data.

1.3.9 Load palette - CRAM and SMS palette data

Now we want to define our palette. SMS and GG graphics are defined by a 4-bit (16 colour) palette. Each pixel is one of 16 colours in the palette*, and which colours correspond to which palette index is entirely up to us. We can define each of the 16 to be one of 64 possible colours (4096 for the Game Gear, which I'm not going to go into). So that's what we're going to do. We're only actually going to use two colours, black for the background and white for the foreground, so we only need to define those two.

The colours are defined by the low 6 bits of a byte stored in a little bit more RAM in the VDP, called the colour RAM (CRAM). One byte's 8 bits define the colour as follows:

Bit: 7 6 5 4 3 2 1 0
% Unused Blue Green Red

Notice how I number the bits from 7 at the left (Most Significant Bit) to 0 at the right (Least Significant Bit). If you're not familiar with binary numbers then I suggest you look it up on Google and learn a bit more. Note also that % is used to represent a binary number.

Each colour component therefore has two bits to define it, which means it can range from 0 (%00) to 3 (%11). So, for full intensity red I want R=3, G=0 and B=0, which is %00000011. (I put zeroes in the unused bits.) White is %00111111, yellow is %00001111, and so on. (If you're not familiar with how colours are made up of RGB components, look that up too.) If these binary numbers are written as hexadecimal, they'll range from $00 (black) to $3f (white). If you run one of the various colour test demos (available here) you'll find all 64 colours shown, often with their corresponding numbers, for easy reference!


Note that different emulators represent colours differently. Here, eSMS produces a white that's quite dark. Also, Bock chose to convert the numbers to decimal, just to confuse matters. You'll need to convert between decimal and hex a lot anyway, so why not practise now? Check that 63 = $3f.

Here's a JavaScript table showing the SMS colours in various data formats. Your browser might mess it up if its JavaScript engine isn't correct.

Anyway, I took the values for black and white and stored them in my data section lower down in the source file:

PaletteData:
.db $00,$3f ; Black, White
PaletteDataEnd:
Then I want to write this (small) data block to CRAM, which is done very similarly to VRAM. The difference is, instead of the magic number $4000 which signals a write to VRAM, I use $c000 for a write to CRAM. The address is $0000 for me to write to the first palette index. Then I use otir to output my data, very similarly to what was done previously:
    ;==============================================================
    ; Load palette
    ;==============================================================
    ; 1. Set VRAM write address to CRAM (palette) address 0 (for palette index 0)
    ; by outputting $c000 ORed with $0000
    ld a,$00
    out ($bf),a
    ld a,$c0
    out ($bf),a
    ; 2. Output colour data
    ld hl,PaletteData
    ld b,(PaletteDataEnd-PaletteData)
    ld c,$be
    otir
I could have just written the two bytes manually, but when the project gets more advanced and I want 32 colours and several palettes I'll be glad of this automation.

* There are actually two 16-colour palettes, one for sprites or tiles and one for tiles only. Any one tile can only use one of the palettes and is so still limited to 16 colours. For simplicity I'll ignore the sprite palette.

1.3.10 Loading the font - SMS tile format, pointers

Now we've got our palette going - and, in fact, if we missed out the rest of the code, it would run and we could see the palette loaded in an emulator's palette window. Now we want to define some graphics to go with it.

The VDP works with tiles. Each tile is an 8×8 pixel square and we have about 450 of them available. The screen is then built up by referring each square of the screen to one of the defined tiles. So, we have to define tiles to be able to display anything.

As I said before, each pixel of a tile can be one of the 16 colours in the palette. So, that's 4 bits per pixel × 64 pixels per tile = 256 bits = 32 bytes per tile. The way the data is stored is a bit hard to explain, so let's take an example. Let's take the capital A from our font, and let the background be colour 0 (which we have set as black with our palette) and the foreground colour 1 (white). Then we will take one row of pixels from the top:

As you can see, I've converted each pixel to its binary representation - %0000 for 0 and %0001 for 1. Then I take the LSB (Least Significant Bit, bit 0, the rightmost bit) from each pixel - which is the top row of numbers in the diagram - and get %00111100 = $3c. I repeat this for each row, which represent successively more significant bits, until I have four bytes which describe the top row of the tile: $3c $00 $00 $00. If I repeat this for each line I will end up with my 32 bytes in the format used by the VDP.

For 1 bit graphics (either colour 0 or colour 1) you will notice that the last three bytes for each row is zero, and the first is just the row's bits stuck together. We can use this to our advantage - knowing this, we can just store the image as a 1bpp (bit-per-pixel) image and make sure that after each byte we write, we put three zeroes. This also has the advantage that we can write the image data in a way that makes it still possible to see the image:

 ; Character 0x41 A

.DB %00111100    ; Hex 3Ch
.DB %01100110    ; Hex 66h
.DB %01100110    ; Hex 66h
.DB %01111110    ; Hex 7Eh
.DB %01100110    ; Hex 66h
.DB %01100110    ; Hex 66h
.DB %01100110    ; Hex 66h
.DB %00000000    ; Hex 00h
The 16KB of VRAM is split into three areas - the tile definitions, where our ~450 unique tiles are stored; the name table, where we define which tile each part of the screen shows; and the sprite table, where we control the sprites. We can arrange these three things any way we want, but apart from a few cases, there is one way that makes the most efficient use of the available space:

$0000-$37ffTiles - 448 @ 32 bytes per tile
$3800-$3effScreen - 32 x 28 locations @ 2 bytes each
$3f00-$3fffSprites - 64 @ 4 bytes each

That's (partly) what my block of VDP initialisation data defined before.

So, as before we set the VDP ready to receive data, this time at address $0000 because that's where the first tile will be:

    ;==============================================================
    ; Load tiles (font)
    ;==============================================================
    ; 1. Set VRAM write address to tile index 0
    ; by outputting $4000 ORed with $0000
    ld a,$00
    out ($bf),a
    ld a,$40
    out ($bf),a
Then we want to loop through as many bytes as the font data takes up. We'll use the same method we used before when we wanted to loop through $4000 bytes to clear the VRAM. The difference is, now each time we want to also progress through the font data, and after each data byte write three zeroes. We'll use hl to "point" to the data - that means we put the memory address in hl, and use that as the address each time, adding one to it each time round the loop. You'll use pointers like this a lot in assembler.

So, here we go:

    ; 2. Output tile data
    ld hl,FontData              ; Location of tile data
    ld bc,FontDataEnd-FontData  ; Counter for number of bytes to write
    WriteTilesLoop:
        ; Output data byte then three zeroes, because our tile data is 1 bit
        ; and must be increased to 4 bit
        ld a,(hl)        ; Get data byte
        out ($be),a
        ld a,$00
        out ($be),a
        out ($be),a
        out ($be),a
        inc hl           ; Add one to hl so it points to the next data byte
        dec c
        jp nz,WriteTilesLoop
        dec b
        jp nz,WriteTilesLoop
You'll notice the instruction ld a,(hl). The brackets mean "what's pointed to by", so in full it's "load a with what's pointed to by hl". Also notice the inc hl instruction - this is INCrement, the opposite to DECrement, and it adds one to the register given. Here, I'm using the 16-bit version with register pair hl but the syntax is the same for the 8-bit version with a single register.

Now, when the emulator gets to this point, we will see the tiles appear in the tile displayer... but there's still nothing on the screen.

1.3.11 Writing to the name table - name table format, ASCII conversion, terminators

We have to write to the name table to tell it which tiles to display where, to show our message. The name table is a list of words describing the screen background over its entirety - which is 32 × 28 locations. Each entry describes which tile that screen location should be filled with (a number from 0 to 448) and, with the extra high bits left over, some extra attributes such as flipping, priority and which palette to use. We'll not use those attribute bits yet though. The first entry describes the top-left position, then entries describe the row to its right before moving down to the next row.

Looking at the tiles we've loaded in the emulator, we can see that the tile for space is stored at index $00, '0' is at $11, 'A' is $21, and so on. Well, just to be very handy, I chose a set of tiles which matches the ordering of letters according to the ASCII standard. However, it does not match ASCII exactly, because ASCII includes 32 control characters at the start which are useless to me. In ASCII, space is $20, '0' is $31, 'A' is $41, and so on - each one exactly $20 more than it is in my tiles. To convert from the ASCII code to the tile index I have to subtract $20.

Why is this important? Because if I want to store "Hello world!" I need to know which tile index each letter corresponds to. I could do it by hand:

Message:
.db $28,$45,$4c,$4c,$4f,$00,$57,$4f,$52,$4c,$44,$21
But that's really hard to read; what if I make a mistake; and what if I want to change it later? Well, WLA DX is clever and allows me to enter ASCII directly, like this:
Message:
.db "Hello world!"
It will then convert the text to ASCII and store that as data in the ROM file. Then the program has to convert from ASCII to tile indices (by subtracting $20) for each letter and now it's really easy to check it's right and change it if wanted.

There's one more thing. In a typical program you might want to write more than one thing - perhaps "Welcome to Hello World XP" at the start, then "Press any button to start", then of course the lengthy credits sequence. It's a pain to have to keep track of not only the location of each text string, but also its length, so we'll borrow something from the world of PCs (and no doubt a gazillion other computers) - we'll make it so there's a terminator byte. A terminator byte can't be bargained with. It can't be reasoned with. It doesn't feel pity, or remorse, or fear. And it absolutely will not stop, ever, until you are dead. Erm... no, actually it's a byte included at the end of a stream of data which cannot possibly be valid data, and signals that the data is finished. In our case, we'll use $00 because that's what is very often used on PCs, especially in the C language, giving a "null-terminated string":

Message:
.db "Hello world!",0
So, here's our outline of code:

And here's the source to do it:

    ;==============================================================
    ; Write text to name table
    ;==============================================================
    ; 1. Set VRAM write address to name table index 0
    ; by outputting $4000 ORed with $3800+0
    ld a,$00
    out ($bf),a
    ld a,$38|$40
    out ($bf),a
    ; 2. Loop through text data, converting from ASCII and writing tile indices
    ; to name table. Stop when a zero byte is found.
    ld hl,Message
    WriteTextLoop:
        ld a,(hl)    ; Get data byte
        cp $00       ; Is it zero?
        jp z,EndWriteTextLoop    ; If so, exit loop
        sub $20      ; In ASCII, space=$20 and the rest follow on.
                     ; We want space=$00 so we subtract $20.
        out ($be),a
        ld a,$00
        out ($be),a
        inc hl       ; Point to next letter
        jp WriteTextLoop

    EndWriteTextLoop:
First of all, we have to set the VRAM write address. As mentioned before, the name table is stored at $3800 in VRAM, and we want to write to the start of it, so we have to OR $3800 with $4000 (to tell it we want to write to that address) and output it as before. I've decided to let WLA DX do the ORing for me, just in case I mess it up, by writing "$38|$40" for the high byte of the (word) address. "|" means "OR" and WLA DX will calculate the answer while compiling the code, so the effect is the same as if I'd written $78, but I think it's more clear this way.

Then I've set hl to point to the address of my message (which I labelled with "Message:"); then, similarly to before, I load what it points to.

cp means compare. It compares a to the value given and sets the flag register accordingly, without modifying a. (Internally, it is doing a subtraction.) The relevant flags will then be:

FlagSet ifRelevant conditions
za=valuez/nz
ca>valuec
a<=valuenc
This is a simplified interpretation of the flags, but suffices in most cases.

So, if a=$00 the code will jump to the EndWriteTextLoop: label (ie. the end); otherwise, it gets to the sub instruction. You might guess that means subtract - it subtracts the value given from a.

Notice that for the "Write to name table" step I have convert it from a byte to a word, but that's easy because all I have to do is set the high byte of the word to zero. Again, I must swap them around when outputting. hl is incremented and the process repeats.

What would happen if I forgot to put the terminating zero byte by adding ",0" at the end of the .db line? The code would keep on processing whatever followed the text. By chance, it's the palette data and that happens to start with a zero so there is no difference - but in another case you might well end up with junk data there.

1.3.12 Turning on the screen - VDP register ($8)1

Now you'd think we'd finished - but we haven't. The whole time, we have had the screen turned off. Why? Because when it's turned on, we have to be careful about how we access VRAM, to avoid graphical corruption when run on a real system (emulators generally don't get this corruption, though). By turning it off, we can access VRAM any way we like and it will cause no problems.

The turning on and off of the screen is done through one of the VDP registers which you noticed me gloss over before. The VDP has several registers which control certain aspects of its operation; some are related to legacy (SG-1000 type) video modes, some to the normal (SMS type) video mode and some to both. The turning on and off of the screen is done with register 1.

We access VDP registers by writing words to port $bf again; but this time, the data format is a bit different. The magical word signifying that it's a register write is $8000. The register we want to write to is given by the high byte of the word; so, for register 1 we have $0100. Finally, the data to be written to that register is given by the low byte of the word - we don't output data to port $be at all!

So, in practice, we will actually have to do something like this:

ld a,$xy
out ($bf),a
ld a,$8z
out ($bf),a
where $xy is the data and z is the register number. So, in some ways, it is easier (if inaccurate) to think that you're putting $xy in register $8z, hence me calling it "register ($8)1" at the top of this section.

Anyway, let's see what register $81 controls.

BitFunction If set If reset
7 VRAM size VRAM is 16KB VRAM is 8KB
6 Enable display Display on Display off
5 Vblank interruptsInterrupt generated on VBlank VBlank gives no interrupts
4 28 row display Screen shows 28 rows (eg. Codemasters games) Screen is normal size
3 30 row display Screen shows 30 rows Screen is normal size
2 Unused No effect No effect
1 Doubled sprites Each sprite defined will also show the next tile under itNormal sprites
0 Zoomed sprites Sprites are stretched to 16x16 pixels Normal sprites

Charles MacDonald's "Sega Master System VDP documentation" describes all of the registers in very good detail. Anyway, there is far too much information here for us to remember every time, so it makes sense for us to add comments to make it clear what we're doing. I think you'll agree that we want to set bits 7 and 6 (bit 7 should always be set; and we want the screen on now) and the rest aren't much use to us (the bigger screen displays look appealing but introduce more difficulties). So:

    ; Turn screen on
    ld a,%11000100
;          |||| |`- Zoomed sprites -> 16x16 pixels
;          |||| `-- Doubled sprites -> 2 tiles per sprite, 8x16
;          |||`---- 30 row/240 line mode
;          ||`----- 28 row/224 line mode
;          |`------ VBlank interrupts
;          `------- Enable display
    out ($bf),a
    ld a,$81
    out ($bf),a
You see how I've written the data byte in binary form, and labelled it? I suggest you do that every time you access such a register (others aren't split into many parts so you don't need to).

1.3.13 Time to stop - infinite loops

OK, now our code is almost finished. We've done everything we wanted to do; but there's one more thing we have to do. The Z80 will execute all the code we've written, and then when it gets to the end it will keep on going and going forever, never stopping. We don't want that, we want it to stop; so what we'll do is put it in an infinite loop. Normally, infinite loops are a bad thing because they stop your program ever continuing; you'll probably create a few by accident and have to figure out why they're happening and fix the bug causing them. But here, we want one. We want the processor to keep doing the same thing (nothing) over and over again forever, which we will achieve by making a jump point to itself:
    ; Infinite loop to stop program
    Loop:
         jp Loop
When the Z80 gets to the instruction jp Loop it will jump to the label Loop: immediately. When it gets there, it will find the instruction jp Loop and will jump to the label Loop: immediately. When it gets there, it will find the instruction jp Loop and will jump to the label Loop: immediately... and so on forever.

Now we've added that infinite loop, we know the program will never get past it; so here is a safe place to put data. Why does it matter where you put data? Because you have to make sure that the data is never accidentally interpreted as code. The Z80 can't tell if what it's looking at is sensible program code or data, it assumes everything is program code. So you have to make sure that the place you insert data is outside the program code and that execution will never accidentally get to your data. For a simple program like this one (with no "functions", just one code block) we put it after the program. We could equally have put it before the "main:" label, and at the start of the program execution would have jumped straight past it to that label.

I can put the data in any order I like because it doesn't matter - it's not necessary to put it in the order it's used. In larger projects you may choose to order the data logically to make it easier to navigate, and maybe split the data up according to what it's for.

Anyway, you may have noticed that the program is now finished. Press F11 to run it again.

1.4 Enhancing our program

1.4.1 Linking to external files

You might have noticed that nearly 90% of the source to "Hello World!" is taken up by the font; and that font is already defined for us by Mike G, so we don't need to edit it. So why not put it in its own file, and just tell WLA DX that it should insert that file at that point?
FontData:
.include "BBC Micro font.inc"
FontDataEnd:
Copy everything between the Fontdata: and FontDataEnd: labels to a new file and save it as "BBC Micro font.inc" in the same folder as the "Hello World.asm" source file; then delete everything you copied and replace it with the directive shown above. The file will then be "virtually" inserted at that point while compiling, exactly like C's #include preprocessor command. So whatever's in the linked file must be correct code.

There are a few more useful commands for including data from external files:

.include "name.inc"Include source file "name.inc"
.incbin "data.bin"Include binary file "data.bin", as if .includeing a version converted to source .db directives
.incdir "c:\path\"Change the directory where .included and .incbined files are assumed to be

1.4.2 Functions - call, ret, push, pop: the stack finally explained

You might have noticed that even in such a simple program as "Hello World!", we found ourselves doing the same thing over and over again:
    ; 1. Set VRAM write address to 0 by outputting $4000 ORed with $0000
    ld a,$00
    out ($bf),a
    ld a,$40
    out ($bf),a
Wouldn't it be easier if we could write some code to do that more easily? Something like
    ld VRAM write address,$3800   ; Not a real instruction :(
Well, we can't do it quite like that, but we can do something almost the same. But first we need to learn about the stack.

The usual description is that it's like a stack of playing cards, with the magical limitation that we can only take the top card from the stack, or put another one on there. The important thing is, the cards come off in the reverse order they're put on, so it's important not to get them mixed up.

For the Z80, the stack is a section of memory containing words, not bytes. We can push a register pair onto the stack and the Z80 will store it in that section of memory. We can then pop it back into any register pair, although it usually only makes sense to pop it into the one you took it from. It allows you to do something like this:

ld hl,$1234
push hl
     ld hl,$5678
     ; Do something with hl (which contains $5678)
     ; ...
pop hl
; hl now contains $1234 again
...in effect, "saving" the contents of that register so you can do something else with it, then restoring it to its previous state. The other main use for the stack is for functions. There is a Z80 instruction "call" which is exactly like jp, in that it makes execution jump to a certain point instead of continuing on linearly; except that first, it pushes the pc register pair, which by now contains the address of the next instruction after the call, onto the stack. Then, some time after jumping to the given address, if it encounters a ret instruction it will pop the stored pc address and start executing code from there, in effect returning to the point it was at before: Somewhere in the program, usually not in the normal flow of the program:
MyFunction:
    inc a    ; Do something
    ret      ; return
In the normal flow:
    ld a,$00
    call MyFunction
    ; a now contains $01
    call MyFunction
    ; a now contains $02
Again, remember you have to be careful with the order you push/pop, especially when mixed in with calls and returns. This:
call MyFunction
MyFunction:
    ld hl,$1234
    push hl
    ret        ; Error!
will not work, because the ret will take the last thing pushed, which is $1234, and execution will continue at $1234! Except in very few circumstances, that's not something you'll want to do, because $1234 might be some data, or some completely unrelated code, or even halfway through an instruction!

Anyway, let's get on with our useful function, now we know what to be careful about. Let's make it possible to specify an address in register pair hl which will then be ORed with $4000 and output to port $bf, thereby setting the VRAM address and making it ready to write:

VRAMToHL:
    ld a,l
    out ($bf),a
    ld a,h
    or $40
    out ($bf),a
    ret
This takes the value in hl, outputs the low byte (stored in l), ORs the high byte (h) with $40, then outputs that. We have to transfer the data from h or l into a to be able to output it, and incidentally also to be able to OR it. We don't bother ORing the low byte with $00 - if you don't know why, try it out on a calculator with different values for low byte.

We can now do something like this to set the VRAM write address to $0000:

    ld hl,$0000
    call VRAMToHL
Isn't that easier? But there's a problem. What will be the value in a after this code runs?
    ld a,$27
    ld hl,$0000
    call VRAMToHL
You'd want it to still be $27; but actually it will be $40, because in the VRAMToHL function we used a, overwriting what was in there before. This is a case where we want to "save" the register using push and pop. Register a gets paired with the flag register f to form pair af. Here's our amended function:
VRAMToHL:
    push af
        ld a,l
        out ($bf),a
        ld a,h
        or $40
        out ($bf),a
    pop af
    ret
Notice how I use indentation to make it clear which code block I'm saving af for the duration of. In general, you should push every register you will change the contents of at the start of the function, and pop them all in reverse order at the end; unless you intend to "return a value" in one of the registers, in which case you should not save that one. Pushing and popping has other uses, particularly in more advanced code when you run out of registers you can use.

1.5 Exercises

  1. Modify "Hello World!" so it displays more text - try to fill the whole screen.
    Hint 1
    Hint 2
  2. Change one of the times where the VRAM write address is set to use the new function, and check if it still works by compiling (F9) and running (F11). When it does, change all of them in the same way.
  3. Make it so the text displayed is stored in an external file. Does this external file need to contain any .db directives?
    Answer
  4. (Compulsory exercise) Send me an email here with any questions or problems you've come across, and any suggestions for the next instalment.