Forums

Sega Master System / Mark III / Game Gear
SG-1000 / SC-3000 / SF-7000 / OMV
Home - Forums - Games - Scans - Maps - Cheats - Credits
Music - Videos - Development - Hacks - Translations - Homebrew

View topic - devkitSMS - develop your homebrew in C

Reply to topic Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8  Next
Author Message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11824
  • Location: London
Reply with quote
Post Posted: Wed Oct 14, 2015 10:04 am
I found it super helpful to define macros for colours from HTML style hex in asm, the C equivalent would be RGB(0xff0000) for red. Of course you don't have 24 bits available, but otherwise it looks more like red than 0x03. You could also switch the behaviour for a GG target.
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 1914
Reply with quote
Post Posted: Wed Oct 14, 2015 10:59 am
I could add few a macros to build colors as in
RGB(r,g,b)  /* 0 to 3 for each color */
RGB8(r,g,b)
RGBHTML(html_style_color)


Quote
You could also switch the behaviour for a GG target.

Ehm, I don't get that :|
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11824
  • Location: London
Reply with quote
Post Posted: Wed Oct 14, 2015 11:31 am
A colour is one byte with two bits per channel for SMS, and two bytes with four bits per channel for GG. Assuming you have a macro to select the target system, you can have the same macro do both ways accordingly.
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 1914
Reply with quote
Post Posted: Wed Oct 14, 2015 11:38 am
Update rolled. Macros added:
RGB(r,g,b)          /* values 0-3 for SMS and 0-15 for GG */
RGB8(r,g,b)         /* values 0-255 for both SMS and GG */ 
RGBHTML(RGB24bit)   /* 24 bit HTML-like RGB values */


Maxim: yes, sorry I didn't get it at first :)
  View user's profile Send private message Visit poster's website
  • Joined: 07 Oct 2015
  • Posts: 105
Reply with quote
Post Posted: Thu Oct 15, 2015 6:15 am
Great additions, thanks.

I have yet another suggestion, this time with an idea about the implementation.

I have coded several games for the NES using the excelent Shiru's Neslib. This library adds support for metasprites, so you can place a bunch of sprites together that conform an entity.

The solution is based in a very simple approach: you define an array with positioning and tile data for each sprite, then call a function with a master x, y and such array. The function creates a sprite for each entry in the array (until it founds a value of 127, which is the end marker) simply adding the values found in the array with the x, y passed to the function.

A very simple, carefree implementation which doesn't support clipping would be this:

/*
   metasprite format:

   signed char metasprite [] = {
      xoffset1, yoffset1, tile1,
      xoffset2, yoffset2, tile2,
      ...
      127
   };
*/

signed char meta_x;
_Bool SMS_addMetaSprite (unsigned char x, unsigned char y, signed char *metasprite) {
   while ((meta_x = *metasprite ++) != 0x7f) {
      if (SpriteNextFree < MAXSPRITES) {
         SpriteTableY [SpriteNextFree] = y + *metasprite ++;
         SpriteTableXN [SpriteNextFree << 1] = x + meta_x;
         SpriteTableXN [(SpriteNextFree << 1) + 1] = *metasprite ++;
         SpriteNextFree ++;
      } else return (false);
   }
   return (true);
}


(Sorry for my coding style, I come from cc65 and z88dk, two compilers which behave much better with globals - as in speed and code size)

This simple approach has a built-in support for the ability of handing the x, y origin of a big sprite in an arbitrary point. For example, imagine you need a 16x16 metasprite made of 4 sprites which you want to handle as it were a 8x8 virtual sprite placed in the bottom center of the 16x16 square, (see attached file). You could create this metasprite:

signed char metasprite [] = {
   -4, -8, 0,
   4, -8, 1,
   -4, 0, 2,
   4, 0, 3,
   127
};


This way the metasprite's origin is at (4, 8) inside the metasprite.

What do you think? (by the way, I haven't tested the code, so it may not work - but you get the idea!).
8x8.png (795 B)
8x8.png

  View user's profile Send private message
  • Joined: 05 Sep 2013
  • Posts: 1914
Reply with quote
Post Posted: Thu Oct 15, 2015 8:25 am
Thanks for the suggestion.
However, there are a few reasons why I wouldn't add such code to the library, the first being that there are possibly infinite approaches to meta objects and I don't want to force anyone using a specific one.
Of course you/we are free to create additional libraries that builds additional 'smart' functionalities on top of SMSlib and share them with the SMS homebrew community, and we could even make that part of the devkitSMS/SMSlib package, as I'm doing with PSGlib C incarnation for example.
Thus feel free to contribute with your meta objects handling library, I'll gladly add that to the repository :)
  View user's profile Send private message Visit poster's website
  • Joined: 07 Oct 2015
  • Posts: 105
Reply with quote
Post Posted: Thu Oct 15, 2015 10:54 am
That's cool. I'll make this into a lib as soon as I find the time :)
  View user's profile Send private message
  • Joined: 19 Oct 2012
  • Posts: 20
Reply with quote
Post Posted: Thu Oct 15, 2015 11:38 am
I like the metasprite implementation, but what would be the best way to add animations to that data model?
  View user's profile Send private message
  • Joined: 05 Sep 2013
  • Posts: 1914
Reply with quote
Post Posted: Thu Oct 15, 2015 12:24 pm
An array of metasprite indexes? It would also be nice to handle animation timing though...
  View user's profile Send private message Visit poster's website
  • Joined: 07 Oct 2015
  • Posts: 105
Reply with quote
Post Posted: Fri Oct 16, 2015 8:26 am
I usually have an array of metasprite definition pointers arranged so they fit my needs. I tend to do things simple with little overhead.

For example, let's imagine you have 3 cells for a walking animation in each direction: standing, left foot forward, right foot forward, so the animation is (lame) as in "standing, left foot forward, standing, right foot forward...".

I define a metasprite for each cell (as in meta_walk_D_n, where D is the direction L or R and 'n' de cell number, 1 to 3), then have a master array with such metasprites arranged to fit my needs:

signed char **metasprites = {
    meta_walk_L_1, meta_walk_L_2, meta_walk_L_1, meta_walk_L_3,
    meta_walk_R_1, meta_walk_R_2, meta_walk_R_1, meta_walk_R_3
};


Then I have a facing variable which equals 0 or 4 depending on the direction (0 for Left, 4 for Right) and a frame variable which holds the current animation frame (0 to 3). So each step:

frame = (frame + 1) & 3;  // cycle 0 1 2 3 0 1 2 3 ...
addMetaSprite (x, y, metasprites [facing + frame]);


But as I said, there are multiple approaches.

If you need more control, such as timing, you can create a more complex structure, for example storing the # of game frames the current animation frame has to stay before being switched to the next, and stuff like that. But I have never ever needed such a complex solution.

For example, I time my walking frames using the sprite coordinates, as in "advance frame each tile boundary". That works great in a walking animation as the sprite will animate slower at first then faster as its velocity increases. Measuring how many pixels each frame should last depending on how the frames are drawn and how many frames exist gives you great results with zero hassle - and that shows how I'm more a graphic artist than a programmer ^_^u
  View user's profile Send private message
  • Joined: 05 Sep 2013
  • Posts: 1914
Reply with quote
Post Posted: Mon Oct 19, 2015 10:57 am
back in topic, now ;)

I just rolled a small update. I wasn't very fond of forcing programmers to specify release (compilation) date in SDSC header thru the macro
SMS_EMBED_SDSC_HEADER(verMaj,verMin,dateYear,dateMonth,dateDay,author,name,descr);

so I added a new macro
SMS_EMBED_SDSC_HEADER_AUTO_DATE(verMaj,verMin,author,name,descr);

where you specify version and information but no release (compilation) date. Doing so, the compiler fills the date with zeroes so that the (updated) ihx2sms tool knows that it has to fill the blank with the current date.
Also, region code in SEGA header has been fixed for GameGear too ($7C).
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 1914
Reply with quote
Post Posted: Thu Oct 29, 2015 4:19 pm
Well, na_th_an, why didn't you tell us? =D
  View user's profile Send private message Visit poster's website
  • Joined: 07 Oct 2015
  • Posts: 105
Reply with quote
Post Posted: Thu Oct 29, 2015 4:55 pm
In fact I was going to, but you beat me to it :D

It's not a real game, just testing your lib and writing some command line converters (which I will share soon)
  View user's profile Send private message
  • Joined: 01 Aug 2012
  • Posts: 246
  • Location: Porto, Portugal
Reply with quote
Post Posted: Thu Oct 29, 2015 10:49 pm
nice to see you here na_th_an!!!

btw, sorry about my “ignorance” about C and devkitSMS, but about this issue of 8x16 sprites, i guess that we can have some inline assembly that could set the vdp registry bit for that directly, can’t we? it might look like a dirty patchwork in the code (like what i always do on Boriel’s ZX-Basic Compiler! :D ), but it’s better than nothing! :D
  View user's profile Send private message Visit poster's website
  • Joined: 01 Aug 2012
  • Posts: 246
  • Location: Porto, Portugal
Reply with quote
Post Posted: Fri Oct 30, 2015 12:24 am
this is what i have on ZX-Basic Compiler for having 8x16 sprites (like what i used at http://www.boriel.com/wiki/en/index.php/ZX_BASIC:Released_Programs_-_SegaE - these System-E sources are not that different from what i used on SMS )

sub smsvdp(tad as ubyte, tvl as ubyte):
  asm
    ld a,(ix+7)
    out ($bf),a
    ld a,(ix+5)
    or $80
    out ($bf),a
    end asm
  end sub

'- mode 4,2
smsvdp(0,%00000100):smsvdp(1,%11100010) '- smsvdp(1,$84)
smsvdp(2,$ff):smsvdp(5,$ff):smsvdp(10,$ff)
smsvdp(6,$fb) '- sprite patterns - $fb for $0000 (256 sprites available), $ff for $2000 (192 sprites available)
smsvdp(7,$00) '- border colour (sprite palette)


it might be similar somehow on C and devkitSMS (there might be some vdp or i/o commands available somehow, i guess)
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 1914
Reply with quote
Post Posted: Fri Oct 30, 2015 9:09 am
na_th_an wrote
In fact I was going to, but you beat me to it :D


Ahem... sorry, I didn't realize you just released the game the very same day. Sorry for spoiling the news :|

nitrofurano: SMSlib supports 8x8 and 8x16 sprites, even zoomed.
void SMS_setSpriteMode (unsigned char mode)

You might have missed this post.
  View user's profile Send private message Visit poster's website
  • Joined: 01 Aug 2012
  • Posts: 246
  • Location: Porto, Portugal
Reply with quote
Post Posted: Fri Oct 30, 2015 1:22 pm
thanks and sorry, sverx, i was falling asleep when i was reading it! :D
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 1914
Reply with quote
Post Posted: Fri Oct 30, 2015 1:40 pm
Last edited by sverx on Sat Oct 31, 2015 3:57 pm; edited 1 time in total
na_th_an wrote
[...] command line converters (which I will share soon)


I just gave a look at the source package to see the tools you used, and I suspect you overlooked Maxim's Bmp2Tile (here's the github repository).
Using it you can convert png/bmp/pcx/gif images to tiles (h/v flip reduced, 8x16) tilemaps (with tile offset, priority and palette bits) and palettes, and it supports compression on tiles and tilemaps. Can be run from command line too.
It can output binary files that you'll save into some folder, say 'assets' and then, using folder2c, you can generate assets.c and assets.h to be used into your project.

I hope this is a useful suggestion :)

nitrofurano: don't worry pal! :)
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11824
  • Location: London
Reply with quote
Post Posted: Fri Oct 30, 2015 3:38 pm
How silly not to update my own website... It's there now.

Bmp2tile is no good if you aren't on Windows.
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 5
Reply with quote
Post Posted: Mon Nov 02, 2015 3:17 pm
Hi

Using devkitSMS is it possible to somehow flip sprite tiles or would I need to store the flipped versions in ROM?

Thanks.
  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11824
  • Location: London
Reply with quote
Post Posted: Mon Nov 02, 2015 5:04 pm
It's not a hardware feature so you would need to implement it yourself.
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 5
Reply with quote
Post Posted: Mon Nov 02, 2015 5:08 pm
Maxim wrote
It's not a hardware feature so you would need to implement it yourself.


I am thinking on doing it just uploading the ROM data in different order (right to left, so to speak, in the case of horizontal flip). Is this the usual way of doing sprite tile flip? Does devkitSMS allow to upload this way data to VDP or should I need to create some asm function?

Thanks.
  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11824
  • Location: London
Reply with quote
Post Posted: Mon Nov 02, 2015 5:38 pm
It depends, maybe you reduced your bitplane count to improve efficiency so a function that did four bitplanes would waste time. Reversing the bits in a byte in software (horizontal flip) is a bit tricky, and there isn't one best way to do it. Maybe you would choose to store the data flipped for speed. The tradeoffs are enough to make it hard to offer a function.
  View user's profile Send private message Visit poster's website
  • Joined: 17 Sep 2013
  • Posts: 113
  • Location: Gravataí, RS, Brazil
Reply with quote
Post Posted: Mon Nov 02, 2015 6:25 pm
Maxim wrote
It depends, maybe you reduced your bitplane count to improve efficiency so a function that did four bitplanes would waste time. Reversing the bits in a byte in software (horizontal flip) is a bit tricky, and there isn't one best way to do it. Maybe you would choose to store the data flipped for speed. The tradeoffs are enough to make it hard to offer a function.


Using a revese bits lookup table is problably the fastest way to horizontal flip the bits, and its only 256 bytes in ROM.
  View user's profile Send private message
  • Joined: 05 Sep 2013
  • Posts: 5
Reply with quote
Post Posted: Mon Nov 02, 2015 7:07 pm
Thanks for the answers guys. Reading them I noticed I need more background of how SMS works, as I don't understand why I would need a table. My idea was that, to flip a tile, instead of copying the pixel data from n to n + size it would be enough copying it from n + size to n (a decreasing index loop, in other words). It seems I lack some fundamental knowledge ^_^U
  View user's profile Send private message
  • Joined: 05 Sep 2013
  • Posts: 1914
Reply with quote
Post Posted: Mon Nov 02, 2015 7:39 pm
Tiles are 'planar', so you really need to reverse the order of the bits in every byte to make an horizontal flip, and it's not a single instruction, so usually you define an array of 256 bytes where each constant value equals the value of its index, reversed, as in
lut[0x00]=0x00
lut[0x01]=0x80
lut[0x02]=0x40

and so on.

As for your original question, I think ROM is quite cheap nowadays so you can store both the original tiles and the flipped ones in ROM. Also, you can compress them, for instance using PSGaiden compression, if you're concerned about wasting space.

edit: anyway you also get the chance of having different tiles, think at all those heroes holding the sword with the 'wrong' hand when facing the 'wrong' direction... :)
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11824
  • Location: London
Reply with quote
Post Posted: Mon Nov 02, 2015 8:11 pm
Surely they swap the sword to the other hand when they turn around? Normally with sprite tile loads you care about speed, so storing the data twice is the best way. An aligned lookup table is the fastest way to do it in code. Decompressing is way too slow for tile loads in the vblank.
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 5
Reply with quote
Post Posted: Mon Nov 02, 2015 9:12 pm
I see now where is the catch flipping pixels. As you said the ROM path seems the fastest and simplest way. Is bank switching fast enough to upload tiles from ROM to RAM and then to VRAM?
  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11824
  • Location: London
Reply with quote
Post Posted: Mon Nov 02, 2015 10:31 pm
Bank switching is effectively instantaneous. However, why stage in RAM? Copy from ROM to VRAM.
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 5
Reply with quote
Post Posted: Mon Nov 02, 2015 10:38 pm
Maxim wrote
Bank switching is effectively instantaneous. However, why stage in RAM? Copy from ROM to VRAM.


Ooops, I didn't know that. Anyway with devkitSMS this ROM->VRAM transfer is transparent to the programmer, if I am not mistaken.
  View user's profile Send private message
  • Joined: 05 Sep 2013
  • Posts: 1914
Reply with quote
Post Posted: Tue Nov 03, 2015 10:17 am
Cyttorak wrote
with devkitSMS this ROM->VRAM transfer is transparent to the programmer, if I am not mistaken.


The functions are built trying to do that as transparently as possible. Anyway both loading from ROM or from RAM it's the same task, just using a different source address.
  View user's profile Send private message Visit poster's website
  • Joined: 07 Oct 2015
  • Posts: 105
Reply with quote
Post Posted: Thu Nov 12, 2015 10:17 am
sverx wrote
na_th_an wrote
[...] command line converters (which I will share soon)


I just gave a look at the source package to see the tools you used, and I suspect you overlooked Maxim's Bmp2Tile (here's the github repository).
Using it you can convert png/bmp/pcx/gif images to tiles (h/v flip reduced, 8x16) tilemaps (with tile offset, priority and palette bits) and palettes, and it supports compression on tiles and tilemaps. Can be run from command line too.
It can output binary files that you'll save into some folder, say 'assets' and then, using folder2c, you can generate assets.c and assets.h to be used into your project.

I hope this is a useful suggestion :)


Found it, but by some odd reason I always spend more time trying to master some other guy's converter than writing my own.

My current converter project is a multi-purpose multi-platform (NES and SMS right now, but if you support SMS, Megadrive support is cheap to implement) command-line tool for getting single charsets, fixed tilesets (where 1 WxH tiles is comprised by W*H consecutive chars, no tilemaps), tilemaps, and metasprites from portions of png files. It also detects and exports palettes (with some caveats, specially the NES version, 'cause as many of you may know, the NES is well known for being "NTSC" - "Never The Same Colour" XD).

I made it so it fitted my needs completely. Can be used on its own or with folder2c. I just need to clean it up.

I also have a question. It's completely just out of curiosity, and maybe it will show a lack of knowledge on my part, but why do you do this:

void SMS_setBGScrollX (int scrollX) {
  SMS_write_to_VDPRegister(0x08,LO(scrollX));
}

void SMS_setBGScrollY (int scrollY) {
  SMS_write_to_VDPRegister(0x09,LO(scrollY));
}


Instead of just using unsigned chars? To make the handling of negative numbers more intuitive to coders? wouldn't be a tad faster just using unsigned chars, not having to calculate the "LO"?
  View user's profile Send private message
  • Joined: 05 Sep 2013
  • Posts: 1914
Reply with quote
Post Posted: Thu Nov 12, 2015 10:52 am
na_th_an wrote
[...] by some odd reason I always spend more time trying to master some other guy's converter than writing my own.
My current converter project is a multi-purpose multi-platform [...]


I understand you very well, but given the lot of powerful features BMP2Tile gives you, I wouldn't suggest anything else to someone coding on a SMS/GG. Of course if you're targeting multi-platforms that's a whole different story :)

na_th_an wrote
I also have a question. It's completely just out of curiosity, and maybe it will show a lack of knowledge on my part, but why do you do this:

void SMS_setBGScrollX (int scrollX) {
  SMS_write_to_VDPRegister(0x08,LO(scrollX));
}

void SMS_setBGScrollY (int scrollY) {
  SMS_write_to_VDPRegister(0x09,LO(scrollY));
}


Instead of just using unsigned chars? To make the handling of negative numbers more intuitive to coders? wouldn't be a tad faster just using unsigned chars, not having to calculate the "LO"?


Correct, the handling of negative numbers it's the reason. But no, it's just as fast as using an unsigned char, and that's because the parameter is anyway passed pushing two bytes on the stack. The LO(x) doesn't involve any calculation, as SDCC understands you just want to use the least significant byte and retrieves only that from the stack :)

;SMSlib.c:231: void SMS_setBGScrollX (int scrollX) {
;   ---------------------------------
; Function SMS_setBGScrollX
; ---------------------------------
_SMS_setBGScrollX::
;SMSlib.c:232: SMS_write_to_VDPRegister(0x08,LO(scrollX));
   di
   ld   iy,#2
   add   iy,sp
   ld   h,0 (iy)                 ; <--- here is getting the value
   ld   l,#0x00
   ld   a,h
   out   (_VDPControlPort),a
   ld   a,#0x88
   out   (_VDPControlPort),a
   ei
   ret

(anyway please don't ask me why the compiler places that useless
ld   l,#0x00
... SDCC has still many quirks, even if it's a very good software...)

p.s. if you want to call SMS_write_to_VDPRegister() directly so to pass it an unsigned char instead -if it's more handy for you- of course you can :)
  View user's profile Send private message Visit poster's website
  • Joined: 17 Nov 2015
  • Posts: 91
  • Location: Canada
Reply with quote
Post Posted: Tue Nov 17, 2015 7:52 pm
sverx wrote

Correct, the handling of negative numbers it's the reason. But no, it's just as fast as using an unsigned char, and that's because the parameter is anyway passed pushing two bytes on the stack.


With sdcc, chars are pushed as a single byte. This means the char is wrangled into the high byte of a register pair (it could be AF), pushed and the sp is incremented before the call. After the call, the stack has to be adjusted by a single byte. The call overhead is a little greater but sdcc in general (ie without some help) will generate better code for chars around the call. In vararg parameter lists, chars have to be promoted to int and in those cases a char will be pushed as 16-bits.

;SMSlib.c:231: void SMS_setBGScrollX (int scrollX) {
;   ---------------------------------
; Function SMS_setBGScrollX
; ---------------------------------
_SMS_setBGScrollX::
;SMSlib.c:232: SMS_write_to_VDPRegister(0x08,LO(scrollX));
   di
   ld   iy,#2
   add   iy,sp
   ld   h,0 (iy)                 ; <--- here is getting the value
   ld   l,#0x00
   ld   a,h
   out   (_VDPControlPort),a
   ld   a,#0x88
   out   (_VDPControlPort),a
   ei
   ret


Quote

(anyway please don't ask me why the compiler places that useless
ld   l,#0x00
... SDCC has still many quirks, even if it's a very good software...)


I just happened to be looking at the generated asm from your library when you posted this :)

The peephole optimizer is supposed to be eliminating dead loads but in this case the peepholer is not running on that code. The reason is the inlined asm. The peepholer will ignore all code up to the last "__endasm;" it sees in the function. In the case above, that means all C code up to "ei" is not peephole optimized. I'm going to raise a ticket about this at sf/sdcc but a fix is probably more than trivial since the peepholer will have to break that bunch of code into blocks linking inlined asm and C-asm in a general manner, something it's clearly not doing now.

If you eliminate the inlined ei and di that code assembles to:


   ld   hl, #2+0
   add   hl, sp
   ld   a, (hl)
   out   (_VDPControlPort),a
   ld   a,#0x88
   out   (_VDPControlPort),a
   ret


Inlined asm is actually very dangerous under sdcc. The reason is sdcc completely ignores the inlined asm and will assume live registers still hold their values with interrupting inline asm present. This is as it should be but humans tend to forget this. Here's an example:


int dummy;

void SMS_setBGScrollXY (int scrollX) {

   VDPControlPort=(LO(scrollX)); VDPControlPort=(0x08)|0x80;

   __asm
   ld hl,(_dummy)
   ld de,(_dummy)
   __endasm;

   VDPControlPort=(LO(scrollX)); VDPControlPort=(0x08)|0x80;
}


Generated code:

   ld   iy,#2
   add   iy,sp
   ld   d,0 (iy)
   ld   h,#0x00
   ld   a,d
   out   (_VDPControlPort),a
   ld   a,#0x88
   out   (_VDPControlPort),a

   ld hl,(_dummy)
   ld de,(_dummy)

   ld   a,d
   out   (_VDPControlPort),a
   ld   a,#0x88
   out   (_VDPControlPort),a

   ret


Notice how the first invocation of your macro is not peephole optimized because the optimizer is only applied after the last "__endasm". You should spot something else :- the lined code as changed "de" but the compiler has continued to generate code assuming "d" still holds its original value.

The situation can be more complicated than that as the peephole rules can change what registers are used for computations. This means inlined assembly, unless it's trivial or appears at the end of the function, must preserve all registers it uses to be guaranteed to work.

Inlined assembly was only meant to inject a little asm here and there within a C environment but it's one of those things that is abused in the z80 community. An asm function should be written in asm and stored in an asm file. This is made difficult by sdcc's tools, I know, and it does require a bit more knowledge but it's how it should be done :)


Edit: I was also going to mention that "--reserve-regs-iy" which prevents sdcc from using the iy register generally leads to better code. There's something wrong with how sdcc costs use of iy. But then tried it on your code and it crashed the compiler. Looks like another bug to hunt down :D
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 1914
Reply with quote
Post Posted: Tue Nov 17, 2015 8:34 pm
Thanks a lot for the very interesting information you posted! There's still a lot of SDCC that I've got to learn and master...
As for the use of inline asm, I wasn't really fond of it to be honest... but I had to start using it a while ago, because I just couldn't make __critical sections work correctly, at least before release 3.5.1 (build #9261 at least). I have now to consider switching, of course, but I surely want to avoid breaking everything just to shave one cycle or two ;)

edit: compiling with -DNESTED_DI_EI_SUPPORT it will use __critical sections instead of inline asm, and it generates this code:

;SMSlib.c:231: void SMS_setBGScrollX (int scrollX) {
;   ---------------------------------
; Function SMS_setBGScrollX
; ---------------------------------
_SMS_setBGScrollX::
;SMSlib.c:232: SMS_write_to_VDPRegister(0x08,LO(scrollX));
   ld   hl, #2+0
   add   hl, sp
   ld   d, (hl)
;SMSlib.c:114: }   
   ld   a,i
   di
   push   af
;SMSlib.c:112: VDPControlPort=value;
   ld   a,d
   out   (_VDPControlPort),a
;SMSlib.c:113: VDPControlPort=VDPReg|0x80;
   ld   a,#0x88
   out   (_VDPControlPort),a
   pop   af
   ret   PO
   ei
;SMSlib.c:232: SMS_write_to_VDPRegister(0x08,LO(scrollX));
   ret

which has been peephole optimized, surely. I wonder anyway who really need to nest interrupts on a SMS/GG...

also, about:
Quote
"--reserve-regs-iy" [...] prevents sdcc from using the iy register generally leads to better code

I wonder if it's correct that forcing a compiler between stricter bounds leads to better results... ;)
  View user's profile Send private message Visit poster's website
  • Joined: 17 Nov 2015
  • Posts: 91
  • Location: Canada
Reply with quote
Post Posted: Wed Nov 18, 2015 9:35 pm
sverx wrote

edit: compiling with -DNESTED_DI_EI_SUPPORT it will use __critical sections instead of inline asm, and it generates this code:

;SMSlib.c:231: void SMS_setBGScrollX (int scrollX) {
;   ---------------------------------
; Function SMS_setBGScrollX
; ---------------------------------
_SMS_setBGScrollX::
;SMSlib.c:232: SMS_write_to_VDPRegister(0x08,LO(scrollX));
   ld   hl, #2+0
   add   hl, sp
   ld   d, (hl)
;SMSlib.c:114: }   
   ld   a,i
   di
   push   af
;SMSlib.c:112: VDPControlPort=value;
   ld   a,d
   out   (_VDPControlPort),a
;SMSlib.c:113: VDPControlPort=VDPReg|0x80;
   ld   a,#0x88
   out   (_VDPControlPort),a
   pop   af
   ret   PO
   ei
;SMSlib.c:232: SMS_write_to_VDPRegister(0x08,LO(scrollX));
   ret

which has been peephole optimized, surely. I wonder anyway who really need to nest interrupts on a SMS/GG...


Yes "ld hl, #2+0" with no spaces around the "+" is a result of a peephole rule. The problem with __critical is the code only works reliably on CMOS z80s. It's using "ld a,i" to determine whether interrupts are currently enabled and this will not work reliably on nmos z80s. Most systems in the early to mid 80s at least were using nmos z80s but I don't know what was used in SMS/GG. In the case of the code above, it may not re-enable interrupts on rare occasions.

This is a known bug in sdcc. Zilog's proposed fix turns into a bunch of code that pushes 0 on the stack and checks to see if it was changed by an interrupt routine while "ld a,i" was executing. It's not as elegant nor lightweight as "ld a,i". Other compilers and libraries let you specify if you're targetting an nmos or cmos cpu and either do it the hard way or the easy way as appropriate. But it's also done through a call to a subroutine which adds overhead. To replace "di" and "ei" and get the peepholer to work in between, this is an annoying price to pay. I was thinking how it could be done automatically and reliably and the only things that come to mind were writing the entire function in asm or settling for calls to a subroutine to di/ei. Even inlining an ei() / di() function doesn't fool the peepholer :)

Quote

also, about:
Quote
"--reserve-regs-iy" [...] prevents sdcc from using the iy register generally leads to better code

I wonder if it's correct that forcing a compiler between stricter bounds leads to better results... ;)


In hand assembled asm code, ix and iy are rarely used. ix is used by sdcc as frame pointer, so that has to be allowed. However it uses iy frequently and this means the code is slower and larger than hand-assembled code. It's doing that for two reasons as far as I can tell. One is it's using it as an additional 16-bit register with offset without properly costing its use. The other is sdcc mainly considers the z80 an 8-bit processor so it usually misses using 16-bit instructions like "add hl,rp" and "sbc hl,rp". I see a few places where sdcc generates better code with iy but more places where the code turns out larger and slower. The better code with iy is only better because sdcc cannot generate better asm code without it.

Anyway you have an example above which the peepholer manages to fix. Note it's not the compiler that figures out that HL is better used to offset into the stack than IY.

Another odd thing is with "--reserve-regs-iy" selected it seems sdcc considers different code generation options. I've seen better code generated only without "--reserve-regs-iy-" for things that have nothing to do with iy. So sdcc on its own can be a wash -- you need to try with and without to see how code size and speed are impacted.

However, if more peephole rules can be added, it's always better with "--reserve-regs-iy" on. The reason is restricting sdcc's use of iy makes it more like hand-assembled code and gives more opportunity for the peepholer to improve it. In benchmarks we're getting 5-15% reduction in code size and 5-20% increase in speed just with these additional rules. Results are never slower or larger than sdcc's native compiles and having "--reserve-regs-iy" on is necessary for best results.
  View user's profile Send private message Visit poster's website
  • Joined: 17 Nov 2015
  • Posts: 91
  • Location: Canada
Reply with quote
Post Posted: Wed Nov 18, 2015 10:36 pm
Alcoholics Anonymous wrote

The peephole optimizer is supposed to be eliminating dead loads but in this case the peepholer is not running on that code. The reason is the inlined asm. The peepholer will ignore all code up to the last "__endasm;" i


I've looked into this properly now and I was wrong here. What happens is the peepholer is unable to cross an inline asm boundary to determine if registers are live. So in your example the the peepholer is unable to determine if iy is dead or live in order to make the substitution and it's unable to determine if "ld l,0" is a dead load because it can't see the end of the function.
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 1914
Reply with quote
Post Posted: Thu Nov 19, 2015 9:01 am
Alcoholics Anonymous wrote
The problem with __critical is the code only works reliably on CMOS z80s. It's using "ld a,i" to determine whether interrupts are currently enabled and this will not work reliably on nmos z80s. Most systems in the early to mid 80s at least were using nmos z80s but I don't know what was used in SMS/GG. In the case of the code above, it may not re-enable interrupts on rare occasions.


I had read about problems with nmos Z80s, but I could never understand the implications. Your explanation just convinced me that I'd better drop that thing completely.

Alcoholics Anonymous wrote
[...] and the only things that come to mind were writing the entire function in asm or [...]


I'll consider this option. As it is now, however, it's just few cycles slower.
Again, thanks for sharing your knowledge :)
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 1914
Reply with quote
Post Posted: Thu Dec 17, 2015 10:03 am
I'm using my own devkit these days, funny when you realize your own tool isn't really up to the job.
I wonder how is that nobody still felt the need for a few FAST VRAM loading functions, for instance for streaming in a tile (or a few of them) during VBlank... maybe you all gave up using it already? :|

BTW I'm working to make up the deficit. :)
  View user's profile Send private message Visit poster's website
  • Joined: 29 Mar 2012
  • Posts: 308
  • Location: Spain
Reply with quote
Post Posted: Thu Dec 17, 2015 10:57 am
I found that when animating tiles on Gaudream... So, if you solve it, it'll be nice :-D
  View user's profile Send private message
  • Joined: 05 Sep 2013
  • Posts: 1914
Reply with quote
Post Posted: Thu Dec 17, 2015 2:07 pm
kusfo wrote
I found that when animating tiles on Gaudream


Why didn't you tell me? The library nowadays grows only when someone requests something missing and worth adding...

Anyway, update! UNSAFE_SMS_VRAMmemcpy*() functions added, and UNSAFE_SMS_load1Tile() [2Tiles() and 4Tiles()] created for handiness. Note that these should be used ONLY during screen vertical blanking or when screen is turned off. VRAM corruption WILL happen otherwise (on hardware, on emulators you may not see any difference...)

/* VRAM unsafe functions. Fast, but dangerous, can be safely used only during VBlank or when screen is off */
void UNSAFE_SMS_copySpritestoSAT (void);                         /* copy sprites to Sprites Attribute Table */
void UNSAFE_SMS_VRAMmemcpy32 (unsigned int dst, void *src);      /* copy 32 bytes to VRAM */
void UNSAFE_SMS_VRAMmemcpy64 (unsigned int dst, void *src);      /* copy 64 bytes to VRAM */
void UNSAFE_SMS_VRAMmemcpy128 (unsigned int dst, void *src);     /* copy 128 bytes to VRAM */

/* handy macros for UNSAFE_SMS_VRAMmemcpy* functions (can be safely used ONLY during VBlank or when screen is off) */
UNSAFE_SMS_load1Tile(src,theTile)                        /* copy ONE tile to VRAM */
UNSAFE_SMS_load2Tiles(src,tilefrom)                      /* copy TWO tile to VRAM */
UNSAFE_SMS_load4Tiles(src,tilefrom)                      /* copy FOUR tile to VRAM */
  View user's profile Send private message Visit poster's website
  • Joined: 29 Mar 2012
  • Posts: 308
  • Location: Spain
Reply with quote
Post Posted: Thu Dec 17, 2015 2:20 pm
We want to finish Gaudream in time for the first physical releases, so you can expect a lot of feature requests ahead! :-)
  View user's profile Send private message
  • Joined: 07 Oct 2015
  • Posts: 105
Reply with quote
Post Posted: Fri Dec 18, 2015 7:47 am
In the NES every update must be done during VBlank. The library we use, Neslib, provides a couple of structures you can fill during frame time which are interpreted and sent to VRAM during VBlank.

How should we use the "unsafe" operations in SMSlib? Do we have to hook them to the frame interrupt manually, or is there a special way to call them? Would placing them right after "waitForVBlank" and taking care they don't take up much time suffice? (I guess so, just wanted a confirmation).

By the way, I've re-discovered (I mean, I discovered it myself but then I found it was something everybody has been using for 30 years XD) a way to avoid your sprites disappearing when the 8-per-line limit is reached: just send them to the SAT in a different order in each frame. That way, the sprite which disappears is different every frame.

I've implemented this my current NES development and it works quite nicely. I have a list I populate during frame time with every sprite which has to be sent to the SAT (called OAM in the NES). I know the maximum length MAX_SIZE of the list and have calculated an increment value INCREMENT which is prime to this length, for example I have 19 metasprites and use an increment of 8.

In each frame, I start running accross the list starting by a different element each time (in my game, I use "frame_counter & 7" to start on positions 0, 1, 2, ... 7, 0, 1, 2...). Then I iterate MAX_SIZE times, incrementing my index INCREMENT places every iteration, module MAX_SIZE. That way every position in the list is visited once, and in a different order each frame. I just send the corresponding metasprite to the SAT every iteration.

When more than 8 sprites are placed in the same line, the 9th is a different one each time, so instead of sprites disappearing I get flicker, which is way more desirable.

You probably know this technique alread, but I'm share in case somebody doesn't.
  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11824
  • Location: London
Reply with quote
Post Posted: Fri Dec 18, 2015 8:15 am
You can also reverse the order but that can result in unbalanced flickering. You can also prioritise the player sprites over enemies to make the player never flicker. Commercial games would deliberately flicker whole metasprites (as in, remove them from the hardware sprite list) to make them flicker entirely instead of per scan line, once the game figured out that the limit had been breached. That's a lot harder.
  View user's profile Send private message Visit poster's website
  • Joined: 07 Oct 2015
  • Posts: 105
Reply with quote
Post Posted: Fri Dec 18, 2015 8:50 am
Yeah, I'm saving the main player from flickering. I send the main player sprite to the SAT, then apply the technique.

In older games I've made some sprites deliberately flicker. For example, in another yet unreleased project for NES, the main player can fire. Projectiles are sent to the SAT alternatively: projectiles with an odd index in odd frames, projectiles with an even index in even frames.

The possibilities are endless :)

Anyways, back to the point, I will have the need of fast tile pattern data upload to the VRAM in my next SMS project, which is porting to the SMS my current NES game on the works.

In the NES, each sprite in a metasprite can be v-flipped and h-flipped in hardware, but such a thing is not possible in the SMS. In the NES version I use all 256 patterns to store every character in the game facing right, then I use the hardware when I need them facing left - which is not possible in the SMS. So in the SMS, due the lack of space for replicating every pattern facing left and right, I will need to use 8 patterns per main player (there are two) and change the pattern data each frame. That means fetching and uploading 512 bytes per frame.

Will I have enough time during VBlank for the update?
  View user's profile Send private message
  • Joined: 05 Sep 2013
  • Posts: 1914
Reply with quote
Post Posted: Fri Dec 18, 2015 8:58 am
na_th_an wrote
How should we use the "unsafe" operations in SMSlib? Do we have to hook them to the frame interrupt manually, or is there a special way to call them? Would placing them right after "waitForVBlank" and taking care they don't take up much time suffice? (I guess so, just wanted a confirmation).


Yes, that's the simpler approach and the one I suggest (and use). As they take a fixed amount of time you just have to leave a little margin. I would say a good rule of thumb is that each tile loading takes no more than 3 scanlines and SAT update takes 15 scanlines (I mean with MAXSPRITES=64 which is the default).
On a NTSC SMS you've got 70 'blanking' scanlines (262 total lines - 192 screen lines)... and you've got even more of them on PAL. :)

na_th_an wrote
By the way, I've re-discovered (I mean, I discovered it myself but then I found it was something everybody has been using for 30 years XD) a way to avoid your sprites disappearing when the 8-per-line limit is reached: just send them to the SAT in a different order in each frame. That way, the sprite which disappears is different every frame.

I've implemented this my current NES development and it works quite nicely. I have a list I populate during frame time with every sprite which has to be sent to the SAT (called OAM in the NES). I know the maximum length MAX_SIZE of the list and have calculated an increment value INCREMENT which is prime to this length, for example I have 19 metasprites and use an increment of 8.


I'm doing that for the Wekas (and only them) in Waimanu, though I admit your idea of using a prime is cunning and it didn't occur to me, I am simply using a circular queue with some optimizations as to skip unused slots. Oh, well, by now... ;)
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 1914
Reply with quote
Post Posted: Fri Dec 18, 2015 9:01 am
na_th_an wrote
I will need to use 8 patterns per main player (there are two) and change the pattern data each frame. That means fetching and uploading 512 bytes per frame.
Will I have enough time during VBlank for the update?


I would say so :)
  View user's profile Send private message Visit poster's website
  • Joined: 07 Oct 2015
  • Posts: 105
Reply with quote
Post Posted: Fri Dec 18, 2015 9:07 am
That'a awesome news. Now I have to do some tests first to learn a technique I've never used (this is, changing the pattern data ingame!).

I'm sure the same technique can be applied in the SG1000. I will try that too. Monochrome 16x16 sprites would make my game look ugly ;)

Cheers.
  View user's profile Send private message
  • Joined: 01 Feb 2014
  • Posts: 347
Reply with quote
Post Posted: Fri Dec 18, 2015 9:22 am
na_th_an wrote
I will need to use 8 patterns per main player (there are two) and change the pattern data each frame. That means fetching and uploading 512 bytes per frame.
Will I have enough time during VBlank for the update?

If you find you're running out of time, you can always try to reduce the colours of your player sprites. I did this with the Bruce Lee sprite when I needed to get rid of a few cycles. My sprite originally used only 9 colours, so turning it into 3bpp tiles with 8 wasn't complicated and it sped up the constant refreshing of the tiles considerably.

More information on the method here.
  View user's profile Send private message
  • Joined: 05 Sep 2013
  • Posts: 1914
Reply with quote
Post Posted: Fri Dec 18, 2015 9:37 am
Kagesan wrote
If you find you're running out of time, you can always try to reduce the colours of your player sprites. I did this with the Bruce Lee sprite when I needed to get rid of a few cycles.


According to the linked topic, you're going to save 5 cycles every 4 transfer, which means about 8% the best case. So I think one should be quite desperate to resort to this.

If VBlank can be not long enough, I would either:
- move something 'out' of VBlank
- make blanking time longer turning off screen and keeping it off until done
  View user's profile Send private message Visit poster's website
Reply to topic Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8  Next



Back to the top of this page

Back to SMS Power!