Forums

Sega Master System / Mark III / Game Gear
SG-1000 / SC-3000 / SF-7000 / OMV
Home - Forums - Games - Scans - Maps - Cheats - Credits
Music - Videos - Development - Hacks - Translations - Homebrew

View topic - When does an HBlank start?

Reply to topic
Author Message
  • Joined: 06 Aug 2021
  • Posts: 49
Reply with quote
When does an HBlank start?
Post Posted: Wed Jun 15, 2022 12:39 pm
I'm playing around with interrupts and want to make the most out of the HBlank period, so I started off with a simple palette change of the 0th index sprite palette, which corresponds to the border color.

Using the Event viewer in Emulicious, I wanted to see where the HBlank was starting. I thought that the HBlank interrupt would start as soon as the HBlank period starts, but it looks like it doesn't start until it gets right back to the left side border, meaning if I use an interrupt and want to write to the VDP, I don't have any time before the SMS is drawing to the screen again.

Should I be executing code until I hit the next HBlank period, or am I doing something wrong?

Here is $0038:

;==============================================================
; Interrupt Handler
;==============================================================
.orga $0038
;Swap shadow registers and registers
    ex af, af'
    exx
;Get the status of the VDP
        in a,(VDPCommand)
        ld (VDPStatus), a
;Count the number of interrupts since VBlank
        ld hl, INTNumber
        ld a, (hl)
        inc a
        ld (hl), a
;Do specific scanline-based tasks
        call InterruptHandler
;Swap shadow registers and register back
    exx
    ex af, af'
    ei

;Leave
    reti


And the InterruptHandler:

;Get here after coming from $0038
InterruptHandler:
;Check if we are at VBlank, Bit 7 tells us that
    ld a, (VDPStatus)
    bit 7, a                ;Z is set if bit is 0
    jp nz, +     

    ld a, (INTNumber)
    cp $01
    jp nz, Black

NotBlack:
    ld hl, $c010 | CRAMWrite
    call SetVDPAddress
    ; Next we send the VDP the palette data
    ld (hl), $39
    ld bc, $01
    call CopyToVDP 

    ret

Black:
    ld hl, $c010 | CRAMWrite
    call SetVDPAddress
    ; Next we send the VDP the palette data
    ld hl, color
    ld bc, $01
    call CopyToVDP 

;Return to end Interrupt
    ret


;If we are on the last scanline (VBlank)
+:
;Set  IntNumber to zero
    ld hl, INTNumber
    ld (hl), $00

;Update frame count
    call UpdateFrameCount

;Check what scene we're on
    ld a, (sceneID)
    cp $02
    jp nz, ++ 
 

++:
    ret

HBlank.png (18.31 KB)
Notice where $0038 gets triggered
HBlank.png

  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14690
  • Location: London
Reply with quote
Post Posted: Wed Jun 15, 2022 12:55 pm
In general by the time you take the interrupt it’s some way into the line; games often have delay loops before pushing palette changes so the CRAM dots are offscreen. Other changes like HScroll are latched so you can write them and they take effect on the next line.
  View user's profile Send private message Visit poster's website
  • Joined: 06 Aug 2021
  • Posts: 49
Reply with quote
Post Posted: Wed Jun 15, 2022 1:09 pm
Maxim wrote
In general by the time you take the interrupt it’s some way into the line; games often have delay loops before pushing palette changes so the CRAM dots are offscreen. Other changes like HScroll are latched so you can write them and they take effect on the next line.


Ahhh okay, that makes sense why I wasn't noticing this issue when doing HScroll then. Thanks!
  View user's profile Send private message
  • Joined: 06 Mar 2022
  • Posts: 598
  • Location: London, UK
Reply with quote
Post Posted: Wed Jun 15, 2022 1:31 pm
Last edited by willbritton on Mon Jun 20, 2022 8:35 pm; edited 2 times in total
This is a very interesting subject (if that's the kind of thing that floats your boat!) and I noted from earlier exploration of the docs on this site that the exact point of interrupt was uncertain.

Very quick and possibly flawed back of envelope reasoning:

1. Charles MacDonald provides the following rough guide of horizontal "pixel" duration here, for NTSC:

 Pixels H.Cnt   Description
  256 : 00-7F : Active display
   15 : 80-87 : Right border
    8 : 87-8B : Right blanking
   26 : 8B-ED : Horizontal sync
    2 : ED-EE : Left blanking
   14 : EE-F5 : Color burst
    8 : F5-F9 : Left blanking
   13 : F9-FF : Left border


In NTSC we can estimate a clock cycle as being roughly 262 * 342 * 60 / 3.5x10^6 = ~1.5 "pixels" long.

2. I'm going to make the assumption that the VDP doesn't issue the interrupt until at least the start of HSYNC (in practice it could well be some time later)

3. Interrupt timing for mode 1 isn't in the Z80 data sheet, but there is a very handy treatment by Achim Flammenkamp here.

Note in particular the minimum timing of 13 clock cycles to get to $0038 and also that the interrupt signal won't even be strobed until the current instruction is finished, so with the longest instruction cycles that might be as many as 23 cycles for the CPU to "catch" the interrupt before servicing it.

Given all this, I reckon the absolute soonest you could possibly respond to a horizontal interrupt would be around 13 * 1.5 = ~20 pixels after HSYNC which is still in the HSYNC itself; and in the worst case (13 + 23) * 1.5 = ~54 pixels after HSYNC which is around 4 pixels into the left border - possibly that's where we're seeing it in the screenshot here?

Of course this doesn't include any code after the jump to $0038 which will take more cycles on top.
  View user's profile Send private message Visit poster's website
  • Joined: 06 Aug 2021
  • Posts: 49
Reply with quote
Post Posted: Thu Jun 16, 2022 12:26 pm
willbritton wrote


Given all this, I reckon the absolute soonest you could possibly respond to a horizontal interrupt would be around 13 * 1.5 = ~20 pixels after HSYNC which is still in the HSYNC itself; and in the worst case (13 + 23) * 1.5 = ~54 pixels after HSYNC which is around 4 pixels into the left border - possibly that's where we're seeing it in the screenshot here?


Ooo that's tough if you're doing writes to VDP in that time. That kind of variance could definitely mess up anything that was just barely fitting in the allotted HBlank timing. I think I'll play around with finding out what can and can't be squeezed into that window
  View user's profile Send private message
  • Joined: 05 Sep 2013
  • Posts: 3763
  • Location: Stockholm, Sweden
Reply with quote
Post Posted: Thu Jun 16, 2022 12:35 pm
... yet I'm quite sure there's enough time to change the H_scroll value in time for the next line to get the value, if you do it as quickly as possible.

I have to dig some code, probably, now that I said that...

edit: here!

    ld a,(_next_bg_x_value)      // needs to have the value at hand!
    out (#0xBF),a
    ld a,#0x88                   // write to hscroll VDP register
    out (#0xBF),a
  View user's profile Send private message Visit poster's website
  • Joined: 06 Aug 2021
  • Posts: 49
Reply with quote
Post Posted: Thu Jun 16, 2022 2:46 pm
sverx wrote


    ld a,(_next_bg_x_value)      // needs to have the value at hand!
    out (#0xBF),a
    ld a,#0x88                   // write to hscroll VDP register
    out (#0xBF),a


So this would adjust the scroll speed for all 192 lines independently then? You could get some seriously smooth curves going on with that
  View user's profile Send private message
  • Joined: 06 Mar 2022
  • Posts: 598
  • Location: London, UK
Reply with quote
Post Posted: Thu Jun 16, 2022 3:14 pm
Quote
So this would adjust the scroll speed for all 192 lines independently then? You could get some seriously smooth curves going on with that

Sure, Hang On does just that to render the curving road!

For reference, here are the instructions Hang On runs from $0038:


; @ $0038:
push af
in a, (Port_VDPStatus)
or a
jp p, _RAM_C4D0_              ; condition met
; @ $C4D0:
in a, (Port_VCounter)
cp $5F                        ; decide whether we're far enough down the screen to render road
jr c, _LABEL_3C8_             ; condition not met
ld ($C4DA), a
ld a, ($C500)
out (Port_VDPAddress), a
ld a, $88
out (Port_VDPAddress), a      ; write the horizontal scroll value


Not sure how many cycles that is, but it obviously happens very soon after the interrupt.

Also, and I'm only guessing here based on how I think the VDP background processing would be designed, but I would imagine you have until perhaps 8 or 16 pixels before the left hand drawable edge of the screen to set the horizontal scroll value. That's a good way into the left border, but I doubt it would be "extended" by blanking the left hand column -- maybe though...
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 3763
  • Location: Stockholm, Sweden
Reply with quote
Post Posted: Fri Jun 17, 2022 8:21 am
bofner wrote
sverx wrote


    ld a,(_next_bg_x_value)      // needs to have the value at hand!
    out (#0xBF),a
    ld a,#0x88                   // write to hscroll VDP register
    out (#0xBF),a


So this would adjust the scroll speed for all 192 lines independently then? You could get some seriously smooth curves going on with that


provided you prepare the value for next_bg_x_value every frame and you set the vcounter register to 0... yes
  View user's profile Send private message Visit poster's website
  • Joined: 14 Aug 2000
  • Posts: 740
  • Location: Adelaide, Australia
Reply with quote
When does an HBlank start?
Post Posted: Fri Jun 17, 2022 8:21 am
@sverx were there any instructions between that code snippet and the interrupt vector?
  View user's profile Send private message
  • Joined: 05 Sep 2013
  • Posts: 3763
  • Location: Stockholm, Sweden
Reply with quote
Post Posted: Fri Jun 17, 2022 8:26 am
@asynchronous sure, I'm using devkitSMS so there's quite a bit of instructions: https://github.com/sverx/devkitSMS/blob/master/SMSlib/src/SMSlib.c#L375
  View user's profile Send private message Visit poster's website
  • Joined: 14 Aug 2000
  • Posts: 740
  • Location: Adelaide, Australia
Reply with quote
When does an HBlank start?
Post Posted: Sat Jun 18, 2022 11:19 am
Ah OK, I thought for a second you were able to update H scoll for the same line and not the next line. Superhuman stuff. My bad.
  View user's profile Send private message
  • Joined: 05 Sep 2013
  • Posts: 3763
  • Location: Stockholm, Sweden
Reply with quote
Post Posted: Sat Jun 18, 2022 8:20 pm
ahah no no, I meant you can do it on time to have the scroll changed in the next line.
  View user's profile Send private message Visit poster's website
  • Joined: 06 Mar 2022
  • Posts: 598
  • Location: London, UK
Reply with quote
Post Posted: Tue Jun 21, 2022 3:04 pm
So I did some experimentation with Emulicious because I'm curious about the exact timings.

Not sure how representative of real hardware it is (for one thing having the debugger open does affect the results so does raise a question about CPU load maybe).

Experiment 1: Change the palette very soon after line interrupt

di
in a, (VDP_CMD)
xor a
out (VDP_CMD), a
ld a, $c0
out (VDP_CMD), a
ld a, (backgroundColor)
xor $ff
ld (backgroundColor), a
out (VDP_DATA), a
reti


Results in "palette.png".
The palette visually switches after the 150th pixel on the line, which is WAY further into the scanline than I'd expected, but sanity-checking: those instructions total 93 cycles, so at 1.5 pixels per cycle that's around 140 pixels, plus allowing the VDP some time to actually change the palette that would suggest the CPU receives the interrupt very close to the beginning of the line. Just goes to show how precious those cycles are.

UPDATE: I progressively added more NOPs into the interrupt service code until I got to a max of 185 cycles worth of instructions before things started going pear-shaped, not including the reti. Makes pretty good intuitive sense, as if a CPU cycle is worth ~1.5 pixels then that's ~278 pixels worth of scanline time to do some work.

Experiment 2: Update the horizontal scroll value based on a variable in RAM very soon after line interrupt

di
in a, (VDP_CMD)
and $80
jr nz, +
ld a, (hScrollValue)
inc a
jr ++
+
xor a
++
out (VDP_CMD), a
ld (hScrollValue), a
ld a, $88
out (VDP_CMD), a
reti


Results in "scroll.png".

This doesn't really prove much, except that the horizontal scroll is indeed latched all the way through the scanline - there are no signs of any discontinuities at any point down the screen.

Also implicit here is the fact that you don't get a horizontal interrupt on scanline 0 until after it has rendered - the first scanline has no scroll because it was reset by the frame interrupt.

Experiment 3: Update the horizontal scroll value with the vcounter value very soon after horizontal interrupt

di
in a, (VDP_CMD)
in a, (VCOUNTER)
out (VDP_CMD), a
ld (hScrollValue), a
ld a, $88
out (VDP_CMD), a
reti


Results in vcounter.png

This illustrates the fact that vcounter tracks the actual scanline, and is correct at least at the point that horizontal scroll is set (we presume somewhere similarly far through the scanline as the palette switched in Experiment 1).

Also evident here (and if you zoom right in and measure the pixels you can confirm) is that vcounter is at 193 when the vertical interrupt takes place: that's the value of the scroll position apparent on scanline 0.
palette.png (101.87 KB)
palette.png
scroll.png (120.02 KB)
scroll.png
vcounter.png (120.39 KB)
vcounter.png

  View user's profile Send private message Visit poster's website
  • Joined: 14 Apr 2013
  • Posts: 623
Reply with quote
Post Posted: Tue Jun 21, 2022 7:49 pm
willbritton wrote
Not sure how representative of real hardware it is (for one thing having the debugger open does affect the results so does raise a question about CPU load maybe).

Which events are affected by having the debugger open? I noticed a bug with the reported dot for CPU events such as interrupts and HALT. But other events all look stable to me.

The 1.5 pixels per CPU cycle comes from the different clock rates. The master clock clocks both the CPU and the VDP but the CPU has a divider of 15 in between and the VDP has a divider of 5 and the VDP outputs a pixel every 2 clocks.
  View user's profile Send private message Visit poster's website
  • Joined: 23 Jan 2010
  • Posts: 417
Reply with quote
Post Posted: Tue Jun 21, 2022 8:49 pm
Quote
The 1.5 pixels per CPU cycle comes from the different clock rates. The master clock clocks both the CPU and the VDP but the CPU has a divider of 15 in between and the VDP has a divider of 5 and the VDP outputs a pixel every 2 clocks

This is a curiosity in many years of query. Which is the clock of SMS VDP? The Same that CPU? 3,59? It change if is a SMS 1, SMS2 or GG?
  View user's profile Send private message
  • Joined: 05 Sep 2013
  • Posts: 3763
  • Location: Stockholm, Sweden
Reply with quote
Post Posted: Wed Jun 22, 2022 8:20 am
I think it's fair to presume that the line IRQ gets fired by the VDP as soon as possible into the new line, as I suspect the check happens as soon as the counter gets incremented. Of course the CPU will service that as soon as possible, which means when the current instruction is complete (of course provided that interrupts are enabled otherwise it can happen much later...)
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 3763
  • Location: Stockholm, Sweden
Reply with quote
Post Posted: Wed Jun 22, 2022 8:22 am
segarule wrote
This is a curiosity in many years of query. Which is the clock of SMS VDP? The Same that CPU? 3,59? It change if is a SMS 1, SMS2 or GG?


It's the same clock, different dividers. The VDP gets a clock that's 3 times faster than the CPU, and it takes two clock cycles for each dot. PAL/NTSC devices have a slightly different crystal so the speed is a bit off.
  View user's profile Send private message Visit poster's website
  • Joined: 23 Jan 2010
  • Posts: 417
Reply with quote
Post Posted: Wed Jun 22, 2022 10:03 am
Last edited by segarule on Fri Jun 24, 2022 9:17 am; edited 1 time in total
Quote
It's the same clock, different dividers. The VDP gets a clock that's 3 times faster than the CPU, and it takes two clock cycles for each dot. PAL/NTSC devices have a slightly different crystal so the speed is a bit off.

Thanks.
Ah. 1.5 Pixels make total sense for me, now. (Already explained for Calindro). Im wondering if the master clock pushed to limits will affect CPU and VDP. In NESdev have "PPU dots per CPU cycle= 3". This could explain why games in Nes "seems" more smooth or somebody explained in other forum that 65c02 use few cycles per instruction compared to z80.
  View user's profile Send private message
  • Joined: 06 Mar 2022
  • Posts: 598
  • Location: London, UK
Reply with quote
Post Posted: Wed Jun 22, 2022 11:28 am
segarule wrote
Ah. 1.5 Pixels make total sense for me, now. (Already explained for Calindro). Im wondering if you pushed the master clock to limits will affect CPU and VDP


In a standard system (i.e. with the clock divider mentioned) if you increased the master clock I'm pretty sure you'd get out of sync with the display fairly quickly and you'd lose the picture, but in any case the VDP can only receive data relatively slowly so without modifying your code you'd start losing data on the bus between the CPU and the VDP.

(Also it depends on the CPU, the current range of DIP CMOS Z80s I believe can be clocked up to 10MHz, but the one you find in your original SMS may well not be rated more than 4MHz.)

segarule wrote
This is a curiosity in many years of query. Which is the clock of SMS VDP? The Same that CPU? 3,59? It change if is a SMS 1, SMS2 or GG?


See here for a bit more detail.

Calindro wrote
Which events are affected by having the debugger open? I noticed a bug with the reported dot for CPU events such as interrupts and HALT. But other events all look stable to me.


Not sure specifically, only that the pattern captured on screen changes with the debugger running, basically it looks like the events (palette change and hscroll) happen somewhat later with the debugger open, so that they occur on the next line instead of the same one.

Happy to help investigate if you wanted me to get your some more info, just let me know. Least I can do to pay you back for this incredible tool!

sverx wrote
I think it's fair to presume that the line IRQ gets fired by the VDP as soon as possible into the new line, as I suspect the check happens as soon as the counter gets incremented. Of course the CPU will service that as soon as possible, which means when the current instruction is complete (of course provided that interrupts are enabled otherwise it can happen much later...)


Yeah agreed, and I discuss the time taken for the CPU to respond a little further up; the thing I'm still wondering (and grappling with this for a project I will unveil very soon...) is whether the interrupt / counter is incremented on HSYNC or as soon as the right border starts on the previous line. Not that it particularly matters for game dev of course, only for hardware nerds like me!
  View user's profile Send private message Visit poster's website
  • Joined: 14 Apr 2013
  • Posts: 623
Reply with quote
Post Posted: Wed Jun 22, 2022 4:18 pm
willbritton wrote
Not sure specifically, only that the pattern captured on screen changes with the debugger running, basically it looks like the events (palette change and hscroll) happen somewhat later with the debugger open, so that they occur on the next line instead of the same one.

Happy to help investigate if you wanted me to get your some more info, just let me know. Least I can do to pay you back for this incredible tool!

I don't see how that could happen and I don't seem to be able to reproduce it. I've tried different scenes in different roms and debugger open vs. debugger closed always matched.
I'd appreciate if you could help me reproduce it.
  View user's profile Send private message Visit poster's website
  • Joined: 23 Jan 2010
  • Posts: 417
Reply with quote
Post Posted: Wed Jun 22, 2022 4:56 pm
Quote
In a standard system (i.e. with the clock divider mentioned) if you increased the master clock I'm pretty sure you'd get out of sync with the display fairly quickly and you'd lose the picture, but in any case the VDP can only receive data relatively slowly so without modifying your code you'd start losing data on the bus between the CPU and the VDP.

(Also it depends on the CPU, the current range of DIP CMOS Z80s I believe can be clocked up to 10MHz, but the one you find in your original SMS may well not be rated more than 4MHz.)

I mean my question is without hardware overclock. I had in mind a code exploring fulltime the master clock.

Quote
See here for a bit more detail.

Cool! Thanks. So i can presume that our SMS TMS is 10.7 Mhz, correct?
  View user's profile Send private message
Reply to topic



Back to the top of this page

Back to SMS Power!