- Joined: 05 Sep 2013
- Posts: 3828
- Location: Stockholm, Sweden
|
Z80 cycle count / pre-fetch thing
Posted: Fri May 17, 2019 3:56 pm
|
I stumbled upon a few documents/webpages that address the Z80 pre-fetch feature and its effect on instruction cycle counting.
The point here is (it seems) that after a jump you'll have your pre-fetch queue empty and so the first operation might have a penalty, depending on the instruction itself.
For instance the CP (HL) instruction (expected) timing is 7 cycles but, since you need one memory access fetch the byte opcode and a second memory access to read memory pointed by HL, and since each access requires 4 cycles, you'd end up with an additional cycle taken.
An instruction like ADD HL,BC instead, being a single byte instruction that doesn't need any other memory access, it wouldn't ever take more than the expected 11 cycles.
Does that apply to Master System too?
|
- Site Admin
- Joined: 19 Oct 1999
- Posts: 14745
- Location: London
|
Posted: Fri May 17, 2019 7:56 pm
|
I guess so since it's a standard Z80 with no wait states. It would be interesting to confirm this, though.
|
- Joined: 05 Sep 2013
- Posts: 3828
- Location: Stockholm, Sweden
|
Posted: Sun May 19, 2019 10:47 am
|
it seems to me I can't simply use the Z80 R register to measure this, right? :/
|
- Joined: 14 Aug 2000
- Posts: 742
- Location: Adelaide, Australia
|
Z80 cycle count / pre-fetch thing
Posted: Sun May 19, 2019 11:35 am
|
You could use a loop and count the iteration between line interrupts.
The loop below is 23 cycles advertised, so if you set your line interrupt 10 lines apart, that's 2280 cycles, room for 99 iterations, you'll be able to see if there 99 non-prefetched memory accesses.
Loop:
INC (HL)
JR Loop
|
- Joined: 05 Sep 2013
- Posts: 3828
- Location: Stockholm, Sweden
|
Posted: Fri Jun 14, 2019 12:20 pm
|
so I found out a simple way of testing this: by having two slightly different loops.
#define HOWMANY 204
#define SOMERAM 0xd000
void test_A (void) {
__asm
ld b,#HOWMANY
ld hl,#SOMERAM
labelA: dec (hl)
ld a,(#SOMERAM)
djnz labelA
__endasm;
}
void test_B (void) {
__asm
ld b,#HOWMANY
ld hl,#SOMERAM
labelB: ld a,(#SOMERAM)
dec (hl)
djnz labelB
__endasm;
}
As you can see, I'm simply swapping dec (hl) and ld a,(#imm) so that in case A dec (hl) is the first instruction after the djnz, in case B the ld a,(#imm) is the first instruction instead, and in this case it shouldn't take the expected 13 cycles but, according to the document I've found and the fact that it performs 4 memory accesses, it should take 16 cycles instead.
So I just test that this way:
void main (void) {
SMS_autoSetUpTextRenderer();
for (i=0;i<24*2;i++) {
SMS_waitForVBlank();
test_A();
printf(" A=0x%2x ",SMS_getVCount());
SMS_waitForVBlank();
test_B();
printf(" B=0x%2x ",SMS_getVCount());
}
}
and of course, as expected, both MEKA and Emulicious display the same value (I'm reading the VDP's vcounter here) for both tests.
What I did not expect was to see that also my SMS II does display the same values too.
So it's either something wrong on my side or the Z80 I've got in my SMS doesn't behave as expected.
I'm at a loss. What do you think? :|
|
- Site Admin
- Joined: 19 Oct 1999
- Posts: 14745
- Location: London
|
Posted: Fri Jun 14, 2019 2:57 pm
|
I think the Z80 only "prefetches" *within* instructions - which is why a two-byte instruction can take less than 8 cycles, but a 1-byte instruction can't take less than 4.
|
- Joined: 05 Sep 2013
- Posts: 3828
- Location: Stockholm, Sweden
|
Posted: Fri Jun 14, 2019 3:48 pm
|
From what I read, each fetch of an opcode first byte is followed by a DRAM refresh so that's why 4 is the minimum.
BTW I'm starting to believe I've just read a bunch of BS, as I couldn't get any different timing no matter the tests I've done (and I've done some 10...)
|