Forums

I stumbled upon a few documents/webpages that address the Z80 pre-fetch feature and its effect on instruction cycle counting.
The point here is (it seems) that after a jump you'll have your pre-fetch queue empty and so the first operation might have a penalty, depending on the instruction itself.
For instance the CP (HL) instruction (expected) timing is 7 cycles but, since you need one memory access fetch the byte opcode and a second memory access to read memory pointed by HL, and since each access requires 4 cycles, you'd end up with an additional cycle taken.
An instruction like ADD HL,BC instead, being a single byte instruction that doesn't need any other memory access, it wouldn't ever take more than the expected 11 cycles.

Does that apply to Master System too?

I guess so since it's a standard Z80 with no wait states. It would be interesting to confirm this, though.

it seems to me I can't simply use the Z80 R register to measure this, right? :/

You could use a loop and count the iteration between line interrupts.

The loop below is 23 cycles advertised, so if you set your line interrupt 10 lines apart, that's 2280 cycles, room for 99 iterations, you'll be able to see if there 99 non-prefetched memory accesses.

Loop:
INC (HL)
JR Loop

so I found out a simple way of testing this: by having two slightly different loops.

#define HOWMANY 204
#define SOMERAM 0xd000

void test_A (void) {
__asm
ld b,#HOWMANY
ld hl,#SOMERAM
labelA: dec (hl)
ld a,(#SOMERAM)
djnz labelA
__endasm;
}

void test_B (void) {
__asm
ld b,#HOWMANY
ld hl,#SOMERAM
labelB: ld a,(#SOMERAM)
dec (hl)
djnz labelB
__endasm;
}

As you can see, I'm simply swapping dec (hl) and ld a,(#imm) so that in case A dec (hl) is the first instruction after the djnz, in case B the ld a,(#imm) is the first instruction instead, and in this case it shouldn't take the expected 13 cycles but, according to the document I've found and the fact that it performs 4 memory accesses, it should take 16 cycles instead.

So I just test that this way:

void main (void) {

SMS_autoSetUpTextRenderer();

for (i=0;i<24*2;i++) {
SMS_waitForVBlank();
test_A();
printf(" A=0x%2x ",SMS_getVCount());

SMS_waitForVBlank();
test_B();
printf(" B=0x%2x ",SMS_getVCount());
}
}

and of course, as expected, both MEKA and Emulicious display the same value (I'm reading the VDP's vcounter here) for both tests.

What I did not expect was to see that also my SMS II does display the same values too.

So it's either something wrong on my side or the Z80 I've got in my SMS doesn't behave as expected.

I'm at a loss. What do you think? :|

I think the Z80 only "prefetches" *within* instructions - which is why a two-byte instruction can take less than 8 cycles, but a 1-byte instruction can't take less than 4.

From what I read, each fetch of an opcode first byte is followed by a DRAM refresh so that's why 4 is the minimum.

BTW I'm starting to believe I've just read a bunch of BS, as I couldn't get any different timing no matter the tests I've done (and I've done some 10...)

Author	Message
sverx Joined: 05 Sep 2013 Posts: 3828 Location: Stockholm, Sweden	Z80 cycle count / pre-fetch thing Posted: Fri May 17, 2019 3:56 pm
	I stumbled upon a few documents/webpages that address the Z80 pre-fetch feature and its effect on instruction cycle counting. The point here is (it seems) that after a jump you'll have your pre-fetch queue empty and so the first operation might have a penalty, depending on the instruction itself. For instance the CP (HL) instruction (expected) timing is 7 cycles but, since you need one memory access fetch the byte opcode and a second memory access to read memory pointed by HL, and since each access requires 4 cycles, you'd end up with an additional cycle taken. An instruction like ADD HL,BC instead, being a single byte instruction that doesn't need any other memory access, it wouldn't ever take more than the expected 11 cycles. Does that apply to Master System too?

Maxim Site Admin Joined: 19 Oct 1999 Posts: 14745 Location: London	Posted: Fri May 17, 2019 7:56 pm
	I guess so since it's a standard Z80 with no wait states. It would be interesting to confirm this, though.

sverx Joined: 05 Sep 2013 Posts: 3828 Location: Stockholm, Sweden	Posted: Sun May 19, 2019 10:47 am
	it seems to me I can't simply use the Z80 R register to measure this, right? :/

asynchronous Joined: 14 Aug 2000 Posts: 742 Location: Adelaide, Australia	Z80 cycle count / pre-fetch thing Posted: Sun May 19, 2019 11:35 am
	You could use a loop and count the iteration between line interrupts. The loop below is 23 cycles advertised, so if you set your line interrupt 10 lines apart, that's 2280 cycles, room for 99 iterations, you'll be able to see if there 99 non-prefetched memory accesses. Loop: INC (HL) JR Loop

sverx Joined: 05 Sep 2013 Posts: 3828 Location: Stockholm, Sweden	Posted: Fri Jun 14, 2019 12:20 pm
	so I found out a simple way of testing this: by having two slightly different loops. #define HOWMANY 204 #define SOMERAM 0xd000 void test_A (void) { __asm ld b,#HOWMANY ld hl,#SOMERAM labelA: dec (hl) ld a,(#SOMERAM) djnz labelA __endasm; } void test_B (void) { __asm ld b,#HOWMANY ld hl,#SOMERAM labelB: ld a,(#SOMERAM) dec (hl) djnz labelB __endasm; } As you can see, I'm simply swapping dec (hl) and ld a,(#imm) so that in case A dec (hl) is the first instruction after the djnz, in case B the ld a,(#imm) is the first instruction instead, and in this case it shouldn't take the expected 13 cycles but, according to the document I've found and the fact that it performs 4 memory accesses, it should take 16 cycles instead. So I just test that this way: void main (void) { SMS_autoSetUpTextRenderer(); for (i=0;i<24*2;i++) { SMS_waitForVBlank(); test_A(); printf(" A=0x%2x ",SMS_getVCount()); SMS_waitForVBlank(); test_B(); printf(" B=0x%2x ",SMS_getVCount()); } } and of course, as expected, both MEKA and Emulicious display the same value (I'm reading the VDP's vcounter here) for both tests. What I did not expect was to see that also my SMS II does display the same values too. So it's either something wrong on my side or the Z80 I've got in my SMS doesn't behave as expected. I'm at a loss. What do you think? :\|

Maxim Site Admin Joined: 19 Oct 1999 Posts: 14745 Location: London	Posted: Fri Jun 14, 2019 2:57 pm
	I think the Z80 only "prefetches" within instructions - which is why a two-byte instruction can take less than 8 cycles, but a 1-byte instruction can't take less than 4.

sverx Joined: 05 Sep 2013 Posts: 3828 Location: Stockholm, Sweden	Posted: Fri Jun 14, 2019 3:48 pm
	From what I read, each fetch of an opcode first byte is followed by a DRAM refresh so that's why 4 is the minimum. BTW I'm starting to believe I've just read a bunch of BS, as I couldn't get any different timing no matter the tests I've done (and I've done some 10...)

Forums

View topic - Z80 cycle count / pre-fetch thing