Forums

Sega Master System / Mark III / Game Gear
SG-1000 / SC-3000 / SF-7000 / OMV
Home - Forums - Games - Scans - Maps - Cheats - Credits
Music - Videos - Development - Hacks - Translations - Homebrew

View topic - Multiple Instruction Paths for Pentium CPUs?

Reply to topic
Author Message
Chris
  • Guest
Reply with quote
Multiple Instruction Paths for Pentium CPUs?
Post Posted: Tue Nov 16, 1999 2:10 am
I was reading some bizzare article on the pentium family of processors and it said that the pentium,
even though it was a single processor, contained dual or u and v pipelines for instructions. Then,
it went into further detail of explaining the newer Pentium IIs and how they contain an updated
quad or u,v,y,z pipelines for handleing instructions. It was saying that these pipelines could
simultaneously handle instructions from within different sources of memory; including files
and blocks of allocated memory. So you could tell u to follow all of the instructions from
0x0000 portion of memory, tell v to handle instructions found in the 0x6000 block of memory,
and the two pipelines will directly deal and execute with each of the instructions found in
memory; independently.

So, I got to brainstorming. I figured if these newer pentiums harnessed this type of processing
power, why aren't some emulators built around these hidden pipelines? This could prove to
be most useful for arcade machines or even the Sega Genesis because these systems use more
than one processor. So, if you could get these pipelines to independently emulate the different
regions of the board (main cpu, sound cpu, etc) it should greatly improve the speed of today's
emulators, right? Instead of having the main CPU handling and emulating all aspects of
the more advanced systems (which makes interpreted emulation slow), you could have these
pipelines emulate the abilities of a single chip and chain them to communicate together
but perform individual tasks. Imagine the System 16 archetecture. It uses one 68000
processor and one Z80 processor for it's sound. Now if you had the y pipeline emulating and
performing the tasks of the 68000 and you had the z pipeline emulating the Z80 and sending
the data to your sound card or emulated Yamaha sound chip, and they were both co-dependent
of each other (like the real system), wouldn't this greatly improve speed?

Chris :o)
 
  • Joined: 28 Sep 1999
  • Posts: 1197
Reply with quote
Post Posted: Tue Nov 16, 1999 2:50 am

Quote
> and blocks of allocated memory. So you could tell u to follow all of the instructions from
> 0x0000 portion of memory, tell v to handle instructions found in the 0x6000 block of memory,
> and the two pipelines will directly deal and execute with each of the instructions found in
> memory; independently.

This is really hard to explain, but the basic concept is that
instead of getting one instruction and doing the appropriate
actions, the CPU grabs two and executes them in parallel.
These instructions cannot come from different areas, only
from one region. (i.e., in a group of three instructions, the
u pipe would do the first, the v pipe would do the second,
and back to the u pipe for the third). Needless to say there
are many rules about what instructions can be executed at
the same time, especially if an instruction relies on the
results of a previous one.

Your idea about emulating different stuff at the same time
is perfectly valid for multi-CPU systems.

If you want some nitty-gritty info on pipelines, check
some of these links out:

http://www.x86.org
http://developer.intel.com
http://www.sandpile.org
  View user's profile Send private message Visit poster's website
Nyef
  • Guest
Reply with quote
Post Posted: Tue Nov 16, 1999 3:07 am
Quote
> So, I got to brainstorming. I figured if these newer pentiums harnessed this type of processing
> power, why aren't some emulators built around these hidden pipelines? This could prove to

Some are, just not the way you envision. And some of the fast ones just rely on the compiler to
take advantage of them.

There's one NES emulator that managed to get (in an obviously unfair test) 992 FPS on a Celeron
433. That works out to something on the order of 16 x86 cycles to a 6502 cycle. And that's not
counting graphics. Actual games on that setup run at more like 600 FPS or so, but still. You
simply can't do that without that kind of pipelining.

Of course, this simply pisses me off, since my emulator can't even get 60 FPS on my p133 (except
for GG games). Hopefully this will change soon.

Quote
> be most useful for arcade machines or even the Sega Genesis because these systems use more
> than one processor. So, if you could get these pipelines to independently emulate the different
> regions of the board (main cpu, sound cpu, etc) it should greatly improve the speed of today's
> emulators, right? Instead of having the main CPU handling and emulating all aspects of

Too bad the Pentium doesn't work that way, right? And the synchronization problems would be a bitch
and a half even if it did.

How about just switching to dynarec? Even for the 8-bit systems that can give a nice speed boost.

Quote
> Chris :o)

--Nyef
 
Eric
  • Guest
Reply with quote
Post Posted: Tue Nov 16, 1999 4:30 pm
Your thoughts have already been responded to (by Nyef and Charles Mac Donald) but I thought I'd add my two cents, and fill in some other information:

Quote
> I was reading some bizzare article on the pentium family of processors and it said that the pentium,
> even though it was a single processor, contained dual or u and v pipelines for instructions.

This is correct.

Quote
> Then,
> it went into further detail of explaining the newer Pentium IIs and how they contain an updated
> quad or u,v,y,z pipelines for handleing instructions.

This is almost correct. The Pentium Pro, Pentium IIs, Celerons, and Pentium III's are all Out-Of-Order machines. This basically means that the processor fetches a group of instructions, and executes them in any order (subject to data dependency limitations), as soon as the available resources (such as floating-point unit) are available. There are five execution paths on the Pentium Pro and Pentium II's (I don't know about the Pentium III's). These paths are not pipelines in the traditional sense. It's better to think of these processors as having a single pipeline, that splits five-ways during the execution stage. It should also be noted, that these five paths are not able to execute any instruction, four of the five are special purpose paths, and can only used if there happens to be the right mix of instructions in the buffer.

Quote
>It was saying that these pipelines could
> simultaneously handle instructions from within different sources of memory; including files
> and blocks of allocated memory. So you could tell u to follow all of the instructions from
> 0x0000 portion of memory, tell v to handle instructions found in the 0x6000 block of memory,
> and the two pipelines will directly deal and execute with each of the instructions found in
> memory; independently.

Unfortunately, this is not true. All of these processors fetch instructions from a single location in memory, examine them in groups, and execute as many as possible (either with dual-pipes, like the Pentium, or with advanced data-flow analysis, and out-of-order execution, like the Pentium Pros, etc.)

Quote
> So, I got to brainstorming. I figured if these newer pentiums harnessed this type of processing
> power, why aren't some emulators built around these hidden pipelines? This could prove to
> be most useful for arcade machines or even the Sega Genesis because these systems use more
> than one processor. So, if you could get these pipelines to independently emulate the different
> regions of the board (main cpu, sound cpu, etc) it should greatly improve the speed of today's
> emulators, right? Instead of having the main CPU handling and emulating all aspects of
> the more advanced systems (which makes interpreted emulation slow), you could have these
> pipelines emulate the abilities of a single chip and chain them to communicate together
> but perform individual tasks. Imagine the System 16 archetecture. It uses one 68000
> processor and one Z80 processor for it's sound. Now if you had the y pipeline emulating and
> performing the tasks of the 68000 and you had the z pipeline emulating the Z80 and sending
> the data to your sound card or emulated Yamaha sound chip, and they were both co-dependent
> of each other (like the real system), wouldn't this greatly improve speed?

As Charles Mac Donald said, this idea is only feasible on multiprocessor systems.

So, then how do you write an emulator to get the most speed out of these processors? Well, unfortunately, you have to decide which processor you want to target.

For the Pentium, the trick is finding pairs of instructions which can be executed together (one in the U pipe, and one in the V pipe.) There are all kinds of resources available on this subject.

For the P6 family (Pentium Pro, Pentium II, Celeron, Pentium III) the trick, again, is to group instructions so that all five execution paths stay busy as much as possible. (This is extremely difficult without an intimate knowledge of the micro-architecture).

There are many other issues involved with optimizing programs for these processors. If you're curious, see the following web-site: http://developer.intel.com/vtune/cbts/refman.htm.

Remember, though, that your choice of algorithm will ALWAYS have a greater affect your program's performance than finely tuning how you group instructions.

Eric
 
Nyef
  • Guest
Reply with quote
Post Posted: Tue Nov 16, 1999 6:58 pm
Quote
> Remember, though, that your choice of algorithm will ALWAYS have a greater affect your
> program's performance than finely tuning how you group instructions.

This, combined with using a p133 (non-MMX, thank you) system as my main dev platform and
a portability requirement, is why I have been reducing the amount of ASM actually used in
DarcNES. Which reminds me, I should probably remove the ASM CPU cores from the main
distribution, they aren't used anymore. :-)

--Nyef
 
Chris
  • Guest
Reply with quote
Dynarec
Post Posted: Wed Nov 17, 1999 3:40 am
I'm seriously considering getting involved with this whole dynamic recompilation method of
emulating systems because the interpretation method stinks because it takes way too much
out of any PC. It makes sense too.

Envision a guy named Mr. Speedy who is 10 times faster at doing anything that a human can. He
can run faster than your eyes can vision, he can speak faster than your ears can respond, and
he is extremely intelegent. Mr. Speedy, one day, is watching an old movie that's showing a
band playing a song live. The band is compiled of one singer, one dancer, one drumist,
and one pianist; a total of 4 band members. The're all independently performing their own
tasks, such as the pianist reading his notes from his sheet music and playing them and the
dancer shaking his groove thang to the beautiful tones of the singer, yet they are all controlled
and maintained by the rhythmic beats of the drummer. Mr. Speedy, who is (keep in mind)
faster than the average human thinks that he's better than the band just because he's faster.
So he decides to compete at a local talent show as a "one man band" and he's going to
perform the live song done by the band on the video tape.

So, the night of the talent show Mr. Speedy, prepares himself for the show by watching the
video again, just before they call him up on stage. They announce his name and she steps
up onto the stage. All of the instruments needed to play the song are all adjacently scattered
out; leaving Mr. Speedy in the center. So, he dashes over to the piano, quickly reads some
notes, figures them out and plays 1 chord on the piano, then he dashes over to the drum and
plays one base kick, then he hauls ass over to the position of the dancer and poses (just enough
to let the audience see him), and finally he runs to the position of the singer, sings a note
or two, and poses again (just enough to let the audience see him). Remember that Mr. Speedy
is 10 times faster than the average human, so if he were to mimic and physically perform all
of the actions of the band at his pace that it would be one big blur and the music would sound
like one big mess. So little by little, he as to perform the tasks of the band, while at the same
time pausing within certain times in milliseconds to allow the audience to observe and listen
to what is going on. So, at this pace, the audience see's the illusion (animation) of 4 Mr.
Speedys playing the song as a live band. Most of the people watching this show are shocked
and entertained as this one person performs the actions of the live band. But some members
of the audience (the gurus and fans of the band) are not happy with Mr. Speedy's routine. They
complain that his dance just isn't right, and the notes that he plays on the piano are just a little
off, and the drumming isn't up to par, and so on.

Interpretation just stinks when it comes to emulation sometimes. And I know what you're thinking,
"Well, re-compilation isn't going to be as accurate either.". True, you are right. But at least
this method will slove the speed issue. It's a shame that a system such as the Playstation
which runs at 60 Mhz (I think) cannot run at full frameskip on my Celeron (PentiumII/MMX)
366Mhz.

Chris :o|
 
  • Joined: 12 Jul 1999
  • Posts: 891
Reply with quote
This is something I've pondered
Post Posted: Wed Nov 17, 1999 8:45 am
Quote
> I'm seriously considering getting involved with this whole dynamic recompilation method of
> emulating systems because the interpretation method stinks because it takes way too much
> out of any PC. It makes sense too.

> Envision a guy named Mr. Speedy who is 10 times faster at doing anything that a human can. He
> can run faster than your eyes can vision, he can speak faster than your ears can respond, and
> he is extremely intelegent. Mr. Speedy, one day, is watching an old movie that's showing a
> band playing a song live. The band is compiled of one singer, one dancer, one drumist,
> and one pianist; a total of 4 band members. The're all independently performing their own
> tasks, such as the pianist reading his notes from his sheet music and playing them and the
> dancer shaking his groove thang to the beautiful tones of the singer, yet they are all controlled
> and maintained by the rhythmic beats of the drummer. Mr. Speedy, who is (keep in mind)
> faster than the average human thinks that he's better than the band just because he's faster.
> So he decides to compete at a local talent show as a "one man band" and he's going to
> perform the live song done by the band on the video tape.

> So, the night of the talent show Mr. Speedy, prepares himself for the show by watching the
> video again, just before they call him up on stage. They announce his name and she steps
> up onto the stage. All of the instruments needed to play the song are all adjacently scattered
> out; leaving Mr. Speedy in the center. So, he dashes over to the piano, quickly reads some
> notes, figures them out and plays 1 chord on the piano, then he dashes over to the drum and
> plays one base kick, then he hauls ass over to the position of the dancer and poses (just enough
> to let the audience see him), and finally he runs to the position of the singer, sings a note
> or two, and poses again (just enough to let the audience see him). Remember that Mr. Speedy
> is 10 times faster than the average human, so if he were to mimic and physically perform all
> of the actions of the band at his pace that it would be one big blur and the music would sound
> like one big mess. So little by little, he as to perform the tasks of the band, while at the same
> time pausing within certain times in milliseconds to allow the audience to observe and listen
> to what is going on. So, at this pace, the audience see's the illusion (animation) of 4 Mr.
> Speedys playing the song as a live band. Most of the people watching this show are shocked
> and entertained as this one person performs the actions of the live band. But some members
> of the audience (the gurus and fans of the band) are not happy with Mr. Speedy's routine. They
> complain that his dance just isn't right, and the notes that he plays on the piano are just a little
> off, and the drumming isn't up to par, and so on.

> Interpretation just stinks when it comes to emulation sometimes. And I know what you're thinking,
> "Well, re-compilation isn't going to be as accurate either.". True, you are right. But at least
> this method will slove the speed issue. It's a shame that a system such as the Playstation
> which runs at 60 Mhz (I think) cannot run at full frameskip on my Celeron (PentiumII/MMX)
> 366Mhz.

> Chris :o|

I can see your point.
I've often thought that an 'emulator' should be called an 'interpreter' as basically that's what they do, isn't it? They interpret the instructions from Z80 or 68000 or whatever and 'interpret' it for your x86-based machine.
I've often wanted to know if it was possible for somebody to make a program that takes the ROM, interprets all of the instructions into the x86 equivalent and the compiles an x86 executable file from the resulting data.
This may be the long way about it, but I think the result would be a faster in the long run. Maybe even a Genesis game could then be run at full speed on a 486.
Is this even feasable?
If so, then it could even be done in Visual Basic, methinks.
Just pondering,
~unfnknblvbl
  View user's profile Send private message
Nyef
  • Guest
Reply with quote
Re: Dynarec
Post Posted: Wed Nov 17, 1999 1:30 pm
Quote
> I'm seriously considering getting involved with this whole dynamic recompilation method of
> emulating systems because the interpretation method stinks because it takes way too much
> out of any PC. It makes sense too.

If you haven't written an emulator using an interpretive core, you have no buisness messing with
dynarec. Especially you, Chris. I've yet to see evidence that you even know what's involved in
writing an interpretive core, let alone a dynarec one.

I want emulation speed, too. But only to a point. And right now writing a dynarec core is the last
thing I would consider for speed (I have several things I can do to my graphics renderers to speed
things up, there are several optimizations I can do to the interpretive cores I'm using, there's a couple
spots where the compiler I'm using is generating code that is less than optimal (movzx on a p133 when
the top 3 bytes are going to be 0 anyway _sucks_)).

Quote
> Interpretation just stinks when it comes to emulation sometimes. And I know what you're thinking,
> "Well, re-compilation isn't going to be as accurate either.". True, you are right.

No, I'm thinking "Chris is talking out his ass again". Dynarec is an impressive technique, but it's not
exactly worthwhile when dealing with single processor 8-bit systems like we are here. And it's not
simple in the least.

If one can get 60 FPS on a p133 or p120 using an interpretive core, then what point dynarec?

Now, if you _can't_ get 60 FPS on a p133 (this takes a "16-bit" system, 8-bit systems can hit 60
using mainly C code and only a smattering of ASM (for when the compiler is being stupid)), then
using dynarec is more reasonable.

All this assumes that you have a p133 or p120 to test with. Optimizing for more than 60 FPS on
your target platform (which you had damned well better have on your desk) is a waste of time.

Quote
> But at least
> this method will slove the speed issue. It's a shame that a system such as the Playstation
> which runs at 60 Mhz (I think) cannot run at full frameskip on my Celeron (PentiumII/MMX)
> 366Mhz.

With a dynarec core, no less (all PSX emulators that I know of use dynarec). The CPU core isn't
everything (and to think that the r3000 is actually a pretty simple CPU that takes to dynarec quite
nicely, too).

So dynarec doesn't automatically solve the "speed issue". And I believe the PSX uses a 25 MHz
clock.

And wouldn't "full frameskip" mean that none of the frames are displayed? :-)

Quote
> Chris :o|

--Nyef
 
  • Site Admin
  • Joined: 25 Oct 1999
  • Posts: 2029
  • Location: Monterey, California
Reply with quote
Re: Dynarec
Post Posted: Wed Nov 17, 1999 3:53 pm
Quote
> > I'm seriously considering getting involved with this whole dynamic recompilation method of
> > emulating systems because the interpretation method stinks because it takes way too much
> > out of any PC. It makes sense too.

> If you haven't written an emulator using an interpretive core, you have no buisness messing with
> dynarec. Especially you, Chris. I've yet to see evidence that you even know what's involved in
> writing an interpretive core, let alone a dynarec one.

I am a little concerned that Chris is biting off more than he can chew. Although writing a z80 interpreter (emulator) isn't unfathomably difficult (the ideas behind it aren't too hard to grasp, getting all the flag behavior and such down takes some time however), a prospective author should at least be fluent in their language of choice (C..) as well as the target processor (has he written much z80 code?)

Quote
> I want emulation speed, too. But only to a point. And right now writing a dynarec core is the last
> thing I would consider for speed (I have several things I can do to my graphics renderers to speed
> things up, there are several optimizations I can do to the interpretive cores I'm using, there's a couple
> spots where the compiler I'm using is generating code that is less than optimal (movzx on a p133 when
> the top 3 bytes are going to be 0 anyway _sucks_)).

> > Interpretation just stinks when it comes to emulation sometimes. And I know what you're thinking,
> > "Well, re-compilation isn't going to be as accurate either.". True, you are right.

Um.. why wouldn't it be accurate?
Assuming there were no bugs in the recompiler (of course)

Quote
> Dynarec is an impressive technique, but it's not
> exactly worthwhile when dealing with single processor 8-bit systems like we are here. And it's not
> simple in the least.

I could see dynamic recompilation being handy for writing emulators for more limited platforms (sms emulators for, say... Digita Cameras, Playstations (does playstation mastergear get 60 fps?)).

Quote
> If one can get 60 FPS on a p133 or p120 using an interpretive core, then what point dynarec?

> Now, if you _can't_ get 60 FPS on a p133 (this takes a "16-bit" system, 8-bit systems can hit 60
> using mainly C code and only a smattering of ASM (for when the compiler is being stupid)), then
> using dynarec is more reasonable.

> All this assumes that you have a p133 or p120 to test with. Optimizing for more than 60 FPS on
> your target platform (which you had damned well better have on your desk) is a waste of time.

I'd have to agree. If an SMS emulator doesn't run full speed on any computer sold as 'new' in the last four years, the bottleneck is probably somewhere other than the z80 core, unless it's terribly poorly written.

Quote
> > But at least
> > this method will slove the speed issue. It's a shame that a system such as the Playstation
> > which runs at 60 Mhz (I think) cannot run at full frameskip on my Celeron (PentiumII/MMX)
> > 366Mhz.

Why should it? The playstation has quite a few tricks up it's old, tattered sleeve, after all: A seperate coprocessor for doing geometry transforms and lighting effects (we PC users won't have anything like that until the GeForce256 cards are out), other chips which can move data, draw sprites, rasterize texturemapped triangles, a fairly complicated sound system (24 channels of compressed audio with DSP effects).. you can see that there's a lot for your poor celeron to deal with besides the little ol' r3000.

Quote
> With a dynarec core, no less (all PSX emulators that I know of use dynarec). The CPU core isn't
> everything (and to think that the r3000 is actually a pretty simple CPU that takes to dynarec quite
> nicely, too).

Very nicely. No status flags to deal with, just an array of 32 longs for the CPU regs (and one is always zero), a very small instruction set, and that branch-delay weirdness ( I guess one need only stick the branch after the delayed instruction in the recompiled core.)

Quote
> So dynarec doesn't automatically solve the "speed issue". And I believe the PSX uses a 25 MHz
> clock.

It's a 33mhz R3000A. I've heard of a few strange fellows overclocking theirs to 40mhz, so Gran Tourismo wouldn't drop frames anymore.
In case anyone asks, the N64 uses a 90mhz r4000.. hey, it really -is- 64-bit!

Quote
> And wouldn't "full frameskip" mean that none of the frames are displayed? :-)

As opposed to partial frameskip, where only the top half of the screen is drawn.
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 25 Oct 1999
  • Posts: 2029
  • Location: Monterey, California
Reply with quote
Re: This is something I've pondered
Post Posted: Wed Nov 17, 1999 4:22 pm
:o|

Quote
> I can see your point.
> I've often thought that an 'emulator' should be called an 'interpreter' as basically that's what they do, isn't it? They interpret the instructions from Z80 or 68000 or whatever and 'interpret' it for your x86-based machine.

Well the processor emulator is an interpreter... I don't think that word would apply to the emulation of the other aspects of the hardware... in a sense, yes, but 'emulator' seems like a better match.

Quote
> I've often wanted to know if it was possible for somebody to make a program that takes the ROM, interprets all of the instructions into the x86 equivalent and the compiles an x86 executable file from the resulting data.

Like chris, you're assigning too much importance to the CPU and not enough to the dedicated hardware that makes up every game system. A Neo-Geo and a Genesis have the same processors in common ( a 68000 and a z80), but there's not much else in them that is the same.

The CPU's task in a console system is a lot more complicated that just emitting a 320 x 240 array of pixel data to convert to the final image on the television.

Quote
> This may be the long way about it, but I think the result would be a faster in the long run. Maybe even a Genesis game could then be run at full speed on a 486.

There have been a few who've worked in this direction... I think there was even an NES emulator that let you 'compile' an .nes file into a playable, stand alone executable for a PC (I think it just appended the the .nes file onto a precompiled nes emulator, though).
This really isn't the best way to go. First of all, remember that all data and cpu code look the same to the compiler and the genesis (bytes is bytes is bytes, after all...)... The emulator knows nothing of the behavior of the cart until it is run... it would have to assume any data addressable by the cpu could in fact be a CPU instruction, because the CPU could conceivably branch anywhere in cartridge memory... You'd end up with an executable several times large than the original file... before you add the original data back to it.
Second.. the genesis has two CPU's... most PC's have just one.

And third... you still need emulators for all the other special purpose chips in the genesis... and there's a lot of work there: tile drawing, dual-plane scrolling, sprite hardware, collision detection, raster interrupts, FM sound, controller input, fm synthesis, psg sound, pcm sound... I'm sure I'm forgetting a lot, too.

These support chips do not run 'programs'... their behavior is implicit in their design. They only receive data from the processor and send data back (in some instances). And most of these hardware functions can npot be immediately mapped to your PC's hardware (which does not have sprites, tile mapped modes, collision detection, psg sound, may not have an FM chip, etc...). So your PC must interperet them, convert graphics data on the fly to a format that your video card can understand... process and mix sound to send it to your sound card, translate input from your keyboard or game pad and convert it into a format the genesis program will understand, etc, etc...

A better paradigm for console systems is to think of the CPU(s) as orchestrating the behavior of several dedicated sub-systems that operate at the same time as a CPU. As you can imagine, that's a lot of work for the emulator to do, and it also has the job of making sure the timing and communcation between the emulated hardware is accurate.

And, in fact, most emulators don't spend all that much time on CPU emulation... even genesis emulators.

Quote
> Is this even feasable?
> If so, then it could even be done in Visual Basic, methinks.
> Just pondering,
> ~unfnknblvbl
  View user's profile Send private message Visit poster's website
Eric
  • Guest
Reply with quote
Re: This is something I've pondered
Post Posted: Wed Nov 17, 1999 4:47 pm
Quote
> I've often wanted to know if it was possible for somebody to make a program that takes the ROM, interprets all of the instructions into the x86 equivalent and the compiles an x86 executable file from the resulting data.
> This may be the long way about it, but I think the result would be a faster in the long run. Maybe even a Genesis game could then be run at full speed on a 486.
> Is this even feasable?

This idea has been floating around for a while. For lack of a better term, I believe it's called Static-Recompilation. Meaning, the entire program is recompiled before execution.
There are two difficulties: Self-modifying code, and Data.

Static-Recompilation will (almost) never work with a program that contains self-modifying code, unless all the modification are known in advance, and a table of the proper code sequences can be created.

The other problem is data. It's necessary to know where in the ROM data is being stored so it isn't re-compiled. One, naive, solution to this would be to recompile the whole ROM, and also keep a complete duplicate of the whole ROM as pure data. This , however, can result in significant waste of space.


Finally, a note to Chris:

"DynaRec" is one of emulation's biggest "buzzwords": everyone has heard it, and very few know what it means, (though, many on this message board do.) I'd like to dispell a myth about dynamic recompilation right now: It does not in ANY way guarantee better performance. The choice to use dynamic-recompilation in your emulator should be based on a thorough study of the programs the emulator will run. Only then will you know whether dynamic recompilation is suitable for your emulator.

The principle of dynamic-recompilation is to avoid the overhead of repeatedly translating (interpreting) instructions. It accomplishes this nicely, but at what cost? Here are some of the issues you should examine if considering dynamic-recompilation:

Do you know what a "Basic Block" is?
Once you've translated a block of instructions, where do you store them?
How do you retrieve the translation at a later point, if the program jumps back to that code?
How big should the "cache" of translated instruction blocks be?
The act of translating a block and storing (tagging its location so you can find the block later) takes longer than simply interpreting it, plus it's not available for execution until the whole block is done. What if the emulator encounters a previously untranslated block in a particularly timing-sensitive area, your emulator may not run smoothly. Is this acceptable? How will you handle this?

Now, the above are issues involved with the most fundamental dyna-rec approach. The REAL benefit of dynamic-recompilation, however, comes from the ability to examine the translated instruction blocks and perform optimizations on them. For example, if your emulator ran on x86 machines, you could then perform x86 optimizations on the translated code. Understand, though, that this optimization takes time, and the time you shave off the code had better, over the course of running the ROM, be more than the time it took you to optimize the code. This kind of optimization brings forward issues all its own.

Basically, my point to all this is: Dynamic-Recompilation has its place, but it should only be attempted by the most savvy emulator authors. A naive dyna-rec scheme will almost always result in lower performance than an interpretive approach.

You make up your own mind, and don't believe everything you hear about dynamic-recompilation. There's a lot of misinformation out there. However, this message board is graced with a lot of very competent people, you may want to listen to them, (and no, I'm not talking about myself :-).

Good luck.

Eric
 
Nyef
  • Guest
Reply with quote
Re: This is something I've pondered
Post Posted: Wed Nov 17, 1999 5:05 pm
Quote
> :o|

Well said. That's just about my reaction, too. :-/

Quote
> Well the processor emulator is an interpreter... I don't think that word would apply to the emulation of the other aspects of the hardware... in a sense, yes, but 'emulator' seems like a better match.

If the CPU core is an interpreter, then a dynarec core is a JIT compiler. And feel free to call each component a "simulator".

Quote
> > I've often wanted to know if it was possible for somebody to make a program that takes the ROM, interprets all of the instructions into the x86 equivalent and the compiles an x86 executable file from the resulting data.

> Like chris, you're assigning too much importance to the CPU and not enough to the dedicated hardware that makes up every game system. A Neo-Geo and a Genesis have the same processors in common ( a 68000 and a z80), but there's not much else in them that is the same.

> The CPU's task in a console system is a lot more complicated that just emitting a 320 x 240 array of pixel data to convert to the final image on the television.

And in case you're wondering, somewhere out there is a large (very large) text file exploring the possibility of this sort of NES->EXE translation, and it doesn't look pretty.

Quote
> > This may be the long way about it, but I think the result would be a faster in the long run. Maybe even a Genesis game could then be run at full speed on a 486.

> There have been a few who've worked in this direction... I think there was even an NES emulator that let you 'compile' an .nes file into a playable, stand alone executable for a PC (I think it just appended the the .nes file onto a precompiled nes emulator, though).

This wouldn't really suprise me. Actually doing a proper job of it would take some really heavy analysis.

Quote
> This really isn't the best way to go. First of all, remember that all data and cpu code look the same to the compiler and the genesis (bytes is bytes is bytes, after all...)... The emulator knows nothing of the behavior of the cart until it is run... it would have to assume any data addressable by the cpu could in fact be a CPU instruction, because the CPU could conceivably branch anywhere in cartridge memory... You'd end up with an executable several times large than the original file... before you add the original data back to it.

And even if there is a 1:1 correlation of Z80/6502->x86 instructions, the file would be larger. Proving this is left as an excercise for the reader (especially the Z80 case, since I'm not 100% sure about it).

Quote
> Second.. the genesis has two CPU's... most PC's have just one.

For which favor, much thanks.

Quote
> And third... you still need emulators for all the other special purpose chips in the genesis... and there's a lot of work there: tile drawing, dual-plane scrolling, sprite hardware, collision detection, raster interrupts, FM sound, controller input, fm synthesis, psg sound, pcm sound... I'm sure I'm forgetting a lot, too.

No, that's a pretty complete list. You have FM synth in there twice, though. If you want the 32x or SCD you have more, though.

Quote
> These support chips do not run 'programs'... their behavior is implicit in their design. They only receive data from the processor and send data back (in some instances).

Which is a very good argument for not calling them "interpreters". This is also where most of the emulation time will tend to be spent.

Quote
> And most of these hardware functions can npot be immediately mapped to your PC's hardware (which does not have sprites, tile mapped modes, collision detection, psg sound, may not have an FM chip, etc...). So your PC must interperet them, convert graphics data on the fly to a format that your video card can understand... process and mix sound to send it to your sound card, translate input from your keyboard or game pad and convert it into a format the genesis program will understand, etc, etc...

Most speedups here involve doing work up front rather than when the result is needed, and using as many space for time tradeoffs as possible.

Quote
> A better paradigm for console systems is to think of the CPU(s) as orchestrating the behavior of several dedicated sub-systems that operate at the same time as a CPU.

Perhaps that's a better paradigm for when you're programming them, but when you're emulating sometime other paradigms work just as well or better. For example, most timing constraints are posed by the video system, so why not let that run the show?

Quote
> As you can imagine, that's a lot of work for the emulator to do, and it also has the job of making sure the timing and communcation between the emulated hardware is accurate.

That reminds me. I still haven't fixed my SMS PSG to use the cycle counts from the CPU to handle register write timing...

Quote
> And, in fact, most emulators don't spend all that much time on CPU emulation... even genesis emulators.

Which is why we still use C CPU cores... even genesis emulators.

Genesis/SNES is about the complexity level that dynarec and such actually start paying off at. Especially on slower computers.

Quote
> > Is this even feasable?

Nope, so sorry. Try doing the translation on the fly.

Quote
> > If so, then it could even be done in Visual Basic, methinks.

Nope, so sorry. Try using either ASM, C, or even Delphi.

--Nyef
 
Nyef
  • Guest
Reply with quote
Re: Dynarec
Post Posted: Wed Nov 17, 1999 5:36 pm
Quote
> I am a little concerned that Chris is biting off more than he can chew. Although writing a z80 interpreter (emulator) isn't unfathomably difficult (the ideas behind it aren't too hard to grasp, getting all the flag behavior and such down takes some time however), a prospective author should at least be fluent in their language of choice (C..) as well as the target processor (has he written much z80 code?)

Fluency with the target processor is by no means required for an experienced coder. Chris manifestly is not such.

Quote
> > > Interpretation just stinks when it comes to emulation sometimes. And I know what you're thinking,
> > > "Well, re-compilation isn't going to be as accurate either.". True, you are right.

> Um.. why wouldn't it be accurate?
> Assuming there were no bugs in the recompiler (of course)

The only scenarios that come to mind involve execution from I/O space, and I/O access that have to be
perfectly emulated. Of course, most interpretive cores would fall down here too, but they at least have a
fighting chance.

Quote
> > Dynarec is an impressive technique, but it's not
> > exactly worthwhile when dealing with single processor 8-bit systems like we are here. And it's not
> > simple in the least.

> I could see dynamic recompilation being handy for writing emulators for more limited platforms (sms emulators for, say... Digita Cameras, Playstations (does playstation mastergear get 60 fps?)).

Okay, here it's probably worth it. For the N64, however, it probably isn't. I have no idea if it would be worth it for the Saturn, either.

Quote
> > If one can get 60 FPS on a p133 or p120 using an interpretive core, then what point dynarec?

> > Now, if you _can't_ get 60 FPS on a p133 (this takes a "16-bit" system, 8-bit systems can hit 60
> > using mainly C code and only a smattering of ASM (for when the compiler is being stupid)), then
> > using dynarec is more reasonable.

> > All this assumes that you have a p133 or p120 to test with. Optimizing for more than 60 FPS on
> > your target platform (which you had damned well better have on your desk) is a waste of time.

> I'd have to agree. If an SMS emulator doesn't run full speed on any computer sold as 'new' in the last four years, the bottleneck is probably somewhere other than the z80 core, unless it's terribly poorly written.

And if it's Marat's core, then even being poorly written isn't enough of an excuse. It takes a special
talent to write a CPU core that is so slow that it can't run at full speed on a p133 (not that I haven't
seen it done). Marat doesn't have this talent (for which favor, much thanks).

Quote
> > > But at least
> > > this method will slove the speed issue. It's a shame that a system such as the Playstation
> > > which runs at 60 Mhz (I think) cannot run at full frameskip on my Celeron (PentiumII/MMX)
> > > 366Mhz.

> Why should it? The playstation has quite a few tricks up it's old, tattered sleeve, after all: A seperate coprocessor for doing geometry transforms and lighting effects (we PC users won't have anything like that until the GeForce256 cards are out), other chips which can move data, draw sprites, rasterize texturemapped triangles, a fairly complicated sound system (24 channels of compressed audio with DSP effects).. you can see that there's a lot for your poor celeron to deal with besides the little ol' r3000.

Indeed. And don't forget the MDEC hardware. And there are other systems with these problems. The Saturn comes to mind...

Quote
> > With a dynarec core, no less (all PSX emulators that I know of use dynarec). The CPU core isn't
> > everything (and to think that the r3000 is actually a pretty simple CPU that takes to dynarec quite
> > nicely, too).

> Very nicely. No status flags to deal with, just an array of 32 longs for the CPU regs (and one is always zero), a very small instruction set, and that branch-delay weirdness ( I guess one need only stick the branch after the delayed instruction in the recompiled core.)

Ease of implementation alone is enough to make me want to write a dynarec r3000 core. And the PSX is enough justification to do so... When I finish with everything else on my TODO list, that is. :-)

Quote
> > So dynarec doesn't automatically solve the "speed issue". And I believe the PSX uses a 25 MHz
> > clock.

> It's a 33mhz R3000A. I've heard of a few strange fellows overclocking theirs to 40mhz, so Gran Tourismo wouldn't drop frames anymore.

Fun. If it has an MMU, I should try porting Linux to it... :-)

Quote
> In case anyone asks, the N64 uses a 90mhz r4000.. hey, it really -is- 64-bit!

I thought it was an r4400? But yes, it really is a 64-bit system.

Quote
> > And wouldn't "full frameskip" mean that none of the frames are displayed? :-)

> As opposed to partial frameskip, where only the top half of the screen is drawn.

Nah, every other scanline. And if you change which set of scanlines change each
frame and draw them at 2/3 brightness, you can call it a feature. :-)

--Nyef
 
  • Joined: 24 Jun 1999
  • Posts: 1732
  • Location: Paris, France
Reply with quote
About that NES emulator (forgot it's name)
Post Posted: Wed Nov 17, 1999 7:18 pm
Quote
> There have been a few who've worked in this direction... I think there was even an NES emulator that let you 'compile' an .nes file into a playable, stand alone executable for a PC (I think it just appended the the .nes file onto a precompiled nes emulator, though).

It is exactly what it does.
  View user's profile Send private message Visit poster's website
Chris
  • Guest
Reply with quote
My Answers
Post Posted: Wed Nov 17, 1999 7:50 pm
Quote
> Do you know what a "Basic Block" is?
Not exactly. But, I'm assuming that it's a block of memory that contains instructions.

Quote
> Once you've translated a block of instructions, where do you store them?
I'm not sure. You could probably save them to a file for later use (execution).

Quote
> How do you retrieve the translation at a later point, if the program jumps back to that code?
I don't really understand what you're trying to say. Are you talking about conditional jumps, such
as not zero or equal to? If the program before the translation wants to perform a logical jump, then
simply jump to that area (allocate a code segment and use offset to simulate program's memory.)
I dunno about the more complicated systems. I'm imagining the order of the SG1000 memory
structure cause that system uses a very simple memory structure. When you get into bank switching
and all that then I get very confused.

Quote
> How big should the "cache" of translated instruction blocks be?
I will honestly say that I don't know what a "cache" is. I know it's a place to store memory but I
don't understand it's purpose. Why do some computers uses cache (without RAM) or vise versa?
Why do some computers use both? I really need some explination on this one. I'm sorry
but I don't know.

Quote
> The act of translating a block and storing (tagging its location so you can find the block later) takes longer than simply interpreting it, plus it's not available for execution until the whole block is done. What if the emulator encounters a previously untranslated block in a particularly timing-sensitive area, your emulator may not run smoothly. Is this acceptable? How will you handle this?
My God. Mabye I should've changed my wording from considering to thinking about. It's not
like I'm breaking my neck and trying to write something like that right now. Hell, I'm still trying
to get the damn Flags register in my interpretive Z80 engine to work correctly. I may not even
get into re-compilation because I'm not ready for that type of challenge yet.

Quote
> Now, the above are issues involved with the most fundamental dyna-rec approach. The REAL benefit of dynamic-recompilation, however, comes from the ability to examine the translated instruction blocks and perform optimizations on them. For example, if your emulator ran on x86 machines, you could then perform x86 optimizations on the translated code. Understand, though, that this optimization takes time, and the time you shave off the code had better, over the course of running the ROM, be more than the time it took you to optimize the code. This kind of optimization brings forward issues all its own.
Well, if everything above is fundamental then I must be a fucking idiot, huh? :o(

Quote
> Basically, my point to all this is: Dynamic-Recompilation has its place, but it should only be attempted by the most savvy emulator authors. A naive dyna-rec scheme will almost always result in lower performance than an interpretive approach.
And it is a place that is not mine, at least not as of right now.

Quote
> You make up your own mind, and don't believe everything you hear about dynamic-recompilation. There's a lot of misinformation out there. However, this message board is graced with a lot of very competent people, you may want to listen to them, (and no, I'm not talking about myself :-).
At least you're one of the nicer gurus of the bunch. You take the time to enlighten me and get me
to understand stuff. Everyone else here is like, "You retarded ass newbie fuckup! You don't know
shit! I hate you! You suck! I'm an old fart genious and that's that!" Or mabye that's just the way
I've been envisioning the sarcastic slams and insults.
Quote
> Good luck.
Yeah, I'll try,

Chris :o(
 
Chris
  • Guest
Reply with quote
Lemme Guess...
Post Posted: Wed Nov 17, 1999 7:57 pm
Nes-Lord?

Chris :o|
 
Chris
  • Guest
Reply with quote
Oh, the pain...
Post Posted: Wed Nov 17, 1999 8:06 pm
Quote
> If you haven't written an emulator using an interpretive core, you have no buisness messing with
> dynarec. Especially you, Chris. I've yet to see evidence that you even know what's involved in
> writing an interpretive core, let alone a dynarec one.

Arrgghh! I'm hit! (left crawling in a pool of blood)
Damn, I thought emulation was supposed to be a hobby. Now I feel like I have something to
prove. Oh well, give me some time to finish my little hobby project.

Quote
> I want emulation speed, too. But only to a point. And right now writing a dynarec core is the last
> thing I would consider for speed (I have several things I can do to my graphics renderers to speed
> things up, there are several optimizations I can do to the interpretive cores I'm using, there's a couple
> spots where the compiler I'm using is generating code that is less than optimal (movzx on a p133 when
> the top 3 bytes are going to be 0 anyway _sucks_)).

Good thing I got this little 366Mhz upgrade...

Quote
> No, I'm thinking "Chris is talking out his ass again". Dynarec is an impressive technique, but it's not
> exactly worthwhile when dealing with single processor 8-bit systems like we are here. And it's not
> simple in the least.

Oh, the bastard shoved his boot into my neck! I can barely breathe! Only my arms and mind remain...

Thanks for the input Nyef

C..h..r..i..s |o(
 
Nyef
  • Guest
Reply with quote
Re: My Answers
Post Posted: Wed Nov 17, 1999 8:29 pm
Quote
> > Now, the above are issues involved with the most fundamental dyna-rec approach.
> Well, if everything above is fundamental then I must be a fucking idiot, huh? :o(

Not nessecarily. You're just missing most of the background in compiler technology, program optimization,
and computer architecture that several of us have. If you want references for most of these subjects, feel
free to ask.

Quote
> > Basically, my point to all this is: Dynamic-Recompilation has its place, but it should only be attempted by the most savvy emulator authors.
> And it is a place that is not mine, at least not as of right now.

> > You make up your own mind, and don't believe everything you hear about dynamic-recompilation. There's a lot of misinformation out there. However, this message board is graced with a lot of very competent people, you may want to listen to them, (and no, I'm not talking about myself :-).
> At least you're one of the nicer gurus of the bunch. You take the time to enlighten me and get me
> to understand stuff. Everyone else here is like, "You retarded ass newbie fuckup! You don't know
> shit! I hate you! You suck! I'm an old fart genious and that's that!" Or mabye that's just the way
> I've been envisioning the sarcastic slams and insults.

May I take this opportunity to apologise for some of my remarks? At least some of them were meant to be
more along the lines of "Here there be dragons" and "there are better ways to obtain the speed you need"
and "there is a lot you need to learn before trying this" than "you suck". I will admit that the "talking out
your ass" bit was way out of line.

On the other hand, your rather contrived example of "Mr. Speedy" was almost designed to piss me off.

Anyway, I apologise for the tone of my remarks earlier and for overreacting the way I did.

Quote
> > Good luck.
> Yeah, I'll try,

> Chris :o(

--Nyef
 
Eric
  • Guest
Reply with quote
Re: My Answers
Post Posted: Wed Nov 17, 1999 9:12 pm
Honestly, I didn't expect you to answer those questions. They were just proposed for you to think about while you consider how you might implement a dynamic-recompiling emulator.
In any case, I'll respond:

Quote
> > Do you know what a "Basic Block" is?
> Not exactly. But, I'm assuming that it's a block of memory that contains instructions.

Simply put, a basic block is group of instructions that will always be executed together. In other words, there are no "jumps" into it (except to the first instruction) or out of it (except possibly the last instruction). Technically, there's a little more to it.
The signficance of basic blocks to a dyna-rec emulator, is that these are the units of code that will typically examine for translation at one time. Your emulator would need to "parse" the ROM, and break up the code sections into basic blocks.

Quote
> > Once you've translated a block of instructions, where do you store them?
> I'm not sure. You could probably save them to a file for later use (execution).

More than likely you will want to store the translated instructions into memory (a file will be to slow to access). This brings up another problem, remember that the translated instructions are data while they're being written, but at some later point your actually going to execute out of that area. In some operating systems (not DOS, though) it can be difficult to execute of areas that are considered "data."

Quote
> > How do you retrieve the translation at a later point, if the program jumps back to that code?
> I don't really understand what you're trying to say. Are you talking about conditional jumps, such
> as not zero or equal to? If the program before the translation wants to perform a logical jump, then
> simply jump to that area (allocate a code segment and use offset to simulate program's memory.)
> I dunno about the more complicated systems. I'm imagining the order of the SG1000 memory
> structure cause that system uses a very simple memory structure. When you get into bank switching
> and all that then I get very confused.

Let's say the ROM contains a section of code it will use repeatedly throughout execution (let's say a routine that clears video memory). The first time the routine is called, your emulator will translate the ROM instructions into the native instructions of computer the emulator is running on. You save the translation somewhere in memory (the "translation cache.") At some later point, the ROM needs to execute that routine again. Your emulator knows (somehow, yet another issue to consider) that it has been translated already, and the translation is in the "translation cache." But the question is, WHERE exactly, in the cache is it? You need to come up with a scheme for quickly locating the translated block you're looking for.

Quote
> > How big should the "cache" of translated instruction blocks be?
> I will honestly say that I don't know what a "cache" is. I know it's a place to store memory but I
> don't understand it's purpose. Why do some computers uses cache (without RAM) or vise versa?
> Why do some computers use both? I really need some explination on this one. I'm sorry
> but I don't know.

Don't apologize, contrary to what many people on the internet believe, not knowing something is not a sin.

A cache (in the computer world) is a small, fast area of memory. Whenever the processor needs something from main memory, it also stores it in a cache (usually located within the processor itself). If at some later point, the processor needs that data again, it can read it from the fast cache, instead of going all the way to (slow) main memory. Many processors have at least 2 levels of cache ranging from about 16KB to 1MB.

The purpose of a cache is to improve performance, since in many cases, once data is used, it is needed again and again in the immediate future. (Computer scientists call this "Temporal Locality." There's also "Spacial Locality" which governs how big caches need to be and other things, but that's way beyond the topic right now.)

Quote
> > The act of translating a block and storing (tagging its location so you can find the block later) takes longer than simply interpreting it, plus it's not available for execution until the whole block is done. What if the emulator encounters a previously untranslated block in a particularly timing-sensitive area, your emulator may not run smoothly. Is this acceptable? How will you handle this?
> My God. Mabye I should've changed my wording from considering to thinking about. It's not
> like I'm breaking my neck and trying to write something like that right now. Hell, I'm still trying
> to get the damn Flags register in my interpretive Z80 engine to work correctly. I may not even
> get into re-compilation because I'm not ready for that type of challenge yet.

Yes, again, my questions were just meant to get you thinking. I didn't expect you to answer them. They're just things you need to consider for dynamic recompilation.

(By the way, let me know if you need help with the flags. I've written my own Z80 emulator, plus I've spent a lot of time studying them (and teaching a class in basic computer arithmetic)).

Quote
> > Now, the above are issues involved with the most fundamental dyna-rec approach. The REAL benefit of dynamic-recompilation, however, comes from the ability to examine the translated instruction blocks and perform optimizations on them. For example, if your emulator ran on x86 machines, you could then perform x86 optimizations on the translated code. Understand, though, that this optimization takes time, and the time you shave off the code had better, over the course of running the ROM, be more than the time it took you to optimize the code. This kind of optimization brings forward issues all its own.
> Well, if everything above is fundamental then I must be a fucking idiot, huh? :o(

Not at all (and I hope I'm not implying that). Don't be afraid to challenge yourself with this stuff, but also know your limits.

Quote
> > Basically, my point to all this is: Dynamic-Recompilation has its place, but it should only be attempted by the most savvy emulator authors. A naive dyna-rec scheme will almost always result in lower performance than an interpretive approach.
> And it is a place that is not mine, at least not as of right now.

Perhaps not now, these things take time.

Quote
> > You make up your own mind, and don't believe everything you hear about dynamic-recompilation. There's a lot of misinformation out there. However, this message board is graced with a lot of very competent people, you may want to listen to them, (and no, I'm not talking about myself :-).
> At least you're one of the nicer gurus of the bunch. You take the time to enlighten me and get me
> to understand stuff. Everyone else here is like, "You retarded ass newbie fuckup! You don't know
> shit! I hate you! You suck! I'm an old fart genious and that's that!" Or mabye that's just the way
> I've been envisioning the sarcastic slams and insults.

Unfortunately, the internet can be very impersonal, and its easy for others to be insulting. Hopefully, you won't get discouraged by this behavior.

Good luck.

Eric
 
  • Site Admin
  • Joined: 25 Oct 1999
  • Posts: 2029
  • Location: Monterey, California
Reply with quote
Re: Dynarec
Post Posted: Wed Nov 17, 1999 9:15 pm

Quote
> Fluency with the target processor is by no means required for an experienced coder. Chris manifestly is not such.

Maybe not fluency.. one should be at least have used some kind of assembly language to be familiar with how a computer functions at a CPU level. Jumping into z80 didn't take very long for me, since I had the background in assembly lanugage for other processors (680x0, 6502, and a little intel). I think it'd be easier to grasp the concepts of how a CPU works by programming with it a little first. But the concepts that one learns dealing with one processor mostly apply to others.

Quote
> > Um.. why wouldn't it be accurate?
> > Assuming there were no bugs in the recompiler (of course)

> The only scenarios that come to mind involve execution from I/O space, and I/O access that have to be
> perfectly emulated. Of course, most interpretive cores would fall down here too, but they at least have a
> fighting chance.

Ach. Yes, if there needs to be a cycle-for-cycle integration I guess there's no way to interleave the behavior of two CPU's in recompiled code.

Quote
> > I could see dynamic recompilation being handy for writing emulators for more limited platforms (sms emulators for, say... Digita Cameras, Playstations (does playstation mastergear get 60 fps?)).

> Okay, here it's probably worth it. For the N64, however, it probably isn't. I have no idea if it would be worth it for the Saturn, either.

It might... if there's enough ram to spare.
And maybe if you want to emulate something that used multiple z80's (I think arcade galaga used three or four...)


Quote
> > I'd have to agree. If an SMS emulator doesn't run full speed on any computer sold as 'new' in the last four years, the bottleneck is probably somewhere other than the z80 core, unless it's terribly poorly written.

> And if it's Marat's core, then even being poorly written isn't enough of an excuse. It takes a special
> talent to write a CPU core that is so slow that it can't run at full speed on a p133 (not that I haven't
> seen it done). Marat doesn't have this talent (for which favor, much thanks).



But at least
Quote
> > > > this method will slove the speed issue. It's a shame that a system such as the Playstation
> > > > which runs at 60 Mhz (I think) cannot run at full frameskip on my Celeron (PentiumII/MMX)
> > > > 366Mhz.

> > Why should it? The playstation has quite a few tricks up it's old, tattered sleeve, after all: A seperate coprocessor for doing geometry transforms and lighting effects (we PC users won't have anything like that until the GeForce256 cards are out), other chips which can move data, draw sprites, rasterize texturemapped triangles, a fairly complicated sound system (24 channels of compressed audio with DSP effects).. you can see that there's a lot for your poor celeron to deal with besides the little ol' r3000.

> Indeed. And don't forget the MDEC hardware. And there are other systems with these problems. The Saturn comes to mind...




> > > With a dynarec core, no less (all PSX emulators that I know of use dynarec). The CPU core isn't
> > > everything (and to think that the r3000 is actually a pretty simple CPU that takes to dynarec quite
> > > nicely, too).

> > Very nicely. No status flags to deal with, just an array of 32 longs for the CPU regs (and one is always zero), a very small instruction set, and that branch-delay weirdness ( I guess one need only stick the branch after the delayed instruction in the recompiled core.)

> Ease of implementation alone is enough to make me want to write a dynarec r3000 core. And the PSX is enough justification to do so... When I finish with everything else on my TODO list, that is. :-)

sure, sure.

Quote
> > > So dynarec doesn't automatically solve the "speed issue". And I believe the PSX uses a 25 MHz
> > > clock.

> > It's a 33mhz R3000A. I've heard of a few strange fellows overclocking theirs to 40mhz, so Gran Tourismo wouldn't drop frames anymore.

> Fun. If it has an MMU, I should try porting Linux to it... :-)
It doesn't, or else it would have been done by now (some of the same strange fellows are trying to write a TCP/IP stack for it as well... and hell, I was thinking of porting the citadel bbs software to Playstation once. The hell was I thinking?)




Quote
> I thought it was an r4400?

I think you're right.


Quote
> Nah, every other scanline. And if you change which set of scanlines change each
> frame and draw them at 2/3 brightness, you can call it a feature. :-)

And hey, I used to play my SMS on a TV with a pale white band running down the left side of the tube... I should emulate that 'feature' too, for old time's sake.

Or Zoop should. ZOOP, GET ON IT!
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 25 Oct 1999
  • Posts: 2029
  • Location: Monterey, California
Reply with quote
Re: This is something I've pondered
Post Posted: Wed Nov 17, 1999 9:49 pm
Quote
> > :o|

> Well said. That's just about my reaction, too. :-/

The funny thing is, I didn't write that. It's Chris's, I forgot to snip it out when I was quoting.


Quote
> > Well the processor emulator is an interpreter... I don't think that word would apply to the emulation of the other aspects of the hardware... in a sense, yes, but 'emulator' seems like a better match.

> If the CPU core is an interpreter, then a dynarec core is a JIT compiler. And feel free to call each component a "simulator".

This will set me free.



Quote
> > The CPU's task in a console system is a lot more complicated that just emitting a 320 x 240 array of pixel data to convert to the final image on the television.

> And in case you're wondering, somewhere out there is a large (very large) text file exploring the possibility of this sort of NES->EXE translation, and it doesn't look pretty.

The sound is pretty gruesome. Although I recall (I really haven't poked around the nes much mind you) that graphics data is stored in a seperate bank of ROM, not accessible directly by the CPU anyway.. so I suppose one could get away with translating all that into an easier to use graphics format.
But then an emulator could do that at load time too.

Quote
> > > This may be the long way about it, but I think the result would be a faster in the long run. Maybe even a Genesis game could then be run at full speed on a 486.

I just realized I forgot to respond to that... can't a 486 run a genesis game at full speed, at least with the sound at low quality? I suspect a fast enough 486 with a ***GOOD VIDEO CARD*** could do it, in dos mode. When I first got this p133 I'm typing on (second hand, an old, modified Point of Sale machine), I couldn't even run a NES emulator at less than 1/5 frameskip... In fact, almost any game or emulator I ran through it performed horribly, as did every dos vga program I was writing... eventually I realized the machine still had an ISA VGA card in it.
Dear god. It was so slow, you couldn't copy a full mode13h bitmap to it at full speed.
I replaced it with a $30 PCI card from creative labs, and I never looked back.


Quote
> This wouldn't really suprise me. Actually doing a proper job of it would take some really heavy analysis.

And that's when one fall well into 'diminishing returns'.



Quote
> And even if there is a 1:1 correlation of Z80/6502->x86 instructions, the file would be larger. Proving this is left as an excercise for the reader (especially the Z80 case, since I'm not 100% sure about it).

Well, there would have to be more x86 instructions then z80 instructions... you'd have to translate an instruction at every byte, even if some bytes are part of multibyte instructions.

For instance, consider:
$0e $1c
the compiler might see that and assume it's "ld c,$1c"
but perhaps, elsewhere in the program, the cpu branches directly to the $1c, demonstrating that the $1c is actually 'inc e', and $0e is part of another instruction, or data, or unused junk.

no I can see all kinds of problems cropping up.


Quote
> > And third... you still need emulators for all the other special purpose chips in the genesis... and there's a lot of work there: tile drawing, dual-plane scrolling, sprite hardware, collision detection, raster interrupts, FM sound, controller input, fm synthesis, psg sound, pcm sound... I'm sure I'm forgetting a lot, too.

> No, that's a pretty complete list. You have FM synth in there twice, though.

I -really- like to emphasize good emulator sound.


Quote
> > These support chips do not run 'programs'... their behavior is implicit in their design. They only receive data from the processor and send data back (in some instances).

> Which is a very good argument for not calling them "interpreters".

precisely.

Quote
> This is also where most of the emulation time will tend to be spent.

indubitibly.

Quote
> > A better paradigm for console systems is to think of the CPU(s) as orchestrating the behavior of several dedicated sub-systems that operate at the same time as a CPU.

> Perhaps that's a better paradigm for when you're programming them, but when you're emulating sometime other paradigms work just as well or better. For example, most timing constraints are posed by the video system, so why not let that run the show?

Well, I meant it's better paradigm for understanding what emulators do.
Perhaps I am misusing the word 'paradigm'.


Quote
> > And, in fact, most emulators don't spend all that much time on CPU emulation... even genesis emulators.

> Which is why we still use C CPU cores... even genesis emulators.

I still wouldn't sign off on the value of a well-written CPU core, particularly in assembly.. again, the sole z80 in an sms doesn't take much of a bite out of speed, but I've noticed some major speed ups in multi-cpu games when Mame has switched from a portable c core to an asm core for a given processor, as well as major slowdowns when they've had to switch back after finding new bugs.

Quote
> Genesis/SNES is about the complexity level that dynarec and such actually start paying off at. Especially on slower computers.

> Nope, so sorry. Try doing the translation on the fly.

> > > If so, then it could even be done in Visual Basic, methinks.

Well I suppose a recompiler could be done in Visual Basic, if it could be done at all. I wouldn't recommend it.
How 'visual' is visual basic, anyway? I've never had the *ahem* privelage to use it.
  View user's profile Send private message Visit poster's website
Chris
  • Guest
Reply with quote
Re-compilation explination
Post Posted: Thu Nov 18, 1999 12:49 am
Quote
> Not nessecarily. You're just missing most of the background in compiler technology, program optimization,
> and computer architecture that several of us have. If you want references for most of these subjects, feel
> free to ask.

Man, I'm confused. I don't really know that Dynamic re-compilation means, except that it converts the
instructions before it executes and then it executes. What I was trying to do was more to binary
conversion. I was going to take one Z80 instruction and convert it to an x86 equivalent (or atleast
simulate that). The video ram was going to be sent to the PC's video memory, and the converted
rom or program was going to allocate a segment of memory for itself. A lot of the Z80s and x86s
instructions are pretty equal, except for when you get into the whole memory conflicts. For the
sound, say my conversion program discovered a sound write, I would modify it so that way it writes
to the sound card in the PC and not to somewhere else and screw things up (the converted program
or rom would initially initialize the sound blaster port and set up it's own digital wave output.
See what I'm saying? It's really not going to be an emulator anymore. It's going to be more like
a direct conversion program. Take the ROM, change it's previous instructions to PC equivalents,
and save them to an .EXE file. Then later on you could run it like any standard PC program and
it would play like a game. I'm imagining that the conversion would create some good and bad
bugs for the game.

I accept the apologies. But you gotta atleast give me a break sometimes. Damn, man. This
whole emulation thing is new to me. I'm simply experimenting around with it to see where it
will take me. "Can I write an emulator?" That's the question that runs though my mind sometimes.
And I'm a firm believer that if you put your mind to something that you can do anything. Emulation
is a real low-level and technical type of programming. I think programming an emulator is fun
because it stimulates my brain, it improves my abilities as an overall programmer, and it's
fun and facinating to actually be simulating the tasks of a real system. I like to problem solve,
believe it or not, and the exploration into these other fields of programming that I'm unfamiliar
with is fun to me. And with these new things that I'm learning everyday I take those ideas and
piece them with my old methods of comparing, mixed the old codes together, etc. With all this
talk of pointers, structures, unions, constants, assembly and processor opcodes, memory addressing,
I've learned C a hell of a lot better and faster within these past couple months than I did about
a year ago. And with these techniques that I've learned in C, I'm finding that their easy to apply
to other languages because their all based on assembly or methods performed by the processor
everyday. That's why before I was like, "I want to write it in VB." because I've had previous
experience with VB. So, I'm more comfortable and flexible in that language. I've written game
demos and applications such as Donkey Konger and a little text editor. VB uses all these pre-
arranged functions and libraries that allow you to do extremely complicated things in just
a few lines of code. When I tried to study C, I hated it at first because there's absolutely nothing
available in C, in terms of multimedia stuff (not unless you were to get new libraries and headers).
"What were these projects? Why do I need these include files? Why dosen't this thing lemme
know if my code is right or wrong before I have to compile?" I was used to

Print "Hello"

not

#include

void main()
{
printf("Hello
");
}

But, ever since I got into emulator programming I've learned so many new aspects to computers.
It's given me a whole new light on programming and programming games. Okay, I'll shut up now.

Chris :o)
 
  • Joined: 12 Jul 1999
  • Posts: 891
Reply with quote
That's what I was on about! *nt*
Post Posted: Thu Nov 18, 1999 1:41 am
Quote
> > Not nessecarily. You're just missing most of the background in compiler technology, program optimization,
> > and computer architecture that several of us have. If you want references for most of these subjects, feel
> > free to ask.

> Man, I'm confused. I don't really know that Dynamic re-compilation means, except that it converts the
> instructions before it executes and then it executes. What I was trying to do was more to binary
> conversion. I was going to take one Z80 instruction and convert it to an x86 equivalent (or atleast
> simulate that). The video ram was going to be sent to the PC's video memory, and the converted
> rom or program was going to allocate a segment of memory for itself. A lot of the Z80s and x86s
> instructions are pretty equal, except for when you get into the whole memory conflicts. For the
> sound, say my conversion program discovered a sound write, I would modify it so that way it writes
> to the sound card in the PC and not to somewhere else and screw things up (the converted program
> or rom would initially initialize the sound blaster port and set up it's own digital wave output.
> See what I'm saying? It's really not going to be an emulator anymore. It's going to be more like
> a direct conversion program. Take the ROM, change it's previous instructions to PC equivalents,
> and save them to an .EXE file. Then later on you could run it like any standard PC program and
> it would play like a game. I'm imagining that the conversion would create some good and bad
> bugs for the game.

> I accept the apologies. But you gotta atleast give me a break sometimes. Damn, man. This
> whole emulation thing is new to me. I'm simply experimenting around with it to see where it
> will take me. "Can I write an emulator?" That's the question that runs though my mind sometimes.
> And I'm a firm believer that if you put your mind to something that you can do anything. Emulation
> is a real low-level and technical type of programming. I think programming an emulator is fun
> because it stimulates my brain, it improves my abilities as an overall programmer, and it's
> fun and facinating to actually be simulating the tasks of a real system. I like to problem solve,
> believe it or not, and the exploration into these other fields of programming that I'm unfamiliar
> with is fun to me. And with these new things that I'm learning everyday I take those ideas and
> piece them with my old methods of comparing, mixed the old codes together, etc. With all this
> talk of pointers, structures, unions, constants, assembly and processor opcodes, memory addressing,
> I've learned C a hell of a lot better and faster within these past couple months than I did about
> a year ago. And with these techniques that I've learned in C, I'm finding that their easy to apply
> to other languages because their all based on assembly or methods performed by the processor
> everyday. That's why before I was like, "I want to write it in VB." because I've had previous
> experience with VB. So, I'm more comfortable and flexible in that language. I've written game
> demos and applications such as Donkey Konger and a little text editor. VB uses all these pre-
> arranged functions and libraries that allow you to do extremely complicated things in just
> a few lines of code. When I tried to study C, I hated it at first because there's absolutely nothing
> available in C, in terms of multimedia stuff (not unless you were to get new libraries and headers).
> "What were these projects? Why do I need these include files? Why dosen't this thing lemme
> know if my code is right or wrong before I have to compile?" I was used to

> Print "Hello"

> not

> #include

> void main()
> {
> printf("Hello
");
> }

> But, ever since I got into emulator programming I've learned so many new aspects to computers.
> It's given me a whole new light on programming and programming games. Okay, I'll shut up now.

> Chris :o)
  View user's profile Send private message
  • Joined: 12 Jul 1999
  • Posts: 891
Reply with quote
Yeesh!
Post Posted: Thu Nov 18, 1999 1:56 am
Quote
> How 'visual' is visual basic, anyway? I've never had the *ahem* privelage to use it.

Hahaha! Very 'Visual' and not very 'Basic'.

I'm not actually going to write an emulator or translator or interpreter or whatever else, I only ever read this messageboard because it's interesting to read these posts.
I was only seeing if that was a Genesis ROM -> x86 converter was possible in any way. Apparently, it's not, so I'll give it a rest now, mmkay?
~unfnknblvbl
  View user's profile Send private message
  • Joined: 12 Jul 1999
  • Posts: 891
Reply with quote
Re: Dynarec
Post Posted: Thu Nov 18, 1999 7:24 am
Quote
> I'd have to agree. If an SMS emulator doesn't run full speed on any computer sold as 'new' in the last four years, the bottleneck is probably somewhere other than the z80 core, unless it's terribly poorly written.


Then why won't Meka or BRSMS run at 60FPS on my P133 w/32MB RAM?
KGen98 and Genecyst both run at 60FPS on my setup.
My setup:
BRAND/MODEL: Compaq Presario 7240
CPU: Intel Pentium 133
RAM: 32MB 72-pin EDO
VIDEO CARD: S3 Trio64v+PCI
SOUND CARD: ESS AudioDrive 1788

Sorry, but I cannot help but rant at how crap my computer is.
It runs Quake and Quake II (kind of) for Christs' sake!
How come it won't run the SMS emulator of my choice at 100% speed?
Sotty,
~unfnknblvbl
  View user's profile Send private message
Nyef
  • Guest
Reply with quote
Re: This is something I've pondered
Post Posted: Thu Nov 18, 1999 1:44 pm
Quote
> > > :o|

> > Well said. That's just about my reaction, too. :-/

> The funny thing is, I didn't write that. It's Chris's, I forgot to snip it out when I was quoting.

But you snipped one of the brockets, so it looked like it was yours. That is kinda amusing, at that. :-)

Quote
> > If the CPU core is an interpreter, then a dynarec core is a JIT compiler. And feel free to call each component a "simulator".

> This will set me free.

It's too bad that freedom is so restrictive...

Quote
> > > The CPU's task in a console system is a lot more complicated that just emitting a 320 x 240 array of pixel data to convert to the final image on the television.

> > And in case you're wondering, somewhere out there is a large (very large) text file exploring the possibility of this sort of NES->EXE translation, and it doesn't look pretty.

> The sound is pretty gruesome. Although I recall (I really haven't poked around the nes much mind you) that graphics data is stored in a seperate bank of ROM, not accessible directly by the CPU anyway.. so I suppose one could get away with translating all that into an easier to use graphics format.

It varies. Some games have the graphics data in a separate address space, but that tends to be bankswitched, so you then have fun with pointers to keep the right data where you gan get at it. Some games have RAM in that space, and use the CPU to copy data out of the CPU space into it.

Quote
> But then an emulator could do that at load time too.

True, and I will be fixing DarcNES to do this before the next release. :-)

Quote
> > > > This may be the long way about it, but I think the result would be a faster in the long run. Maybe even a Genesis game could then be run at full speed on a 486.

> I just realized I forgot to respond to that... can't a 486 run a genesis game at full speed, at least with the sound at low quality? I suspect a fast enough 486 with a ***GOOD VIDEO CARD*** could do it, in dos mode.

One of the 100+ MHz 486 systems with PCI slots could possibly do it. Probably couldn't do the sound, though.

Quote
> When I first got this p133 I'm typing on (second hand, an old, modified Point of Sale machine), I couldn't even run a NES emulator at less than 1/5 frameskip... In fact, almost any game or emulator I ran through it performed horribly, as did every dos vga program I was writing... eventually I realized the machine still had an ISA VGA card in it.
> Dear god. It was so slow, you couldn't copy a full mode13h bitmap to it at full speed.
> I replaced it with a $30 PCI card from creative labs, and I never looked back.

That had to suck.

Quote
> > And even if there is a 1:1 correlation of Z80/6502->x86 instructions, the file would be larger. Proving this is left as an excercise for the reader (especially the Z80 case, since I'm not 100% sure about it).

> Well, there would have to be more x86 instructions then z80 instructions... you'd have to translate an instruction at every byte, even if some bytes are part of multibyte instructions.

Not nessecarily.

Quote
> For instance, consider:
> $0e $1c
> the compiler might see that and assume it's "ld c,$1c"
> but perhaps, elsewhere in the program, the cpu branches directly to the $1c, demonstrating that the $1c is actually 'inc e', and $0e is part of another instruction, or data, or unused junk.

This would only happen if it was an indirect jump. The compiler should catch all direct jumps like that (unless they themselves were the target of an indirect jump).

Quote
> no I can see all kinds of problems cropping up.

The trick is to have a fallback position of an interpretive core in the final executable. Or in a DLL.

Quote
> > > And third... you still need emulators for all the other special purpose chips in the genesis... and there's a lot of work there: tile drawing, dual-plane scrolling, sprite hardware, collision detection, raster interrupts, FM sound, controller input, fm synthesis, psg sound, pcm sound... I'm sure I'm forgetting a lot, too.

> > No, that's a pretty complete list. You have FM synth in there twice, though.

> I -really- like to emphasize good emulator sound.

Then you probably wouldn't like DarcNES in NES mode. The sound support was written before all the
good sound info came out, and I haven't really touched it since.

SMS sound in DarcNES, on the other hand, is rather nice (except for the noise channel, I'm not happy
with it yet).

Quote
> > This is also where most of the emulation time will tend to be spent.

> indubitibly.

Note "tend to". There is at least one case on record where someone using a dynarec core had sped
up his video code to the point where the CPU core was the bottleneck. His emulator is currently doing
990+ FPS on a Celeron 433 (the test was rigged, however. The CPU is in an infinite loop, and the screen
is a static image).

Quote
> > > A better paradigm for console systems is to think of the CPU(s) as orchestrating the behavior of several dedicated sub-systems that operate at the same time as a CPU.

> > Perhaps that's a better paradigm for when you're programming them, but when you're emulating sometime other paradigms work just as well or better. For example, most timing constraints are posed by the video system, so why not let that run the show?

> Well, I meant it's better paradigm for understanding what emulators do.

For the layperson, sure. But once you are actually writing one, you may wish to search for better ones.

Quote
> Perhaps I am misusing the word 'paradigm'.

Perhaps we both are. Maybe 'metaphor' would be a more apropriate word. :-)

Quote
> I still wouldn't sign off on the value of a well-written CPU core, particularly in assembly.. again, the sole z80 in an sms doesn't take much of a bite out of speed, but I've noticed some major speed ups in multi-cpu games when Mame has switched from a portable c core to an asm core for a given processor, as well as major slowdowns when they've had to switch back after finding new bugs.

Certainly not. In many cases an ASM core is critical. But for what we're dealing with here, a C core is quite acceptable.

Quote
> Well I suppose a recompiler could be done in Visual Basic, if it could be done at all. I wouldn't recommend it.

I'm not so sure. A recompiler requires procedure pointers at the very least (does VB even have these?). And
the x86 code structure is so chaotic that you can't just use a simple array to hold the instructions.

And to make matters worse, I can't think of how to do a computed GOSUB in VB. All _real_ basic
interpreters support a computed GOSUB. And self-modifying code. And run on 6502 systems (okay,
that's probably pushing it, but you can see what I mean).

--Nyef
 
Nyef
  • Guest
Reply with quote
Re: Dynarec
Post Posted: Thu Nov 18, 1999 2:04 pm
Quote
> > I'd have to agree. If an SMS emulator doesn't run full speed on any computer sold as 'new' in the last four years, the bottleneck is probably somewhere other than the z80 core, unless it's terribly poorly written.

> Then why won't Meka or BRSMS run at 60FPS on my P133 w/32MB RAM?

No idea. I think I know why DarcNES isn't running at 60FPS on my P133 w/64MB RAM, though. I've been
running it in 16bpp, and that requires an extra step to convert 8->16 bits for the entire buffer.

On the other hand, GG mode _is_ running at 60FPS under those conditions, so I'm pretty close to having
the speed I want. Just fix up a couple interfaces, and write some ASM code. :-)

Quote
> KGen98 and Genecyst both run at 60FPS on my setup.

Both of which use ASM extensively. 16-bit systems is the point where one has to break out the heavy
optimization weaponry. Extensive ASM use, dynarec, all the tricks apply.

Quote
> My setup:
> BRAND/MODEL: Compaq Presario 7240

This alone says something.

Quote
> CPU: Intel Pentium 133
> RAM: 32MB 72-pin EDO

Should be more than enough.

Quote
> VIDEO CARD: S3 Trio64v+PCI

Just about what I have,,,

Quote
> SOUND CARD: ESS AudioDrive 1788

> Sorry, but I cannot help but rant at how crap my computer is.

Gee... And to think I brag about using a computer that's on par with yours. :-)

Quote
> It runs Quake and Quake II (kind of) for Christs' sake!
> How come it won't run the SMS emulator of my choice at 100% speed?

Because they are written mainly in C, and compiled with GCC, and GCC can't optimize worth shit?

--Nyef
 
Reply to topic



Back to the top of this page

Back to SMS Power!