|
ForumsSega Master System / Mark III / Game GearSG-1000 / SC-3000 / SF-7000 / OMV |
Home - Forums - Games - Scans - Maps - Cheats - Credits Music - Videos - Development - Hacks - Translations - Homebrew |
Author | Message |
---|---|
Chris
|
Multiple Instruction Paths for Pentium CPUs?
Posted: Tue Nov 16, 1999 2:10 am
|
I was reading some bizzare article on the pentium family of processors and it said that the pentium,
even though it was a single processor, contained dual or u and v pipelines for instructions. Then, it went into further detail of explaining the newer Pentium IIs and how they contain an updated quad or u,v,y,z pipelines for handleing instructions. It was saying that these pipelines could simultaneously handle instructions from within different sources of memory; including files and blocks of allocated memory. So you could tell u to follow all of the instructions from 0x0000 portion of memory, tell v to handle instructions found in the 0x6000 block of memory, and the two pipelines will directly deal and execute with each of the instructions found in memory; independently. So, I got to brainstorming. I figured if these newer pentiums harnessed this type of processing power, why aren't some emulators built around these hidden pipelines? This could prove to be most useful for arcade machines or even the Sega Genesis because these systems use more than one processor. So, if you could get these pipelines to independently emulate the different regions of the board (main cpu, sound cpu, etc) it should greatly improve the speed of today's emulators, right? Instead of having the main CPU handling and emulating all aspects of the more advanced systems (which makes interpreted emulation slow), you could have these pipelines emulate the abilities of a single chip and chain them to communicate together but perform individual tasks. Imagine the System 16 archetecture. It uses one 68000 processor and one Z80 processor for it's sound. Now if you had the y pipeline emulating and performing the tasks of the 68000 and you had the z pipeline emulating the Z80 and sending the data to your sound card or emulated Yamaha sound chip, and they were both co-dependent of each other (like the real system), wouldn't this greatly improve speed? Chris :o) |
|
|
Posted: Tue Nov 16, 1999 2:50 am |
This is really hard to explain, but the basic concept is that instead of getting one instruction and doing the appropriate actions, the CPU grabs two and executes them in parallel. These instructions cannot come from different areas, only from one region. (i.e., in a group of three instructions, the u pipe would do the first, the v pipe would do the second, and back to the u pipe for the third). Needless to say there are many rules about what instructions can be executed at the same time, especially if an instruction relies on the results of a previous one. Your idea about emulating different stuff at the same time is perfectly valid for multi-CPU systems. If you want some nitty-gritty info on pipelines, check some of these links out: http://www.x86.org http://developer.intel.com http://www.sandpile.org |
|
Nyef
|
Posted: Tue Nov 16, 1999 3:07 am |
Some are, just not the way you envision. And some of the fast ones just rely on the compiler to take advantage of them. There's one NES emulator that managed to get (in an obviously unfair test) 992 FPS on a Celeron 433. That works out to something on the order of 16 x86 cycles to a 6502 cycle. And that's not counting graphics. Actual games on that setup run at more like 600 FPS or so, but still. You simply can't do that without that kind of pipelining. Of course, this simply pisses me off, since my emulator can't even get 60 FPS on my p133 (except for GG games). Hopefully this will change soon.
Too bad the Pentium doesn't work that way, right? And the synchronization problems would be a bitch and a half even if it did. How about just switching to dynarec? Even for the 8-bit systems that can give a nice speed boost.
--Nyef |
|
Eric
|
Posted: Tue Nov 16, 1999 4:30 pm |
Your thoughts have already been responded to (by Nyef and Charles Mac Donald) but I thought I'd add my two cents, and fill in some other information:
This is correct.
This is almost correct. The Pentium Pro, Pentium IIs, Celerons, and Pentium III's are all Out-Of-Order machines. This basically means that the processor fetches a group of instructions, and executes them in any order (subject to data dependency limitations), as soon as the available resources (such as floating-point unit) are available. There are five execution paths on the Pentium Pro and Pentium II's (I don't know about the Pentium III's). These paths are not pipelines in the traditional sense. It's better to think of these processors as having a single pipeline, that splits five-ways during the execution stage. It should also be noted, that these five paths are not able to execute any instruction, four of the five are special purpose paths, and can only used if there happens to be the right mix of instructions in the buffer.
Unfortunately, this is not true. All of these processors fetch instructions from a single location in memory, examine them in groups, and execute as many as possible (either with dual-pipes, like the Pentium, or with advanced data-flow analysis, and out-of-order execution, like the Pentium Pros, etc.)
As Charles Mac Donald said, this idea is only feasible on multiprocessor systems. So, then how do you write an emulator to get the most speed out of these processors? Well, unfortunately, you have to decide which processor you want to target. For the Pentium, the trick is finding pairs of instructions which can be executed together (one in the U pipe, and one in the V pipe.) There are all kinds of resources available on this subject. For the P6 family (Pentium Pro, Pentium II, Celeron, Pentium III) the trick, again, is to group instructions so that all five execution paths stay busy as much as possible. (This is extremely difficult without an intimate knowledge of the micro-architecture). There are many other issues involved with optimizing programs for these processors. If you're curious, see the following web-site: http://developer.intel.com/vtune/cbts/refman.htm. Remember, though, that your choice of algorithm will ALWAYS have a greater affect your program's performance than finely tuning how you group instructions. Eric |
|
Nyef
|
Posted: Tue Nov 16, 1999 6:58 pm |
This, combined with using a p133 (non-MMX, thank you) system as my main dev platform and a portability requirement, is why I have been reducing the amount of ASM actually used in DarcNES. Which reminds me, I should probably remove the ASM CPU cores from the main distribution, they aren't used anymore. :-) --Nyef |
|
Chris
|
Dynarec
Posted: Wed Nov 17, 1999 3:40 am
|
I'm seriously considering getting involved with this whole dynamic recompilation method of
emulating systems because the interpretation method stinks because it takes way too much out of any PC. It makes sense too. Envision a guy named Mr. Speedy who is 10 times faster at doing anything that a human can. He can run faster than your eyes can vision, he can speak faster than your ears can respond, and he is extremely intelegent. Mr. Speedy, one day, is watching an old movie that's showing a band playing a song live. The band is compiled of one singer, one dancer, one drumist, and one pianist; a total of 4 band members. The're all independently performing their own tasks, such as the pianist reading his notes from his sheet music and playing them and the dancer shaking his groove thang to the beautiful tones of the singer, yet they are all controlled and maintained by the rhythmic beats of the drummer. Mr. Speedy, who is (keep in mind) faster than the average human thinks that he's better than the band just because he's faster. So he decides to compete at a local talent show as a "one man band" and he's going to perform the live song done by the band on the video tape. So, the night of the talent show Mr. Speedy, prepares himself for the show by watching the video again, just before they call him up on stage. They announce his name and she steps up onto the stage. All of the instruments needed to play the song are all adjacently scattered out; leaving Mr. Speedy in the center. So, he dashes over to the piano, quickly reads some notes, figures them out and plays 1 chord on the piano, then he dashes over to the drum and plays one base kick, then he hauls ass over to the position of the dancer and poses (just enough to let the audience see him), and finally he runs to the position of the singer, sings a note or two, and poses again (just enough to let the audience see him). Remember that Mr. Speedy is 10 times faster than the average human, so if he were to mimic and physically perform all of the actions of the band at his pace that it would be one big blur and the music would sound like one big mess. So little by little, he as to perform the tasks of the band, while at the same time pausing within certain times in milliseconds to allow the audience to observe and listen to what is going on. So, at this pace, the audience see's the illusion (animation) of 4 Mr. Speedys playing the song as a live band. Most of the people watching this show are shocked and entertained as this one person performs the actions of the live band. But some members of the audience (the gurus and fans of the band) are not happy with Mr. Speedy's routine. They complain that his dance just isn't right, and the notes that he plays on the piano are just a little off, and the drumming isn't up to par, and so on. Interpretation just stinks when it comes to emulation sometimes. And I know what you're thinking, "Well, re-compilation isn't going to be as accurate either.". True, you are right. But at least this method will slove the speed issue. It's a shame that a system such as the Playstation which runs at 60 Mhz (I think) cannot run at full frameskip on my Celeron (PentiumII/MMX) 366Mhz. Chris :o| |
|
|
This is something I've pondered
Posted: Wed Nov 17, 1999 8:45 am
|
I can see your point. I've often thought that an 'emulator' should be called an 'interpreter' as basically that's what they do, isn't it? They interpret the instructions from Z80 or 68000 or whatever and 'interpret' it for your x86-based machine. I've often wanted to know if it was possible for somebody to make a program that takes the ROM, interprets all of the instructions into the x86 equivalent and the compiles an x86 executable file from the resulting data. This may be the long way about it, but I think the result would be a faster in the long run. Maybe even a Genesis game could then be run at full speed on a 486. Is this even feasable? If so, then it could even be done in Visual Basic, methinks. Just pondering, ~unfnknblvbl |
|
Nyef
|
Re: Dynarec
Posted: Wed Nov 17, 1999 1:30 pm
|
If you haven't written an emulator using an interpretive core, you have no buisness messing with dynarec. Especially you, Chris. I've yet to see evidence that you even know what's involved in writing an interpretive core, let alone a dynarec one. I want emulation speed, too. But only to a point. And right now writing a dynarec core is the last thing I would consider for speed (I have several things I can do to my graphics renderers to speed things up, there are several optimizations I can do to the interpretive cores I'm using, there's a couple spots where the compiler I'm using is generating code that is less than optimal (movzx on a p133 when the top 3 bytes are going to be 0 anyway _sucks_)).
No, I'm thinking "Chris is talking out his ass again". Dynarec is an impressive technique, but it's not exactly worthwhile when dealing with single processor 8-bit systems like we are here. And it's not simple in the least. If one can get 60 FPS on a p133 or p120 using an interpretive core, then what point dynarec? Now, if you _can't_ get 60 FPS on a p133 (this takes a "16-bit" system, 8-bit systems can hit 60 using mainly C code and only a smattering of ASM (for when the compiler is being stupid)), then using dynarec is more reasonable. All this assumes that you have a p133 or p120 to test with. Optimizing for more than 60 FPS on your target platform (which you had damned well better have on your desk) is a waste of time.
With a dynarec core, no less (all PSX emulators that I know of use dynarec). The CPU core isn't everything (and to think that the r3000 is actually a pretty simple CPU that takes to dynarec quite nicely, too). So dynarec doesn't automatically solve the "speed issue". And I believe the PSX uses a 25 MHz clock. And wouldn't "full frameskip" mean that none of the frames are displayed? :-)
--Nyef |
|
|
Re: Dynarec
Posted: Wed Nov 17, 1999 3:53 pm
|
I am a little concerned that Chris is biting off more than he can chew. Although writing a z80 interpreter (emulator) isn't unfathomably difficult (the ideas behind it aren't too hard to grasp, getting all the flag behavior and such down takes some time however), a prospective author should at least be fluent in their language of choice (C..) as well as the target processor (has he written much z80 code?)
Um.. why wouldn't it be accurate? Assuming there were no bugs in the recompiler (of course)
I could see dynamic recompilation being handy for writing emulators for more limited platforms (sms emulators for, say... Digita Cameras, Playstations (does playstation mastergear get 60 fps?)).
I'd have to agree. If an SMS emulator doesn't run full speed on any computer sold as 'new' in the last four years, the bottleneck is probably somewhere other than the z80 core, unless it's terribly poorly written.
Why should it? The playstation has quite a few tricks up it's old, tattered sleeve, after all: A seperate coprocessor for doing geometry transforms and lighting effects (we PC users won't have anything like that until the GeForce256 cards are out), other chips which can move data, draw sprites, rasterize texturemapped triangles, a fairly complicated sound system (24 channels of compressed audio with DSP effects).. you can see that there's a lot for your poor celeron to deal with besides the little ol' r3000.
Very nicely. No status flags to deal with, just an array of 32 longs for the CPU regs (and one is always zero), a very small instruction set, and that branch-delay weirdness ( I guess one need only stick the branch after the delayed instruction in the recompiled core.)
It's a 33mhz R3000A. I've heard of a few strange fellows overclocking theirs to 40mhz, so Gran Tourismo wouldn't drop frames anymore. In case anyone asks, the N64 uses a 90mhz r4000.. hey, it really -is- 64-bit!
As opposed to partial frameskip, where only the top half of the screen is drawn. |
|
|
Re: This is something I've pondered
Posted: Wed Nov 17, 1999 4:22 pm
|
:o|
Well the processor emulator is an interpreter... I don't think that word would apply to the emulation of the other aspects of the hardware... in a sense, yes, but 'emulator' seems like a better match.
Like chris, you're assigning too much importance to the CPU and not enough to the dedicated hardware that makes up every game system. A Neo-Geo and a Genesis have the same processors in common ( a 68000 and a z80), but there's not much else in them that is the same. The CPU's task in a console system is a lot more complicated that just emitting a 320 x 240 array of pixel data to convert to the final image on the television.
There have been a few who've worked in this direction... I think there was even an NES emulator that let you 'compile' an .nes file into a playable, stand alone executable for a PC (I think it just appended the the .nes file onto a precompiled nes emulator, though). This really isn't the best way to go. First of all, remember that all data and cpu code look the same to the compiler and the genesis (bytes is bytes is bytes, after all...)... The emulator knows nothing of the behavior of the cart until it is run... it would have to assume any data addressable by the cpu could in fact be a CPU instruction, because the CPU could conceivably branch anywhere in cartridge memory... You'd end up with an executable several times large than the original file... before you add the original data back to it. Second.. the genesis has two CPU's... most PC's have just one. And third... you still need emulators for all the other special purpose chips in the genesis... and there's a lot of work there: tile drawing, dual-plane scrolling, sprite hardware, collision detection, raster interrupts, FM sound, controller input, fm synthesis, psg sound, pcm sound... I'm sure I'm forgetting a lot, too. These support chips do not run 'programs'... their behavior is implicit in their design. They only receive data from the processor and send data back (in some instances). And most of these hardware functions can npot be immediately mapped to your PC's hardware (which does not have sprites, tile mapped modes, collision detection, psg sound, may not have an FM chip, etc...). So your PC must interperet them, convert graphics data on the fly to a format that your video card can understand... process and mix sound to send it to your sound card, translate input from your keyboard or game pad and convert it into a format the genesis program will understand, etc, etc... A better paradigm for console systems is to think of the CPU(s) as orchestrating the behavior of several dedicated sub-systems that operate at the same time as a CPU. As you can imagine, that's a lot of work for the emulator to do, and it also has the job of making sure the timing and communcation between the emulated hardware is accurate. And, in fact, most emulators don't spend all that much time on CPU emulation... even genesis emulators.
|
|
Eric
|
Re: This is something I've pondered
Posted: Wed Nov 17, 1999 4:47 pm
|
This idea has been floating around for a while. For lack of a better term, I believe it's called Static-Recompilation. Meaning, the entire program is recompiled before execution. There are two difficulties: Self-modifying code, and Data. Static-Recompilation will (almost) never work with a program that contains self-modifying code, unless all the modification are known in advance, and a table of the proper code sequences can be created. The other problem is data. It's necessary to know where in the ROM data is being stored so it isn't re-compiled. One, naive, solution to this would be to recompile the whole ROM, and also keep a complete duplicate of the whole ROM as pure data. This , however, can result in significant waste of space. Finally, a note to Chris: "DynaRec" is one of emulation's biggest "buzzwords": everyone has heard it, and very few know what it means, (though, many on this message board do.) I'd like to dispell a myth about dynamic recompilation right now: It does not in ANY way guarantee better performance. The choice to use dynamic-recompilation in your emulator should be based on a thorough study of the programs the emulator will run. Only then will you know whether dynamic recompilation is suitable for your emulator. The principle of dynamic-recompilation is to avoid the overhead of repeatedly translating (interpreting) instructions. It accomplishes this nicely, but at what cost? Here are some of the issues you should examine if considering dynamic-recompilation: Do you know what a "Basic Block" is? Once you've translated a block of instructions, where do you store them? How do you retrieve the translation at a later point, if the program jumps back to that code? How big should the "cache" of translated instruction blocks be? The act of translating a block and storing (tagging its location so you can find the block later) takes longer than simply interpreting it, plus it's not available for execution until the whole block is done. What if the emulator encounters a previously untranslated block in a particularly timing-sensitive area, your emulator may not run smoothly. Is this acceptable? How will you handle this? Now, the above are issues involved with the most fundamental dyna-rec approach. The REAL benefit of dynamic-recompilation, however, comes from the ability to examine the translated instruction blocks and perform optimizations on them. For example, if your emulator ran on x86 machines, you could then perform x86 optimizations on the translated code. Understand, though, that this optimization takes time, and the time you shave off the code had better, over the course of running the ROM, be more than the time it took you to optimize the code. This kind of optimization brings forward issues all its own. Basically, my point to all this is: Dynamic-Recompilation has its place, but it should only be attempted by the most savvy emulator authors. A naive dyna-rec scheme will almost always result in lower performance than an interpretive approach. You make up your own mind, and don't believe everything you hear about dynamic-recompilation. There's a lot of misinformation out there. However, this message board is graced with a lot of very competent people, you may want to listen to them, (and no, I'm not talking about myself :-). Good luck. Eric |
|
Nyef
|
Re: This is something I've pondered
Posted: Wed Nov 17, 1999 5:05 pm
|
Well said. That's just about my reaction, too. :-/
If the CPU core is an interpreter, then a dynarec core is a JIT compiler. And feel free to call each component a "simulator".
And in case you're wondering, somewhere out there is a large (very large) text file exploring the possibility of this sort of NES->EXE translation, and it doesn't look pretty.
This wouldn't really suprise me. Actually doing a proper job of it would take some really heavy analysis.
And even if there is a 1:1 correlation of Z80/6502->x86 instructions, the file would be larger. Proving this is left as an excercise for the reader (especially the Z80 case, since I'm not 100% sure about it).
For which favor, much thanks.
No, that's a pretty complete list. You have FM synth in there twice, though. If you want the 32x or SCD you have more, though.
Which is a very good argument for not calling them "interpreters". This is also where most of the emulation time will tend to be spent.
Most speedups here involve doing work up front rather than when the result is needed, and using as many space for time tradeoffs as possible.
Perhaps that's a better paradigm for when you're programming them, but when you're emulating sometime other paradigms work just as well or better. For example, most timing constraints are posed by the video system, so why not let that run the show?
That reminds me. I still haven't fixed my SMS PSG to use the cycle counts from the CPU to handle register write timing...
Which is why we still use C CPU cores... even genesis emulators. Genesis/SNES is about the complexity level that dynarec and such actually start paying off at. Especially on slower computers.
Nope, so sorry. Try doing the translation on the fly.
Nope, so sorry. Try using either ASM, C, or even Delphi. --Nyef |
|
Nyef
|
Re: Dynarec
Posted: Wed Nov 17, 1999 5:36 pm
|
Fluency with the target processor is by no means required for an experienced coder. Chris manifestly is not such.
The only scenarios that come to mind involve execution from I/O space, and I/O access that have to be perfectly emulated. Of course, most interpretive cores would fall down here too, but they at least have a fighting chance.
Okay, here it's probably worth it. For the N64, however, it probably isn't. I have no idea if it would be worth it for the Saturn, either.
And if it's Marat's core, then even being poorly written isn't enough of an excuse. It takes a special talent to write a CPU core that is so slow that it can't run at full speed on a p133 (not that I haven't seen it done). Marat doesn't have this talent (for which favor, much thanks).
Indeed. And don't forget the MDEC hardware. And there are other systems with these problems. The Saturn comes to mind...
Ease of implementation alone is enough to make me want to write a dynarec r3000 core. And the PSX is enough justification to do so... When I finish with everything else on my TODO list, that is. :-)
Fun. If it has an MMU, I should try porting Linux to it... :-)
I thought it was an r4400? But yes, it really is a 64-bit system.
Nah, every other scanline. And if you change which set of scanlines change each frame and draw them at 2/3 brightness, you can call it a feature. :-) --Nyef |
|
|
About that NES emulator (forgot it's name)
Posted: Wed Nov 17, 1999 7:18 pm
|
It is exactly what it does. |
|
Chris
|
My Answers
Posted: Wed Nov 17, 1999 7:50 pm
|
Not exactly. But, I'm assuming that it's a block of memory that contains instructions. I'm not sure. You could probably save them to a file for later use (execution). I don't really understand what you're trying to say. Are you talking about conditional jumps, such as not zero or equal to? If the program before the translation wants to perform a logical jump, then simply jump to that area (allocate a code segment and use offset to simulate program's memory.) I dunno about the more complicated systems. I'm imagining the order of the SG1000 memory structure cause that system uses a very simple memory structure. When you get into bank switching and all that then I get very confused. I will honestly say that I don't know what a "cache" is. I know it's a place to store memory but I don't understand it's purpose. Why do some computers uses cache (without RAM) or vise versa? Why do some computers use both? I really need some explination on this one. I'm sorry but I don't know. My God. Mabye I should've changed my wording from considering to thinking about. It's not like I'm breaking my neck and trying to write something like that right now. Hell, I'm still trying to get the damn Flags register in my interpretive Z80 engine to work correctly. I may not even get into re-compilation because I'm not ready for that type of challenge yet. Well, if everything above is fundamental then I must be a fucking idiot, huh? :o( And it is a place that is not mine, at least not as of right now. At least you're one of the nicer gurus of the bunch. You take the time to enlighten me and get me to understand stuff. Everyone else here is like, "You retarded ass newbie fuckup! You don't know shit! I hate you! You suck! I'm an old fart genious and that's that!" Or mabye that's just the way I've been envisioning the sarcastic slams and insults. Yeah, I'll try, Chris :o( |
|
Chris
|
Lemme Guess...
Posted: Wed Nov 17, 1999 7:57 pm
|
Nes-Lord?
Chris :o| |
|
Chris
|
Oh, the pain...
Posted: Wed Nov 17, 1999 8:06 pm
|
Arrgghh! I'm hit! (left crawling in a pool of blood) Damn, I thought emulation was supposed to be a hobby. Now I feel like I have something to prove. Oh well, give me some time to finish my little hobby project.
Good thing I got this little 366Mhz upgrade...
Oh, the bastard shoved his boot into my neck! I can barely breathe! Only my arms and mind remain... Thanks for the input Nyef C..h..r..i..s |o( |
|
Nyef
|
Re: My Answers
Posted: Wed Nov 17, 1999 8:29 pm
|
Not nessecarily. You're just missing most of the background in compiler technology, program optimization, and computer architecture that several of us have. If you want references for most of these subjects, feel free to ask.
May I take this opportunity to apologise for some of my remarks? At least some of them were meant to be more along the lines of "Here there be dragons" and "there are better ways to obtain the speed you need" and "there is a lot you need to learn before trying this" than "you suck". I will admit that the "talking out your ass" bit was way out of line. On the other hand, your rather contrived example of "Mr. Speedy" was almost designed to piss me off. Anyway, I apologise for the tone of my remarks earlier and for overreacting the way I did.
--Nyef |
|
Eric
|
Re: My Answers
Posted: Wed Nov 17, 1999 9:12 pm
|
Honestly, I didn't expect you to answer those questions. They were just proposed for you to think about while you consider how you might implement a dynamic-recompiling emulator.
In any case, I'll respond:
Simply put, a basic block is group of instructions that will always be executed together. In other words, there are no "jumps" into it (except to the first instruction) or out of it (except possibly the last instruction). Technically, there's a little more to it. The signficance of basic blocks to a dyna-rec emulator, is that these are the units of code that will typically examine for translation at one time. Your emulator would need to "parse" the ROM, and break up the code sections into basic blocks.
More than likely you will want to store the translated instructions into memory (a file will be to slow to access). This brings up another problem, remember that the translated instructions are data while they're being written, but at some later point your actually going to execute out of that area. In some operating systems (not DOS, though) it can be difficult to execute of areas that are considered "data."
Let's say the ROM contains a section of code it will use repeatedly throughout execution (let's say a routine that clears video memory). The first time the routine is called, your emulator will translate the ROM instructions into the native instructions of computer the emulator is running on. You save the translation somewhere in memory (the "translation cache.") At some later point, the ROM needs to execute that routine again. Your emulator knows (somehow, yet another issue to consider) that it has been translated already, and the translation is in the "translation cache." But the question is, WHERE exactly, in the cache is it? You need to come up with a scheme for quickly locating the translated block you're looking for.
Don't apologize, contrary to what many people on the internet believe, not knowing something is not a sin. A cache (in the computer world) is a small, fast area of memory. Whenever the processor needs something from main memory, it also stores it in a cache (usually located within the processor itself). If at some later point, the processor needs that data again, it can read it from the fast cache, instead of going all the way to (slow) main memory. Many processors have at least 2 levels of cache ranging from about 16KB to 1MB. The purpose of a cache is to improve performance, since in many cases, once data is used, it is needed again and again in the immediate future. (Computer scientists call this "Temporal Locality." There's also "Spacial Locality" which governs how big caches need to be and other things, but that's way beyond the topic right now.)
Yes, again, my questions were just meant to get you thinking. I didn't expect you to answer them. They're just things you need to consider for dynamic recompilation. (By the way, let me know if you need help with the flags. I've written my own Z80 emulator, plus I've spent a lot of time studying them (and teaching a class in basic computer arithmetic)).
Not at all (and I hope I'm not implying that). Don't be afraid to challenge yourself with this stuff, but also know your limits.
Perhaps not now, these things take time.
Unfortunately, the internet can be very impersonal, and its easy for others to be insulting. Hopefully, you won't get discouraged by this behavior. Good luck. Eric |
|
|
Re: Dynarec
Posted: Wed Nov 17, 1999 9:15 pm
|
Maybe not fluency.. one should be at least have used some kind of assembly language to be familiar with how a computer functions at a CPU level. Jumping into z80 didn't take very long for me, since I had the background in assembly lanugage for other processors (680x0, 6502, and a little intel). I think it'd be easier to grasp the concepts of how a CPU works by programming with it a little first. But the concepts that one learns dealing with one processor mostly apply to others.
Ach. Yes, if there needs to be a cycle-for-cycle integration I guess there's no way to interleave the behavior of two CPU's in recompiled code.
It might... if there's enough ram to spare. And maybe if you want to emulate something that used multiple z80's (I think arcade galaga used three or four...)
But at least
sure, sure. It doesn't, or else it would have been done by now (some of the same strange fellows are trying to write a TCP/IP stack for it as well... and hell, I was thinking of porting the citadel bbs software to Playstation once. The hell was I thinking?)
I think you're right.
And hey, I used to play my SMS on a TV with a pale white band running down the left side of the tube... I should emulate that 'feature' too, for old time's sake. Or Zoop should. ZOOP, GET ON IT! |
|
|
Re: This is something I've pondered
Posted: Wed Nov 17, 1999 9:49 pm
|
The funny thing is, I didn't write that. It's Chris's, I forgot to snip it out when I was quoting.
This will set me free.
The sound is pretty gruesome. Although I recall (I really haven't poked around the nes much mind you) that graphics data is stored in a seperate bank of ROM, not accessible directly by the CPU anyway.. so I suppose one could get away with translating all that into an easier to use graphics format. But then an emulator could do that at load time too.
I just realized I forgot to respond to that... can't a 486 run a genesis game at full speed, at least with the sound at low quality? I suspect a fast enough 486 with a ***GOOD VIDEO CARD*** could do it, in dos mode. When I first got this p133 I'm typing on (second hand, an old, modified Point of Sale machine), I couldn't even run a NES emulator at less than 1/5 frameskip... In fact, almost any game or emulator I ran through it performed horribly, as did every dos vga program I was writing... eventually I realized the machine still had an ISA VGA card in it. Dear god. It was so slow, you couldn't copy a full mode13h bitmap to it at full speed. I replaced it with a $30 PCI card from creative labs, and I never looked back.
And that's when one fall well into 'diminishing returns'.
Well, there would have to be more x86 instructions then z80 instructions... you'd have to translate an instruction at every byte, even if some bytes are part of multibyte instructions. For instance, consider: $0e $1c the compiler might see that and assume it's "ld c,$1c" but perhaps, elsewhere in the program, the cpu branches directly to the $1c, demonstrating that the $1c is actually 'inc e', and $0e is part of another instruction, or data, or unused junk. no I can see all kinds of problems cropping up.
I -really- like to emphasize good emulator sound.
precisely.
indubitibly.
Well, I meant it's better paradigm for understanding what emulators do. Perhaps I am misusing the word 'paradigm'.
I still wouldn't sign off on the value of a well-written CPU core, particularly in assembly.. again, the sole z80 in an sms doesn't take much of a bite out of speed, but I've noticed some major speed ups in multi-cpu games when Mame has switched from a portable c core to an asm core for a given processor, as well as major slowdowns when they've had to switch back after finding new bugs.
Well I suppose a recompiler could be done in Visual Basic, if it could be done at all. I wouldn't recommend it. How 'visual' is visual basic, anyway? I've never had the *ahem* privelage to use it. |
|
Chris
|
Re-compilation explination
Posted: Thu Nov 18, 1999 12:49 am
|
Man, I'm confused. I don't really know that Dynamic re-compilation means, except that it converts the instructions before it executes and then it executes. What I was trying to do was more to binary conversion. I was going to take one Z80 instruction and convert it to an x86 equivalent (or atleast simulate that). The video ram was going to be sent to the PC's video memory, and the converted rom or program was going to allocate a segment of memory for itself. A lot of the Z80s and x86s instructions are pretty equal, except for when you get into the whole memory conflicts. For the sound, say my conversion program discovered a sound write, I would modify it so that way it writes to the sound card in the PC and not to somewhere else and screw things up (the converted program or rom would initially initialize the sound blaster port and set up it's own digital wave output. See what I'm saying? It's really not going to be an emulator anymore. It's going to be more like a direct conversion program. Take the ROM, change it's previous instructions to PC equivalents, and save them to an .EXE file. Then later on you could run it like any standard PC program and it would play like a game. I'm imagining that the conversion would create some good and bad bugs for the game. I accept the apologies. But you gotta atleast give me a break sometimes. Damn, man. This whole emulation thing is new to me. I'm simply experimenting around with it to see where it will take me. "Can I write an emulator?" That's the question that runs though my mind sometimes. And I'm a firm believer that if you put your mind to something that you can do anything. Emulation is a real low-level and technical type of programming. I think programming an emulator is fun because it stimulates my brain, it improves my abilities as an overall programmer, and it's fun and facinating to actually be simulating the tasks of a real system. I like to problem solve, believe it or not, and the exploration into these other fields of programming that I'm unfamiliar with is fun to me. And with these new things that I'm learning everyday I take those ideas and piece them with my old methods of comparing, mixed the old codes together, etc. With all this talk of pointers, structures, unions, constants, assembly and processor opcodes, memory addressing, I've learned C a hell of a lot better and faster within these past couple months than I did about a year ago. And with these techniques that I've learned in C, I'm finding that their easy to apply to other languages because their all based on assembly or methods performed by the processor everyday. That's why before I was like, "I want to write it in VB." because I've had previous experience with VB. So, I'm more comfortable and flexible in that language. I've written game demos and applications such as Donkey Konger and a little text editor. VB uses all these pre- arranged functions and libraries that allow you to do extremely complicated things in just a few lines of code. When I tried to study C, I hated it at first because there's absolutely nothing available in C, in terms of multimedia stuff (not unless you were to get new libraries and headers). "What were these projects? Why do I need these include files? Why dosen't this thing lemme know if my code is right or wrong before I have to compile?" I was used to Print "Hello" not #include void main() { printf("Hello "); } But, ever since I got into emulator programming I've learned so many new aspects to computers. It's given me a whole new light on programming and programming games. Okay, I'll shut up now. Chris :o) |
|
|
That's what I was on about! *nt*
Posted: Thu Nov 18, 1999 1:41 am
|
|
|
|
Yeesh!
Posted: Thu Nov 18, 1999 1:56 am
|
Hahaha! Very 'Visual' and not very 'Basic'. I'm not actually going to write an emulator or translator or interpreter or whatever else, I only ever read this messageboard because it's interesting to read these posts. I was only seeing if that was a Genesis ROM -> x86 converter was possible in any way. Apparently, it's not, so I'll give it a rest now, mmkay? ~unfnknblvbl |
|
|
Re: Dynarec
Posted: Thu Nov 18, 1999 7:24 am
|
Then why won't Meka or BRSMS run at 60FPS on my P133 w/32MB RAM? KGen98 and Genecyst both run at 60FPS on my setup. My setup: BRAND/MODEL: Compaq Presario 7240 CPU: Intel Pentium 133 RAM: 32MB 72-pin EDO VIDEO CARD: S3 Trio64v+PCI SOUND CARD: ESS AudioDrive 1788 Sorry, but I cannot help but rant at how crap my computer is. It runs Quake and Quake II (kind of) for Christs' sake! How come it won't run the SMS emulator of my choice at 100% speed? Sotty, ~unfnknblvbl |
|
Nyef
|
Re: This is something I've pondered
Posted: Thu Nov 18, 1999 1:44 pm
|
But you snipped one of the brockets, so it looked like it was yours. That is kinda amusing, at that. :-)
It's too bad that freedom is so restrictive...
It varies. Some games have the graphics data in a separate address space, but that tends to be bankswitched, so you then have fun with pointers to keep the right data where you gan get at it. Some games have RAM in that space, and use the CPU to copy data out of the CPU space into it.
True, and I will be fixing DarcNES to do this before the next release. :-)
One of the 100+ MHz 486 systems with PCI slots could possibly do it. Probably couldn't do the sound, though.
That had to suck.
Not nessecarily.
This would only happen if it was an indirect jump. The compiler should catch all direct jumps like that (unless they themselves were the target of an indirect jump).
The trick is to have a fallback position of an interpretive core in the final executable. Or in a DLL.
Then you probably wouldn't like DarcNES in NES mode. The sound support was written before all the good sound info came out, and I haven't really touched it since. SMS sound in DarcNES, on the other hand, is rather nice (except for the noise channel, I'm not happy with it yet).
Note "tend to". There is at least one case on record where someone using a dynarec core had sped up his video code to the point where the CPU core was the bottleneck. His emulator is currently doing 990+ FPS on a Celeron 433 (the test was rigged, however. The CPU is in an infinite loop, and the screen is a static image).
For the layperson, sure. But once you are actually writing one, you may wish to search for better ones.
Perhaps we both are. Maybe 'metaphor' would be a more apropriate word. :-)
Certainly not. In many cases an ASM core is critical. But for what we're dealing with here, a C core is quite acceptable.
I'm not so sure. A recompiler requires procedure pointers at the very least (does VB even have these?). And the x86 code structure is so chaotic that you can't just use a simple array to hold the instructions. And to make matters worse, I can't think of how to do a computed GOSUB in VB. All _real_ basic interpreters support a computed GOSUB. And self-modifying code. And run on 6502 systems (okay, that's probably pushing it, but you can see what I mean). --Nyef |
|
Nyef
|
Re: Dynarec
Posted: Thu Nov 18, 1999 2:04 pm
|
No idea. I think I know why DarcNES isn't running at 60FPS on my P133 w/64MB RAM, though. I've been running it in 16bpp, and that requires an extra step to convert 8->16 bits for the entire buffer. On the other hand, GG mode _is_ running at 60FPS under those conditions, so I'm pretty close to having the speed I want. Just fix up a couple interfaces, and write some ASM code. :-)
Both of which use ASM extensively. 16-bit systems is the point where one has to break out the heavy optimization weaponry. Extensive ASM use, dynarec, all the tricks apply.
This alone says something.
Should be more than enough.
Just about what I have,,,
Gee... And to think I brag about using a computer that's on par with yours. :-)
Because they are written mainly in C, and compiled with GCC, and GCC can't optimize worth shit? --Nyef |
|