Forums

Sega Master System / Mark III / Game Gear
SG-1000 / SC-3000 / SF-7000 / OMV
Home - Forums - Games - Scans - Maps - Cheats - Credits
Music - Videos - Development - Hacks - Translations - Homebrew

View topic - Talking PSG

Reply to topic
Author Message
  • Joined: 05 Mar 2006
  • Posts: 53
  • Location: France
Reply with quote
Talking PSG
Post Posted: Fri May 01, 2015 9:38 pm
I've recently been playing with robot voice effect algorithms for an unrelated project. One of the outputs I got was a barely intelligible, very bleepy voice similar to PSG arpeggios.

This made me wonder if it was possible to imitate human speech through partial reconstitution of the formants on the SN76489.

I made a dirty program to, in order:
-Split a wave file in 1/f length chunks (16.6ms for 60Hz).
-Do a FFT of each of the chunks, to get their power spectrum
-For each of those, get the 3 most powerful components
-Record frequency and power of each chunk in an asm include file
-Optionally generate a simulation wave file, by using 3 square oscillators

What I got is this:
http://furrtek.free.fr/tmp/sega_psg.mp3

First sample is the original file at 44100Hz.
Second is synthesized, updating the PSG registers at 60Hz.
Third updates at 120Hz (twice per frame, best result I think).
Fourth updates at 240Hz (four times per frame).

Altough the intelligibility is heavily dependent on the original voice's clarity, tone, and the use of common (identifiable) words, I think that this approach is still interesting in a data size and time consumption point of view.

It may be used for example to give more depth to game characters during dialogs or cutscenes, where the narrated text is displayed on the screen (matching the text to the voice can work pretty well), or even some sort of voice synthesis from syllabes and a dictionary.

Regarding data size and CPU time, the PSG data can be packed/reduced to 5 bytes per chunk, giving a bitrate of 600 B/s with 2 updates per frame. In comparison to the 2000 B/s and 100% CPU use of the common volume-only based approach.

Evolutions may include overlap analysis, the use of the noise channel, and a frenquency matching algorithm which takes into account the harmonics of square waveforms.

Should I dig more into this ? :o
sega_psg.mp3 (212.32 KB)
Attachment fairy

  View user's profile Send private message Visit poster's website
  • Joined: 28 Nov 2014
  • Posts: 365
Reply with quote
Post Posted: Fri May 01, 2015 10:00 pm
I would love it if you did. Even though the samples aren't as crisp as they "could" be, I think this is a great proof of concept. I've been thinking of diving into some study of this myself, but I'm currently lacking understanding on a good number of topics necessary to do this. The SMS and especially GG, show on some occasions, how clear voice samples can sound when done by someone who knows what they are doing (the Madou Monogatari series I recently worked on come to mind).

I recently assumed that samples took up so much space that games were limited to a few, but seeing games with 40+ voice samples, with rather clear diction threw that misconception out the window. It is more probably, that like FM, most composers/sound engineers were not competent enough to bring out its best features. Compile comes to mind as a developer who worked with both very well.

I'm not sure what the intended outcome of this would be, if voices were given a melodic overlay (like auto-tune?) or some other gimmick, I'm sure it would be an appreciated resource.

It sounds as if this was something you just decided to work on one afternoon as a "what if". It would be nice to see where you could take this. It doesn't need to be anything as sophisticated as a PSGlib for voices, but I'm sure those of us who are less technically capable would appreciate your efforts. With some polishing, maybe a bit more exploration, I can imagine people might want to incorporate something like this into their projects. I know I would.

So, if it is not too much trouble, please continue.
  View user's profile Send private message
  • Joined: 26 Dec 2004
  • Posts: 374
  • Location: Japan
Reply with quote
Post Posted: Fri May 01, 2015 10:45 pm
Yes, it's a cool technique that can have some interesting effects applied to it.

Another good example: #t=257
  View user's profile Send private message Visit poster's website
  • Joined: 05 Mar 2006
  • Posts: 53
  • Location: France
Reply with quote
Post Posted: Sat May 02, 2015 7:33 am
Great demo, I was sure someone already experimented with this but couldn't find the right keywords to hear examples !

I'll be sure to release a tool to export the right data from a wave file, and also the player's source to include.
I'm far from being an audio expert or voice actor, so I can't really know in advance how to improve the sound's quality except by simply updating the PSG registers more often.

I didn't write about playback: The raster line interrupt makes a good timer.

Are there any SMS or GG games which manage to animate stuff while reading samples ? I've always seen the CPU just sit in a playback loop with multiple timing NOPs after disabling interrupts.
  View user's profile Send private message Visit poster's website
  • Joined: 20 Feb 2008
  • Posts: 118
  • Location: Saintes, France
Reply with quote
Post Posted: Sat May 02, 2015 8:10 am
furrtek wrote
Are there any SMS or GG games which manage to animate stuff while reading samples ? I've always seen the CPU just sit in a playback loop with multiple timing NOPs after disabling interrupts.


The only example that comes to my mind is Space Harrier : when you "die", the whole game freezes and the only animated thing is the player's character falling :
  View user's profile Send private message
  • Joined: 01 Feb 2014
  • Posts: 877
Reply with quote
Post Posted: Sat May 02, 2015 9:28 am
This is very interesting. I always stood clear from even experimenting with voice samples because I thought it a necessity to stop the whole game while playing a sample. A sample playback at 60Hz, while admittedly almost intelligible, could still be used to add some character to a game while keeping it running at full speed. The quality would certainly be good enough for non-word voice sounds ("Huh", "Ah", "Mh") or death screams.
  View user's profile Send private message
  • Joined: 25 Feb 2006
  • Posts: 874
  • Location: Belo Horizonte, MG, Brazil
Reply with quote
Post Posted: Sat May 02, 2015 11:38 am
It's a nice idea.I wonder if the noise channel could be used for the sibilant consonants; I doubt it would improve by much, though.
  View user's profile Send private message Visit poster's website
  • Joined: 05 Mar 2006
  • Posts: 53
  • Location: France
Reply with quote
Post Posted: Sat May 02, 2015 2:23 pm
Here: https://github.com/furrtek/PSGTalk

Only takes 44100 8bit mono raw files in for now, and only gives a simulation in the same format, no PSG parameters export. Will do that tonight along with the player asm code.
  View user's profile Send private message Visit poster's website
  • Joined: 31 Oct 2007
  • Posts: 853
  • Location: Estonia, Rapla city
Reply with quote
Post Posted: Sat May 02, 2015 4:27 pm
That's neat ! Really chirpy sound though, much like the FM based attempts bit worse.
  View user's profile Send private message Visit poster's website
  • Joined: 26 Dec 2004
  • Posts: 374
  • Location: Japan
Reply with quote
Post Posted: Sat May 02, 2015 11:04 pm
haroldoop wrote
It's a nice idea.I wonder if the noise channel could be used for the sibilant consonants; I doubt it would improve by much, though.

Fricatives (like /f/) are only noise, so they would benefit from being simulated. "Ancient" discrete-logic voice simulators used a sine/square wave generator coupled with a noise source.
  View user's profile Send private message Visit poster's website
  • Joined: 01 Jan 2014
  • Posts: 331
Reply with quote
Post Posted: Sun May 03, 2015 3:56 am
Cool though slightly nightmare inducing.

The first parse has a base resonance. Be nice if that could be isolated.

EDIT: first parse referring to Second is synthesized, updating the PSG registers at 60Hz. Third sample also has it but reduced.

Is this caused by multiple waves producing artificial square waves below native levels?
  View user's profile Send private message
  • Joined: 25 Feb 2006
  • Posts: 874
  • Location: Belo Horizonte, MG, Brazil
Reply with quote
Post Posted: Sun May 03, 2015 12:16 pm
psidum wrote

The first parse has a base resonance. Be nice if that could be isolated.

EDIT: first parse referring to Second is synthesized, updating the PSG registers at 60Hz. Third sample also has it but reduced.

Is this caused by multiple waves producing artificial square waves below native levels?


It's the opposite: the square waves are a sum of sine waves.
In other words, the distortion you hear is a consequence of using a sum of sine waves to represent a single sine wave.
I guess there could be a way of producing a better approximation but, right now, I can't think of any that doesn't involve brute force. :P

Basically, its an optimization problem where one has the sum of sine waves that compose the original voice, and three square wave generators that themselves produce sums of waves; the objective is to setup the frequency and volume of each generator so that the sum of their signals produce the minimum amount of error when compared to the original sum of senoids. I guess one could also ignore the senoids that are beyond the range of human hearing in order to reduce the number of wariables.
  View user's profile Send private message Visit poster's website
  • Joined: 01 Jan 2014
  • Posts: 331
Reply with quote
Post Posted: Sun May 03, 2015 1:02 pm
Thanks for the run down.

It sounds quite cool when you put it through a low pass filter.
  View user's profile Send private message
  • Joined: 21 Jul 2005
  • Posts: 412
  • Location: GBG
Reply with quote
Post Posted: Sun May 03, 2015 10:28 pm
Last edited by FluBBa on Thu May 07, 2015 3:59 pm; edited 1 time in total
Here is a C implementation of the old C64 SAM, text to speech.
https://github.com/s-macke/SAM

This used the 3 sound channels of the C64 to do text to speech, though the SID can select between square, triangle & sawtooth for each channel plus filters. I guess there is some interesting info there.
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 3828
  • Location: Stockholm, Sweden
Reply with quote
Post Posted: Tue May 05, 2015 8:32 am
I guess anyway you have to account for the first few harmonics, those who are audible (say up to 16kHz at least) to approximate the FFT using 3 square waves.
Can't say if 60Hz update is enough, though.
The generated VGM anyway can be converted to a PSG file and you could use PSGlib to replay it :)
  View user's profile Send private message Visit poster's website
  • Joined: 09 Dec 2013
  • Posts: 228
  • Location: detroit
Reply with quote
Post Posted: Thu May 07, 2015 1:53 pm
FluBBa wrote
This used the 3 sound channels of the C64 to do text to speech, though the SID can select between square, triangle & sawtooth for each channel plus filters. I guess there is some interesting info there.


SAM actually used the digi method of tweaking the volume register (4bit) to produce the voice, it didn't use any of the oscillators at all.

best regards,
- dink
  View user's profile Send private message
  • Joined: 21 Jul 2005
  • Posts: 412
  • Location: GBG
Reply with quote
Post Posted: Thu May 07, 2015 4:01 pm
Yes, you're completely right. I guess I read very late one night =)
I wonder how often it updates its values? And could it be better if using square waves instead of sines?
  View user's profile Send private message Visit poster's website
  • Joined: 05 Mar 2006
  • Posts: 53
  • Location: France
Reply with quote
Post Posted: Thu Oct 29, 2015 9:59 pm
Old topic dig-up, I updated the code so that it's now useful (maybe).

https://github.com/furrtek/PSGTalk

The program can take multiple parameters and can output data for various clock rates. Also added example asm playback code using the raster interrupt.

It sounds nowhere near the simulation file so there's still lots of improvement to do... Todo-list in source file.
  View user's profile Send private message Visit poster's website
  • Joined: 17 Sep 2013
  • Posts: 128
  • Location: Gravataí, RS, Brazil
Reply with quote
Post Posted: Thu Oct 29, 2015 10:21 pm
I saw somethink similiar zapping on youtube these days:

https://www.youtube.com/watch?v=YBxq7k45pBo

the code for the audio part is in the video description.

I aslo was thinking: How about intead using the FFT, use a square wave transform. wouldn't that be more fitted to the PSG chip?
  View user's profile Send private message
  • Joined: 05 Mar 2006
  • Posts: 53
  • Location: France
Reply with quote
Post Posted: Thu Oct 29, 2015 11:21 pm
I just spent some time reading about the square wave transform.
I found a paper explaining the approximation method, but the amplitude equations are puzzling, how are they found for a given precision value ?
I can't figure out how they're related to the value or between each other.
  View user's profile Send private message Visit poster's website
Reply to topic



Back to the top of this page

Back to SMS Power!