by andete. Original documents available at: https://github.com/andete/ym2413/tree/master/results

In this post I'll look at the YM2413 envelope generator (EG). The EG has 4 phases: attack, decay, sustain and release (sometimes referred to as ADSR-envelopes). More specifically, in this post I'll look at the decay and release part. In fact release behaves the same as decay, so I'll only discuss decay. I'll leave the attack part for later because it's more difficult. The sustain part is trivial (envelope level remains constant) I won't discuss that either.

### Envelope-Generator Decay-Rate levels

To start this experiment I manually tweaked the instrument/channel parameters till I got a 'nice' looking waveform (I'll explain in a bit what I mean with 'nice'). I came up with these settings:

OperatorAMPMEGKRMLKLTLWFFBARDRSLRR
modulator0010000630015150015
carrier0010000 0 15031505
 reg#0x20 = 0x00 key-off reg#0x10 = 0x00 fnum-low=0 reg#0x30 = 0x00 max volume / custom instrument reg#0x20 = 0x19 key-on / block=4 / fnum=256

Note that the 'key-on' bit is first set to 0 then to 1, this makes sure we re-trigger the ADSR phases.

These settings result in the following waveform:

As expected the envelope of the wave goes down exponentially. When plotted in a logarithmic scale (not shown) it goes down linearly, and as we'll see that's also how the YM2413 implements the envelopes.

If we zoom-in at the top region (the rectangle marked in green), we see this:

We want to measure the EG levels (y-axis), though the YM2413 output level not only depends on the EG but also on the volume (no problem we picked the maximum volume) and the sine-wave-phase. So for each EG level we want to make sure the sine-wave-table hits a value (close to) the maximal value.

So a 'nice' signal means:

• Pick the frequency 'just right':
• too low -> sine doesn't reach max amplitude in each EG segment
• too high -> step-size may be too high so that we skip over the max amplitude region in the sine-table
• Pick decay rate (DR) 'just right':
• too high -> we skip over the maximum amplitude
• too low -> not really a problem, but measurements takes very long

In the zoomed image you see that for each EG segment we have at least 4 peaks where the sine wave reaches max amplitude. 1 peak might also work, but 4 allows to double check (in case of noise).

Note that we picked SL=15 (the maximum value). This means the EG will only go down to -45dB. This is not the full YM2413 range which is -48dB.

The next two images zoom-in near the tail of the waveform (the rectangle marked in red in the full image):

The first of these two shows the location where signal just goes to lowest amplitude, after this point it only takes on the values +1, +0, -0, -1 anymore. The second images is located much further to the end where the envelope has stabilized (so where we reached -45dB). It uses the same 4 output levels. But if you look closely the shape is not the same (in an earlier post we had a similar situation for volume=14 vs volume=15). Unfortunately this means that figuring out the number of EG steps is not as simple as counting the number of plateaus in the waveform.

To investigate this further I wrote a program that takes the waveform as input and:

• Detects changes in the EG level (to do this is looks for local maxima, then it looks for (abrupt) changes in successive maxima).
• It prints both the amplitude and the position of these changes. (Note that amplitudes are in DAC values instead of converting them to YM2413 values, I found this to be more robust against measurement noise).
• It also prints the difference in position compared to the previous change.

The result is shown in the following table:

xydelta-x
060494-
1020593131020
3068582402048
4092571621024
5116560841024
7164551112048
8188541461024
9212532651024
11260523582048
12284514691024
13308507061024
15356499622048
16380492041024
17404485311024
19452478652048
20476472211024
21500465651024
23548460322048
24572454891024
25596449411024
27644443942048
28668438551024
29692434211024
31740429832048
32764425471024
33788421241024
35836416772048
36860413561024
37884409201024
39932405922048
40956402711024
41980399491024
44028396232048
45052394071024
46076390811024
48124388562048
49148385361024
50172383181024
52220381002048
53244378831024
54268376671024
56316374442048
57340372321024
58364370121024
60412367972048
61436366881024
62460364711024
64508363592048
65532361441024
66556360361024
68604359292048
69628358121024
70652356041024
72700354912048
73724353861024
74748352791024
76796351732048
77820350661024
78844349571024
80892348492048
81916347321024
84988346243072<-- no longer follows pattern
86012345161024
87036344071024
90108343003072
93180341893072
94204340811024
97276339713072
99324338612048
102396337513072
106492336464096
109564335373072
114684334225120
118780333174096
123900332065120
131068330997168
140284329909216
1525723287712288

The upper part of this table is very regular, delta-x always repeats the same pattern: 1024 1024 2048 (this pattern will be explained in the second part of this post).

Starting from x=84988 the pattern breaks. Though at that point, when translated to YM2413 output levels, the difference in amplitude is only 1 level. So there simply cannot be a level in between. Or at least my program cannot detect such 'in between' levels. When visually inspecting the waveform around x=83964 I indeed saw a change in waveform 'shape' (I've not created a picture of it, but it's similar to the two pictures of the tail above). So it's reasonable to assume that the pattern '1024 1024 2048' simply keeps on repeating till the end.

The changes in waveform-shape are very subtle, but as far as I can tell the waveform really stops changing at (about) x=162474. That corresponds to 120 different EG levels (and confirmed by following the '1024 1024 2048' pattern). Remember that we set car.SL=15 (means decay from 0dB to -45dB). If we extrapolate to the full range (0dB to -48dB) there would be 128 EG steps, and that's a nice 'round' number.

The YM2413 datasheet mentions EG goes in steps of 0.325dB. Instead we measured 48dB/128 = 0.375dB. So I assume the value in the datasheet is a typo (as we'll see below there are more such typos in the datasheet).

### Envelope-Generator Decay-Rate timing

To reverse engineer the decay-rate timing I took a different approach: I started from the decay-rate timing table (0%-100%) in the YM2413 datasheet. (I actually did a large part of this analysis before I had access to the YM2413 measurement board).

I've copy/pasted the table from the datasheet below, but reformatted it in 4 columns (so e.g. RATE=25 can be found in row '24' column '+1').

RATE+0+1+2+3
0infinfinfinf
420926.616807.21400612028.7
810463.38403.587002.986014.32
125231.644201.793501.493007.16
162615.822100.891750.751503.58
201307.911050.45875.37751.79
24653.95525.22437.69375.9
28326.98262.61218.84187.95
32163.49131.31109.4293.97
3681.7465.6554.7146.99
4040.8732.8327.3623.49
4420.4416.4113.6811.75
4810.228.216.845.87
525.114.13.422.94
562.552.051.711.47
601.271.271.271.27

The first and last rows are special (rate 0..3 show no decay, rates 60..63 have the same value). The other rows have the following structure:

• The values in row N+1 are equal to the values in row N divided by 2.
• The values in column +1 are approx equal to column +0 times 4/5 [1].
• The values in column +2 are approx equal to column +0 times 4/6.
• The values in column +3 are approx equal to column +0 times 4/7.

The values in the table are listed in milliseconds. When expressed in multiples of the YM2413 sample duration (sample-rate = 3579545MHz/72) it shows even more structure. I won't show such a transformed table. Instead I'll give a program that can reconstruct the original table.

double dur = 72.0 / 3579545.0 * 1000; // duration of 1 sample in ms
for (int i = 4; i < 64; ++i) {
int s[4] = {127, 102, 85, 73};
int cycles = (i < 60)
? (1 << (14 - (i / 4))) * s[i & 3]
: 63;
cout << cycles * dur << endl;
}

And the corresponding generated table:

RATE+0+1+2+3
0infinfinfinf
420926.616807.214006.012028.7
810463.38403.597002.996014.33
125231.654201.793501.493007.17
162615.822100.901750.751503.58
201307.911050.45875.374751.792
24653.956525.224437.687375.896
28326.978262.612218.843187.948
32163.489131.306109.42293.9739
3681.744565.65354.710946.9870
4040.872232.826527.355423.4935
4420.436116.413313.677711.7467
4810.21818.206636.838865.87337
525.109034.103313.419432.93669
562.554512.051661.709711.46834
601.267201.267201.267201.26720

By properly rounding these values we can almost perfectly reproduce the values from the datasheet. (Initially there were some discrepancies, but these all turned out to be transcription errors from a not always very readable scanned document, so single-digit errors like 0<->8 or 2<->7).

So now we have a formula for the decay rate timing. But what do these numbers mean exactly? How can we translate these numbers to changes in the envelope level?

From the previous section we know there are 128 envelope levels. So in a way the only remaining question we need to answer is: When does the envelope generator switch from one level to the next?

I didn't figure this out completely myself, instead I read existing OPLx emulation code (mostly YM2413Burczynski.cc and YMF262.cc in the openMSX source tree). Based on that I got a good idea of what to look for and then I could confirm (or reject, see below) whether the measurements matched the emulation algorithm.

So at each sample for each operator we need to decide whether to go to the next envelope level or not. The algorithm goes like this:

• Depending on the {mod,car},{DR,RR,KSR} and channel.{fnum,block} settings we calculate an effective-rate (the formula is given in the datasheet). This results in a number between 0 and 63 (this is also the index in the decay-rate tables above).
• Rates 0..3 and 60..63 are special:
0.. 3: never advance to the next EG level
60..63: always advance 2 EG levels
• For the other levels we calculate 2 parameters (parameter names taken from the YM2413Burczynski source code):
eg_shift = 13 - (rate / 4)
eg_select = rate & 3
• 'eg_select' selects between 4 small tables (with each 8 entries):
{0,1,0,1,0,1,0,1} // 4 out of 8
{0,1,0,1,1,1,0,1} // 5 out of 8
{0,1,1,1,0,1,1,1} // 6 out of 8
{0,1,1,1,1,1,1,1} // 7 out of 8
• There is one global counter (shared among the 18 operators), each sample that counter is increased by one. (Side-note: the same counter is also used for other stuff, hopefully I can confirm that in future posts).
• For each operator we shift that global counter right over 'eg_shift' bits. Only if all the bits that were shifted out are zero we execute the next step. (Side-note: an alternative mechanism that may or may not be cheaper in hardware is to look at the carry-in bits from incrementing the global counter, or check whether the n-th bit changed after the increment). So for example if 'eg_shift=4', we only execute the next step once in every 1<<4 = 16 iterations.
• We take the lower 3 bits of the shifted global counter and use that as an index in the 'eg_select' table. Then add the value from the table to the current EG level. Note that the table may contain 0 (so we still don't advance in that case).

An example: for RATE=8, in this step, we only go 4 out of 8 times to the next level. For rates 9, 10 and 11 that's respectively 5/8, 6/8 and 7/8. If we follow this sequence we'd expect 8 out of 8 for RATE=12, instead we again use the 4/8-table. But because eg_shift is one less for RATE=12 compared to RATE=8 this step triggers twice as often and we effectively get 8/8.

Another example: the waveform shown in the pictures above has decay RATE=14. For that rate we have eg_shift=10 and use the table {0,1,1,1,0,1,1,1}. eg_shift=10 means we only advance every 1024 samples. The table means we only advance 3 out 4 times (=6/8). So combined this means that the number of samples before moving to the next EG level is in sequence 2048, 1024 and 1024 samples. And this is exactly what we measured.

Note that because of the global counter, the transition from the very first EG level (level 0) to the next level typically triggers faster than the transitions to the other levels (because the global counter is likely not zero when the decay phase starts). I didn't mention it before, but in the zoomed image above you can indeed see that the first EG-segment only has 3 peaks while all the others have 4 or 8 peaks.

Side-note: one difference between the YM2413Burczynski code and the measurements is that the former uses 256 EG levels while we only measured 128 levels. This mistake might be because the YM2413Burczynski code is derived from OPL2/3 emulation code and those chips have double the EG resolution (0.1875dB per step, in addition the OPL2/3 EG range is 0-96dB, so in total OPL2/3 have 512 EG-steps).

Next to rate=14 (see the images above), I also confirmed the following rates using the same approach, I only had to re-tweak the parameters to again get a 'nice' waveform:

0.. 3 -> as expected: no transitions
4 -> as expected: 8192
9 -> as expected: 4096,4096,4096,2048,2048
14 -> as expected: 2048 1024 1024
19 -> as expected: 1024,512,512,512,512,512,512

For the following rates I had to use a different approach. I'll first show the results:

48 -> as expected: 4
49 -> as expected: 4,4,4,2,2
50 -> as expected: 4,2,2
51 -> as expected: 4,2,2,2,2,2,2
60..63 -> as expected: increase 2 levels per step

The envelope is changing very rapidly (only stays constant for 4 or 2 samples), so the sine-wave-peaks approach doesn't work anymore. Instead I generated a waveform like this (image shows rate=49):

Instead of having an infinitely fast attack rate (AR=15) I used a slower one (AR=4..7). The left part of the image shows the attack phase, the right part shows the decay phase. I also used a very low frequency (though I could only use 'fnum' for this, 'block' has an influence on the effective-rate so it cannot be chosen freely). The combination of AR and fnum is chosen so that when the attack phase finishes we're (approximately) at 1/4th of the sine wave. So at that point we've reached the 'top' of the sine and that means locally the sine wave is not changing too much. At least the sine-shape is changing much less compared to the rapid decay-rate changes. This can be seen in the red-encircled part in the image: at the end of the attack-phase, the waveform is reasonably flat, at least compared to the decay phase immediately right of it.

The next image shows the same waveform, but zoomed-in on the decay phase (the green rectangle in the picture above).

You clearly see (short) flat sections in this picture (this means that indeed the sine-shape isn't changing too much yet). You can see that there's a repeating pattern of 3 segments of length 4 followed by 2 segments of length 2. And that's exactly what the above algorithm predicts.

I measured rates 52..59 in a very similar way but now it did NOT fully match the predictions. The overall duration was predicted correctly, but not the details of the EG-level-transitions. I'll again first show the results:

52 -> as expected: 2
53 -> got 2,2,2,2,2,2,1,1,1,1 (expected 2,2,2,1,1)
54 -> got 2,2,1,1,1,1 (expected 2,1,1)
55 -> got 2,2,{12x1} (expected 2,1,1,1,1,1,1)
56 -> as expected: 1
57 -> got { 4x0.5} {12x1} (expected 0.5,1,1,1)
58 -> got { 4x0.5} { 4x1} (expected 0.5,1)
59 -> got {12x0.5} { 4x1} (expected 0.5,0.5,0.5,1)

Note: segment duration of 0.5 means EG advanced 2 steps

Because this is such an unexpected result I'm including a picture of rate=54, so you can double check my findings:

This is again zoomed-in at the decay phase. It shows 2 segments of length 2 followed by 4 segments of length 1. To me this is very unexpected because the original algorithm predicts only 1 segment of length 2 followed by 2 segments of length 1, and that would result in a much smoother curve.

For rates 52..55 the above algorithm would work and would give a smoother result. For rates 56..59 the algorithm breaks down, because eg_shift is negative. The YM2413Burczynski code has a solution for that, but I didn't bother explaining it because the predictions are anyway wrong.

The above algorithm can be fixed by changing to 16-entry eg_select tables for rates 52..59:

52: {0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1}
53: {0,1,0,1,0,1,0,1,0,1,0,1,1,1,1,1}
54: {0,1,0,1,1,1,1,1,0,1,0,1,1,1,1,1}
55: {0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1}
56: {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}
57: {2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1}
58: {2,2,2,2,1,1,1,1,2,2,2,2,1,1,1,1}
59: {2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1}

These tables show what is happening, but it almost certainly is not how it is happening. I mean how it is implemented it hardware (though for a software implementation it might be the simplest approach). If it really was based on 16-entry tables then for sure the curves could be made smoother.

### Die-shot

I again looked at the YM2413[2] die-shot: http://siliconpr0n.org/map/yamaha/fhb013/mz_ns50xu/ And I'm glad that also this time I found a feature that supports the findings in this post:

Use the die-shot link to zoom-in on the region marked in red in this (small) picture. You'll find an array of 7 chains of 18 flip-flops. That is 18 times a 7-bit value. Or for each of the 18 operators a 7-bit EG-level value. The output of those flip-flop-chains also seems to be routed to some small logic block (an adder?) and then further routed to the input of the exp-table-ROM. That's exactly what you'd expect for the EG-level values.

The die-shot also, more or less, confirms the global shared counter approach. At least we do not see any array of 18x(13+3) bits, such an array would be required to give each operator it's own counter.

There is a yet-unknown array of 18x12 bits (located in the top-middle rectangle). My current best *guess* is that this is related to the phase-modulation calculations. At least the YM2413 emulators seem to require extra storage for this (carried from one iteration to the next). Hopefully I can tell more about this in the future.

1. ^ More accurate values for the column-ratios are: 102/127, 85/127, 73/127. Side-note + looking ahead: these less nice ratios are because the table lists the time needed for 127 EG transitions, if they were based on 128 transitions (so if the duration of the first or last level was included, then the simpler ratios would be correct).
2. ^ actually the fhb013