Forums

Sega Master System / Mark III / Game Gear
SG-1000 / SC-3000 / SF-7000 / OMV
Home - Forums - Games - Scans - Maps - Cheats - Credits
Music - Videos - Development - Hacks - Translations - Homebrew

View topic - Fullscreen video playback on Sega Master System

Reply to topic Goto page Previous  1, 2, 3  Next
Author Message
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Sun Sep 25, 2016 8:35 am
v5 is up! : http://sfx.gligli.free.fr/smsdev/sms_video_player_v5.7z (also attached).
What's new:
  • Reduced video noise by basically swapping inter and intra tiling passes.
  • Added an option to further reduce video noise by taking into account tiles spatial / temporal coordinates while merging tiles.
  • To reduce data size, per frame tile indices and tilemaps are now packed. Fixed 3:4 ratio for tile indices. Temporal compression for tilemaps.
  • Changed the compiling process to prevent the ROM header from corrupting tiles.

Quality to ROM size ratio is now much better and I was able to fit the rickroll video in a 4 mega ROM. Sonic video is also much better quality.
Tiles indices are now stored relative to the keyframe, which makes them less than 4096, which allowed me to pack 2 in 3 bytes.
Tilemap compression works by skipping parts of the picture that don't change between frames. Very effective combined with noise reduction!

BKK> Eg the bad apple color video should now encode in better quality while using less ROM size.

  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11902
  • Location: London
Reply with quote
Post Posted: Sun Sep 25, 2016 9:19 am
Awesome :) I've been having a go at optimising the player, of course that's all changed now but the code really wasn't bad anyway. I ought to have a go with Emulicious' profiling. I'd like to squeeze some audio in there somehow.
  View user's profile Send private message Visit poster's website
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Sun Sep 25, 2016 9:57 am
Yeah I rewrote a good bit of the code, besides the format changes.
I think the asm should be pretty tight now as I've learned a lot on the Z80 and how to optimise for it since earlier versions.

Cool about the audio :) Using push / pop instead of ex af, af' in VBlank int should leave the whole alternate register set for you to use.
  View user's profile Send private message
  • Joined: 20 Feb 2008
  • Posts: 111
  • Location: Les Herbiers, France
Reply with quote
Post Posted: Sun Sep 25, 2016 1:04 pm
I think this could speed up your code a bit:

TilesUploadSlow:
    ld b, TileSize
-:  outi
    jp nz,-
    dec e
    jp nz, TilesUploadLoop
  View user's profile Send private message
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Sun Sep 25, 2016 1:39 pm
Thank you, It's exactly 26 cycles per iteration so it shouldn't violate VDP timing constraints and probably the most used part of the code!

Actually something else may be done to speedup the tiles upload.
I think during VBlank, the scanline counter restarts from below 192 so in my code I do part of the VBlank with TilesUploadSlow.
I don't know how to properly detect VBlank, it's odd there's no hardware flag for that...
  View user's profile Send private message
  • Joined: 20 Feb 2008
  • Posts: 111
  • Location: Les Herbiers, France
Reply with quote
Post Posted: Sun Sep 25, 2016 2:28 pm
From Charles Mac Donald's VDP documentation :

Quote
----------------------------------------------------------------------------
4.) Status flags
----------------------------------------------------------------------------

Reading the control port returns a byte containing status flags:

MSB .................................LSB
INT OVR COL --- --- --- --- ---

INT - Frame interrupt pending

This flag is set on the first line after the end of the active display
period. It is cleared when the control port is read. For more details,
see the interrupts section.


I usually read the VDP's INT status flag (bit 7 of port $BF) in an actual INT interrupt handler, but you might be able to read it without activating the interruption (I've never tried to do it).

If you wish to detect the end of the VBlank, I guess you'd have to set up a HBlank interrupt on line #0.
  View user's profile Send private message
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Sun Sep 25, 2016 3:02 pm
Ok, I was able to cleanly detect VBlank start by setting a flag in the VBlank int handler but then the addional code I have to write to use it and clear it makes it slower than just testing above or below 192...

I tried to use a HBlank int on line 0 a while ago but the major problem is that I have to access VDP registers from the int handler to stop the HBlank int from triggering, and that conflicts with main program VDP accesses...

I did find another small speedup by combining the 2 tests on VDPScanline in 1. I attached the updated program.
player.asm (8.29 KB)

  View user's profile Send private message
  • Joined: 01 May 2011
  • Posts: 371
Reply with quote
Post Posted: Sun Sep 25, 2016 3:58 pm
I'm having trouble compiling it with v5, I get these errors;

Quote
> Executing: C:\Program Files (x86)\ConTEXT\ConExec.exe "C:\Users\BKK\Desktop\sms_video_player_v5\Compile.bat" "C:\Users\BKK\Desktop\sms_video_player_v5\player.asm"

MEM_INSERT_ABSOLUTE: The current address ($7ff0) exceeds the size of the ROM ($4000).
MEM_INSERT_ABSOLUTE: The current address ($7ff1) exceeds the size of the ROM ($4000).
MEM_INSERT_ABSOLUTE: The current address ($7ff2) exceeds the size of the ROM ($4000).
MEM_INSERT_ABSOLUTE: The current address ($7ff3) exceeds the size of the ROM ($4000).
MEM_INSERT_ABSOLUTE: The current address ($7ff4) exceeds the size of the ROM ($4000).
MEM_INSERT_ABSOLUTE: The current address ($7ff5) exceeds the size of the ROM ($4000).
MEM_INSERT_ABSOLUTE: The current address ($7ff6) exceeds the size of the ROM ($4000).
MEM_INSERT_ABSOLUTE: The current address ($7ff7) exceeds the size of the ROM ($4000).
MEM_INSERT_ABSOLUTE: The current address ($7fe0) exceeds the size of the ROM ($4000).
MEM_INSERT_ABSOLUTE: The current address ($7fe1) exceeds the size of the ROM ($4000).
MEM_INSERT_ABSOLUTE: The current address ($7fe2) exceeds the size of the ROM ($4000).
MEM_INSERT_ABSOLUTE: The current address ($7fe3) exceeds the size of the ROM ($4000).
MEM_INSERT_ABSOLUTE: The current address ($7fe4) exceeds the size of the ROM ($4000).
MEM_INSERT_ABSOLUTE: The current address ($7fe5) exceeds the size of the ROM ($4000).
MEM_INSERT_ABSOLUTE: The current address ($7fe6) exceeds the size of the ROM ($4000).
MEM_INSERT_ABSOLUTE: The current address ($7fe7) exceeds the size of the ROM ($4000).
MEM_INSERT_ABSOLUTE: The current address ($7fe8) exceeds the size of the ROM ($4000).
MEM_INSERT_ABSOLUTE: The current address ($7fe9) exceeds the size of the ROM ($4000).


I can compile Maxim's "Hello World", so I think that I have the environment set up correctly.
  View user's profile Send private message
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Sun Sep 25, 2016 4:12 pm
Ah, I think it's because the WLA-DX version from the tutorial doesn't support 16KB ROMs.
I attached my version (WLA-DX 9.7b plus patched compile.bat).
wlaz80win32_97b.7z (129.36 KB)

  View user's profile Send private message
  • Joined: 01 May 2011
  • Posts: 371
Reply with quote
Post Posted: Sun Sep 25, 2016 5:06 pm
Yes, that was the problem, thanks!
  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11902
  • Location: London
Reply with quote
Post Posted: Wed Sep 28, 2016 5:59 pm
My faster TilesUploadSlow, unrolled:
TilesUploadSlow:
      .repeat TileSize-1
      outi    ; 16 cycles work
      inc iy  ; 10 cycles time wasting -> 26 total
      .endr
      outi    ; last one
      dec e
      jp z, TilesUploadEnd

I also relocated it to before TilesUploadLoop so it can fall through into it, avoiding taking any jump. This requires adding a jump into TilesUploadLoop to maintain program flow.
  View user's profile Send private message Visit poster's website
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Wed Sep 28, 2016 7:17 pm
Nice! A quick profile tells me that makes 11.8% free time, worst case.

I think tile bank # / tile address extraction could be improved but I'm not sure how...

Edit: found one more small improvement by using RAM instead of IX:
.equ FrameTilesOffset $c004
    ; Load frame data address into hl and tiles offset into ram variable
    ld e, (hl)
    inc hl
    ld d, (hl)
    inc hl
    set 6, d; Add $4000
    ld a, (hl)
    ld (FrameTilesOffset), a
    inc hl
    ld a, (hl)
    ld (FrameTilesOffset + 1), a
    ex de, hl
        ; Add frame tile offset to tile index
    ld hl, (FrameTilesOffset)
    add hl, bc
  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11902
  • Location: London
Reply with quote
Post Posted: Wed Sep 28, 2016 9:02 pm
Yes, using index registers is just slow because the prefixes cost another instructions byte, and any offsetting is a maths operation.

If the memory value is being accessed suitably symmetrically, you can push it to the stack instead and shave a few more cycles.

I had a few other optimisations around the frame counter use, avoiding bit shifts, but I need to see how they fit in v5.

Another extreme optimisation tactic might be to unroll the main tile upload loop, assuming it starts at a deterministic part of the frame, you could just pack in the fast and slow uploads as appropriate without any line checks or looping, just a break out based on the tile count needed.

Some code seems to be spending time setting bits/adding offsets which could be baked into the original data.
  View user's profile Send private message Visit poster's website
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Thu Sep 29, 2016 11:55 am
Besides maybe palette copy to RAM (and that is fast enough to cycle pad if not done), I think the tiles upload start is only a few cycles of jitter from being always at the same time offset from a VBlank start. It should start somewhere in late VBlank period.

I think it's the way to go to leave enough time for sound at an acceptable samplerate. Moreover, It should be possible to write the code in a relatively clean way using macros and repeats.
  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11902
  • Location: London
Reply with quote
Post Posted: Thu Sep 29, 2016 4:33 pm
Not a huge optimisation (saves 5 cycles) but it's an opcode I could never use before and therefore feels clever:
SecondPart:
    rld ; get high nibble at (hl) into low nibble of a
/*    ld a, (hl)
    rra
    rra
    rra
    rra*/

    inc hl
    ld c, (hl)
    inc hl

SPEnd:

It'll screw things up if the ROM is writeable, though.
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11902
  • Location: London
Reply with quote
Post Posted: Thu Sep 29, 2016 7:17 pm
Enhancement: the palette update causes flickering pixels on-screen - at least on Emulicious with VDP timing constraints emulation turned on. I split the tilemap upload into two halves, and put the palette update in the middle, to make it happen off-screen.

p3:
    ld c, VDPData

    ; Upload local tilemap to VDP (part 1)

    ld de, $0000 | VRAMWrite
    SetVDPAddress
    ld hl, LocalTileMap
    ld a, (CurFrameIdx)
    and $01
    .repeat TileMapSize / 4
        outi
        out (VDPData), a
    .endr

    ; Upload local palette to VDP

    ld de, $0000 | CRAMWrite
    SetVDPAddress
    ld hl, LocalPalette
    .repeat TilePaletteSize
        outi
    .endr

    ; Upload local tilemap to VDP (part 2)

    ld de, (TileMapSize / 2) | VRAMWrite
    SetVDPAddress
    ld hl, LocalTileMap + TileMapSize / 4
    ld a, (CurFrameIdx)
    and $01
    .repeat TileMapSize / 4
        outi
        out (VDPData), a
    .endr
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11902
  • Location: London
Reply with quote
Post Posted: Thu Sep 29, 2016 7:32 pm
Small optimisation: all usages of the SetVDPAddress macro now use constants, so bake it in (and leave de alone):
.macro SetVDPAddress args address
; Sets the VDP address
    ld a,<address
    out (VDPControl),a
    ld a,>address
    out (VDPControl),a
.endm
  View user's profile Send private message Visit poster's website
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Thu Sep 29, 2016 7:56 pm
It's odd the palette update causes problems in emulicious, at p3 we are already in VBlank...

How do you handle setting the tiles start address using the fixed SetVDPAddress? using a conditional jump?
  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11902
  • Location: London
Reply with quote
Post Posted: Thu Sep 29, 2016 8:10 pm
Last edited by Maxim on Thu Sep 29, 2016 8:15 pm; edited 1 time in total
Optimised a bunch of cycles off the "7 bits bank plus 9 bits tile index unpack to mapper and address" code:
          ; Upper bits of tile index select a rom bank
      ld a, h
      rra ; incoming carry will always be 0; pushes low bit into carry for use below

      ld (MapperSlot1), a

          ; Lower bits select an offset in that bank
          ; we want the low 9 bits of hl, x32, +$4000, in hl
          ; %-------a bcdefghi
          ;   to
          ; %01abcdef ghi00000
      ld a, l
      ld l, 1 ; to get the 01 high bits we need
      .repeat 3
      rra     ; then rotate carry - a - l right three times
      rr l
      .endr
      ld h, a

Any time you are shifting more than 4 bits you can probably save a bit of time by shifting the other way. Single bit-shift opcodes are a bit restrictive. Also, watch out for the fast a-shifters vs. the slower any-register-shifters.
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11902
  • Location: London
Reply with quote
Post Posted: Thu Sep 29, 2016 8:14 pm
gligli wrote
It's odd the palette update causes problems in emulicious, at p3 we are already in VBlank...

CRAM writes cause pixels to light up in the written colour in the overscan area. Many games suffer from this, it's a minor glitch but it's easily avoided here. Enable both borders and timing constraints to see them.

gligli wrote
How do you handle setting the tiles start address using the fixed SetVDPAddress? using a conditional jump?

I forgot I'd already optimised that one :) I have a bunch of little things I didn't post, so I've attached the whole thing now. In this particular context:
- xor a is faster than ld r, 0
- rrca lets you rotate around quicker than rra
- avoiding the copy to de saves some time
player.sms.asm (8.61 KB)

  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11902
  • Location: London
Reply with quote
Post Posted: Thu Sep 29, 2016 8:50 pm
Passing the tiles to yakmo.exe via .txt files causes Windows to try to index the contents, this is not helping the speed :( (Workaround: don't run inside the Users directory, or put the .txt files in the temp folder.)

Windows 10 decided to force a reboot about 6 hours into my encode, I was not happy...
  View user's profile Send private message Visit poster's website
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Thu Sep 29, 2016 9:27 pm
Nice optimisations!
Ah, I didn't think about that for the .txt files, I must have disabled indexation on my Win7.

I was working on massive unrolling of tile upload and got some nice results combined with all your optimisations: there is now 17.8% free time!
That should allow PCM at more than 8Khz I think.
Full unrolling used way more than 16KB of code so I had to use subs and some stack tricks for it to work fine and sill get a nice speed boost.
Also, it was a bit tricky to calibrate it properly but it's now working fine on my SMS2 :)
Attached the updated version.
player.asm (10.02 KB)

  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11902
  • Location: London
Reply with quote
Post Posted: Thu Sep 29, 2016 9:57 pm
That stack abuse is amazing :)

I think I may have abandoned pcmenc somewhere in the middle of making a packed volume nibble data encoding (for its round robin SNR optimisation). I got distracted by timimg errors in Meka... so there's no player for this format, but it's likely to need a lot less CPU than I quoted above, that was a real time RLE unpacker which really only worked on silence anyway. A simple sample player could go faster if it only uses one channel.
  View user's profile Send private message Visit poster's website
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Fri Sep 30, 2016 6:01 pm
Hehe, it does make jumps and use of previous stack totally transparent.

I still think the best way to replay sound is to add many calls to something like:
if scanline != prevScanline then
{
    playSample()
    prevScanline = scanline
}

Because trying to count cycles between playSample() calls seems a little nightmarish. Also the scanline counter is a steady 15.7Khz clock which I think should be a good enough sample rate.

I'm going to try to play a volume ramp with this method, to see if it's feasible.
  View user's profile Send private message
  • Joined: 05 Sep 2013
  • Posts: 1958
Reply with quote
Post Posted: Fri Sep 30, 2016 6:37 pm
I totally agree with the line thing. Also, if you're not using the register alternate set, you can use them in your ISR, as swapping sets it's faster than push/pop, thus reducing interrupt overhead.
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11902
  • Location: London
Reply with quote
Post Posted: Sat Oct 01, 2016 5:04 am
Using line interrupts only works during the active display. Polling the line counter would work, but plenty of the units of work take much longer than a line (228 cycles) - a slow tile upload takes about four lines. Adding in line counter polling at a high enough frequency to not end up losing time would end up costing more CPU, but would be easier than cycle counting the whole thing.

One sample per line is about 15kHz, or one sample every 228 cycles. 8kHz is one sample every 447 cycles. If the sample player can play a sample in 30 cycles - including all overhead, that's highly optimistic, a call and return costs 27 - then that corresponds to 13% and 7% of the CPU respectively. So you can see how the sampling rate can rapidly start eating into the available time.
  View user's profile Send private message Visit poster's website
  • Joined: 01 Jan 2014
  • Posts: 300
Reply with quote
Post Posted: Sat Oct 01, 2016 5:21 am
These are the reasons I do all timing specific stuff via cycle counting though I have never done anything as adventurous as what is going on here.

I thought I would make reference to efry's eclipse ide tool which includes cycle counting as gligli might not have seen it. Its useful for optimizing code at the very least. There are a small number of syntax issues on the more rare instructions but overall the positives outweigh negatives for me at least.

http://www.smspower.org/forums/15498-Z80EclipseIDEInitialRelease

examples of cycle counting http://www.smspower.org/forums/15606-Z80EditorAddedCycleAndByteCounting
  View user's profile Send private message
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Sat Oct 01, 2016 10:18 am
I played with scanline polling a bit, and also tried the hblank int and, yeah, cycle counting seems like the only viable option.

For the player code, I think the best is to have per-frame PCM data and unpack it to RAM as simple PSG register writes. It will have to be double buffered so that the previous frame data can be played while PCM for next frame is unpacked.
This way, playing a PCM sample can be reduced to a 24 cycles macro:
exx
outi
exx
All that's needed is to init alternate c to $7f, set alternate hl to the start of a buffer when a new frame starts (around p3: in player.asm) and it should work I think.
  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11902
  • Location: London
Reply with quote
Post Posted: Sat Oct 01, 2016 11:00 am
Sounds good. The cycle counting will be hard work, but jitter in the timing is not as bad as you might think, so provided the samples come out somewhat uniformly then it'll be good.

If you just do an outi then either you limit yourself to a single channel (quiet) or you can try to use pcmenc's round robin optimised data, either way the buffer unpacker has to fill in the high nibble of each byte. Have a go with ratio 1 - the quality is not great at 8kHz, and the available CPU limits how much we can do about that.
  View user's profile Send private message Visit poster's website
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Sat Oct 01, 2016 2:11 pm
Using pcmenc at 11KHz and ratio 3 gives pretty good results at my opinion, it's not anywhere near hifi but music is recognisable.
I used a brickwall limiter VST to up input audio volume so it's acceptably loud on my TV.
I attached an example with the Sonic CD intro soundtrack.
11400-rto3.7z (220.69 KB)

  View user's profile Send private message
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Mon Oct 03, 2016 6:23 pm
Sound is coming along in a nicer way than expected, still far from easy but definitely doable.
I finally chose 10.5Khz as sample rate (337 cycles per sample) because it aligns nicely with a video frame length and the various outi kernels.
Quality is not much worse than with audio only, there's some noise above nyquist frequency plus an audible glitch and the end of the video frame but that could be because I play a loop for now (200Hz sine).
I still have a bunch of algorithms to adapt for sound and the encoder to update but nothing that is not doable.

By the way, I fixed a few bugs with the packed volumes format of pcmenc (-p 4), here's the diff:
@@ -787,13 +787,13 @@ int chVolPackChunk(uint8_t*& pDest, uint8_t*& pSource, int maxTripletCount, int
       {
          if (i & 1)
          {
-            *pDest |= pSource[3 * i + 0];
-            *pDest++ = (uint8_t)(pSource[3 * i + 1] << 4 | pSource[3 * i + 2] << 4);
+            *pDest++ |= pSource[3 * i + 0];
+            *pDest++ = (uint8_t)(pSource[3 * i + 1] << 4 | pSource[3 * i + 2]);
          }
          else
          {
             *pDest++ = (uint8_t)(pSource[3 * i + 0] << 4 | pSource[3 * i + 1]);
-            *pDest++ = pSource[3 * i + 2] << 4;
+            *pDest = pSource[3 * i + 2] << 4;
          }
       }
       break;
  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11902
  • Location: London
Reply with quote
Post Posted: Mon Oct 03, 2016 7:33 pm
Yeah, I fixed those already and need to push my commits. I'd not got around to making a player at the time, I have now.
  View user's profile Send private message Visit poster's website
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Wed Oct 05, 2016 10:38 pm
Last edited by gligli on Sat Oct 08, 2016 9:15 pm; edited 1 time in total
V6! ( http://sfx.gligli.free.fr/smsdev/sms_video_player_v6.7z )

What's new:
  • PCM sound, using pcmenc ( http://github.com/maxim-zhao/pcmenc ).
  • Temporary files for yakmo are now stored in Windows temp folder.
  • Improved tilemap compression (skip enough tiles to make it a net size gain).

Sound finally was really hard, I had to make all the algorithms run in constant time to push samples to PSG at proper times. The tilemap unpacker couldn't be modified like that because timing depends on data being unpacked so I added code to more or less dynamically count cycles and push a sample when needed. Code speed was a problem too, even with the optimisations we did with Maxim.

PCM is 10.5Khz unsing pcmenc optimised 4bit round-robin. Sound quality is lo-fi but not too bad I think. There is some clicks at times and a little pitch modulation, both because of jitter.

Emulator compatibility is low and only MEKA and Emulicious seem to be able to play it properly. Of course it runs on real hardware :)
  View user's profile Send private message
  • Joined: 25 Feb 2006
  • Posts: 457
  • Location: Belo Horizonte, MG, Brazil
Reply with quote
Post Posted: Wed Oct 05, 2016 11:27 pm
It looks and sounds pretty amazing.
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11902
  • Location: London
Reply with quote
Post Posted: Thu Oct 06, 2016 11:12 am
It requires accurate enough PAL emulation, but I'm surprised that's a problem in most emulators. It glitches out a few times in Meka, it looks like single frame junk palette data.

The audio quality is way better than anything else on the system (homebrew aside), doing that while playing video is amazing.

It's a shame not to be able to target the final ROM size, if the quality could take advantage of it. The encoding is so CPU intensive, though, that I'd prefer not to iteratively solve for it :)
  View user's profile Send private message Visit poster's website
  • Joined: 01 May 2011
  • Posts: 371
Reply with quote
Post Posted: Thu Oct 06, 2016 2:19 pm
This is brilliant! Emulicious seems to run it better than MEKA. I had some issues with audio cutting out on the latter.

Edit: I didn't quite get the sound in perfect sync for this, and had about 100KB spare, so video quality could have been improved slightly.
Challenge AD V6.rar (636.69 KB)

  View user's profile Send private message
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Thu Oct 06, 2016 3:01 pm
Last edited by gligli on Thu Oct 06, 2016 3:30 pm; edited 1 time in total
Thanks!

I had (Windows) audio buffering issues in the most recent MEKA I could find, but emulation seemed to work fine.
There is very little CPU time left and when it runs out (because of inaccurate emu timing or too slow Z80 code), it starts to fall apart.
Even Emulicious in VDP timing emulation mode is not perfectly accurate and things that fail on real HW can still work on it.

Yeah, even my Sonic CD video encoded a little too big for the 1MB limit and I didn't bother redoing it.
I think I could compute a max ROM size, but that could be a pretty bad estimation of final size.
The actual final size will depend on the video and would need steps 1 / 2a / 2b run, I can't do much about that given the way K-Means works.
  View user's profile Send private message
  • Joined: 01 May 2011
  • Posts: 371
Reply with quote
Post Posted: Thu Oct 06, 2016 3:23 pm
Is there an estimate on how much space audio takes?
  View user's profile Send private message
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Thu Oct 06, 2016 3:29 pm
Yep, exactly 421 bytes per frame.
  View user's profile Send private message
  • Joined: 01 May 2011
  • Posts: 371
Reply with quote
Post Posted: Thu Oct 06, 2016 3:45 pm
Thanks, that works out at ~316KB/minute. That might just be enough for my next video (95 seconds) if I reduce the quality to the bare minimum.
  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 11902
  • Location: London
Reply with quote
Post Posted: Thu Oct 06, 2016 5:16 pm
You can target 4MB if you don't care about Everdrive.

In Meka, make sure you are set to PAL and 50Hz. Anything else will make the audio go funny.

What chance of an NTSC/60 version? At first thought, you would have to target 12fps, as at 15fps there's probably not enough time to upload to VRAM due to the shorter VBlanks. The final tilemap upload might not be able to complete within the VBlank which would be troublesome.

You could also go for a Game Gear version (they are all NTSC/60 timing) with a much reduced frame size presumably making certain things easier, and a richer palette maybe helping on quality sometimes.
  View user's profile Send private message Visit poster's website
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Thu Oct 06, 2016 5:59 pm
Yeah, the major problem with 60Hz is that tilemap upload can't fit in a single VBlank. Tilemap double buffering could be brought back to fix this, it would only cost going from max 208 tiles to max 207 tiles per frame to fit the dummy sprite table.

Game Gear seems easier as it should be possible to use 2 line interrupts to enable / disable display and extend the VBlank period. 15fps should be doable due to reduced frame size too.

Anyway the whole cycle counting would have to be redone so I don't think I'll try it anytime soon.
  View user's profile Send private message
  • Joined: 01 May 2011
  • Posts: 371
Reply with quote
Post Posted: Fri Oct 07, 2016 11:06 am
What do I need to edit in order to compile >1MB ROMs? That came out larger than I was expecting (probably too many key frames).
  View user's profile Send private message
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Fri Oct 07, 2016 12:17 pm
Nothing, they should just work.
  View user's profile Send private message
  • Joined: 01 May 2011
  • Posts: 371
Reply with quote
Post Posted: Fri Oct 07, 2016 1:35 pm
Ah, I was able to compile with player_nosound.asm, but not with player.asm

Quote
> Executing: C:\Program Files (x86)\ConTEXT\ConExec.exe "C:\Users\BKK\Desktop\sms_video_player_v6\Compile.bat" "C:\Users\Maff\Desktop\sms_video_player_v6\player.asm"

object.o:C:\Users\BKK\Desktop\sms_video_player_v6\player.asm: INSERT_SECTIONS: No room for section "Data" (7170 bytes) in ROM bank 0.
The system cannot find the file specified.
> Execution finished.


Maybe something wrong with my WAV file?
  View user's profile Send private message
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Fri Oct 07, 2016 1:50 pm
This is because the index.bin file is too big to fit with the code in the first 16KB of ROM (its size depends on the number of video frames).
I'll have to try to make the code a little smaller to see if it can fit...
How many frames does your video have?
  View user's profile Send private message
  • Joined: 01 May 2011
  • Posts: 371
Reply with quote
Post Posted: Fri Oct 07, 2016 1:53 pm
1190 frames, 38 keyframes. Would reducing the number of keyframes help?
  View user's profile Send private message
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Fri Oct 07, 2016 1:55 pm
Unfortunately no, but I think I can easily reduce the size of the code a bit, we'll see if it's enough.
  View user's profile Send private message
  • Joined: 14 Sep 2016
  • Posts: 71
  • Location: Lyon, France
Reply with quote
Post Posted: Fri Oct 07, 2016 3:39 pm
I attached v6.1, it should yield enough free space for your video.
player.asm (23.16 KB)

  View user's profile Send private message
  • Joined: 01 May 2011
  • Posts: 371
Reply with quote
Post Posted: Fri Oct 07, 2016 5:17 pm
That did it, thanks!

I'll attempt to squeeze it into 1MB along with a higher quality 4MB version.
  View user's profile Send private message
Reply to topic Goto page Previous  1, 2, 3  Next



Back to the top of this page

Back to SMS Power!