Forums

Sega Master System / Mark III / Game Gear
SG-1000 / SC-3000 / SF-7000 / OMV
Home - Forums - Games - Scans - Maps - Cheats - Credits
Music - Videos - Development - Hacks - Translations - Homebrew

View topic - aPLib decompressor improvements (plus other compressors)

Reply to topic
Author Message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14685
  • Location: London
Reply with quote
aPLib decompressor improvements (plus other compressors)
Post Posted: Tue Dec 29, 2015 6:47 pm
Last edited by Maxim on Wed Dec 30, 2015 5:19 pm; edited 1 time in total
[Split from http://www.smspower.org/forums/15763-HowAboutDevelopingSGGamesInC ]

Some sort of relevant updates:

- There are speed-optimised versions of the aplib decompressor by MetalBrain and Syx: http://www.cpcwiki.eu/forum/programming/aplib-decruncher-10-faster-)/ which you might want to look at
- I did some trials of ZX7 - its compression is generally worse than aPLib
- I also came across Exomizer v2, which seems to compress better than anything else I've found but needs a fairly big (156 bytes) RAM buffer

I haven't hooked these last two up to decompressors, let alone ones that go to VRAM, to compare the speed/space tradeoffs.
  View user's profile Send private message Visit poster's website
  • Joined: 07 Oct 2015
  • Posts: 114
Reply with quote
Post Posted: Tue Dec 29, 2015 9:18 pm
I've been using Metalbrain's optimized code in my Spectrum projects since 2008. I haven't tried the original dwedit's port in the good ol' ZX so I can't really compare. Metalbrain is one of the most talented assembly coders I know so I'm sure he's made a good job.

I've experimented with all those decompressors. I first used pucrunch, then tried aplib and exomizer. I've used exomizer in the Amstrad CPC as it comes bundled with the library I was using, the CPCRSLIB. It has the best compression ratio but decompressing is a bit slow and pretty hard on the stack.

I stuck with aplib circa late 2008 'cause it offered better compression than pucrunch, the decompressor was fast and small, and it required little memory. We've used it in all our Spectrum games.

It allowed us to pack the almost 300Kb of data Ninjajar! to fit entirely in the 128K of RAM of the ZX Spectrum.

If somebody has problems with appack.exe, you can try the old apack.exe compressor. I've been using it since 2008, and just tried it to compress the binaries for SMS and the decompressor works fine with it. It just compresses (appack.exe also decompresses), though. Attached is my copy of apack.exe I can't seem to find elsewhere.
apack.zip (24.97 KB)

  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14685
  • Location: London
Reply with quote
Post Posted: Tue Dec 29, 2015 9:43 pm
I'm part way through adapting the faster aPLib unpacker to go straight to VRAM, although that is necessarily extremely slow on readback and seek. It's a bit tricky because it loads up the registers a bit more, plus it uses af' which may cause trouble for people expecting it to be left alone - which I'd guess is fairly rare in practice.
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 3758
  • Location: Stockholm, Sweden
Reply with quote
Post Posted: Wed Dec 30, 2015 8:48 am
10% faster? I wonder if it's worth.
@Maxim: if we could work together on a VRAM depacker that uses a temporary RAM area (the size of which we could set via define) - [or the stack!] when readback from VRAM is needed, that would probably speed up things considerably and be worth rewriting the whole thing.
Do you know where I can read the details of the depacker algorithm?
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14685
  • Location: London
Reply with quote
Post Posted: Wed Dec 30, 2015 9:29 am
I don't think the algorithm or format are documented apart from the source you see, nobody thought to add many comments. It seems to be a bit packed stream of raw data plus gamma coded look backs, but I'm sure there's more to it than that. I'll see if I can add comments as I look at it this morning on the way to work. I suspect the look back copying could use the stack, and we might avoid setting the write address all the time and stash the port in c for a lot of time saved.
  View user's profile Send private message Visit poster's website
  • Joined: 07 Oct 2015
  • Posts: 114
Reply with quote
Post Posted: Wed Dec 30, 2015 10:01 am
I can't remember the specific details, but when talking about this and other depackers in the past it was mentioned that, usually, the "window" needed during depacking was rather big, in the Kb area.

Personally, I don't think speed is an issue there. You can always mask it using different methods, plus it is a task to be performed "offline" (out of the game loop, I mean).

I'd rather leave it as is, unless you just want to do it for kicks :)

By the way, maybe this little proggie I made to check the % of free space in raw, headerless ROM is useful for somebody, so I'm sharing it here.

$ romview.exe raw_rom [# of banks]


If # of banks is not specified, the program calculates it.

It outputs a raw_rom.png file with a picture simmilar to this (btw, my current SG project).

Might be useful. Sources included. Compiles with freebasic.
output.png (7.08 KB)
output.png
romview.zip (109.12 KB)

  View user's profile Send private message
  • Joined: 05 Sep 2013
  • Posts: 3758
  • Location: Stockholm, Sweden
Reply with quote
Post Posted: Wed Dec 30, 2015 10:22 am
na_th_an wrote
when talking about this and other depackers in the past it was mentioned that, usually, the "window" needed during depacking was rather big, in the Kb area


Actually, I think we won't keep a full window, just a buffer for speeding up the VRAM->VRAM transfers. Even just a 16 bytes buffer would save lots of time, as you won't have to keep on setting VRAM read address and VRAM write address (two OUTs each operation!) for each byte you want to copy.

Your tool is nice. A black pixel is $00 and any other value is a white (green!) pixel, right?

Here's Waimanu Scary Monsters Saga image, for fun :) You can clearly see what's not compressed, there. Also, you can see how using superfree sections left free space only at the end of the last two banks.
WaimanuSMS_rom_image.png (20.96 KB)
Waimanu Scary Monsters Saga rom map
WaimanuSMS_rom_image.png

  View user's profile Send private message Visit poster's website
  • Joined: 07 Oct 2015
  • Posts: 114
Reply with quote
Post Posted: Wed Dec 30, 2015 10:43 am
sverx wrote
Your tool is nice. A black pixel is $00 and any other value is a white (green!) pixel, right?


Exactly. Of course it's not perfect, as any zero-paded pool of data at the end of the ROM would look like empty space, but it's a decent solution to give yourself an idea of how much space you have to play with (so you can tell the musician "you have 16Kb, make anything you like with it!") :-)

It's inspired in the much better and more complex nessc by Shiru.
  View user's profile Send private message
  • Joined: 05 Sep 2013
  • Posts: 3758
  • Location: Stockholm, Sweden
Reply with quote
Post Posted: Wed Dec 30, 2015 10:58 am
na_th_an wrote
it's a decent solution to give yourself an idea of how much space you have to play with


latest ihx2sms gives you details on how many bytes are used in your ROM, and that's really counting code and data, even data at zero.
  View user's profile Send private message Visit poster's website
  • Joined: 07 Oct 2015
  • Posts: 114
Reply with quote
Post Posted: Wed Dec 30, 2015 12:18 pm
sverx wrote
na_th_an wrote
it's a decent solution to give yourself an idea of how much space you have to play with


latest ihx2sms gives you details on how many bytes are used in your ROM, and that's really counting code and data, even data at zero.


It seems that I have to update, then :) Thanks.
  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14685
  • Location: London
Reply with quote
Post Posted: Wed Dec 30, 2015 12:39 pm
I expect the linker will avoid gaps anyway, so the unused bytes will all be at the end.

I've gone through a Python aPLib depacker and used it to add copious comments to the code - which I'll post later. Basically, aPLib is a pure LZ encoder with one unusual feature: it can reuse the LZ offset around "gaps" in the LZ match. This can save a lot of bytes when you have similar runs of data. Apart from that, there's a bitstream of commands and variable-length-encoded parameters, which is interleaved byte-wise with byte-sized data (for unmatched bytes and also for 8 bits of the LZ offset). The slowness seems to come from two things:

1. The bitstream unpacking. The Z80 is a bit slow at this stuff, you hve to do it a bit at a time; plus the code costs a call per bit.
2. The VRAM copying. A small buffer would help a lot, maybe just 16 bytes on the stack.

The bitstream stuff can be helped by the optimisations linked above, at the cost of a larger decoder. I'll play with the VRAM buffer stuff after I get it working :)
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 3758
  • Location: Stockholm, Sweden
Reply with quote
Post Posted: Wed Dec 30, 2015 2:21 pm
Maxim wrote
Basically, aPLib is a pure LZ encoder with one unusual feature: it can reuse the LZ offset around "gaps" in the LZ match.


I don't get it. You mean this comment?
;if gamma code is 2, use old R0 offset, and a new gamma code for length


BTW studying the current code I can see sometimes it's rather quirk, as in
  ld bc,#0
  call ap_getbitbc
  call ap_getbitbc
  call ap_getbitbc
  call ap_getbitbc

[...]

ap_getbitbc: ;doubles BC and adds the read bit
  sla c
  rl b
  call ap_getbit
  ret z
  inc bc
  ret

which is funny because it's doubling BC (and eventually adding 1) 4 times, after having set BC to 0. This means B will remain 0 no matter what...

this code would just do the same
xor a
.rept 4
  add a,a
  call ap_getbit    ; of course that function there should keep A register unchanged
  jr z,+
  inc a
+:
.endr
ld c,a
ld b,#0


edit: don't you feel we should split this topic? :)
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14685
  • Location: London
Reply with quote
Post Posted: Wed Dec 30, 2015 5:34 pm
Topic split!

I found this: https://code.google.com/p/kabopan/source/browse/trunk/kbp/comp/aplib.py and made a hacked up decompressor here: https://github.com/maxim-zhao/aplib.py which attempts to document the format and also give you some analysis of a compressed file. I didn't look at the Python compressor yet, I reckon the opaque binary one does a decent job and I hate writing compressors :)

The codec can in fact support arbitrary-length distances for the LZ matches and their lengths, but practically it is limited to 16 bits and more practically to 16KB. We could therefore make some assumptions about the data to simplify and optimise the decompressor, for example that we don't need to handle the distances over 32000 bytes (or that could be conditional and off by default), and that smaller distances are much more likely.

There's natural conflict between making a small decompressor and a fast one. The calls to get a bit four times are just smaller than your version.

The main problem with using a RAM cache to hold data during a VRAM to VRAM copy is that the copies very frequently overlap source and destination - as that's a way to get RLE, you emit one byte and then an n-1 byte copy at offset 1. That's going to be hard to deal with.
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 3758
  • Location: Stockholm, Sweden
Reply with quote
Post Posted: Thu Dec 31, 2015 9:48 am
I read your comments and they're quite clear, what I don't get is where in your (and dwedit's) asm code you're checking that it's time to fetch another byte for the bitstream. I mean, I can't find anywhere the count to 8 that I'm expecting... :|
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14685
  • Location: London
Reply with quote
Post Posted: Thu Dec 31, 2015 10:51 am
Last edited by Maxim on Sun Jan 03, 2016 8:29 am; edited 1 time in total
In dwedit's code, mem.bits holds a single bit which masks the last byte read from the stream, which is itself held in mem.byte. The manipulation happens in ap_getbit. It grabs both the memory items into bc, then rotates the bits part. If that rotates out of bit 0, it sets carry and bit 7. The carry causes it to read the next byte into b. Then it ands then together, masking to the current bit and returning the result in the z flag.

In Metalbrain's version it's much more complicated and took me a while to figure out. The current bit stream is held in a throughout (it swaps to a' when 8-bit maths is needed) and a marker bit is set on the right side, via shifting through the carry flag. This allows a z check after shifting to determine that the bits have run out and then it grabs the next byte, rotates the signalling bit out of carry into a, and the next data bit out for use after the z check. It also optimises part of the process inline to every place a bit is retrieved, with a jump to a location specific handler every time when the bits run out. This is great for performance but ends up costing about 10% more in code size.
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 3758
  • Location: Stockholm, Sweden
Reply with quote
Post Posted: Thu Dec 31, 2015 12:22 pm
Wow. There's still a great deal of stuff I have to learn...

Are you working on a faster implementation for VRAM?
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 3758
  • Location: Stockholm, Sweden
Reply with quote
Post Posted: Thu Dec 31, 2015 1:02 pm
Here's a tentative (and untested) code to handle VRAM reads to buffer and VRAM writes from buffer.
Read is easy: you pass HL as source address and B is the number of bytes to copy to buffer.
Write is slightly more complex: DE is the target address and C is the number of total bytes you want to write to VRAM. B is the number of bytes your buffer contains, so if B<C then you'll start over until needed. Even with a buffer of just a few bytes it will easily be faster than other implementations.

.section "Get Bytes From VRAM" free
; read B bytes from VRAM (HL) into buffer
; IN: HL source VRAM address
; IN: B number of bytes (1 to sizeof(buffer))
; clobbers: AF,BC,HL
GetBytesFromVRAM:
  ld c,$bf
  di
  out (c),l
  out (c),h
  ei
  ld hl,buffer
-:in a,($be)
  ld (hl),a
  inc hl
  djnz -
  ret
.ends

.section "Write Bytes To VRAM" free
; write C bytes from first B bytes from buffer
; IN: DE destination VRAM address
; IN: B number of bytes to output from the buffer (1 to sizeof(buffer))
; IN: C number of total bytes to write (1 to 256)
; clobbers: AF,BC,HL,DE
WriteBytesToVRAM:
  ld a,e
  di
  out ($bf),a
  ld a,d
  or $40
  out ($bf),a
  ei
  ld d,b         ; save B into D
_restart:
  ld hl,buffer
-:ld a,(hl)
  out ($be),a
  dec c
  ret z          ; when C==0 we've done
  inc hl
  djnz -
  ld b,d         ; restore B
  jr _restart
.ends
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14685
  • Location: London
Reply with quote
Post Posted: Thu Dec 31, 2015 2:02 pm
I've been commenting the fast implementation, plus adding in my modifications to make it target VRAM. So far it's broken though, I'm debugging that. The code is very heavily optimised, which makes it more fragile to things like changes to flags and unexpected dependencies on values left in registers that seemed unused.

I suspect the VRAM to VRAM copy is the slowest part, but as mentioned before the behaviour with overlapping source and destination is quite hard to get around, especially when the result is multiplicative. I guess you can buffer min(pointer delta, count, buffer size) bytes at first, but it would be best if you could logically buffer the repeated values and write them into place without ever re-reading them.
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 3758
  • Location: Stockholm, Sweden
Reply with quote
Post Posted: Thu Dec 31, 2015 2:22 pm
Maxim wrote
I guess you can buffer min(pointer delta, count, buffer size) bytes at first, but it would be best if you could logically buffer the repeated values and write them into place without ever re-reading them.


I prefer reading data storing it once, writing it many times if needed. Also, you need less RAM (e.g. you can read 3 and write 250... RLE for free)
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14685
  • Location: London
Reply with quote
Post Posted: Thu Dec 31, 2015 6:56 pm
I fixed my bug - my "VRAM ldir" function was trashing a, which as mentioned above is now an important store of the bitstream. After fixing that, I re-ran my bad benchmark (since the source graphics had changed):

; Compression  Data size  Ratio  Decompressor size  Load time (cycles)  Ratio
; None              9728  100%                  24              161365   100%
; PScompr           8338   86%                  54             1335193   827%
; Sonic 1           5507   57%                 162             1011588   627%
; PSGcompr          5029   52%                 223             1576965   977%
; PuCrunch          4005   41%                 414             3394510  2104%
; aPLib             3946   41%                 304             3552372  2201%
; aPLib-fast        3946   41%                 334             1789523  1109%


This is without any buffering of the VRAM reads, and it's already close to the performance of RLE decompressors. The code is attached, and seems to work correctly for the single small test case I used :) The method to work on is _ldir_vram_to_vram.

Edit: I spent a bit of time on it but it's not looking hopeful. Even if I assume it's safe to over-read (mostly true) and over-write (not really), I still have to handle these issues:

1. Overlap of source and destination -> maximum bufferable amount is smaller
2. Buffer is smaller than the amount to be copied, including in the overlap scenario
3. Amount to be copied is not a multiple of the buffer size

It seems like it's going to come out as a lot of code to handle all this, at which point I'm suspicious of the speed gains.
aplib-z80-fast.asm (11.43 KB)

  View user's profile Send private message Visit poster's website
  • Joined: 08 Dec 2013
  • Posts: 200
Reply with quote
Post Posted: Thu Dec 31, 2015 8:30 pm
Impressive speed increase!

Isn't aPlib proprietary though? (So we can't write our own compressors).
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14685
  • Location: London
Reply with quote
Post Posted: Thu Dec 31, 2015 9:38 pm
The format is documented, there's a Python compressor for example.
  View user's profile Send private message Visit poster's website
  • Joined: 07 Oct 2015
  • Posts: 114
Reply with quote
Post Posted: Fri Jan 01, 2016 8:06 am
That's twice at fast and just 34 bytes bigger? That's what I call an imporvement! Congratulations.
  View user's profile Send private message
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14685
  • Location: London
Reply with quote
Post Posted: Fri Jan 01, 2016 9:15 am
It's not my improvements, they come from Metalbrain and others in the forum linked above. I just integrated my VDP tweaks. Going through and documenting everything helped me to understand how well optimised it is (for example, every jump has been tuned to be jr or jp depending on the most likely value of the condition, with the more likely branch inlined). This means there's not much more to optimise in the algorithm (I gained about 1% by reordering some run length checks, at the cost of 2 bytes), but the VRAM copying is still a major hotspot which would require a fair chunk of code to improve.

As it is, it costs you ~340 bytes for the decompressor, which is a fairly big chunk in a small game (1% of a 32KB game) but the RAM use is pleasantly small (although it hits almost all the registers). You'll almost certainly make that up in the compression though (60% reduction seems typical, so you need less than 1KB of data to break even). The compression will get better as the size increases, though.
  View user's profile Send private message Visit poster's website
  • Joined: 07 Oct 2015
  • Posts: 114
Reply with quote
Post Posted: Fri Jan 01, 2016 1:51 pm
It may be a footprint to consider in small games, but it helped me save more than 2Kb (compressor included) and allowed me to add real graphics for the ending scene and some more variety. I think it's worth the price. I don't know which the average compression ratio for planar SMS tiles would be, but for bitmap SG tiles it's around the 55-70% mark (depending on the tiles, of course - fonts have the best compressoin ratio, colour attributes, if wisely applied, may save tons as well) which is pretty good.
  View user's profile Send private message
  • Joined: 05 Sep 2013
  • Posts: 3758
  • Location: Stockholm, Sweden
Reply with quote
Post Posted: Sat Jan 02, 2016 10:58 am
Maxim wrote
the VRAM copying is still a major hotspot which would require a fair chunk of code to improve.


We could try addressing first some of the cases and see how much it improves: for instance
- if bytestocopy<=sizeof(buffer) and offset>=bytestocopy (which means a simple short copy with no overlap)
- if bytestocopy<=256 and offset<=sizeof(buffer)
you can directly use the code I posted before, as it handles overlaps (will read 'offset' bytes into the buffer once and loop-write them)
We could also change that sample code so to support more than 256 writes on overlap

I believe longer runs of copies are less frequent and we can address these later.
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14685
  • Location: London
Reply with quote
Post Posted: Sat Jan 02, 2016 12:44 pm
We could also make a compressor which limits runs to 256 bytes... but 256b is a lot of memory to reserve for this. Tile loading seems like something you might do while under a lot of memory pressure.

One other thing to consider is the interleaving, most SMS tile compressors deinterleave for better RLE matches so it might help here - but I suspect it'll be a nightmare to reinterleave while decompressing.
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 3758
  • Location: Stockholm, Sweden
Reply with quote
Post Posted: Sat Jan 02, 2016 10:14 pm
256 bytes are surely too much, I imagined that 16 bytes are enough for speeding up a lot and small enough to be feasible. Also when runs are longer than our buffer we could try beaking the calls in smaller segments.
As for plane deinterleaving, this would require another buffer, which is probably something we can't afford.
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14685
  • Location: London
Reply with quote
Post Posted: Sat Jan 02, 2016 10:19 pm
Last edited by Maxim on Sun Jan 03, 2016 12:41 am; edited 1 time in total
It's a pessimisation for compression to limit lengths. The interleaving can be done at decompression time by incrementing by 4 after each byte, and wrapping back to the start, plus making similar changes to the reads. It would be really slow though.

For an emulated game you can use cart RAM to relieve memory pressure, but I guess we want to avoid that sort of non standard hardware - it's a step towards having an on cart decompressor writing directly to the VDP at the maximum possible rate.
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 3758
  • Location: Stockholm, Sweden
Reply with quote
Post Posted: Sat Jan 02, 2016 10:31 pm
We've got an aPLib decompressor that works, now we've got one that works faster. I understand we can make that even faster... but are you thinking about creating a new format/compressor/decompressor that could outperform/compare to aPLib? That would be a really ambitious project, but I'm not sure we really need to go that far...
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14685
  • Location: London
Reply with quote
Post Posted: Sat Jan 02, 2016 10:56 pm
If the Python compressor is up to scratch, it'd be easy to hack on it. There's a few things in aPLib which are complicating factors - the length adjustment is a bit arbitrary, the ordering of the methods could be tuned to what turns up most often to save a few bits - plus the ability to address unlimited ranges is something we might be able to restrict in favour of decompressor simplification. Or we should look further afield to other compressors - although from what I've even reading recently, aPLib is fairly competitive in the 8-bit small-memory space. There's much better coders than me operating in the ZX Spectrum and Amstrad CPC world. I've already hooked in Exomiser v2 via my EXE compressor wrapper and it outperforms aPLib (and has decompressor optimised by the same people who did the aPLib improvements), but it uses a lot of RAM. Maybe there's space for something using a smaller RAM helper, or optimised for use from ROM (as those others may not be).
  View user's profile Send private message Visit poster's website
  • Joined: 05 Sep 2013
  • Posts: 3758
  • Location: Stockholm, Sweden
Reply with quote
Post Posted: Sun Jan 03, 2016 10:33 am
knowking that we will surely decompress less than 16KB anyway we could use this info to slightly simplify the decompressor, and keep both the compressor and the format as it is. After all, your decompressor speed is already comparable with PSGaiden, which usually compresses less, and this means we're easily saving more even if aPLib code is bigger...
  View user's profile Send private message Visit poster's website
  • Joined: 08 Dec 2013
  • Posts: 200
Reply with quote
Post Posted: Mon Jan 04, 2016 5:42 pm
Some observations:

sverx wrote
After all, your decompressor speed is already comparable with PSGaiden


RLE compression very quickly falls to pieces when complexity increases, where as LZ compression can handle complexity that repeats (within the window) very well. Not only is aPlib an improvement over RLE, it will far exceed anything RLE could do as complexity increases.

IMAO: (In My Amateur Opinion)

Maxim wrote
We could also make a compressor which limits runs to 256 bytes... but 256b is a lot of memory to reserve for this


Not to everybody! A variety of compressors (or versions thereof) should exist for a variety of needs. For a project starting out, 256 bytes will be no issue at all. The buffer can also be used for other purposes when not decompressing -- for example, Sonic 1 has about 1.2 KB of unused RAM but even then I know that 256 bytes are used for the scroll cache and decompression could borrow that space.

I would not rule out the option of a large RAM buffer if speed gains could be made where present!

I'm also interested in the possibilities of pre-calculation / ROM-use. Macro assemblers could assemble the decompression routine according to the data itself! Also lookup tables and pre-calculated decompression dictionaries can be provided in the ROM, shifting computation from the Z80 to the PC.

In my usage scenario, the size of the compressor is a low-priority. Sonic 1 has 100+ KB of compressed data; with a level editor, I'm aiming to provide up to 1 MB of ROM; at that point, even spending 16 KB on decompression code/tables is still space saved!

If we as developers want to create an ultimate compression format for the SMS/Z80 then a wide variety of decompression options is the key to common adoption.

My 2p :)
  View user's profile Send private message Visit poster's website
Reply to topic



Back to the top of this page

Back to SMS Power!