Forums

Sega Master System / Mark III / Game Gear
SG-1000 / SC-3000 / SF-7000 / OMV
Home - Forums - Games - Scans - Maps - Cheats - Credits
Music - Videos - Development - Hacks - Translations - Homebrew

View topic - Advantages of languages for asset converters

Reply to topic
Author Message
  • Joined: 05 Dec 2019
  • Posts: 47
  • Location: USA
Reply with quote
Advantages of languages for asset converters
Post Posted: Thu Oct 06, 2022 1:47 pm
I've seen people use Python, C, and Java to convert visual assets and other game-specific data from an editable form to a form usable by an 8-bit game engine. So far, I've been using Python, and I've run into limits on larger projects that pack a 70 megapixel world into a 4 Mbit cartridge.

Interpreter slowness
Compiled languages and JIT-compiled languages can run bit packing code faster than a pure interpreter such as CPython. Writing a background map converter in a language other than Python might help reduce a game's build time when converting dozens of large PNG images (e.g. 2048x2048 pixels, roughly one Sonic act) to tile data as the first step of compressing a background map for use in a game engine. I use Make as a build director to convert only those visual assets that need it, based on the modification dates of their inputs. This helps for incremental builds but not when an update (such as a Git pull) means most of the background maps need to be reconverted.

Associative array
The problem I have had with C is that it doesn't come with an associative array data structure (like C++ std::unordered_map or Python dict or Java java.util.HashMap) in the standard library. An associative array or map data structure maps keys, such as strings, to values of some sort. This is useful for deduplicating tiles in a level map converter or looking up symbols by name in a symbol table in an assembler or compiler. A dynamic array (like Python list or Java java.util.ArrayList) was easy enough to write as a wrapper around realloc(), but lack of any standard map container makes C a bit less approachable. POSIX has hsearch() in <search.h> whose implementation I could gank from musl, but it's restricted in other ways such as not being able to include byte 0x00 in a key.

Process startup time
Java is JIT compiled and has a HashMap. However, it has a relatively long startup time for each process. I suspect that VM startup time may cause problems when I'm invoking a lot of short Java programs, called by Make once for each of 200 levels and 60 enemy types. Does this pose a problem in practice? Is there a way to make a pool of virtual machines, one per core, each waiting to run a program? Or if I'm using Java, would be do I need to switch to a build director other than Make to convert the visual assets?

Runtime installation
In addition, it hasn't always been trivial to deploy Python on remote collaborators' Windows PCs. Some people suggest using PyInstaller to compile each Python program, and I've found three drawbacks to PyInstaller. First, PyInstaller doesn't cross-compile, meaning one needs to run PyInstaller in Python for Windows rather than Python for Linux. (I use Xubuntu, a distribution of GNU/Linux, for its faster I/O than Windows.) Second, programs compiled using PyInstaller's one-file option have had documented permissions problems when running in Wine. (I use Wine to test Windows builds of tools under Linux to make sure they behave the same way as the Linux build of the same tool without having to reboot all the time.) Third, programs compiled using PyInstaller are big. Hello World alone compiles to a 9 megabyte executable, which is already bigger than a Discord user without Nitro is allowed to send in a private message. Fourth, programs compiled using PyInstaller appear to take quite a while to start up; I suspect PyInstaller is more intended for long-running background processes or user-interactive programs than for a CLI program that does something for half a second, writes its output file, and ends.

Is there another language that offers fast execution for short and long programs and some of the fundamental data structures that C lacks? Or a commonly used library to give C some of the modern conveniences of newer languages' standard libraries?
  View user's profile Send private message Visit poster's website
  • Joined: 05 Dec 2019
  • Posts: 47
  • Location: USA
Reply with quote
Post Posted: Thu Oct 06, 2022 2:16 pm
For the sake of completeness:

Deduplication and symbol table lookup are possible without a hash table or search tree. Doing so is O(n²), meaning that if there are twice as many symbols in a set, it'll take roughly four times as long to add them all.
  View user's profile Send private message Visit poster's website
  • Joined: 06 Mar 2022
  • Posts: 216
  • Location: London, UK
Reply with quote
Post Posted: Thu Oct 06, 2022 2:50 pm
Last edited by willbritton on Thu Oct 06, 2022 2:55 pm; edited 2 times in total
Great post @PinoBatch!

I started writing my utilities in Python as I thought it was probably the most accessible (one of my drivers is being able to share code examples with people who might be beginners) but to be honest I really dislike the language.
That said, one might have assumed it should be able to deal relatively performantly with data at scale, given that it's still by far and away the most popular language of choice for commercial data science and spilling over into data engineering. Some other languages in that space which have been touted as more performant are things like Julia and I believe Elm.

Personally on my long list I plan to experiment with rewriting my utilities in two platforms, to see which one I prefer:

1) Go, which I think is a really nice mixture of relatively simple language paradigms, nice modern package management and extensive standard library as well as community, and my understanding is that the performance is as close as it gets to bare bones for what is a garbage collected environment.

2) Typescript, running on Deno. Controversial perhaps, but despite JavaScript's many flaws, I do think TypeScript makes what is still a scripting language very close to feeling like something very strong. My understanding is that modern JavaScript execution engines do a pretty damn good job of being performant and I believe that Deno represents a significant improvement in terms of tooling and package management over Node. I've never used it but I would like to try it and I think for command line utilities it might work well. I also think it might be a good environment for beginners to use but this is just supposition at this point.
  View user's profile Send private message Visit poster's website
  • Joined: 06 Mar 2022
  • Posts: 216
  • Location: London, UK
Reply with quote
Post Posted: Thu Oct 06, 2022 2:53 pm
P.S. as for the installation problem you mentioned, I'm running all my tools for my RetCon project in a Docker container that I'm building up. It's got all the assemblers and stuff on it, useful utilities, as well as the open source tool chain needed to develop the hardware. I've never tried docker on Windows but for Linux / MacOS it makes running up a comprehensive environment as simple as a single command line and I reckon it's for sure the way to go for my purposes at least.
  View user's profile Send private message Visit poster's website
  • Joined: 09 Jan 2012
  • Posts: 61
  • Location: Germany
Reply with quote
Post Posted: Thu Oct 06, 2022 5:54 pm
CPython is currently much slower than other interpreters. With version 3.11 you will see the first performance boost, later versions will hopefully use JIT technology.
In some use cases PyPy is a faster alternative but you cannot combine this with Pyinstaller.
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 08 Jul 2001
  • Posts: 8507
  • Location: Paris, France
Reply with quote
Post Posted: Thu Oct 06, 2022 7:15 pm
If you are happy with C I would suggest using bare-minimum C++ aka take advantage of the availability of std::vector, std::unordered_map and a few helpers here and there, while not pushing it toward modern C++ (which is a lot of counterproductive ideology IMHO).

Another good solution is C# which is pretty fast and which unlike C/C++ has good standard libraries.
  View user's profile Send private message Visit poster's website
  • Site Admin
  • Joined: 19 Oct 1999
  • Posts: 14225
  • Location: London
Reply with quote
Post Posted: Thu Oct 06, 2022 7:36 pm
I have the tools I’ve written myself - bmp2tile is in C#, but leaning on C DLLs for the compression libraries, which are often C first. But for bespoke tooling for a project I spent some time to migrate the PS retranslation tools from questionable C to first modern-ish C++ and then to Python. The cost of the interpreter is tiny, and the code is much more expressive when it can make use of collections and language features to operate on them. It’s also made it much easier to run the build on a CI platform - although I’ve yet to make such a build work cross-platform.

Since I want the tools to be open source, and not check binaries into Git, I found that having C or C++ as the language meant the build had to pay the cost of time to build the tools which is generally much slower than the time saved at execution compared to an interpreter. It’s also rather harder (on Windows) to assume the user has a C/C++ compiler.

Python is generally present on Windows and Linux machines, and in AppVeyor and GitHub Actions. It “just works” better than any other language I can think of in that respect.

Moving to makefiles, and learning to make the best makefiles possible, has been a much bigger time save. Being confident that any one edit to an asset or script will trigger a build of just what has changed, and also that make -j will do the right thing, gives you a lot of confidence. It’s just a shame that nobody has managed to make a better version of make, and that I end up having to download my own tools binary package in order to run it. https://github.com/maxim-zhao/sms-build-tools
  View user's profile Send private message Visit poster's website
  • Joined: 06 Mar 2022
  • Posts: 216
  • Location: London, UK
Reply with quote
Post Posted: Thu Oct 06, 2022 10:49 pm
Maxim wrote
It’s just a shame that nobody has managed to make a better version of make

+1 on that! I've used stuff like ant and (shudder) msbuild professionally but they are very tied to their respective frameworks and make is basically just a command line orchestrator but boy is it hard work to get rolling with.
  View user's profile Send private message Visit poster's website
  • Joined: 25 Feb 2013
  • Posts: 353
  • Location: Osaka
Reply with quote
Post Posted: Fri Oct 07, 2022 12:06 am
Make is the way to go. Simple and trusty. Python is not that slow if you avoid using loops as much as possible and heavily use numpy. In practice, this means having the logic/interface in python and computationally intensive stuff which was not written in python. I like to use pybind11 to implement in C++ what needs to be fast, and do the housekeeping stuff in python. For things you want to run fast, as written above, c++ is a good choice. Portable, modern, does not tie you to a particular compiler vendor.
  View user's profile Send private message
  • Joined: 05 Sep 2013
  • Posts: 3262
  • Location: Torino, then London, now Stockholm
Reply with quote
Post Posted: Fri Oct 07, 2022 8:30 am
PinoBatch wrote
The problem I have had with C is that it doesn't come with an associative array data structure (like C++ std::unordered_map or Python dict or Java java.util.HashMap) in the standard library.


I never needed to look at this myself but I wonder if there are no available libraries that implement this feature?
  View user's profile Send private message Visit poster's website
  • Joined: 07 Jul 2021
  • Posts: 7
Reply with quote
Post Posted: Sat Oct 08, 2022 7:42 am
PinoBatch wrote
Process startup time
Java is JIT compiled and has a HashMap. However, it has a relatively long startup time for each process. I suspect that VM startup time may cause problems when I'm invoking a lot of short Java programs, called by Make once for each of 200 levels and 60 enemy types. Does this pose a problem in practice? Is there a way to make a pool of virtual machines, one per core, each waiting to run a program? Or if I'm using Java, would be do I need to switch to a build director other than Make to convert the visual assets?


Using short Java programs may pose a problem if you are actually invoking them 260 times every build. But if you are using Make as a build director, they will only get invoked when you change an asset, so it is not really a problem.
Using a modern Java VM also helps, as performance is better than in the old times.
Worst case scenario (you need to invoke the program for every asset, every time), you can "easily" modify the main method of your short Java program to process every file within a folder to minimize VM startup time. Alternatively, you can try to do ahead-of-time compilation ( GraalVM Native Image), but it sounds a little bit overkill. My advice would be to check first if VM startup time is really a problem within your build, then look for solutions.
  View user's profile Send private message Visit poster's website
  • Joined: 09 Mar 2021
  • Posts: 4
Reply with quote
Post Posted: Sun Oct 09, 2022 4:40 am
PinoBatch wrote
I've seen people use Python, C, and Java to convert visual assets and other game-specific data from an editable form to a form usable by an 8-bit game engine. So far, I've been using Python, and I've run into limits on larger projects that pack a 70 megapixel world into a 4 Mbit cartridge.

Interpreter slowness
Compiled languages and JIT-compiled languages can run bit packing code faster than a pure interpreter such as CPython. Writing a background map converter in a language other than Python might help reduce a game's build time when converting dozens of large PNG images (e.g. 2048x2048 pixels, roughly one Sonic act) to tile data as the first step of compressing a background map for use in a game engine. I use Make as a build director to convert only those visual assets that need it, based on the modification dates of their inputs. This helps for incremental builds but not when an update (such as a Git pull) means most of the background maps need to be reconverted.


I've been using python for all these types of tasks in recent years, and yeah sometimes things have to be done in pure python rather than numpy/etc - that IS annoying *if* performance is a major factor for that section of the code.

I find it odd with the issue you bring up about leveraging compiling/converting on updated files only ( i.e. with make) and that it is slow otherwise. How often do you really need to build everything from scratch? That seems like such a limited use case. What time are we talking here? 5, 10, 20, 60 minutes to rebuild everything from scratch? Are you obsessively doing 'make clean' every time or something? I'm not really seeing the issue here.

Quote
converting dozens of large PNG images (e.g. 2048x2048 pixels, roughly one Sonic act)

Why are your levels stored as large images?? Why not store them as map data and tile sets?

Quote

Associative array
The problem I have had with C is that it doesn't come with an associative array data structure (like C++ std::unordered_map or Python dict or Java java.util.HashMap) in the standard library. An associative array or map data structure maps keys, such as strings, to values of some sort. This is useful for deduplicating tiles in a level map converter or looking up symbols by name in a symbol table in an assembler or compiler. A dynamic array (like Python list or Java java.util.ArrayList) was easy enough to write as a wrapper around realloc(), but lack of any standard map container makes C a bit less approachable. POSIX has hsearch() in <search.h> whose implementation I could gank from musl, but it's restricted in other ways such as not being able to include byte 0x00 in a key.


You can make this in C yourself. It's close to trivial if you have experience creating data structures from scratch. But why is C an option but not C++?


I've switched to using python for a few reasons: portability, ease of use, incredible library, and tkinter is built in. Portability is the most important reason, else I'd probably just go back to using C++. Spinning up a GUI for a quick app is a close second to "ease of use". I've recently written DMF and VGM converters in python because it was just soo much easier to write than in C/C++.



It seems like you're looking for similar attributes; portability (distribution) and ease of use (coding/data structures/etc). I honestly don't think you're going to find a perfect solution. On paper, Java seems like the most obvious choice, but I find in practice that Python is closer to that goal. Python does have Numba with a JIT and decorates you can use. It's only a matter of time before that becomes standard. Python is only going to get better simply because the library is just amazing and insane compared to anything else. I remember reading that 70% of the scientific/research community uses python over matlab and R.


The whole Discord issue.. I just ended up doing the Nitro thing. It was more annoying to find an alternative solution than it is to pay a tiny monthly fee.
  View user's profile Send private message
  • Joined: 08 Dec 2005
  • Posts: 488
  • Location: Melbourne, Australia
Reply with quote
Post Posted: Mon Oct 24, 2022 5:09 am
I've thought about this a lot. Personally, I find Python the easiest language to write these kinds of tools in, but I've concluded that its performance isn't good enough and its distribution model makes it unsuitable for code that needs to run on other people's machines. A language that can compile to a standalone binary seems much more appropriate, and I narrowed it down to two options:

  1. Go. It's not as easy to write as Python, but is a simple language with good performance and a fairly comprehensive standard library. Downsides include the language being controlled by Google, and that it's hard to mix Go code with code written in other languages.

  2. C++. Specifically, following the Google Style Guide, which is a popular choice and permits language features similar to those used in professional game dev (for example, it disallows exceptions). Even with the style guide's restrictions though, C++ is still an extremely complex language - is that complexity worth dealing with when I don't need the absolute maximum performance?
  View user's profile Send private message Visit poster's website
Reply to topic



Back to the top of this page

Back to SMS Power!