Porting Cave Story to a Sun Ultra 1 Workstation

For the unaware, Cave Story is a platforming Metroidvania released by Daisuke Amaya in 2004. Besides being a fun game, it has several qualities that made porting it to a Sun Ultra 1 seem like a good idea. It’s a strictly 2D game, it can natively run at low resolutions (down to 320x240), there’s a high quality decompilation availble in the form of CSE2, and it came out in 2004 optimized for low end machines of the era. Surely a professional workstation machine from 1995 should be able to run it with ease.

Initial work

To avoid boxing myself in with regards to optimization, I decided to forgo the very convenient backends offered by CSE2’s portable branch. One of the two SDL backends may have seemed like a natural choice since at least one of them does ostensibly support old Solaris, but my ability to optimize them and fix things inside goes down dramatically when I don’t already understand the internals of the library, for obvious reasons. I also felt that despite my vague and uneducated dislike of X11, it seemed like something I should know how to program for, just as a general knowledge kind of thing. So I got started writing a raw X11 backend for CSE2.

Adding a new backend isn’t particularly hard, the backend system is quite well made and makes additions easy. The backends themselves are also relatively simple, and so creating a new one isn’t all that much work, assuming you start from the right place. The backend system is structured like this:

  • Renderer
    • (various renderers including Software)
    • Window
      • (…)
      • Software
        • (various backends for displaying the software renderer’s pictures)
  • Audio
    • (various audio mixers including SoftwareMixer)
    • SoftwareMixer
      • (various backends for playing back SoftwareMixer’s samples)
  • Platform
    • (various backends for setting up and dealing with platform specifics)
  • Controller
    • (various backe—who cares, I’m not hooking up a gamepad to a Sun workstation)

These form a fairly simple interface to the game, which was originally built around DirectX. Given that I have zero faith in this mid-90s workstation’s 3D-acceleration capabilities, I went with the software renderer. It fairly straightforwardly takes in graphics from the game via the RenderBackend_UploadSurface function and stores it away, and has RenderBackend_Blit draw it to the screen one byte at a time. Transparency exists in the form of completely black pixels in the texture, and handling transparency is enabled via a boolean parameter to the blit function. When in the non-transparent mode, lines are drawn via memcpy rather than a byte-by-byte loop. The renderer also does some less interesting things like drawing solid colors and small bitmap fonts.

The first part of the backend I actually touched was the “window” part, which provides the rendering system with a framebuffer and opens a window for it on platforms where that’s appropriate. The other backend part I had to touch upon getting started was “platform”. A platform backend’s main jobs are taking keyboard input and keeping track of time. It’s also a convenient place to do setup prior to actually opening a window.

Let’s go over a few of the functions a window backend needs to provide:

  • WindowBackend_Software_CreateWindow takes in a title and window dimensions and makes a window exist somewhere.
  • WindowBackend_Software_GetFramebuffer takes a pointer to a size_t, sets it to the framebuffer’s pitch, then returns the framebuffer.
  • WindowBackend_Software_Display is called when the renderer is done drawing and the picture needs to go on the screen.

Apart from confusion caused by X11’s lack of appropriate documentation (at least appropriate to my attention span) it was quite easy to open up a window. You open the default display, you ask it for its default screen, and you use the display and screen in a call to XCreateSimpleWindow. Well, no, you’re not done yet. You also have to map the window, which means telling the X server to display it. And one more thing, you have to do some magic with "WM_DELETE_WINDOW" and XSetWMProtocols before the window actually shows up, though this might only be on modern systems? I initially developed the X11 backend on my modern Linux computer since accessing the target machine required a 43 SEK transit ticket or a one hour walk.

Providing a framebuffer required first allocating an appropriately sized buffer that the game can draw into, then another one that X can use to draw on the screen. These are different pixel formats. Cave Story likes to work with pixels in the RGB format, whereas X prefers BGRX, where X is discarded padding. I converted between these in the WindowBackend_Software_Display function. This was done rather poorly and inefficiently. Finally, we can XPutImage the image to the window and watch it appear. An XImage is backed by an in-memory framebuffer, so drawing to it is easy, we just convert the software renderer’s framebuffer to X’s format with the renderer’s framebuffer as the source and the XImage’s buffer as the destination, and then just XPutImage.

At this point I was working on a modern Linux computer since I don’t always have access to our target machine, and the target machine also struggles with Git for reasons that aren’t easy to solve.

But on this modern Linux machine, this code seemed to work quite well. The game ran in its (very) little window at the right speed, which strangely enough is 50 FPS. Yes. This Japanese PC game from 2004 is designed to run at 50 FPS. No, I did not make a mistake, you did not read that wrong. Strange framerate aside, nothing was wrong and I was feeling quite proud of myself. Then came the real hardware testing.

The first test

Getting the game running on the actual Sun Ultra was a challenge in and of itself. Because it felt like it would take forever, and because I was still afraid of the file system collapsing and/or being completely filled up again (be thankful you’re not using Solaris 2.6’s UFS), I started by not actually running it on the Sun. Instead, I used X for what it was ostensibly designed for. I sent the game’s graphics over the network like an insane person.

After a little wrestling with X’s authorization magic, we were able to get the old classic xeyes running on my laptop with its window on the Sun, and then we tried Cave Story. Unfortunately the image was extremely wrong. There was movement, but no recognizable shapes.

After some thinking and xdpyinfoing we found out that the computer was running in an 8-bit color mode which obviously didn’t match the backend’s pixel format of four bytes per pixel. So, we told the game to use what’s called an XVisual with 24-bit color and that gave us a crash. Turns out that Sun’s version of X11 can’t just switch video modes when it’s needed, and runs in the least flexible mode by default. So after some reading that was not at all fun, I figured out how to X make start with a different color mode. There’s instructions for doing that in the appendix and Git repo.

Starting the game now, it… works, but the colors are wrong. In fact, in another version it seemed like two of the colors were supplied to the wrong channel and one was dropped. It seemed that way because it was true. As hinted at before, endianess strikes when you really should expect it but sometimes don’t. In this case I thought the color order refered to the order of bytes in memory, not the order in whole words. Oh well, with a quick change we can reverse the byte order. Apparently that was still wrong, the pixel format should actually be XBGR… which is not BGRX reversed. Hmm… and lo and behold, it works… slowly. The picture does show up, but pushing all these frames through the puny 10BASE-T Ethernet connection this machine most likely has is not pretty.

After copying over the source tree to the machine, I realized I would need some way to build it. Sun did not ship CMake, that’s for sure. Unfortunately the latest CMake binary we could find for the Sun was a whole major version and a bit behind where it needed to be. So instead, I mocked up a quick script in my language of choice (which I will not be telling you about for fear of coming across as a preacher) which generated a long but simple shell script that just ran g++ over and over on just the files I needed. This was eventually replaced with a partially hand-written Makefile.

So when I first compiled there were some minor problems with the one mandatory dependency, stb_image. The first problem was that apparently stdint.h, which provides sane and consistent names for integer data types, does not exist on the Sun. Evidently this was a newer addition to C. Luckily, faking it was an option.

#pragma once

typedef unsigned short uint16_t;
typedef short int16_t;
typedef unsigned int uint32_t;
typedef int int32_t;

That’s it, that fixed that. The second problem was that stb_image was trying to use thread-local variables, which Solaris 2.6 and/or this version of GCC apparently doesn’t support. This was fixed by adding #define STBI_NO_THREAD_LOCALS right before the point where stb_image.h is included. This was enough to get the game to compile. At this point, I was expecting to see the game run at full speed, or at least pretty close, instead I got– wait…

ld.so.1: ./CSE2: fatal: libstdc++.so.6: open failed: No such file or directory

Oh, right… LD_LIBRARY_PATH=/usr/tgcware/lib was necessary so that the dynamic linker can find our C++ standard library. I bet there’s a better way, but this was deemed good enough. Now! I expected a pretty good speed, instead I got… roughly 30 FPS, sometimes higher, usually a bit lower. Like 28.

Now that’s not unworkable, there’s several points we could improve upon. But this is not playable. And this was with GCC’s -O2 optimization option turned on. -O3 was later tested but I chose not to trust its stability and stuck to -O2.

Initial Optimization

The first idea was to skip performing the full screen format conversion by modifying the software renderer to work directly in X’s prefered color format. This required quite a few changes, since the software renderer draws pixels in quite a few places. First up the framebuffer size and pitch given to the software renderer by the window backend needed to be adjusted to fit the extra dead byte per pixel. Then the several drawing functions were hacked to simply do an extra byte. RenderBackend_Blit’s transparency path got a pair of extra increments per pixel and its opaque path had its straight line memcpy adjusted.

I also altered the RenderBackend_UploadSurface function to convert its input to the 4-byte pixel format so that during the actual rendering process we’ll never have to touch the 3-byte format. The full screen conversion code was removed, and the new version tested. It got… roughly 30 FPS. Sometimes lower, usually a bit higher. Like 32.

Another flaw with this version was that it more or less reverted the color fix, since I was now working on my laptop again and hadn’t bothered to add any kind of endianess #ifdefs.

XShm: Making X programs realize it’s not 1987

The X Window System was originally designed to be able to transparently run on a different machine than the program in need of graphics. This has a great deal of benefits, but graphical performance is not one. As it turns out, we’ve been sending our whole framebuffer over a socket every single frame. After friend Hugo sent me a PDF containing vague information about various Sun video cards, I discovered that our video card had deprecated something called “Direct Xlib”. Even disregarding the name sounding a little familiar, this caused me to cease up with excitement. As my eyes flew across the screen, I could feel all 50 frames per second making themselves viable.

“Direct Xlib is not supported on the Creator accelerator. The X shared memory transport feature (new in Solaris 2.5) should be used instead.”

A quick search later, I had the knowledge to make what was to become the most important optimization making full speed gameplay viable.

The X Shared Memory Extension (or XShm for short) allows the X server and client to share memory buffers, which can be used to hold, say, an image. The extension is not entirely X’s creation, but instead works with part of the UNIX System V standard (check out man shmget). Using this functionality, we don’t need to send the image across the network, both processes just already have it. While writing the code that would use this, I discovered that Solaris 2.6 has a very low default limit on how big a shared memory buffer can get. Luckily, although this was slightly smaller than a 640x480 framebuffer, it was significantly larger than two 320x240 framebuffers, so there’s room for our framebuffer and plenty more to spare.

The actual process of setting up the shared memory buffers is a little clunky and involves some very UNIX function names, shmget and shmat, for getting an ID and process-local address respectively. We then have to bind these shared buffers to images. There were some problems initially, but eventually I realized that the resolution was set too high on the Sun. I had apparently been testing with a 640x480 config file from my laptop. I really should have been checking errno, but in my defense, checking errno correctly and thoroughly is annoying.

So, how’s the performance? Roughly 50 FPS! Sometimes lower, usually locked at 50. Which if you remember, is the correct but very odd framerate. Occasionally the framerate counter would show 51 for a second, sometimes it would show 47 during heavy scenes. It’s not the ideal way to play, but it certainly runs!

Although… occasionally the screen would flash, showing an unfinished frame underneath. As it turns out, the X server draws directly from the buffer its given, exactly as promised. But that’s a problem when the game starts drawing the next frame before the X server has finished putting the last one up. This could be solved by having double buffering, since the renderer does provide the window backend a means of switching out the buffer every frame, but at this point I felt like that could be solved later, and moved on.

Beginning the great odyssey of hearing things

Now that the game played more or less fine I turned my attention to sound. I really wanted the sound to work since Cave Story can’t really be called a complete experience without its excellent soundtrack. Sound on this era of Solaris was very simple, but also quite unpleasant. Instead of a proper API that can invoke a callback to fetch samples, it’s the program’s job to feed the audio buffer via a device file, /dev/audio. The game is not particularly friendly to this approach. It expects, quite reasonably, that knowing when to produce more sound is someone else’s problem, and simply provides the audio backend with a means of getting what it needs. Unfortunately, by failing to provide a sufficient audio API, Sun has made this my problem.

At first it sounds difficult to reconcile the needs of a dumb pipe with the needs of the game, but luckily it’s not as bare an interface as I may first have made it sound. There is some limited metadata which can be fetched. ioctl, everybody’s favorite system call*, can be called with a magic constant to get and set some basic information about the audio system’s state via a struct. Things like, “how many samples have you received?”, “any errors?”, and “how many zero-length buffers (end-of-file records) have you processed?”.

Now I’ll admit, an “end-of-file record” was not something I had heard of before. A quick internet search later I felt quite validated, since the internet seemingly hadn’t heard of it either outside of outdated versions of FORTRAN (yes, there are allegedly non-outdated versions of FORTRAN), and they didn’t seem to match the description of “zero-length buffer”. Naturally, I completely ignored this and assumed that things would work out fine anyway.

The first idea I had for structuring the audio backend was to have a secondary thread for audio and have it go as hard as it could. It would check how many of those end-of-file records the audio system had processed, wait until that number increased, store the new number, feed /dev/audio another mouthful of audio, and delay using the usleep function for as long as it seemed like it would take to play that audio. Then it would repeat, see if it had played it, etc.

Things did not work out fine

The speaker remained dead silent. I tried quite a few things, and got plenty of suggestions from others on what to try, but couldn’t get the game to give it any sound at all. I assumed at this point that threading problems might be at fault, so I decided to take care of that.

To ensure there wouldn’t be race conditions with the main game thread, I made the audio thread pause the game thread while performing audio work. This makes sense to do since the normal expectation is that the game thread itself will be interrupted to fetch audio.

I tried implementing something like this and had bad results. And by “bad results” I mean that it crashed the whole process when the game thread received its pause signal. Even though I properly set up a handler for it. At this point I returned to optimizing the graphics. That felt especially pressing since my non-working audio tests had slowed the game down considerably, specifically due to the audio rendering parts. The screen flashing was also getting pretty bad.

Drawing completely transparent empty tiles considered harmful

I was sitting at work (internship but who’s really counting) with no tasks lined up, and I decided I might as well optimize the rendering code (this was many a moon ago, I no longer work there). I knew from looking at the sound code that sound would chew through CPU time like a dog through an expensive leather couch you’re holding onto for your rich uncle, so I felt I’d better prepare for that. I had recently watched an old marketing video for the Symbolics Joshua expert system, which was the big thing in 80s AI research/hype (it is no longer considered AI because we understand its limits and the hype died). At one point, the host Steve Rowley says:

“Whenever a program attempts a large number of anything and only two percent work, that should wave a little red flag in your head that says some pretty serious efficiency gains can be had.”

Now, this should be obvious to anyone who isn’t a prompt engineer, and admittedly the video was made to be somewhat boss-friendly. This was certainly obvious to me at least. But him saying it out loud had my brain automatically trying to find an example of where I had personally seen this, and I immediately went to CSE2’s software renderer. Specifically, the RenderBackend_Blit function. As it turns out, the only time the opaque path was ever taken was when drawing stage backgrounds. This is despite the fact that the vast majority of tiles on the screen are opaque. I had discovered this earlier when testing how performance would be if no transparent tiles were drawn at all, and only the background showed up. Now, I had an idea for how to optimize this.

Instead of assuming every tile is transparent, I could first check which tiles in a given texture need transparency and store the results. I quickly wrote a function to dump out uploaded textures to check my assumptions, and learned two things.

  1. Tiles are 16x16 pixels large. Further testing showed that larger objects are in fact drawn as multiple tiles, not big blocks.
  2. The top-left tile of a level graphics set is always completely black (i.e. transparent).

Just to quickly sanity check that second one, I added a quick bypass to the blit function that just skips rendering the top left tile for the level graphics texture. It made no visible difference except to performance, which improved significantly (by this point I was testing on my laptop with the framelimiter hacked out to see just how much of a margin I had). It turns out that for every tile on the screen that doesn’t have any visible tile, the game draws a completely empty tile. This may have worked fine with DirectX, but it made quite a bit of trouble for the software renderer.

Considering how much of the game is drawn as 16x16 tiles, I decided that this optimization was only needed for those, and got to work. RenderBackend_UploadSurface got modified to compute a set of “cheap rectangles” for each upload. This goes through each 16x16 tile in the texture and checks if the tile contains any completely black pixels. If it does not, its starting X and Y texture coordinates (combined into one 32-bit int) are stored into the surface’s “cheap rectangles” set. RenderBackend_Blit got a piece of code that checks if we’re drawing a 16x16 tile, and if we are, disables transparency if the tile is found in the cheap rectangles set. I kept the little snippet that skips the always empty tile immediately.

This change massively increased the game’s performance and gave an approximate 30 frames per second margin on the Sun Ultra. We had 30 extra frames per second to work with. I was extremely pleased with this and felt kind of silly for not implementing it earlier.

Making machines scream

Now that the game played nearly perfectly, save for the lack of audio and the occasional screen flashing that the lack of double buffering still caused, I finally got started on a serious attempt at doing the audio. I immediately started discovering my past mistakes. First, I was using the data fetched from /dev/audio via ioctl completely wrong, from having either misread or misremembered the how the structs looked.

Second, I realized that we weren’t getting into the main part that actually wrote out the sound, so I had a look and indeed the “end-of-file counter” was never being incremented. In the wait for a better solution, I commented out the code that waits for that condition.

Third, I realized that thread pausing just wasn’t gonna work here, and removed it. Fourth, I discovered that for unknown reasons that still elude me as of writing, using open and write on /dev/audio just didn’t work right, at all. Nothing went out except settings via ioctl. Switching to fopen and fputc somehow made the speaker come to life. I still had to set the ioctl settings, and I didn’t yet know about fileno (which can get the file descriptor number out of a FILE*), so I first opened /dev/audio with open, set the settings, then closed it. This will turn out to have consequences.

But at this point I was just happy to hear some audio come out of the speaker. It didn’t sound any good at all, it wasn’t even recognizable as Cave Story’s music, but it did make the all eardrums in the room move.

Making machines cough

Several problems were encountered, most of which very minor and due to a brief slip of the mind, so this section will only recap the interesting bits (it’s totally not because I have no Git history for this part because I only committed the working code).

The software audio mixer that’s being used generates its samples as signed 16-bit samples stored in longs, which I originally didn’t know how to handle correctly. A lot of back and forth there had the speakers screaming like a cat that hasn’t been fed in 45 minutes, even when it sounded kind of okay the sound was constantly clipping. This problem combined with the mess that was dealing with outputting a stereo signal to a mono output. Things were very wrong. Eventually, after being prodded by others in the room, I simply copied how the other backends handled the sample conversion, and that fixed the peaking problems. The trick was the way the samples were clipped. I’ve forgotten what the wrong way to do it was, but I was doing it. As for the stereo/mono problem, I eventually just turned on stereo output and that fixed that problem too… sort of.

With this combination of settings, stereo, 8-bit depth, 8KHz sampling rate, etc., the sound was coming out all wrong. I soon discovered (but didn’t yet quite internalize) that despite being set to 8-bit mode, and indeed outputting 8-bit sound on the speaker, the audio device wanted 16-bit samples in. So I was essentially skipping over every other sample when trying to output 8-bit sound in the way that made sense. So I changed the code to output 16-bit samples, but for some reason left the settings intact. That worked I guess.

What followed this was a lot of aimless flailing around trying to get the audio to actually work for more than the title screen music. Setting the delay between sound buffers to something lower than the playback rate would have the title screen music sound perfect, but got completely out of sync with the actual game. After a while, presumably the sound chip’s buffer would fill up and crash, or maybe start rejecting samples, though I’m not sure I ever saw this happen. Setting the delay too low would have it struggling to keep up with the game, constantly coughing and sputtering in between bursts of sound. Setting it to what was supposed to be the exact rate obviously didn’t work either, since the computer can’t be that precise.

Among the attempted solutions was one where the sound thread would write out some samples, and then wait for a buffer underrun by watching the audio system’s error counter. This didn’t work, and even if it had it wouldn’t have been any good. Eventually, I decided to go look for inspiration. I started reading the code for some other audio libraries, such as SDL’s audio subsystem. Audio library authors are mortal humans (or so they claim), so this had a decent chance of succeeding.

After my reading (more like quick glances at function names to be honest), I decided that what I should do is have a function that waits until some number of samples have been consumed. With that I could just blast out some audio, wait for some amount’s worth of samples to be consumed, then blast out some more. I hacked this up using the sound system’s samples counter, where it would have some amount of samples it waited to have played before continuing.

This didn’t quite work. The concept seemed sound (pun intended only retroactively), but it would act like there was no speed limit, blasting out audio as fast as it could and getting desynced from the game. But this was still very close to correct, since the concept is obviously proven to work in other contexts, and the audio did sound right.

Making machines sing

So why did checking the number of samples played not work? It’s because there’s no “samples played” counter. That’s right, the “samples counter” counts the number of samples sent, not played. It’s completely useless for playback. I would have to figure out that “end-of-file counter”. Luckily, it turned out to actually be quite simple. After switching to fopen and getting the file descriptor integer from that, I tried doing a write again, just like before, with NULL as the source buffer and 0 for the length, plus a fflush afterwards, just to be sure. And would you believe it? It worked.

I overhauled the code a bit to make way for this feature that now just sorta worked. A counter for the number of sample buffers written out is added. The soundWait function waits until at least one buffer more has been played back since when it was called. It’s also given a return value, the number of sound buffers that have been played back so far. The sample output function is changed to write an empty buffer using write after every buffer of samples, as well as increment the total written buffers counter.

For the main sound loop, we wait until a buffer has been played and save how many buffers have been played back. Then we make and write out new sound buffers until the amount written is at least three buffers ahead of the amount already played. Loop forever.

This way, if the sound output is lagging behind, it has a safe margin of error. Simply make and write more sound buffers until it’s caught up to and ahead of playback. This worked almost perfectly. Occasionally there were still buffer underruns, but they were mostly tolerable.

Making machin—Alarm Clock (program exited)

When I say it worked, I mean it worked for usually a few minutes (it varied) until the process suddenly died, leaving one lone message on its terminal: Alarm Clock. This had started happening earlier but I had dismissed it as probably being a sound buffer overrun crash. At this point that was obviously not the case. Some searching lead to me finding out that this is the message presented when a program fails to handle a SIGALRM signal. I was fairly certain that nothing should be sending that signal, so into GDB I went.

It took some doing to get GDB to stop on SIGALRM. By default it will pass SIGALRM through to the program unless there’s no handler, in which case it will silently absorb it. After setting it to stop, I discovered something nobody wants to have to deal with. A bug in libc.

usleep was used to implement Backend_Delay, a function which the game uses to pass the time between frames and which my audio code uses to wait for audio to play. And in my backtrace I now saw that the SIGALRM that was crashing the game was coming from inside usleep. This discovery made me not very happy. Luckily, Hugo was there to dig up a copy of the source code to Solaris 2.6’s libc and managed to find the source to usleep. And right there, in plain C, was the bug, as clear and obvious as it could possibly be. A textbook race condition.

On Solaris (which seems to have derived its libc from BSD’s), usleep is implemented using the UNIX timer and sigsuspend. The timer counts down from a set time and when it’s finished it sends the calling process a SIGALRM. sigsuspend will, as the name implies, suspend the thread until a signal arrives. Of course, the calling program may be using SIGALRM for its own purposes, so to avoid interfering with the program usleep first saves both the state of the timer as well as SIGALRM’s signal handler, before resetting the timer and setting SIGALRM’s handler to an empty function, since all it wants SIGALRM for is waking up from sigsuspend. When it wakes up, it restores these and returns.

This all makes for a very simple implementation of usleep that I’m sure some Berkeley student was very proud of. Unfortunately, this Berkeley student never expected multi-threading to exist, and the people implementing multi-threading just… forgot about usleep? As you might be able to tell from the above description of the code, if two threads are in usleep at once and one wakes up before the other, the one that wakes up afterwards is going to wake up into a process that has no SIGALRM handler, meaning it will—Alarm Clock.

This is such a textbook race condition that I find it incredible that it wasn’t caught and fixed. I can understand a user not knowing that signals and their handlers are per-process and not per-thread, but someone working on the C library needs to know this stuff. Once this had been figured out, the fix was simple: add an empty handler to SIGALRM. I did that, and Alarm Clock was no more.

Tying up some loose ends

The game occasionally freezes now

So the game has stopped crashing to Alarm Clock. That’s nice, but instead it’s taken to letting one of the two threads simply freeze. I was puzzled by this for quite some time, especially since the problem didn’t like to let itself happen when running through GDB, but eventually I managed to discover that it was freezing inside sigsuspend. Ugh, I bet that has a race condition too. I decided to take some advice I had received previously when complaining about usleep’s problems and just ditch it in favor of using select as a sleep.

The function select is intended to be used to wait for file descriptors to be ready. Say you’re waiting on a really slow storage device, or perhaps a network socket, and you don’t want to stop the entire program until it’s ready to be read from. select lets you specify lists of file descriptors and a timeout, and it will wait until either some of the file descriptors are ready, or until the timeout has passed.

Now, select does apparently have a limitation that makes it unsuitable for general use on modern systems, but I’m not interested in its primary purpose. The timeout is specified as a timeval, which consists of second and microsecond components, and it turns out that if select is called with all its lists being empty, it will simply wait for the timeout. Thanks to select being a system call, this all occurs inside the kernel meaning there’s hopefully a lower chance of process/thread-confusion-induced race conditions. The usage of this trick is very simple, here’s how I used it:

    struct timeval timeout;
    timeout.tv_sec = milliSeconds / 1000;
    timeout.tv_usec = (milliSeconds * 1000) % 1000000;
    select(0, NULL, NULL, NULL, &timeout);

Not only was this approach free of random freezing caused by Sun assuming nobody would ever use the thread features they provided, it also seems to be more accurate than usleep was. It seems like the game (and more importantly, the audio code) was delaying for too long before this change. Most of the buffer underruns that were left vanished just like that! The only ones left are the occasional single-buffer miss during level loading, and those are barely noticeable during gameplay.

Double buffering

So, the screen still flashes occasionally. That’s due to the game’s rendering overwriting the framebuffer that X is actively trying to draw to the screen. The fix for this was simple and I had simply been delaying it out of a combination of laziness and prioritizing other things. The solution is double buffering. Double buffering (or more precisely, double buffering with page flipping) basically works like this: you have two buffers and draw your graphics onto one of them. When the time comes to display, you swap the two buffers so that the one you were drawing to is saved somewhere while it’s being put on the display. You can then safely draw into the other buffer without worrying about clobbering the one that’s being shown.

Implementing this was rather simple, just a matter of adding a second XShm image buffer, tweaking code and names to deal with there now being two of them, and adding the swapping logic to the Display function.

Audio bit depth

As you may have guessed from the weird behavior of /dev/audio from before (no not that weird behavior, no not that one either), we’re already doing all the work of 16-bit audio, the setting was just on 8 for whatever reason. I increased this to 16 and it worked perfectly without any tweaks.

Conclusion

The port is missing a few features. The game’s resolution is stuck at 320x240 due to both performance and default system limits on SHM buffer sizes. This is problematic both due to the very small window size at normal Solaris resolutions, and due to the abysmally ugly font the game has for English at 320x240. The characters are merely 6 pixels wide, and the font doesn’t make the greatest use of that space, which makes some lowercase characters borderline unreadable. The Japanese font is quite readable though, so this doesn’t feel like a particularly hard limitation, but perhaps it is.

There’s no way to make config files directly on the Sun without poking bytes directly into a file. This is due to the normal CSE2 config program being written with OpenGL and Dear ImGui in mind, which I have good reason to suspect wouldn’t work well in this environment. I could have written a basic C program to mitigate this, but I didn’t and instead opted to provide a default config file directly in the repo with known working settings.

The port is not very general, and although a soundless version has run on my modern Linux laptop several times, I can’t guarantee that works since code kept getting changed back and forth between the versions due to me being too lazy to look up the proper C macros to use in ifdefs. There are also several rather ugly hacks touching endianess, so I have no idea if the little endian version works anymore.

Despite these shortcomings, I’m very satisfied with how the port turned out and am reasonably proud of it (that is to say, I don’t mind if people call my code here crap, but I feel good about having written it). The game runs at its full yet weird speed, the audio works as close to flawlessly as possible, and the experience is reasonably seamless. And… now I’m wondering if an OpenGL renderer could work… Nah, not worth figuring out.

The source code can be downloaded from this repo. Binaries are not provided.