Attack of the subs

A little diversion while I wait for parts to arrive for my RAID data recovery and rebuild. Fingers crossed we don’t end up in a whole series of reverse engineering the Drobo BeyondRaid Filesystem format. (Though that could be fun for sport, AFTER I’m back up and running) Today’s target comes as another request. This time it’s 688 Attack Sub from Electronic Arts. Looks like there are a couple of different image asset files, PAK (and an accompanying PAL file), and EGA. So let’s dig in and see how far we get.

PAK File Format

-rw-r--r--  64000  4 Mar  1989 ALFA.PAK
-rw-r--r--    768  4 Mar  1989 ALFA.PAL

To kick things off we’ll begin with the PAK file which is exactly 64000 bytes in size, and has an accompanying PAL file of 768 bytes. This just screams being a 320×200 256 colour image. As the size is exactly the size of the raw image buffer, it’s clearly not compressed. So for this one, we’ll solve it without a single line of code. Instead we’ll fire up everyone’s favourite open-source image editing program, GIMP. For those that are unaware, GIMP can import raw image data. It has a number of options, but we want indexed data, and as a bonus it allows us to import a palette as well. So let’s enter in the particulars and see what we get.

ALPHA.PAK (with ALPHA.PAL) in the gimp RAW image data import window

Well that was easy! Not even going to bother with code here, as both import and export can be handled directly in GIMP. If only they could all be this easy, but then again this blog wouldn’t be alive again if it was.

The EGA File Format

-rw-r--r--  5807  4 Mar  1989 MAIN.EGA

This one is going to take a little more work. Now we’re not sure this even is an image asset though the file extension strongly suggests that it is. It makes sense to be a graphic asset as the PAK file would only work for VGA displays, though we have only one EGA file with the game. The size here would suggest that the file is compressed, as a 320×200 EGA image would require 32000 bytes, and a 640×200 64000 bytes, and a 640×350 112000 bytes. So this is clearly much smaller than any of the standard EGA resolution options. Let’s open it up in our hex viewer and see what we can make of it.

File: MAIN.EGA  [5807 bytes]
Offset    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF  Decoded Text
0000000x: 3F 01 C6 00 97 00 07 0C C0 CC 00 C0 00 0C 00 80  ? · · · · · · · · · · · · · · ·
0000001x: C0 13 0C C0 0C 0C 00 C0 00 0C 00 0C 0C CC 00 0C  · · · · · · · · · · · · · · · ·
0000002x: 00 00 CC C0 CC 00 80 C0 01 0C 00 80 0C 06 00 C0  · · · · · · · · · · · · · · · ·
0000003x: 0C 00 0C C0 CC 80 C0 15 00 C0 00 00 C0 0C 0C 00  · · · · · · · · · · · · · · · ·
        ⋮                                                 ⋮
0000167x: 00 9B 00 FF 00 9B 00 FF 00 9B 00 FF 00 9B 00 FF  · · · · · · · · · · · · · · · ·
0000168x: 00 9B 00 FF 00 9B 00 FF 00 9B 00 FF 00 9B 00 FF  · · · · · · · · · · · · · · · ·
0000169x: 00 9B 00 FF 00 9B 00 FF 00 9B 00 FF 00 9B 00 FF  · · · · · · · · · · · · · · · ·
000016Ax: 00 9B 00 FF 00 9B 00 FF 00 9B 00 FF 00 9B 00 EOF · · · · · · · · · · · · · · · EOF

3F 01: 0x013f (319) Width-1?
C6 00: 0x00c6 (198) Height-1?

While we don’t see the exactly values we’d expect here, they are too close to what we would expect for this to be a coincidence. I suspect a one-off error in the code here as 319 is really strange. Typical ‘end – start’ calculation, and they forgot to add one to it. So I’m going to assume 320 for the width here, and 199 for the height. Still strange on the height, but two off would be a very unusual error. We can try different combinations once we can get further down the road. The rest of the data itself almost looks like it could be uncompressed, though I see a lot of repeated patterns, so I am suspecting some form of RLE compression, given that this cannot be uncompressed based on the file size. The data also appears to be packed, so either we are looking at planar data with 8 pixels packed per byte, and then 4 separate planes of bytes, or we have 4 bits per pixel packed data/2 pixels per byte. Now the file length is an odd number, which would rule out a simple count/byte format like we saw with MPSshow. We also don’t see any obvious repeating tokens to indicate repeat markers like we see in the PIC file format. So that means we are looking at something new yet again. So what makes me think it’s RLE and not something else? Mostly it’s because we still see repeating patterns, had this been some form of LZ compression for example, we would not see such regular repeated patterns. Sure it could be something else, but we have to start somewhere, and I think RLE is the most likely candidate as it’s fast and easy to implement.

Code based RLE

There are 3 main forms of RLE, and we’ve already ruled out two of them. In this third form instead of a flag or token byte to represent that an encoding follows, a single bit is used (typically the MSB), allowing for the count and flag to be encoded together in a single byte, at the cost of half the range. Just like we saw with LZSS and the pointers some small offset is often added to the lengths to remove dead codes (lengths that make no sense). So that’s where we’ll start. We’ll write some code to parse through the stream of bytes in the file, interpreting the data as code based RLE, and summing up our projected resultant decided size. We’ll compare that against what we expect based on the given, and adjust things to see if we can get things to line up.

    size_t dst_pos = 0;
    size_t src_pos = 4; // we've consumed 4 bytes already for len & wid
    while(src_pos < src_len) {
        uint8_t code = src[src_pos++]; // read in the 'code'
        if(code & 0x80) {
            code = code & 0x7f;        // mask off the flag bit
            src_pos++;                 // consume the byte to duplicate
            dst_pos += code;           // reflect its duplicated length
        } else {
            src_pos++;           // consume the copied byte
            dst_pos++;           // reflect the copied byte
        }
    }

For starters I’ve assumed that the flagbit = 1 indicates a run-length encoding, otherwise it’s just a naked byte. Let’s compile it and see what we get.

EGA image analyzer
Opening EGA Image: MAIN.EGA
File Size: 5807
Resolution: 320 x 199 (63680 pixels / 31840 bytes)
Processed: 5808 bytes
Calculated length: 101450 bytes

Well that clearly generates too much output for what we’re looking for. Also we’re ignoring that we can have a count even when the flag bit is not set. So instead of assuming only one byte, let’s assume count bytes of data are copied through from input to output.

    size_t dst_pos = 0;
    size_t src_pos = 4; // we've consumed 4 bytes already.
    while(src_pos < src_len) {
        uint8_t code = src[src_pos++]; // read in 'code'
        if(code & 0x80) {
            code = (code & 0x7f);      // mask off the flag bit
            src_pos++;                 // consume the byte to duplicate
            dst_pos += code;           // reflect its duplicated length
        } else {
            src_pos += code;           // consume the copied bytes
            dst_pos += code;           // reflect the copied bytes
        }
    }

EGA image analyzer
Opening EGA Image: MAIN.EGA
File Size: 5807
Resolution: 320 x 199 (63680 pixels / 31840 bytes)
Processed: 5807 bytes
Calculated length: 50658 bytes

That brings us closer, but we can’t really steer our choices here, as each change changes what bytes we end up looking at as a code, and thus completely alter the decision path. With that said, it still may be a rough indicator that we are headed in the correct direction. With that said, I think this is a more logical choice. Now let’s consider a length value of zero, as it doesn’t make much sense. Let’s add one to this so we get a range of 1-128 instead of 0-127. we can do this for both clauses.

    size_t dst_pos = 0;
    size_t src_pos = 4; // we've consumed 4 bytes already.
    while(src_pos < src_len) {
        uint8_t code = src[src_pos++]; // read in 'code'
        if(code & 0x80) {
            code = (code & 0x7f) + 1;  // mask off the flag bit
            src_pos++;                 // consume the byte to duplicate
            dst_pos += code;           // reflect its duplicated length
        } else {
            code++;
            src_pos += code;           // consume the copied bytes
            dst_pos += code;           // reflect the copied bytes
        }
    }

EGA image analyzer
Opening EGA Image: MAIN.EGA
File Size: 5807
Resolution: 320 x 199 (63680 pixels / 31840 bytes)
Processed: 5807 bytes
Calculated length: 30250 bytes

Once again that seems to bring us closer, but as I stated it is not a reliable measure, but it can be an indicator. Now let’s consider when it makes sense to copy a byte vs run-length encode it. The threshold there lies at 3, as anything less than 3 and it is cheaper to just copy it through rather than to encode it as a run-length. So let’s alter our code for that part to use an adjustment of 3, giving us a run-length range of 3-130.

    size_t dst_pos = 0;
    size_t src_pos = 4; // we've consumed 4 bytes already.
    while(src_pos < src_len) {
        uint8_t code = src[src_pos++]; // read in 'code'
        if(code & 0x80) {
            code = (code & 0x7f) + 3;  // mask off the flag bit
            src_pos++;                 // consume the byte to duplicate
            dst_pos += code;           // reflect its duplicated length
        } else {
            code++;
            src_pos += code;           // consume the copied bytes
            dst_pos += code;           // reflect the copied bytes
        }
    }

EGA image analyzer
Opening EGA Image: MAIN.EGA
File Size: 5807
Resolution: 320 x 199 (63680 pixels / 31840 bytes)
Processed: 5807 bytes
Calculated length: 31840 bytes

Looks like we have a winner! Had we not had a match here, I would have repeated the process above, but reversing the relationship for what the flag bit marked (so copy-through when 1, run-length when 0). Also we could have played with the length and width values as well, but seems our hypothesis there was correct.

Rendering the EGA file

To render the EGA file we need to rewrite our code to actually do the decoding work. And while we’re at it let’s unpack the pixels to 1 pixel per byte. We’ll start by assuming low nibble holds the left most pixel in each byte.

    size_t dst_pos = 0;
    size_t src_pos = 4; // we've consumed 4 bytes already.
    while(src_pos < src_len) {
        uint8_t code = src[src_pos++];
        if(code & 0x80) {                      // run-length marked
            code = (code & 0x7f) + 3;          // mask and adjust for range 3-130
            uint8_t val = src[src_pos++];      // get the value to replicate
            for(int i = 0; i < code; i++) {    // replicate for 'code' bytes
                dst[dst_pos++] = val & 0x0f;
                dst[dst_pos++] = (val >> 4) & 0x0f;
            } 
        } else {                               // copy-length
            code += 1;                         // adjust for range 1-128
            for(int i = 0; i < code; i++) {    // copy for 'code' bytes
                uint8_t val = src[src_pos++];  // pass the byte through
                dst[dst_pos++] = val & 0x0f;
                dst[dst_pos++] = (val >> 4) & 0x0f;
            } 
        }
    }

With the code written, let’s render to an image using the default EGA colours and see what we get.

Clearly we have decoded the RLE data correctly. Looks like we need to swap nibbles, putting the left most pixel in the high nibble. Also looks like the image is encoded bottom to top. Lets fix the pixel order first, and then we can deal with the line order.

    size_t dst_pos = 0;
    size_t src_pos = 4; // we've consumed 4 bytes already.
    while(src_pos < src_len) {
        uint8_t code = src[src_pos++];
        if(code & 0x80) {                      // run-length marked
            code = (code & 0x7f) + 3;          // mask and adjust for range 3-130
            uint8_t val = src[src_pos++];      // get the value to replicate
            for(int i = 0; i < code; i++) {    // replicate for 'code' bytes
                dst[dst_pos++] = (val >> 4) & 0x0f;
                dst[dst_pos++] = val & 0x0f;
            } 
        } else {                               // copy-length
            code += 1;                         // adjust for range 1-128
            for(int i = 0; i < code; i++) {    // copy for 'code' bytes
                uint8_t val = src[src_pos++];  // pass the byte through
                dst[dst_pos++] = (val >> 4) & 0x0f;
                dst[dst_pos++] = val & 0x0f;
            } 
        }
    }

Well that is certainly much better, now all we need to do is reverse the order in which we process the pixels as we render them to the image file. I won’t present code for that as it’s pretty straight forward. You can look at the windows BMP code we wrote in my SPC File format post, to see encoding in reverse line order.

And that seems to be it! With respect to the height, 199 appears to be correct despite being an odd value. I’m guessing it was just a cheap way to squeeze a bit more compression by skipping over the last blank line, or just an implementation error. Conceivably they could have done the same at the top of the image as well for further reduction. Regardless of the reason, we have only 199 lines of image data in the file.

Another pair of relatively simple image formats that only took a couple of hours to decode. In my next post we’ll write a RLE encoder and make it so the EGA image can be re-encoded, for the modding community. Unfortunately there do not appear to be any more bare EGA image assets, so if they exist, they must be bound into the executable files, which are themselves packed. I’ll leave that exercise to someone else as it would be specific to the executable and not something general purpose. Until my next post I’ll leave you with what may be my favourite title screen of any game from the late 1980’s.

Ouch my eye!