Return to Sender

So far we’ve been focusing on decoding and rendering a PIC⁸⁸ image from F15-SE2 into something standard tools can open, and we’ve accomplished that fairly successfully I think. Now it’s time to turn things around and start writing a PIC encoder, so that we can take a custom image and turn it into a PIC file the game can open. This post will focus on the pixel packing and RLE encoding, and the next one will be on the LZW compression.

Before we get into the weeds of it, we need to cover off on a few things. First will be the palette, with F15-SE2, and F19 there is no apparent way to define a custom palette, so any images for those games will have to conform to their respective palettes. (We think we have a good one for F15, but have not looked into F19 yet). With that said, I will write the encoder to output a companion PAL file in case it is being used with another title that does support PAL files, once we move to an input file format that has a palette.

Data Sources

In order to test and validate our code, we will need to have known data sets. Luckily we have most of what we need already from our decoding efforts when we made our stand-alone programs for each stage along the way. We can now use those outputs as inputs, and compare against the inputs that we fed our decoder programs. There is one exception, and that is the final stage where we rendered an image. To provide the necessary data here, I’ve made a version of the program that saves the raw pixel index data as a PIX file, that we can then use as input to our first stage.

Packing the pixels

Packing the pixels is a pretty straight forward process, it’s just a matter of reversing the process we used in the post on decoding, keeping mind of the order in which we load the nibbles. As we’ve already discussed the process in the earlier posts, and know the order, let’s jump straight to the implementation. We’re using a fixed set of values for the format byte here, simply because that’s all our source files have. This is fine for testing/validation of the code. The value is only included here because at the time of decoding, we didn’t understand that byte yet, and were including it in the output files, passing it forward, so we need to include it here again so that we can use checksums to confirm a math to the original input.

void save_raw(FILE *dst, FILE *src, uint width, uint height, bool isPacked) {

    // need to write the format byte
    if(isPacked) {
        fputc(0x0b, dst);
    } else {
        fputc(0xf5, dst);
    }

    if(isPacked) {
        width /= 2; // adjust the width for the fact that for every input byte, 2 pixels will be output
    }

    for(uint y = 0; y < height; y++) {
        for(uint x=0; x < width; x++) {

            uint8_t px = fgetc(src);
            if(feof(src)) { // we must be beyond the input data
                px = 0; // pad with 0 (black)
            }

            if(isPacked) {   // packed mode, write 2 pixels
                px &= 0x0f;
                uint8_t px2 = fgetc(src);
                if(feof(src)) { // we must be beyond the input data
                    px2 = 0; // pad with 0 (black)
                }
                px2 <<= 4;
                px |= px2; 
            }
            // write the (packed) pixel(s)
            fputc(px, dst);
        }
    }
}

Not much to really worry about with this code, so lets run it and see how it goes.

PIX to RAW image converter
Resolution: 320 x 200
Using Packed Pixel Arrangement
Opening PIX Image: 'TITLE16.PIX'	File Size: 64000
Creating RAW Image: 'TITLE16.RAW2'
Saving RAW Image

MD5 (TITLE16.RAW) = ca3658700923b6492d3e613ac56cca0a
MD5 (TITLE16.RAW2) = ca3658700923b6492d3e613ac56cca0a

No surprise here that we have a match. I did confirm with a few other files too, both packed and unpacked, and they all matched. Now on to the fun stuff.

Writing the RLE encoder

This is where things possibly become more challenging, as there are a few implementation variables that we will need to figure out. For example we have to determine if the MicroProse encoder looks to see that the run code is actually shorter than the input run, or will it always use the run code when 2 or more characters in a row are a match, and what does it choose to do when the lengths are the same. In reality it probably doesn’t matter, as the decoder would be fairly immune to these variances, however, our goal here is to generate the exact output, so we know that it will decode correctly by the game. If you’re new to the series, and are unfamiliar with RLE encoding, I suggest you go back and read my previous post on the topic, as I’m going to go straight into the implementation here.

One question we need to answer, before we can begin is if the encoder encodes by scanline or not. That is, do runs wrap from one scanline to the next, or do they always terminate on the scanline boundary. As this is one of the variables that could actually break the decoder in the game. Luckily this should be fairly evident by looking at the RLE output files we have.

So above is the beginning of the data for 256PIT, and that file has the top half of the screen all 0x00 which is perfect for us to examine here. The image width is 320, and the maximal run is 255. Since 320 is not divisible from 255, we should see something along the lines of 00 90 FF 00 90 41 repeating if the encoder terminated on the scanlines. We don’t see that here, so the encoding does not seem to be scanline terminated. We can probably answer our earlier question of of how the encoder handles runs of 2 and 3 as well, just by looking at the data.

In this section of another file (256REAR), we can clearly see the ‘P’ characters repeating up to lengths of 3, and the highlighted section shows that at a length of 4 the encoding is used. So we now know that the run needs to be >3 before the encoding is used, otherwise just output the raw characters. The last variable I can think of is if the encoder supports run-chaining in order to minimize expansion with runs of 0x90 in the input stream (though I doubt it based on the codes we saw in the first capture). Our decoder would handle run-chaining without problem. For now I’m going to assume not, but we can revisit this as we test more and more data. So far none of the examples I’ve looked at appear to have runs of 0x90‘s in them. So with that said, let’s get to the encoding.

Based on what we’ve observed in the data, and what we know of RLE, our encoder needs to take a byte from the input stream, save it. Then count how many times that byte repeats until a different value is found, or we reach the max of 255. At that point, if count is >3 we output the byte followed by 0x90 followed by the count. If the count is <=3 we need to loop and output the byte count times. After which we read the next byte, and start the process over.

void rle_init(rle_state_t *ctx) {
    ctx->symbol = 0; 
    ctx->count = 0;
    ctx->encoding = false;
}

// writes count instances of symbol to output
void rle_dump(FILE *dst, uint8_t symbol, uint8_t count) {
    while(count--) fputc(symbol, dst);
}

// writes a run to the output
void rle_run(FILE *dst, uint8_t symbol, uint8_t count) {
        fputc(symbol, dst);
        fputc(RLE_TOKEN, dst);
        fputc(count, dst);
}

// special case for when we need to encode the RLE_TOKEN in the output stream
void rle_escape(FILE *dst) {
        fputc(RLE_TOKEN, dst);
        fputc(0, dst);
}

void rle_drain(FILE *dst, rle_state_t *ctx) {
        if((false == ctx->encoding) || (0 == ctx->count)) return; // nothing to do
        uint8_t count = ctx->count;
        ctx->encoding = false;
        ctx->count = 0;
        if(count > 3) return rle_run(dst, ctx->symbol, count); // encode a run
        return rle_dump(dst, ctx->symbol, count); // encode the raw symbol
}

// will continually encode on successive calls based on input symbol
// if the input symbol is EOF, then any pending encoding is drained out
void rle_encode(FILE *dst, int symbol, rle_state_t *ctx) {
    if(EOF == symbol) { // we've reached the end, we need to drain out anything we may not have emitted yet.
        return rle_drain(dst, ctx);
    }

    if(RLE_TOKEN == symbol) { // handle the special case of the token value in the stream
        rle_drain(dst, ctx);
        return rle_escape(dst);
    }

    if(false == ctx->encoding) { // not currently in an encoding, store and count the symbol and exit
        ctx->encoding = true;
        ctx->symbol = symbol;
        ctx->count++;
        return;
    }

    if(ctx->symbol != symbol) { // symbols changed
        rle_drain(dst, ctx); // drain it out
        ctx->encoding = true; // store and count the new symbol
        ctx->symbol = symbol;
        ctx->count++;
        return;
    } 

    ctx->count++;
    if(255 == ctx->count) { // max count reached, drain it out
        return rle_drain(dst, ctx);
    }
}

The RLE encoder is somewhat more complex than the decoder, as there are a number of conditions that we have to watch for. First we need to be sure that when we reach the end of input, we flush out (or drain) any data we are still holding onto. We accomplish that here, by looking for EOF being passed in as the input symbol. Then we need to check to see if the symbol is the RLE_TOKEN itself, if so, we once again need to flush out what we have, and then spit out the escaped token value. Then we need to see if we have any data in our context, if not simply start up the context with the current symbol. If we do have data, then we need to check to see if it matches the current value passed in, if not, we need to flush out what we have, and reset the context to the new value. If the value does match, we simply count it. Finally, we check to see if we have reached the maximum possible count of 255, if so, we flush out what we have, and will start over with the next value that comes in.

With the code written, let’s give it a run, and compare the output RLE2 to what we expect from RLE file we generated when writing the decompressor.

PIC RLE Compressor
Opening RAW Image: 'TITLE16.RAW'	File Size: 32001
Creating RLE Image: 'TITLE16.RLE2'
Compression Complete Without Error

% md5 TITLE16.RL*      
MD5 (TITLE16.RLE) = 4e0bbd4d021674875b77969c21e95100
MD5 (TITLE16.RLE2) = 4e0bbd4d021674875b77969c21e95100

We have a match! Let’s run a few more, to look for possible edge cases.

PIC RLE Compressor
Opening RAW Image: 'TITLE640.RAW'	File Size: 224001
Creating RLE Image: 'TITLE640.RLE2'
Compression Complete Without Error

% md5 TITLE640.RL*      
MD5 (TITLE640.RLE) = fe6305c590200014738aca7b87194b1a
MD5 (TITLE640.RLE2) = fe6305c590200014738aca7b87194b1a

Opening RAW Image: '256PIT.RAW' 	File Size: 64001
Creating RLE Image: '256PIT.RLE2'
Compression Complete Without Error

% md5 256PIT.RL*
MD5 (256PIT.RLE) = c893a7d8e8dc4ed7aa821b01f6a76d36
MD5 (256PIT.RLE2) = c893a7d8e8dc4ed7aa821b01f6a76d36

Opening RAW Image: '256REAR.RAW'	File Size: 320765
Creating RLE Image: '256REAR.RLE2'
Compression Complete Without Error

% md5 256REAR.RL*      
MD5 (256REAR.RLE) = c0a4ce46b636cf6899aa960c32a50a36
MD5 (256REAR.RLE2) = da01e3447764aaef8c460bd460e4a774

Well crap! I thought we were good there. I just had to test one more file! Okay, well better we found out now, than later, let’s see if we can figure out what’s going on.

Unexpected variances

-rw-r--r--  15377  6 Jun 16:07 256REAR.RLE
-rw-r--r--  14905 13 Jun 09:41 256REAR.RLE2

Well there’s the first problem, the files are of different sizes, by nearly 500 bytes, that’s not good. How the heck can I pass all the other files and be that far off with this one? Something doesn’t smell right here. Let’s look at the two files and see if we can spot anything at the start or end that would indicate truncation. (original file is on the right)

Other than the length mismatch, the data looks to be good. Even the sequence of bytes looks the same. So somehow we are missing bytes form somewhere in the middle. My first thought was the escaping of 90, searching for 90 00 in the original file didn’t reveal any runs of them, and I get the same number of them in the file we just made. Interestingly though the only file that has escaped 90‘s is the one that fails, but it only as 11, far short of the nearly 500. Since the data looks valid, lets try running our newly created file back the other way, and see how that compares to the file we used as input here. (To avoid overwriting the original, I renamed the RLE2 file to TEST.RLE)

PIC RLE Decompressor
Opening PIC Image: 'TEST.RLE'	File Size: 14905
Creating RAW Image: 'TEST.RAW'
Decompression Complete

% md5 256REAR.RAW
MD5 (256REAR.RAW) = 605537052c628dd41acd5aaacbfd84a2
% md5 TEST.RAW
MD5 (TEST.RAW) = 605537052c628dd41acd5aaacbfd84a2

Now that is both comforting and strange. Seems the RLE code we produced is still valid code, as we get back to the same RAW data after decompressing it. So what the heck is going on? We need to dig deeper. First let’s figure out where the data is mismatched.

% cmp 256REAR.RLE 256REAR.RLE2
256REAR.RLE 256REAR.RLE2 differ: char 753, line 1

Okay, so let’s go and take a look at that location. (original file is on the right)

Indeed they are different, but something strange is going on here. Our file shows a 90 here, so we are in a run. So looking at the bytes before and after we get C2 90 9A. And when we look at the original file, we see the next byte is the 90 with the full sequence being C2 90 99, which is both one off in position, and one less in count than what we have. Now in our file we have the symbol value (C2) at 02EF with the token at 02F0. The original file also has C2 at 02EF but then it repeats C2 at 02F0 resulting in a sequence of C2 C2 90 99 instead of what one would expect, and we generate, C2 90 9A. So for some reason the official MicroProse encoder is breaking the run into 2 parts. And from the size difference, this is happening close to 500 times in this file, assuming the breaks are similar with one byte output, followed by a run for the same byte. All I can think of is that there must be some boundary condition that the MicroProse encoder is hitting to flush what it has and then starting over. The strange thing is that 02F0 is not a typical boundary on a power of 2, so it’s not likely due to an output buffer. Lets how much input data that part of the file generates, perhaps that falls on a power of 2 boundary. Also for completeness here the first point of mismatch happens well before the first escaped 90 in the code, so it’s not likely that it is somehow triggering something downstream.

% head -c 752 TEST.RLE > TRUNC.RLE
% ls -al TRUNC.RLE
-rw-r--r--     752 13 Jun 13:30 TRUNC.RLE

% rle2raw TRUNC.RLE          
PIC RLE decompressor
Opening PIC Image: 'TRUNC.RLE'	File Size: 752
Creating RAW Image: 'TRUNC.RAW'
Decompression Complete

% ls -al TRUNC.RAW
-rw-r--r--   20451 13 Jun 13:30 TRUNC.RAW

Nope! Even with subtracting off the format byte we’re still not landing on a typical power of 2, or a round number. So I have no clue why such an odd code generation would have come from the MicroProse encoder, or what boundary condition triggers it. I think I have no choice but to put a pin in this mystery for now. We can come back to it if we need to. We appear to be generating valid code, that does match the expectations. Hopefully whatever condition triggers this oddity in the MicroProse version isn’t based on the capabilities of the decoder within the game, meaning that our version should still work. Only one way to find out, and that is to finish the encoder and give it a try. So that is where we will pick up next time, writing the LZW compressor that is the final component we need to be able to create a PIC⁸⁸ file.

This post is part of a series of posts surrounding my reverse engineering efforts of the PIC file format that MicroProse used with their games. Specifically F-15 Strike Eagle II (Though I plan to trace the format through other titles to see if and how it changes). To read my other posts on this topic you can use this link to my archive page for the PIC File Format which will contain all my posts to date on the subject.

Ouch my eye!

Return to Sender

Data Sources

Packing the pixels

Writing the RLE encoder

Unexpected variances

Leave a comment Cancel reply

Return to Sender

Data Sources

Packing the pixels

Writing the RLE encoder

Unexpected variances

Share this:

Leave a comment Cancel reply