Run, Forrest! Run!

In my last post we left off with having a basic LZW decompressor up and running, now it’s time for tackling the RLE_{Run-Length Encoding}encoding of the data that the LZW compressor output. This brings us one step closer to rendering the image, which at this point is becoming critical in order to see if we are on the right path in decoding the PIC file format.

What is Run Length Encoding?

Run Length Encoding, or RLE for short, is one of the simplest and most basic forms of lossless data compression. As the name suggests RLE is a way to encode runs of symbols in order to reduce the amount of space they occupy. For low fidelity images like we have here, this can be very effective as we can have large areas of a single colour.

RLE can take many forms, in its most basic form it takes each input symbol, counts how many times it occurs in sequence before a new symbol is encountered, then outputs the symbol and count as a combined structure record, this process is repeated until the end of the input. This can be quite effective for data that has long runs with few transitions. It also makes decoding simple as each record is easily identified, and can be processed individually, non-linearly even. But for image data, especially low-fidelity images, where techniques like dithering are often used to try and create shading, this basic form could result in considerable expansion rather than compression.

Another common form uses an escape or marker symbol to indicate when a run is being specified, otherwise the symbols are simply passed through. In this form we can’t as easily decode from any arbitrary point, as there is some context as to the state, so some analysis of preceding symbols is needed. Nonetheless, this form does not suffer the same shortcoming as the basic form discussed earlier. However, it has its own shortcomings. If the marker symbol can appear in the data stream itself, a mechanism needs to be provided to be able to encode it. The downside is that this usually means that runs of the marker symbol itself cannot be run-length encoded. Meaning that if an input data stream had long runs of the marker symbol, expansion could occur. In this form the counting works the same as it does in the basic case above, except if the count is less than some threshold (usually 4 to prevent expansion), it simply outputs the token(s). If the count is greater than, or equal to, the threshold value then the symbol is output followed by the marker followed by the count

Writing the RLE decoder

The escaped is the form we are most interested in, based on what we have read, and can see in the data itself. Here we can see the sequences surrounding where the marker 90 is seen.

File: TITLE16.RLE  [12681 bytes]
Offset    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF  Decoded Text
0000000x: 0B 00 90 44 88 08 88 00 08 88 90 07 08 80 08 80  · · · D · · · · · · · · · · · ·
0000001x: 00 80 00 88 08 08 88 88 88 80 80 00 88 80 88 00  · · · · · · · · · · · · · · · ·
0000002x: 90 13 80 08 00 90 5A 08 80 08 00 90 0A DD 90 24  · · · · · · Z · · · · · · · · $
0000003x: 00 90 13 08 00 90 5B 08 00 88 00 90 0A 4D 44 90  · · · · · · [ · · · · · · M D ·

Writing to decompress an escaped RLE stream is quite simple. For every byte that is input, it is copied to the output unless, it is the marker. If it is the marker, the next byte is read in, and if non-zero, this is the run count. As we have already output the symbol once, we reduce the count by one, and then loop for the remainder outputting the symbol which preceded the marker. If the count is 0, then we output the marker itself as the symbol.

typedef struct {
    uint8_t last;       // value of last symbol written to output
    bool    isToken;    // indicates if the last byte passed is the RLE token (was not written to output)
} rle_state_t;

void rle_init(rle_state_t *ctx) {
    ctx->last = 0;
    ctx->isToken = false;
}

void rle_expand(FILE *dst, uint8_t symbol, rle_state_t *ctx) {
    // first handle if last symbol was the RLE_TOKEN
    // if so, this symbol is the length value
    if(ctx->isToken) {
        ctx->isToken = false;

        // check for zero-length - special case to encode the RLE_TOKEN value
        // itself into the output datastream
        if(0 == symbol) {
            ctx->last = RLE_TOKEN;
            fputc(ctx->last, dst);
            return;
        }

        // we have a run-length, write it out to the output
        while(--symbol) {
            fputc(ctx->last, dst);
        }
        return;
    }

    // see if this is the RLE_TOKEN, if so sset state nothing more to do to 
    // do until next symbol arrives.
    if(RLE_TOKEN == symbol) {
        ctx->isToken = true;
        return;
    }

    // finally just output the symbol as-is if not handled by now
    ctx->last = symbol;
    fputc(symbol, dst);
}

void rle_decode(FILE *dst, FILE *src, rle_state_t *ctx) {

    rle_init(ctx); // initialize the state machine

    uint8_t symbol = fgetc(src); // grab the first symbol
    while( !feof(src) ) {
        rle_expand(dst, symbol, ctx); // handle the symbol
        symbol = fgetc(src); // grab the next symbol
    }
}

rle_decode() is our main entry point to run the RLE decoding. It expects file handles to the input and output streams, and a pointer to a state context structure, after that it will continue decompressing until the end of the input stream.

With the code done, there’s nothing left to do but let it rip!

PIC RLE decompressor
Opening PIC Image: 'TITLE16.RLE'	File Size: 12681
Creating RAW Image: 'TITLE16.RAW'
Decompression Complete Without Error

Well no surprises there, as our RLE code doesn’t have any error states. So let’s look at the data that came out.

File: TITLE16.RAW  [72326 bytes]
Offset    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF  Decoded Text
0000000x: 0B 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
0000001x: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
0000002x: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
0000003x: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
0000004x: 00 00 00 00 00 88 08 88 00 08 88 88 88 88 88 88  · · · · · · · · · · · · · · · ·
0000005x: 88 08 80 08 80 00 80 00 88 08 08 88 88 88 80 80  · · · · · · · · · · · · · · · ·
0000006x: 00 88 80 88 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
0000007x: 00 00 00 00 00 00 00 80 08 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
0000008x: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
0000009x: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
000000Ax: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
000000Bx: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
000000Cx: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
000000Dx: 00 00 00 08 80 08 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
000000Ex: DD DD DD DD DD DD DD DD DD DD DD DD DD DD DD DD  · · · · · · · · · · · · · · · ·
000000Fx: DD DD DD DD DD DD DD DD DD DD DD DD DD DD DD DD  · · · · · · · · · · · · · · · ·

The data looks reasonably good here, except that most of the values are out of range for a 16 colour image, but they do appear to be regular sequences. Also based on the file size, we have way too much data. However I think the size problem is related to the LZW decompression, not the RLE. As for the data values here, not sure yet, let’s see what it looks like when we go to render it. Rendering will also let us see if we have any one-off errors in our run length code. Also I did look at one of the 256 colour images, and the data looks good there (except for size again). At this point I would expect 64001 bytes, for a fully decoded 320×200 1 byte per pixel image, including our format_identifier byte which I just keep copying forward for now, until I know what to do with it.

While we may not be producing an image just yet, we are on the verge of doing so. In my next post, we will take our RAW output and render the data into an image that we can then view.

This Post is the fourth in a series of posts surrounding my reverse engineering efforts of the PIC file format that MicroProse used with their games. Specifically F-15 Strike Eagle II (Though I plan to trace the format through other titles to see if and how it changes). To read my other posts on this topic you can use this link to my archive page for the PIC File Format which will contain all my posts to date on the subject.

Run, Forrest! Run!

What is Run Length Encoding?

Writing the RLE decoder

Share this:

Leave a comment Cancel reply