Bigger is better?

By the end of my last post we had established that the MicroProse PIC file format likely uses LZW_{Lempel-Ziv-Welch} compression, on top of RLE_{Run-Length Encoding} in order to store the image data. Today we’re going to try to write some code to read in a PIC file, LZW decompress the file, and output the raw RLE data, bringing us one step closer to seeing the image itself. In this post we’ll dig into some code on how to implement LZW decompression.

Before we can write a LZW decompressor, we first need to understand what it is. I’ll give a brief summary of how it works, but for a more thorough explanation I suggest reading this article: LZW Data Compression written by Mark Nelson back in October, 1989 for Dr. Dobb’s Journal. (Be sure to check out his follow-up update linked at the top of the linked page) The Wikipedia page for LZW compression also has an excellent explanation.

First a little history…

Lempel-Ziv-Welch Compression, or LZW for short, is an algorithm published by Terry Welch in 1984 based on earlier works published by Abraham Lempel & Jacob Ziv in 1977 and 1978, namely the LZ77 and LZ78 algorithms. Like its predecessors, LZW is a lossless, dictionary based, compression algorithm. That is to say it uses a table of strings of bytes, and their indices in the table to perform it’s compression, without losing a single bit of the source data once decompressed again. All 3 algorithms are still heavily used today in most of our lossless data compression formats. They are also ideal for low fidelity images, where colour is indexed into a palette, rather than expressed as a set of components (RGB or HSB for example), as we see with most photographic image formats today.

One of the features of LZW and LZ78 is that neither the compressor or decompressor need to know the dictionary ahead of time. Though LZW differs from it’s predecessors in that part of the dictionary is pre-initialized with all the codes covering all possible values of the input symbol. (so 256 values for 8 bit input). This means that unlike LZ78 only the codes are in the output stream, where with LZ78 has to include both the code and the symbol). Typical LZW compression will use variable length codes, starting at some predefined minimum (typically 9 bits) up to some maximum (typically 12 bits) though the minimum and maximum can vary depending on application, and resources. One of the disadvantages to LZW is the amount of memory required to hold the dictionary.

Patent Drama: The LZW algorithm was patented by Sperry (succeeded by Unisys) back in 1983 (patent expired in 2003), and became the subject of a lot of controversy on the internet back in the mid-late 1990’s, as the .GIF_{Graphics Intercahge Format} originally created by CompuServe in 1987, was widely used across the web for images. The GIF file format uses LZW and was not licensed, as CompuServe was unaware of the patent at the time. After several years of inaction, Unisys took several unpopular actions in the mid 1990’s to try and enforce its patent and collect royalties, first from CompuServe and later from any website that used images that were possibly made by an unlicensed application. This ultimately spurred on the development of the .PNG_{Portable Network Graphic}, which is unburdened by patents, we see widely used today. (I wonder if MicroProse ever took out a license for LZW?)

How does LZW compression work?

LZW compresses files by replacing repeated sequences of bytes with a single code value, Thus reducing the overall number of bytes required to represent the sequence. These sequences, or strings, are built up over time as more and more data in the file is processed. Because of how these strings are built up, and the resultant codes are written out it is possible for the decompressor to build up the same table of strings that the compressor used, without the actual table ever being included in the compressed stream. It is also important to note that new codes must be generated sequentially for the algorithm to work.

Upon startup, for both compression and decompression, the dictionary is initialized with the first 256 bytes (0-255) being set to be simply their 8 bit counterparts. (we are assuming a general-purpose implementation here where the input symbols are bytes, or 8 bits wide) Code 256 is the first multi-byte code in the table. Though some implementations will reserve one or more codes after the first 256 for in-band signalling of EOF_{End Of File} or other events to the decompressor, so the first available code may be a few counts higher.

Compression: The compressor will read read as many bytes as it can that match a string of bytes in the dictionary. Once it cannot match anymore, it outputs the code for the longest matching string it found (basically up to the previous byte) and then the code for the current new byte, it also then creates a new dictionary entry, consisting of the string and the new character appended. This process is repeated until the end of file is reached.

Decompression: The decompressor will read in a code, if the code is less than 256 it is considered a terminating code. In this case the corresponding byte is output, and a new dictionary entry is created consisting of the previous code read with the current code appended to it as its string. However, if the code read was not a terminating code, then the string referenced by the code is output.

Dealing with variable bit width of codes: In order to minimize the file size, many LZW implementations will use a variable code width. Assuming an 8 bit input symbol size to the compressor, that would mean a minimal 9 bit code size in the compressed stream. Code width is increased by 1 bit whenever the table runs out of room for the current code width. As both the compressor and decompressor are building the table in sync this can happen without any additional signalling. This will continue until some limit for code width is reached (typically 12 bits).

What happens when the maximum with is reached, and the table is full?: This can vary from implementation to implementation. Some implementations will simply stop generating new codes, and use the dictionary as-is. Others will reset the table and start over from the beginning. The reset could be triggered automatically, the same as expansion, or via a special code value.

Startup: One special case for compression, and decompression, is startup. As explained, new codes are always being generated by taking the string of the last code and appending the current symbol to it. Startup is a special case, as there is no last code yet, so the first symbol is output, without generating a new code. Normal evaluation begins with the second code read-in.

Special case: There is one other special case when decompressing LZW. Under certain circumstances it is possible for the compressor to emit a code, that the decompressor has not had a chance to create yet. This is a very specific case, and always takes a particular form, so is easily checked for and worked around in the decompression code. In this case it is always the string of the last code, with the first character of that last string appended to it, to generate the new unknown code. The value of the unknown code is always the next available code.

Endianess: One thing we haven’t discussed is the endianess of the data, since we are looking at codes that are more than 8 bits wide it is a relevant consideration. In my previous post, I naively assumed little-endian ordering, as we are looking at an Intel x86/DOS based game, and appear to have gotten lucky. The data could very well have been arranged in big-endian format if the PIC file had been developed for use on other platforms before MicroProse started developing DOS games. SO for the time being we will continue to assume little-endian ordering, unless something points us in another direction.

Writing the LZW decompressor

Time to get our hands dirty and write some code. Given that we have seen what look to be 9 bit codes, but we don’t know the extent of the bit widths, for this first pass we’re going to write a fixed code width decompressor to see how fare we go. Then if things start to get funky sometime shortly after the table becomes full, we can go back in and implement variable width codes.

That brings us to the concept of push or pull architecture for the decompressor. With pull style architecture, one byte will be returned each time it is called, this works great if you know the bounds of what you are decompressing. A push style will continue to output bytes of data until it runs out of compressed input. Not knowing the bounds of what we are dealing with yet, a push style makes more sense. This will also allow us to see if there is any data appended after the image data, which could easily be missed if we simply stopped at the edge of the image bounds. Also, as we don not have the other stages of the decoder written yet, we cannot easily stop at any bounds that might exist in the output data. So for those reasons we will implement a push style architecture for our decompressor.

I’m not going to post the full program here, but will try to highlight the most relevant parts. When this series concludes, I plan to post full-working code to be able to read, and possibly write .PIC files. With that, let’s get into it. First we’ll look at the two critical structures for maintaining the dictionary and decompression state.

// LZW dictionary entry
typedef struct {
    uint16_t prefix; // points to code that precedes this one in the chain
    uint8_t  symbol; // this is the last symbol in the chain, for this code/entry
} lzw_dictionary_t;

typedef struct {
    uint8_t     bit_count;   // count of bits currently loaded in bit_buffer
    uint32_t    bit_buffer;  // small working buffer for the input stream
    uint32_t    next_code;   // value of next available code

    lzw_dictionary_t dictionary[LZW_TABLE_SIZE]; // string dictionary
    uint8_t     symbol_stack[LZW_SYMBOL_LIMIT];  // used for constructing the stream of bytes to output during expansion
} lzw_state_t;

Some notes on the dictionary structure. I’ve chosen to go with a linked-list approach to reduce the overall memory footprint. This isn’t strictly necessary with today’s systems typically having multiple gigabytes of RAM. However back in the period where the PIC files we are looking at come from, memory was in the order of a few megabytes, so the code needed to be more conscious of it’s size. Had we implemented the dictionary as an array of strings, it would take up a substantial amount of memory (especially by 1989 standards) Generally speaking, the maximal length for a string will be 2^bit_width - 255. Thus, assuming a fixed 9 bit code length, that would mean the longest string is 257 bytes long. In this approach we would also need to store the length of the string, so another 2 bytes. Thus our dictionary would occupy over 64K of memory, at a minimum (259 bytes/entry * 256 entries, assuming we don’t include the first 256 terminating entries). This could greatly be reduced by using dynamic allocation for each string, at the cost of speed, and complexity of memory management. Now I’m sure there is some more practical limit to the length of a string, as there can only be one string that reaches maximal length. However, we must allow for it because in the case of compressing a large NULL file, it is entirely possible.

With the Linked-list approach, we have a one-way link to a parent, so we can only traverse the list in one direction. This is necessary as several descendant strings may refer to the same parent, so a 2-way link is not practical here. With the structure as shown above, each dictionary entry now consumes only 3 bytes (4 with typical padding for word alignment). That means our table is now only 1K in size for a table for the same 9 bit code length. (2K if we include the first 256 entries in the table as well) That’s much more manageable, in fact supporting up to 12 bits would only need 16K, less than ¼ of what we needed with the simple approach for just a 9 bit code! The downside to this approach is we now have to re-construct the string, by traversing the prefix links back to the terminating node. And as we can only traverse in a back-front fashion, we will need to use a LIFO_{Last-In-First-Out} structure, in order to re-order the output bytes into the correct order.

Given the structures above, we can now implement the encoder as follows, for a fixed 9 bit code length. For the purposes of determining extents, I have left the condition of when the dictionary is full to halt decompression. This will likely get fleshed out in different ways based on what we see when we try to decompress a file. For simplicity we will use file based I/O.

// adds a new entry into the dictionary, by appending 'symbol' to 'code'
void lzw_add_code(uint16_t code, uint8_t symbol, lzw_state_t *ctx) {
    ctx->dictionary[ctx->next_code].prefix = code;
    ctx->dictionary[ctx->next_code].symbol = symbol;
    ctx->next_code++;
}

// LZW state machine init
void lzw_init(lzw_state_t *ctx) {

    // initiailze the chain code table
    for(unsigned int i=0; i < (1 << SYMBOL_WIDTH); i++) {
        ctx->dictionary[i].prefix = LZW_EOC; // set all codes as end of chain initially
        ctx->dictionary[i].symbol = i & ((1 << SYMBOL_WIDTH) - 1); 
    }

    ctx->bit_buffer = 0L;
    ctx->bit_count = 0;
    ctx->next_code = (1 << SYMBOL_WIDTH);
}

// stores the sequence of symbols back to the root for the chain identified by 'code'
// the returned sequence is in LIFO (last in first out) order
// NULL will be returned on failure
uint8_t *lzw_expand_code(uint16_t code, int offset, lzw_state_t *ctx) {

    int code_count = 0; // used to prevent a runaway condition
    uint8_t *sp = ctx->symbol_stack + offset;
    
    // trace the code sequence back to the terminating root symbol
    while(LZW_EOC != code) {
        *sp++ = ctx->dictionary[code].symbol;
        code = ctx->dictionary[code].prefix;

        // make sure we haven't read too many codes, if so abort
        if(LZW_SYMBOL_LIMIT <= code_count++) {
            return NULL;
        }
    }
    sp--;
    return sp;
}

// reads codes from the input file returns LZW_EOF once
// the buffer is exhausted
uint16_t read_code(FILE *src, lzw_state_t *ctx) {
    
    // top up the working buffer with up to 32 bits of data
    // keep it full for as long as possible
    while((!feof(src)) && (24 >= ctx->bit_count)) {
        uint32_t new_bits = fgetc(src);
        if(!feof(src)) {
            ctx->bit_buffer |= (new_bits << ctx->bit_count);
            ctx->bit_count += 8;
        }
    }

    // not enough bits to output a code, must be at EOF
    if(LZW_CODE_WIDTH > ctx->bit_count) {
        return LZW_EOF; 
    }

    // extract the current code from the low 'bit_count' bits of the buffer
    uint16_t rval = ctx->bit_buffer & LZW_CODE_MASK;

    // shift in new bits to replace the code we read out
    ctx->bit_buffer >>= LZW_CODE_WIDTH;
    ctx->bit_count -= LZW_CODE_WIDTH;

    return rval;
}

// Main LZW decompression routine
int lzw_decompress(FILE *dst, FILE *src, lzw_state_t *ctx) {
    uint16_t new_code;
    uint16_t old_code = 0;
    uint8_t symbol = 0;
    uint8_t *sp;

    // startup condition, pass the first symbol through
    old_code = read_code(src, ctx);
    symbol = ctx->dictionary[old_code].symbol;
    fputc(symbol, dst);

    while (LZW_EOF != (new_code = read_code(src, ctx))) {

        // if the read code is greater than the next possible code, we are too far off
        if(new_code > ctx->next_code) {
            return LZW_UNEXPECTED_CODE;
        }

        // special case for when the compressor emits a code before
        // the decompressor has a chance to create it
        if(new_code == ctx->next_code) {
            // decodes with last code appended with latest symbol
            ctx->symbol_stack[0] = symbol;
            sp = lzw_expand_code(old_code, 1, ctx);
        } else {
            // normal chain decoding
            sp = lzw_expand_code(new_code, 0, ctx);
        }

        // make sure we didn't have a decompression failure
        if(NULL == sp) {
            return LZW_DECOMPRESSION_FAIL;
        }

        // output the 'string'
        // chain is in LIFO order so we need to output from back to front
        symbol = *sp; // preserve the last entry in the chain

        // decompression loop
        // loop while the chain pointer is >= the symbol stack buffer
        // 'expand' returns a pointer that is within 'symbol_stack' on success
        while(sp >= ctx->symbol_stack) {
            fputc(*sp--, dst);
        }

        // create a new code, if we can
        if(LZW_LIMIT > ctx->next_code) {
            lzw_add_code(old_code, symbol, ctx);
        } else {
            return LZW_STRING_TABLE_FULL;
        }
        old_code = new_code;
    }

    return LZW_NOERROR;
}

lzw_decompress() is our main entry point to initiate decompression. It expects file handles to the input and output streams, an a pointer to an initialized state context structure, after that it will continue to the end of the input stream, or error whichever comes first.

Now that we have some code, let’s see what happens when we pass a file into it. Will we see something that resembles what we expect to see for RLE data, and will those first few bytes match what we saw when we looked at the data with out hex viewer?

PIC LZW decompressor
Opening PIC Image: 'TITLE16.PIC'	File Size: 7529
Creating RLE Image: 'TITLE16.RLE'
Decompression Failed with error: LZW_UNEXPECTED_CODE
Debug Trace: new: 193 next: 192 pos: 172

Sadly not. The first thing I thought was that I made an error somewhere in my implementation, but no I don’t think so after reviewing it. After looking at the output closer, it appears that we are out of sync right from fairly early on. I am not seeing what I would expect to see from the data we can see when we look at the input with the hex viewer. Seems we may already be into the weeds of implementation variance. The fact that we are quickly erroring out with an unexpected code, leads me to believe that our startup condition may be wrong. Perhaps MicroProse didn’t start in the conventional way. Maybe they don’t pass the first byte through before starting normal parsing, this would result in one more code being generated, which could get us in sync. Now the question is did they start with assuming a 0x00 as the start, or did they feed the format_byte in? The easiest to test is with initializing with the zero, as all that requires is removing our 3 lines of code for the startup condition before the loop.

PIC LZW decompressor
Opening PIC Image: 'TITLE16.PIC'	File Size: 7529
Creating RLE Image: 'TITLE16.RLE'
Decompression Failed with error: LZW_STRING_TABLE_FULL
Debug Trace: new: 1a9 next: 200 pos: 295

Bingo! I think we have a winner here. After looking at the output data, it looks much more reasonable, right up to the end. So now the question is, does the dictionary freeze at this point, or does the code width expand? This is a little harder for me to look at now with my hex viewer, as the start of the expansion is not necessarily on a byte boundary. The easiest test is to let the compression continue assuming a frozen dictionary.

PIC LZW decompressor
Opening PIC Image: 'TITLE16.PIC'	File Size: 7529
Creating RLE Image: 'TITLE16.RLE'
Decompression Complete Without Error

When I do that it does actually run to completion without error. Looking at the data it looks okay, but the proof will be when we try to decode the RLE data into an actual image. As that will make it easier from this point on to see if we have good data or not.

File: TITLE16.RLE  [12681 bytes]
Offset    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF  Decoded Text
0000000x: 0B 00 90 44 88 08 88 00 08 88 90 07 08 80 08 80  · · · D · · · · · · · · · · · ·
0000001x: 00 80 00 88 08 08 88 88 88 80 80 00 88 80 88 00  · · · · · · · · · · · · · · · ·
0000002x: 90 13 80 08 00 90 5A 08 80 08 00 90 0A DD 90 24  · · · · · · Z · · · · · · · · $
0000003x: 00 90 13 08 00 90 5B 08 00 88 00 90 0A 4D 44 90  · · · · · · [ · · · · · · M D ·

One thing I do notice when looking at the data is that the values are not constrained to 0-15. But otherwise looks pretty good. I’m wondering if for 16 colour graphics they are packing 2 pixels per byte? We really won’t know until we can render an image, so that probably makes this a good time to stop here, and pick up in the next post by writing the RLE decoder. Also as I’m not sure what the format_byte does at this point, so it was just passed forward transparently.

This Post is the third in a series of posts surrounding my reverse engineering efforts of the PIC file format that MicroProse used with their games. Specifically F-15 Strike Eagle II (Though I plan to trace the format through other titles to see if and how it changes). To read my other posts on this topic you can use this link to my archive page for the PIC File Format which will contain all my posts to date on the subject.

Ouch my eye!

Bigger is better?

First a little history…

How does LZW compression work?

Writing the LZW decompressor

Leave a comment Cancel reply

Bigger is better?

First a little history…

How does LZW compression work?

Writing the LZW decompressor

Share this:

Leave a comment Cancel reply