Sometimes you need a hammer

In my last post, we wrapped up writing our Bellard-LZSS compressor, to facilitate eventually writing a PIC⁹³ encoder for Railroad Tycoon Deluxe (RRDX) from MicroProse. In this post we will take the next steps, debugging and validating the code which we have written. I have no doubt it will take some tweaking to get the compressor to generate the original compressed stream that we have as a reference, I just hope we are not too far off right out of the gate. So with that said, lets get into it.

First pass

It would be fantastic if things matched on the first try, but this is highly unlikely given the complexity of the compressor, and the number of variables that are at play here. Time to get our hands dirty and dig into it and see what we have.

Bellard-LZSS Compressor
Opening: 'val/LABS-0.PLN'	File Size: 32000 bytes
Creating: 'val/LABS-0.RLZ'
compressed: 32000 / 32000
wrote 2414 bytes

Well that’s a good sign, but the proof is in the output. First let’s try decompressing what we just compressed to make sure we have a valid stream.

Bellard-LZSS Decompressor
Decompressing: 'val/LABS-0.RLZ'	File Size: 2414 bytes
Creating: 'val/LABS-0.OUT'
consumed: 2414 / 2414
wrote 31076 bytes

Well that’s good news and bad news. The good news is that we appear to be generating a valid compressed data stream. The bad news is when decompressing that stream we are not getting back to the original file, as we can see from the output length here, and the input length above. Examining the sizes of the source LZSS file that was originally used to generate our input file, and the size of our compressed file we see differences there too (2376 / 2414). Curiously our compressed output is bigger, yet it is producing a smaller uncompressed file, so we must be making some different decisions too. So lets look at the compressed files to see where the differences are, to try and track down our errors.

Reference LZSS File:
File: val/LABS-0.LZSS  [2376 bytes]
Offset    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF  Decoded Text
0000000x: 55 55 00 FF F8 FC FF F8 FC FF F8 FC FF F8 FC FF  U U · · · · · · · · · · · · · ·
0000001x: F8 FC FF F8 FC FF F8 FC DB 56 FF FE FF FF F8 27  · · · · · · · · · V · · · · · '
0000002x: FC B0 F8 75 C0 37 F8 27 0C B0 F8 FC B0 F8 FC 55  · · · u · 7 · ' · · · · · · · U
0000003x: B5 B0 F8 FC B0 F8 FC B0 F8 FC B0 F8 FC B0 F8 FC  · · · · · · · · · · · · · · · ·

Our LZSS Output:
File: val/LABS-0.RLZ  [2414 bytes]
Offset    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF  Decoded Text
0000000x: 55 D5 00 FF F8 FD FF F8 FD FF F8 FD FF F8 FD FF  U · · · · · · · · · · · · · · ·
0000001x: F8 FD FF F8 FD F3 F8 F0 B6 55 FF FF F8 25 FC B0  · · · · · · · · · U · · · % · ·
0000002x: F8 73 C0 5F F0 25 0C B0 F8 FD B0 F8 FD B0 F8 FD  · s · _ · % · · · · · · · · · ·
0000003x: 55 6D B0 F8 FD B0 F8 FD B0 F8 FD B0 F8 FD B0 F8  U m · · · · · · · · · · · · · ·

Looks like we are failing already in our first compression unit. But we are close in size for both the compressed file and the re-decompressed file, so I think we’re actually pretty close. Clearly we have an error somewhere as we are not generating the correct decompressed output, so likely dropping some bytes somewhere. And we are likely making some different choices on how to encode certain things at certain points.

Debugging the LZSS compressor

I think the best way to tackle this problem is to instrument both our compressor and decompressor to give us some information about the decisions being made about the stream, and the associated inputs diving those decisions. Then we can try to look to see why we are making a different choice. So here we will run the decompressor against the reference LZSS stream. and our compressor on the output from that. But first let’s validate our decompressor to make sure it produces the same output as we got before, just to ensure we don’t have any bad data like before. (the decompressor was extracted from our PIC⁹³ code in my previous post)

MD5 (val/LABS-0.OUT) = 58dca3aa041e78f1f3ee62f6282595a0
MD5 (val/LABS-0.PLN) = 58dca3aa041e78f1f3ee62f6282595a0

I didn’t expect a failure here, but hey, can never be too careful. Okay now that we know the decompression program is good, we can move on to debugging the compression program. I’ve added prints to both programs to output length and offsets at each of the encoded types. To make things a little easier I limited it to the first 2 compression units only, for starters. As we know we are showing a difference already in the first compression unit, this should be enough to start with.

Decompressing:
Decompressing: 'val/LABS-0.LZSS'	File Size: 2376 bytes
Creating: 'val/LABS-0.OUT'
Literal: 00 (SP)
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:8 (SD)
Literal: ff (SP)
Long: 0001:2 (LD) [40]
Literal: fc (SP)
Long: 0050:2 (LD) [118]
Literal: c0 (SP)
Long: 00c9:2 (LD) [40]
Literal: 0c (SP)
Long: 0050:2 (LD) [253]
Long: 0050:2 (LD) [253]
consumed: 2376 / 2376
wrote 32000 bytes

Compressing:
Bellard-LZSS Compressor
Opening: 'val/LABS-0.OUT'	File Size: 32000 bytes
Creating: 'val/LABS-0.RLZ'
Literal: 00 (SP)
Long: 0001:256 (LD)
Long: 0001:256 (LD)
Long: 0001:256 (LD)
Long: 0001:256 (LD)
Long: 0001:256 (LD)
Long: 0001:256 (LD)
Long: 000d:243 (LD)
Literal: ff (SP)
Long: 0001:40 (LD)
Literal: fc (SP)
Long: 0050:118 (LD)
Literal: c0 (LP)
Long: 01a1:40 (LD)
Literal: 0c (LP)
Long: 0050:256 (LD)
Long: 0050:256 (LD)
Long: 0050:256 (LD)
Long: 0050:256 compressed: 32000 / 32000
wrote 2414 bytes

Well okay, that’s a bit of a surprise. Looks like for whatever reason the MicroProse compressor limits itself to runs of 253 bytes, and not 256 as we had accounted for. I’m thinking this is an implementation error on the part of MicroProse, but nonetheless we need to replicate it (though could very well exist in the Bellard implementation as well). Oddly, if anything, this should make the reference file a bit bigger than ours, and not the other way around that we see, so must be more significant differences later. But for now, let’s make this change, and re-run it to see what we get.

Decompressing:
Decompressing: 'val/LABS-0.LZSS'	File Size: 2376 bytes
Creating: 'val/LABS-0.OUT'
Literal: 00 (SP)
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:8 (SD)
Literal: ff (SP)
Long: 0001:2 (LD) [40]
Literal: fc (SP)
Long: 0050:2 (LD) [118]
Literal: c0 (SP)
Long: 00c9:2 (LD) [40]
Literal: 0c (SP)
Long: 0050:2 (LD) [253]
Long: 0050:2 (LD) [253]
consumed: 2376 / 2376
wrote 32000 bytes

Compressing:
Bellard-LZSS Compressor
Opening: 'val/LABS-0.OUT'	File Size: 32000 bytes
Creating: 'val/LABS-0.RLZ'
Literal: 00 (SP)
Long: 0001:253 (LD)
Long: 0001:253 (LD)
Long: 0001:253 (LD)
Long: 0001:253 (LD)
Long: 0001:253 (LD)
Long: 0001:253 (LD)
Long: 0001:253 (LD)
Long: 00f8:8 (SD)
Literal: ff (SP)
Long: 0001:40 (LD)
Literal: fc (SP)
Long: 0050:118 (LD)
Literal: c0 (LP)
Long: 01a1:40 (LD)
Literal: 0c (LP)
Long: 0050:253 (LD)
Long: 0050:253 (LD)
Long: 0050:253 compressed: 32000 / 32000
wrote 2417 bytes

Now we’re getting the correct match lengths, but the match positions are different. This may, or may-not be an actual problem. Though does complicate validation as we cannot use md5 on the compressed outputs to compare. Let’s decompress that last compressed version, and compare that output to the reference output. The highlighted line represents data 1772 bytes (06ec) into the decompressed data. Taking a look at that position should tell us if we have a corruption problem here or not.

Decompressing: 'val/LABS-0.RLZ'	File Size: 2417 bytes
Creating: 'val/LABS-0.OUT'
Literal: 00 (SP)
Long: 0001:2 (LD) [251]
Long: 0001:2 (LD) [251]
Long: 0001:2 (LD) [251]
Long: 0001:2 (LD) [251]
Long: 0001:2 (LD) [251]
Long: 0001:2 (LD) [251]
Long: 0001:2 (LD) [251]
Long: 00f8:8 (SD)
Literal: ff (SP)
Long: 0001:2 (LD) [38]
Literal: fc (SP)
Long: 0050:2 (LD) [116]
Literal: c0 (SP)
Long: 01a1:2 (LD) [38]
Literal: 0c (SP)
Long: 0050:2 (LD) [251]
Long: 0050:2 (LD) [251]
consumed: 2417 / 2417
wrote 31074 bytes

Reference PLN File:
File: val/LABS-0.PLN  [32000 bytes]
Offset    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF  Decoded Text
0000000x: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
0000001x: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
0000002x: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
0000003x: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
        ⋮                                                 ⋮
000006Dx: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
000006Ex: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
000006Fx: 00 00 00 00 FF FF FF FF FF FF FF FF FF FF FF FF  · · · · · · · · · · · · · · · ·

Our decompressed LZSS Output:
File: val/LABS-0.OUT  [32000 bytes]
Offset    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF  Decoded Text
0000000x: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
0000001x: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
0000002x: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
0000003x: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
        ⋮                                                 ⋮
000006Dx: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  · · · · · · · · · · · · · · · ·
000006Ex: 00 00 00 00 00 00 FF FF FF FF FF FF FF FF FF FF  · · · · · · · · · · · · · · · ·
000006Fx: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF  · · · · · · · · · · · · · · · ·

Well it appears we found our first bug. We are not encoding the lengths we think we are, with long pointer and long lengths. Seems that we are off by two. Looking at the code below we can spot the problem at the highlighted line.

            } else { // encode as long pointer
                if(debug) {
                    printf("Long: %04x:%d ", distance, match_len);
                    fflush(stdout);
                }
                // send flags 0,1 - long pointer
                if(0 != (result = write_bit(0, ctx, dst))) goto finish;
                if(0 != (result = write_bit(1, ctx, dst))) goto finish;
                if(match_len > 253) match_len = 253; // short pointer can't encode more than 5 chars
                uint16_t encode_len = match_len - THRESHOLD;
                distance = (-((int16_t)distance) & 0x1fff);
                // send low byte of distance
                ctx->unit.data[ctx->byte_count++] = (distance & 0x00ff);
                distance >>= 5;             // shift in the upper 5 bits
                distance &= 0x00f8;         // mask off the low 3 bits
                if(encode_len <= 7) {       // long pointer can only have a length code up to 7
                    if(debug) {
                        printf("(SD)\n");
                        fflush(stdout);
                    }
                    distance |= encode_len; // encode the length to the distance
                    // send high byte of distance + length
                    ctx->unit.data[ctx->byte_count++] = (distance & 0x00ff); 
                } else {
                    if(debug) {
                        printf("(LD)\n");
                        fflush(stdout);
                    }
                    // send high byte of distance + '000'
                    ctx->unit.data[ctx->byte_count++] = (distance & 0x00ff);
                    encode_len--;
                    // send length-1
                    ctx->unit.data[ctx->byte_count++] = (encode_len & 0x00ff); 
                }
            }

At the highlighted line we are adjusting the encoded length by two, this is necessary for encoding short lengths, but not long lengths, and I forgot to adjust it back. (or restructure the code so that the adjustment only happens when outputting a short length code). This but could very well account for the mismatch of the decompressed sizes. Lets fix that quick by adding two back to the value (for now, will restructure later) and see what that does for us.

Compressing:
Bellard-LZSS Compressor
Opening: 'val/LABS-0.OUT'	File Size: 32000 bytes
Creating: 'val/LABS-0.RLZ'
Literal: 00 (SP)
Long: 0001:253 (LD)
Long: 0001:253 (LD)
Long: 0001:253 (LD)
Long: 0001:253 (LD)
Long: 0001:253 (LD)
Long: 0001:253 (LD)
Long: 0001:253 (LD)
Long: 00f8:8 (SD)
Literal: ff (SP)
Long: 0001:40 (LD)
Literal: fc (SP)
Long: 0050:118 (LD)
Literal: c0 (LP)
Long: 01a1:40 (LD)
Literal: 0c (LP)
Long: 0050:253 (LD)
Long: 0050:253 (LD)
Long: 0050:253 compressed: 32000 / 32000
wrote 2417 bytes
         
Decompressing:
Bellard-LZSS Decompressor
Decompressing: 'val/LABS-0.RLZ'	File Size: 2417 bytes
Creating: 'val/LABS-0.OUT'
Literal: 00 (SP)
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 00f8:8 (SD)
Literal: ff (SP)
Long: 0001:2 (LD) [40]
Literal: fc (SP)
Long: 0050:2 (LD) [118]
Literal: c0 (SP)
Long: 01a1:2 (LD) [40]
Literal: 0c (SP)
Long: 0050:2 (LD) [253]
Long: 0050:2 (LD) [253]
consumed: 2417 / 2417
wrote 32000 bytes

Well that does indeed result in fixing our decompressed result in terms of length. But how does it compare to the original?

MD5 (val/LABS-0.PLN) = 58dca3aa041e78f1f3ee62f6282595a0
MD5 (val/LABS-0.OUT) = 58dca3aa041e78f1f3ee62f6282595a0

Fantastic! So now we are not just producing a valid compressed stream, but one that properly reflects the data it represents. Now why are we encoding different offsets? The first thing that comes to mind is that we are artificially limiting the match length to 253 bytes in the pointer cases, in reality this should be our STRINGSZ constant. So let’s make that change and see what happens, and set STRINGSZ to 253.

Compressing:
Opening: 'val/LABS-0.OUT'	File Size: 32000 bytes
Creating: 'val/LABS-0.RLZ'
Literal: 00 (SP)
Long: 0001:253 (LD)
Long: 0001:253 (LD)
Long: 0001:253 (LD)
Long: 0001:253 (LD)
Long: 0001:253 (LD)
Long: 0001:253 (LD)
Long: 0001:253 (LD)
Long: 00f5:8 (SD)
Literal: ff (SP)
Long: 0001:40 (LD)
Literal: fc (SP)
Long: 0050:118 (LD)
Literal: c0 (LP)
Long: 019e:40 (LD)
Literal: 0c (LP)
Long: 0050:253 (LD)
Long: 0050:253 (LD)
Long: 0050:253 compressed: 32000 / 32000
wrote 2417 bytes

Decompressing:
Decompressing: 'val/LABS-0.RLZ'	File Size: 2417 bytes
Creating: 'val/LABS-0.OUT'
Literal: 00 (SP)
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 0001:2 (LD) [253]
Long: 00f5:8 (SD)
Literal: ff (SP)
Long: 0001:2 (LD) [40]
Literal: fc (SP)
Long: 0050:2 (LD) [118]
Literal: c0 (SP)
Long: 019e:2 (LD) [40]
Literal: 0c (SP)
Long: 0050:2 (LD) [253]
Long: 0050:2 (LD) [253]
consumed: 2417 / 2417
wrote 32000 bytes

MD5 (val/LABS-0.PLN) = 58dca3aa041e78f1f3ee62f6282595a0
MD5 (val/LABS-0.OUT) = 58dca3aa041e78f1f3ee62f6282595a0

As expected we still get a matching output, but from the looks of it this did nothing to solve our offset problem. Looking into the code a little deeper, the insert_node() function which is responsible for finding our maximal length match, is always looking for a STRINGSZ sized match. It is also not necessarily finding matches in order of distance from the current position, therefore not always going to return the closest. Yet from the reference compressed stream, it appears that the closest possible match is used. Makes sense, you want the shortest possible pointers, so the smaller pointer types can be used when available for a given length.

Refactoring the string search

There’s probably a few different ways to accomplish this, but I’m electing to go for the brute force approach here, at least to test the theory. We’re going to scan back from the current position back to the extent of the sliding window looking for string matches, and their lengths. Tracking the maximum matching length we find, along with the earliest index where it happened. It will be comparatively slow, but it will work as a way to test my theory.

// finds the closest, but longest match for string pointed to be 'root'
// returns length and position of match in the context
// max   is the max length of string to compare - this shrinks during wind-down
// tabsz is the valid range of the sliding window, grows and shrinks at 
//       start and shutdown
static void find_string(int root, uint16_t max, uint16_t tabsz, lzss_ctx_t *ctx) {
    uint16_t pos = NIL;  // pos of longest match so far
    uint16_t len = 0;    // length of longest match so far
    uint8_t *ref = &ctx->window[root]; // our root string we are trying to match

    for(int i=1; i<=(tabsz - max) ; i++) {
        uint16_t p = (root - i) & TABLEMSK; // compare position
        uint8_t *str = &ctx->window[p];     // compare string
        uint16_t l; // compare length
        for(l=0; l<max; l++) {
            if(ref[l] != str[l]) break;
        }
        if(l > len) { // update our holding vars if we have a better match
            pos = p;
            if((len = l) == max) { // found max possible length, return
                ctx->match_len = l;
                ctx->match_pos = p;
                return; 
            }
        }
    }
    ctx->match_len = len;
    ctx->match_pos = pos;
}

The following changes were made to the main compression function. At the top basically all the tree initialization was removed, and replaced with a new var to track the valid window size. On first pass we don’t need to call find_string(). Instead the default match_length of zero will force a literal to be pushed out.

int lzss_compress(FILE *dst, FILE *src) {
    int result = 0;                    // return value at 'finish'
    lzss_ctx_t *ctx = NULL;            // the compression state machine context
    uint16_t cur = TABLESZ - STRINGSZ; // position for the current char/string
    uint16_t nxt = 0;                  // position where the next char will be inserted
    uint16_t len = 0;                  // length of bytes in the buffer
    int      byte;                     // temporary holder for when fgetc() 
                                       // is used to check for EOF
    uint16_t winsz = 0;                // valid range of sliding window

    // allocate our compression context, init all to 0
    if(NULL == (ctx = (lzss_ctx_t *)calloc(1, sizeof(lzss_ctx_t)))) {
        result = -2;
        goto finish;
    }
    init_state(ctx);

    // Read STRINGSZ bytes into the last STRINGSZ bytes of the buffer
    len = fread(&ctx->window[cur], 1, STRINGSZ, src);
    if(0 == len) {  // emptyfile, nothing to compress
        result = -3; // technically not an error
        goto finish;
    }
    winsz+=len;

Then at the bottom of the compression loop, I removed all of the tree related stuff again, and replaced it with management of the window size variable, and buffer length. Then just after the window update loops, I added a single call to find the next string.

        int i;
        for (i = 0; (i < match_len) && (EOF != (byte = getc(src))); i++) {
            ctx->window[nxt] = byte;  // read new bytes 
            bi++;
            // replicate bytes close to the end of the buffer, to make the compares easier
            if (nxt < STRINGSZ - 1) ctx->window[nxt + TABLESZ] = byte;
            
            // Since this is a ring buffer, increment the position modulo TABLESZ.
            nxt = (nxt + 1) & TABLEMSK;
            cur = (cur + 1) & TABLEMSK;
            winsz++;
            if(winsz > TABLESZ) winsz = TABLESZ;
        }
        while (i++ < match_len) {
            // After the end of text, no need to read
            // but buffer may not be empty.
            cur = (cur + 1) & TABLEMSK;
            if(len) len--;
            winsz--;
        }
        find_string(cur, len, winsz, ctx);
    } while(0 < len);

Nothing left to do but to give it a go and see how we did.

Bellard-LZSS Decompressor
Decompressing: 'val/LABS-0.LZSS'	File Size: 2376 bytes
Creating: 'val/LABS-0.OUT'
consumed: 2376 / 2376
wrote 32000 bytes
           
Bellard-LZSS Compressor
Opening: 'val/LABS-0.OUT'	File Size: 32000 bytes
Creating: 'val/LABS-0.RLZ'
compressed: 32000 / 32000
wrote 2376 bytes
 
Bellard-LZSS Decompressor
Decompressing: 'val/LABS-0.RLZ'	File Size: 2376 bytes
Creating: 'val/LABS-0.OUT'
consumed: 2376 / 2376
wrote 32000 bytes

MD5 (val/LABS-0.PLN)  = 58dca3aa041e78f1f3ee62f6282595a0
MD5 (val/LABS-0.OUT)  = 58dca3aa041e78f1f3ee62f6282595a0

MD5 (val/LABS-0.LZSS) = a2ec6535216d220678fc648b9447bcd4
MD5 (val/LABS-0.RLZ)  = a2ec6535216d220678fc648b9447bcd4

Well that was a total success! As we can see the generated LZSS compressed file has the same size, and the exact same signature, so it is identical to our reference. However after running it against some other files I’m still getting mismatches. So some work appears to still be needed. I will need to reconfirm as well that the source files don’t have any padding, as these were extracted from the PIC⁹³ files, which individually compresses each image plane.

I’m going to wrap it up here for this post, starting to get on the long side again. But we are getting super close to having this complete and validated. Just a little more work I think ans this one will be all squared away. So next time we will pick up with validating, and making any tweaks as necessary to hopefully get a pass, or at least reason out why we can’t have an exact match. I don’t think our compressed result is bad, only that our code makes some different choices than the reference at this point.

This post is part of a series of posts surrounding my reverse engineering efforts of the PIC file format that MicroProse used with their games. Specifically F-15 Strike Eagle II (Though I plan to trace the format through other titles to see if and how it changes). To read my other posts on this topic you can use this link to my archive page for the PIC File Format which will contain all my posts to date on the subject.

Ouch my eye!