Shoving the toothpaste back into the tube

In my last post on the PIC⁹³ version of the MicroProse PIC file format we left off having successfully decompressed the LZSS compressed image data within the file and reconstituting the image. That part was easy, as the code was pretty much already written for us in the form of the unlzexe github repository. It wasn’t easy getting to the point of identifying that Bellard-LZSS was actually the compression used. In the end we got there. In this post we’re going to try and reverse the process. This could prove to be far more challenging as we don’t have a full reference to look back onto.

Where to begin?

I could probably try to reach out to Fabrice Bellard to see if he would be willing to release his source after all these years, and I may do that still, but that would be the easy way out. Part of this is the fun of the challenge in figuring it out for ourselves. It is also possible that the MicroProse implementation isn’t exact to Bellard, but close enough that the Bellard implementation can decode it. One option might be to reverse engineer LZEXE itself by looking at the generated machine code, but I don’t want to go that route. I’m not trying to recreate LZEXE, I just want to recreate the compression algorithm itself. We have a working decompression module, so we should be able to use that to validate our compression module, much in the same way we did with LZW.

Since we don’t have a reference implementation for the Bellard-LZSS compression algorithm we’re going to use the Okumura-LZSS implementation as a guide, just as Fabrice Bellard did back in 1989. Most of the specifics should be fairly straight forward to implement, as we can see how the decoding of Bellard’s version differs from the Okumura version, but there are some specifics that will still need some figuring out.

Data/Pointer Type	Total Byte Usage	Flag Bits Used	Stream Bytes Used	Distance Bits	Distance Range	Length Bits	Length Range
Literal uncompressed byte	1.125	1	1	N/A	N/A	N/A	N/A
Short distance	1.5	4	1	8	-1 – -256	2	2–5
Long distance	2.25	2	2	13	-1 – -8,192	3	3–9
Long distance/long length	3.25	2	3	13	-1 – -8,192	8	3–256

TABLE 1 – Table pulled from cosmodoc – slightly modified to better suit our needs.

For reference here is the main compression loop for the Okumura-LZSS implementation.

void Encode(void)
{
	int  i, c, len, r, s, last_match_length, code_buf_ptr;
	unsigned char  code_buf[17], mask;
	
	InitTree();  /* initialize trees */
	code_buf[0] = 0;  /* code_buf[1..16] saves eight units of code, and
		code_buf[0] works as eight flags, "1" representing that the unit
		is an unencoded letter (1 byte), "0" a position-and-length pair
		(2 bytes).  Thus, eight units require at most 16 bytes of code. */
	code_buf_ptr = mask = 1;
	s = 0;  r = N - F;
	for (i = s; i < r; i++) text_buf[i] = ' ';  /* Clear the buffer with
		any character that will appear often. */
	for (len = 0; len < F && (c = getc(infile)) != EOF; len++)
		text_buf[r + len] = c;  /* Read F bytes into the last F bytes of
			the buffer */
	if ((textsize = len) == 0) return;  /* text of size zero */
	for (i = 1; i <= F; i++) InsertNode(r - i);  /* Insert the F strings,
		each of which begins with one or more 'space' characters.  Note
		the order in which these strings are inserted.  This way,
		degenerate trees will be less likely to occur. */
	InsertNode(r);  /* Finally, insert the whole string just read.  The
		global variables match_length and match_position are set. */
	do {
		if (match_length > len) match_length = len;  /* match_length
			may be spuriously long near the end of text. */
		if (match_length <= THRESHOLD) {
			match_length = 1;  /* Not long enough match.  Send one byte. */
			code_buf[0] |= mask;  /* 'send one byte' flag */
			code_buf[code_buf_ptr++] = text_buf[r];  /* Send uncoded. */
		} else {
			code_buf[code_buf_ptr++] = (unsigned char) match_position;
			code_buf[code_buf_ptr++] = (unsigned char)
				(((match_position >> 4) & 0xf0)
			  | (match_length - (THRESHOLD + 1)));  /* Send position and
					length pair. Note match_length > THRESHOLD. */
		}
		if ((mask <<= 1) == 0) {  /* Shift mask left one bit. */
			for (i = 0; i < code_buf_ptr; i++)  /* Send at most 8 units of */
				putc(code_buf[i], outfile);     /* code together */
			codesize += code_buf_ptr;
			code_buf[0] = 0;  code_buf_ptr = mask = 1;
		}
		last_match_length = match_length;
		for (i = 0; i < last_match_length &&
				(c = getc(infile)) != EOF; i++) {
			DeleteNode(s);		/* Delete old strings and */
			text_buf[s] = c;	/* read new bytes */
			if (s < F - 1) text_buf[s + N] = c;  /* If the position is
				near the end of buffer, extend the buffer to make
				string comparison easier. */
			s = (s + 1) & (N - 1);  r = (r + 1) & (N - 1);
				/* Since this is a ring buffer, increment the position
				   modulo N. */
			InsertNode(r);	/* Register the string in text_buf[r..r+F-1] */
		}
		if ((textsize += i) > printcount) {
			printf("%12ld\r", textsize);  printcount += 1024;
				/* Reports progress each time the textsize exceeds
				   multiples of 1024. */
		}
		while (i++ < last_match_length) {	/* After the end of text, */
			DeleteNode(s);					/* no need to read, but */
			s = (s + 1) & (N - 1);  r = (r + 1) & (N - 1);
			if (--len) InsertNode(r);		/* buffer may not be empty. */
		}
	} while (len > 0);	/* until length of string to be processed is zero */
	if (code_buf_ptr > 1) {		/* Send remaining code. */
		for (i = 0; i < code_buf_ptr; i++) putc(code_buf[i], outfile);
		codesize += code_buf_ptr;
	}
	printf("In : %ld bytes\n", textsize);	/* Encoding is done. */
	printf("Out: %ld bytes\n", codesize);
	printf("Out/In: %.3f\n", (double)codesize / textsize);
}

As you can see, that unlike the simple elegance of the decompressor, the compressor is far more complex. We will just have to take things one little piece at a time, and hopefully have a functioning compressor at the end of it all. The above code, and the table above it, along with the decompression code will serve as our guides in putting the compressor together. Also from the table pulled form cosmodoc, we can see we have 3 different pointer types, whereas the Okumura implementation has only one, so we have some work to do still.

Implementing the Bellard-LZSS compressor

To try to minimize major structural changes to the Okumura reference, we will use file streams for both input and output. Once we have a working algorithm, we can look at changing one, or both to memory based streams. For now we want to concentrate on making only the changes necessary to implement the Bellard variation, and not how we might want to optimize things.

Unlike decompression, where we don’t need to buffer any data, for compression we need to be able to prefix the 16 flag bits, before the compressed bytes they represent, let’s call this a compression unit. Once a compression unit is full (all the flag bits used), we write the whole thing out. Now we need to determine what the maximal length for a compression unit is. So let’s consider a few cases, from the decompression perspective. If flags was FFFF (all ones) then the data portion would be 16 bytes of uncompressed data, not likely but we are considering extremes here to find extents. Next would be the case of a 00 bit sequence, in this case two more bits are consumed plus one byte of contr0l data. As 4 bits are consumed in this scenario, the data portion would only be 4 more bytes. The final case to consider is a 01 bit sequence, in this case only 2 flag bits are consumed, and 2-3 bytes of additional data. So in this case the worst case scenario would be 24 bytes of data. So that means we need a buffer that can hold our 16 bits of flags plus up to 24 bytes of data, meaning 26 bytes total. For simplicity let’s round this up to 32, leaving us with the following 32 byte structure.

typedef struct {
    uint16_t flags;    // our 16 bit control flags
    uint8_t  data[30]; // our data for this compression unit
} lzss_unit_t;

Note there is noting in there for how many flag bits or bytes of additional data are actually valid, this needs to be managed externally, as it is not part of what is written out. Instead of actually tracking how many bits in the flags are used, we will just track the mask used to insert the bits, when it overflows to zero, we know that all 16 are used and we need to flush and start over. So to do that, we will wrap the above structure in another, to hold our working context. (we will add to this structure as we go)

typedef struct {
    uint16_t mask;    // bit mask for the current flag bit.
    int byte_count;   // tracks the number of data bytes in the compression unit
    lzss_unit_t unit; // the current 'working' lzss compression unit
} lzss_ctx_t;

Since LZSS is a sliding window type compression, the next thing we need is the window itself. We were able to avoid this with the decompressor, since we decompressed to memory, and used the memory buffer as our window. We could do this again with the compressor by having the entire uncompressed source in memory, but to simplify things we will manage it separately for now. The question now becomes, how large is the window? To answer that we need to consider what the maximal reference offset size is that we can produce. From the decompression code (or table earlier) we can determine that the maximal reference offset is 13 bits (2¹³=8192) meaning we need an 8K window. To facilitate quickly finding potential matching strings we will also need some form of tree structure, Okumura uses a binary tree, we will just reuse that same code here,as it does not affect any of the other changes we need for the Bellard implementation. So our updated context structure now looks like this.

#define MAXBITS   (13)
#define TABLESZ   (1 << MAXBITS)
#define TABLEMSK  (TABLESZ - 1) // mask for modulo arithmetic on the sliding window
#define STRINGSZ  (256)         // maximal matching string length
#define THRESHOLD (2)           // break point for encoding literals or pointers
#define NIL       (TABLESZ)     // marker of empty tree entry
#define MAXFLAG   (1UL << 16)   // used for checking when FLAGS are full

typedef struct {
    uint16_t mask;    // bit mask for the current flag bit.
    int byte_count;   // tracks the number of data bytes in the compression unit 
    lzss_unit_t unit; // the current 'working' lzss compression unit

    // set by insert_node()
    uint16_t match_pos; // position of longest match
    uint16_t match_len; // length of longest match

    // search tree
    uint16_t parent[TABLESZ + 1];       // parent node of search tree
    uint16_t rchild[TABLESZ + 1 + 256]; // right child of search tree + 256 root entries
    uint16_t lchild[TABLESZ + 1];       // left child of search tree

    // sliding window
    u_int8_t window[TABLESZ + STRINGSZ - 1];
} lzss_ctx_t;

Binary trees

Before we get into the main compression loop, we need a few support functions. These are largely rehashes of the Okumara versions, I’ve just tried to clean things up a bit by making it more readable by hopefully giving the variables more meaningful names. The biggest changes here are around the init, which takes care of most of the initialization of our context.

/*
 * initialize the LZSS state context
 *
 * lchild[] and rchild[] can be left uninitialized (up to TABLESZ)
 * parent[] These must be set to NIL to mark as empty
 * rchild[] TABLESZ + (0 to 256) are set to NIL - these are the 256 root entries
 */
static void init_state(lzss_ctx_t *ctx) {
    // zero out the entire structure, including the buffers
    bzero(ctx, sizeof(lzss_ctx_t)); // zero out the buffer

    // Clear the buffer with any character that will appear often.
    for (int i = 0; i < TABLESZ - STRINGSZ; i++)
        ctx->window[i] = ' ';

    // initialize the 256 root entries
    for (int i = TABLESZ + 1; i <= TABLESZ + 256; i++) {
        ctx->rchild[i] = NIL;
    }

    // mark all entries as empty
    for (int i = 0; i < TABLESZ; i++) {
        ctx->parent[i] = NIL;
    }

    ctx->mask = 0x0001; 
}

Next is our insert_node() this function is responsible for adding a new node, but also returning the position for the maximal length match. This function is the main workhorse for our compression as it is responsible for finding our matches to reference back to.

//  Inserts string of length STRINGSZ, and returns the longest-match position
//  and length via the context variables match_pos and match_len.
//  If match_len = STRINGSZ, then removes the old node in favour of the new
//  one, because the old one will be deleted sooner.
static void insert_node(int root, lzss_ctx_t *ctx) {
    int  len, pos, cmp;
    u_int8_t  *key;

    cmp = 1;
    key = &ctx->window[root];
    pos = TABLESZ + 1 + key[0];
    ctx->rchild[root] = ctx->lchild[root] = NIL;
    ctx->match_len = 0;
    for ( ; ; ) {
        if (cmp >= 0) {
            if (ctx->rchild[pos] != NIL)
                pos = ctx->rchild[pos];
            else {
                ctx->rchild[pos] = root; 
                ctx->parent[root] = pos;
                return;
            }
        } else {
            if (ctx->lchild[pos] != NIL)
                pos = ctx->lchild[pos];
            else {
                ctx->lchild[pos] = root;
                ctx->parent[root] = pos;
                return;
            }
        }
        for (len = 1; len < STRINGSZ; len++) {
            if ((cmp = key[len] - ctx->window[pos + len]) != 0)
                break;
        }
        if (len > ctx->match_len) {
            ctx->match_pos = pos;
            if ((ctx->match_len = len) >= STRINGSZ)
                break;
        }
    }
    ctx->parent[root] = ctx->parent[pos];
    ctx->lchild[root] = ctx->lchild[pos];
    ctx->rchild[root] = ctx->rchild[pos];
    ctx->parent[ctx->lchild[pos]] = root;
    ctx->parent[ctx->rchild[pos]] = root;
    if (ctx->rchild[ctx->parent[pos]] == pos)
        ctx->rchild[ctx->parent[pos]] = root;
    else
        ctx->lchild[ctx->parent[pos]] = root;
    ctx->parent[pos] = NIL;  // remove pos
}

Finally we also need a way to remove nodes from the trees as they age out of the sliding window. And that’s the last of the utility functions we need for managing our binary tree.

// deletes node 'node' from tree
static void delete_node(int node, lzss_ctx_t *ctx) {
    int  child;
    
    if (ctx->parent[node] == NIL)
        return;  // not in tree
    if (ctx->rchild[node] == NIL)
        child = ctx->lchild[node];
    else if (ctx->lchild[node] == NIL)
        child = ctx->rchild[node];
    else {
        child = ctx->lchild[node];
        if (ctx->rchild[child] != NIL) {
            do {
                child = ctx->rchild[child];
            } while (ctx->rchild[child] != NIL);
            ctx->rchild[ctx->parent[child]] = ctx->lchild[child];
            ctx->parent[ctx->lchild[child]] = ctx->parent[child];
            ctx->lchild[child] = ctx->lchild[node];
            ctx->parent[ctx->lchild[node]] = child;
        }
        ctx->rchild[child] = ctx->rchild[node];
        ctx->parent[ctx->rchild[node]] = child;
    }
    ctx->parent[child] = ctx->parent[node];
    if (ctx->rchild[ctx->parent[node]] == node)
        ctx->rchild[ctx->parent[node]] = child;
    else
        ctx->lchild[ctx->parent[node]] = child;
    ctx->parent[node] = NIL;
}

Flags and the compression unit

Now we are on to one of the first implementation differences between the Okumura and Bellard implementations. With Okumura there was only ever one flag bit per coded symbol, either a literal byte, or a coded pointer. With Bellard we have several types of symbols, and a variable number of flags. As such we will need to manage the flags (and compression unit) outside of the main loop, as they may need to be flushed and reset at anytime. So like with our decompression code, where we had a read_bit() function, we will create a write_bit() function here to manage the flag bits and compression unit.

// write a single bit to the flags word
// any non-zero value will be written as a 1 bit to flags
// if flags becomes full, the compression unit is dumped and reset.
int write_bit(uint8_t bit, lzss_ctx_t *ctx, FILE *dst) {
    if(0 != bit) {
        ctx->unit.flags |= ctx->mask;
    }
    ctx->mask <<= 1;

    // all bits used, write it out and reset the compression unit
    if(0 == ctx->mask) { // write the fields separately in case of padding
        if(1 != fwrite(&ctx->unit.flags, sizeof(uint16_t), 1, dst)) {
            return -1;
        }
        if(1 != fwrite(&ctx->unit.data, ctx->byte_count, 1, dst)) {
            return -1;
        }
        ctx->mask = 0x0001;  // start back at bit 0
        ctx->byte_count = 0; // reset the the count
        bzero(&ctx->unit, sizeof(lzss_unit_t)); // clear the compression unit
    }
    return 0;
}

The main compression loop

With that done, I think we are ready to begin with our main compression loop. First let’s set up the loop according to Okumura’s implementation, though I’m following the cleaner version that is part of Apple’s kernel sources. (this was the version I used as reference for the binary tree functions as well)

int lzss_compress(FILE *dst, FILE *src) {
    int        result = 0;                // return value at 'finish'
    lzss_ctx_t *ctx = NULL;               // the compression state machine context
    uint16_t    cur = TABLESZ - STRINGSZ; // position for the current char/string
    uint16_t    nxt = 0;                  // position the next byte will be inserted
    uint16_t    len = 0;                  // length of bytes in the buffer
    int         byte;                     // temporary holder for when fgetc() is
                                          // used so we can check for EOF

    // allocate our compression context, init all to 0
    if(NULL == (ctx = (lzss_ctx_t *)calloc(1, sizeof(lzss_ctx_t)))) {
        result = -2;
        goto finish;
    }
    init_state(ctx); // properly initialize the context

    // Read STRINGSZ bytes into the last STRINGSZ bytes of the buffer
    len = fread(&ctx->window[cur], 1, STRINGSZ, src);
    if(0 == len) {   // empty file, nothing to compress
        result = -3; // technically not an error
        goto finish;
    }

    // Insert the STRINGSZ strings, each of which begins with one or more
    // 'space' characters.  Note the order in which these strings are
    // inserted.  This way, degenerate trees will be less likely to occur.
    for(int i = 1; i < STRINGSZ; i++) {
        insert_node(cur - i, ctx);
    }

    // Finally, insert the whole string just read.
    // The global variables match_length and match_position are set.
    insert_node(cur, ctx);

    // the main compression loop
    do {
        int match_len = ctx->match_len;
        int match_pos = ctx->match_pos;
        if(match_len > len) { // can't match/remove more than we've read
            match_len = len;
        }


        // match cases go here


        // remove the bytes and update the tree
        int i;
        for (i = 0; (i < match_len) && (EOF != (byte = getc(src))); i++) {
            delete_node(nxt, ctx);    // Delete old strings and
            ctx->window[nxt] = byte;  // read new bytes 
            // replicate bytes close to the end of the buffer, 
            // to make the compares easier
            if (nxt < STRINGSZ - 1) ctx->window[nxt + TABLESZ] = byte;

            // Since this is a ring buffer, increment the position modulo TABLESZ.
            nxt = (nxt + 1) & TABLEMSK;
            cur = (cur + 1) & TABLEMSK;

            // Register the string in text_buf[r..r+STRINGSZ-1]
            insert_node(cur, ctx); 
        }
        while (i++ < match_len) {
            delete_node(nxt, ctx);
            // After the end of text, no need to read
            nxt = (nxt + 1) & TABLEMSK;
            cur = (cur + 1) & TABLEMSK;
            // but buffer may not be empty.
            if (--len) insert_node(cur, ctx);
        }
    } while(0 < len);

    // send any remaining compression unit data
    if(0 != ctx->byte_count) {
        if(1 != fwrite(&ctx->unit, sizeof(uint16_t) + ctx->byte_count, 1, dst)) {
            result = -4;
            goto finish;
        }
    }
    result = 0;

finish:
    if(NULL != ctx) free(ctx);

    return result;
}

The above is incomplete, but represents the core operations of the loop in terms of handling the sliding window, and updating the search trees. Now we will go through each of out match cases one by one to flesh them out, as this is where things get very different from Okumura.

Pointer sizes

The most basic case is when we have an uncompressed literal byte. This happens whenever the match length is less than the threshold length to prevent expansion, the threshold is basically the break-even point between encoding a literal, and encoding a pointer. That seems simple enough, however with the Bellard implementation the threshold moves around a little. Generally speaking the threshold is set to the size of the encoded match pointer. If the match is smaller than the size of the pointer, we just output the literal data, only when the match is longer than the size of the pointer, do we output the pointer. The challenge here is that with Bellard we actually have 3 pointer sizes to worry about. The pointers both vary in the distance they can travel back in the window, and how long of a length they can encode. Using Table 1 above, we can see that we have a short pointer, which can encode up to a length of 5, and travel at most 256 bytes, and a long pointer which can encode up to 9 bytes (or 256 for a long long), and can travel back the entire window size of 8192 bytes. The good news is I think we can treat a long-long the same as a long here, as it doesn’t come into play unless the match length is greater than 9 bytes, which is well beyond the threshold. So that leaves us with two different thresholds. For a short pointer the threshold is only 1, as at two bytes, the short pointer is already smaller. For a long pointer the threshold is 2, as the length needs to be at least 3 bytes before we see a reduction in size. We have an additional challenge that the ranges actually overlap in terms of length, so that cannot be the only deciding factor. The other determining factor for which pointer to use is how many bits we need to encode the distance part of the pointer (how far back in the window we go to find the bytes we need). So with that in mind I think we can use the following simple logic to select between the two pointer types.

        uint16_t distance = (cur - match_pos) & TABLEMSK;
        if((distance <= 256) && (match_len <= 5)) { // short pointer range
            // handle short pointer cases here
        } else { // long pointer range
            // handle long (and long long) cases here
        }

Encoding literals

With that in place, we can finally start to look at encoding some data. As mentioned earlier the simplest case is when we encode a literal byte of data. Not only is it the simplest case, it is also our most important one. Without being able to encode literals, we couldn’t build up a dictionary that the decompressor could use to reconstruct the file. As stated earlier, our threshold value wiggles around a bit between the pointer sizes, but the handling is exactly the same. THRESHOLD is defined has having a value of 2 and is used for both cases, but in the 2nd case it is a less than or equal relationship instead of just a less than. To encode a literal we write a 1 as the flag bit, and then a single byte of data to the output stream. That leaves us with the following for the short pointer case

            if(match_len < THRESHOLD) { // too short to encode, send as literal
                match_len = 1; // send 1 byte
                // send flag 1 - literal
                if(0 != (result = write_bit(1, ctx, dst))) goto finish;
                ctx->unit.data[ctx->byte_count++] = ctx->window[cur];
            }

For the long pointer case we have this. (identical except for the comparison in the if statement)

            if(match_len <= THRESHOLD) { // too short to encode, send as literal
                match_len = 1; // send 1 byte
                // send flag 1 - literal
                if(0 != (result = write_bit(1, ctx, dst))) goto finish;
                ctx->unit.data[ctx->byte_count++] = ctx->window[cur];
            }

Now is where things start to get interesting, so we will focus on one pointer type at a time.

The short pointer

Short pointers work by using a couple of extra bits to encode the length giving a range of 0-3, but since we cannot encode anything less than 2 we add 2 to this, leaving us with a range of 2-5. Before we can encode the length however, we need to encode the pointer type, this is done by sending 2 zero flag bits for a short pointer, that is then followed by the two bits for the length, meaning it takes 4 flag bits in total to encode a short pointer. We then encode 8 bits of distance into the output data stream. That leaves us with something that looks like this. (note that this is implemented as the else condition to the if case looking to encode a literal above.

            } else { // encode as short pointer
                // send flags 0,0 - short pointer
                if(0 != (result = write_bit(0, ctx, dst))) goto finish;
                if(0 != (result = write_bit(0, ctx, dst))) goto finish;
                if(match_len > 5) match_len = 5; // short pointer can't encode more than 5 chars
                uint8_t encode_len = match_len - THRESHOLD;
                // send flags x,x - length
                if(0 != (result = write_bit(encode_len & 0x02, ctx, dst))) goto finish;
                if(0 != (result = write_bit(encode_len & 0x01, ctx, dst))) goto finish;
                // send the distance pointer
                ctx->unit.data[ctx->byte_count++] = (-((int16_t)distance) & 0x00ff);
            }

The long pointer

Unlike the short pointer, the long pointer uses only 2 flag bits, to identify it, the rest is done with additional bytes in the data stream. A long pointer encodes 3 length bits, and 13 distance bits into a 2 byte word in the data stream. This results in a full reach for distance of 1-8192 bytes. 3 length bits gives us a length range of 0-7, again like the short pointer we add 2 to this as we cannot encode less than 2 bytes (3 in this case), giving us a range of 2-9 bytes. A 3 bit length code of zero here is reserved to indicate a long-long (or control) pointer, which leaves us with a range of 3-9 bytes for the length. the encoding used for length and distance is a little odd, with the first byte being the low 8 bits of the distance, the low 3 bits of the second byte are the length, and the upper 5 bits of the second byte are the remaining distance bits. The result is something like the following. (again this is implemented as an else case to the literal check)

            } else { // encode as long pointer
                // send flags 0,1 - long pointer
                if(0 != (result = write_bit(0, ctx, dst))) goto finish;
                if(0 != (result = write_bit(1, ctx, dst))) goto finish;
                if(match_len > 256) match_len = 256; // can't encode more than 256 chars
                uint16_t encode_len = match_len - THRESHOLD; 
                distance = (-((int16_t)distance) & 0x1fff);
                // send low byte of distance
                ctx->unit.data[ctx->byte_count++] = (distance & 0x00ff);
                distance >>= 5;             // shift in the upper 5 bits
                distance &= 0x00f8;         // mask off the low 3 bits
                if(encode_len <= 7) {       // long pointer can only have a length code up to 7
                    distance |= encode_len; // encode the length to the distance
                    // send high byte of distance + length
                    ctx->unit.data[ctx->byte_count++] = (distance & 0x00ff);
                }
            }

The long long and control pointer

The long long pointer, or long distance and long length, is also used for some in-band signalling using certain reserved values. A value of zero here is used to mark EOF. There is one other code here, a value of one which is used to mark a segment change. This 2^nd code is to ensure that the segment pointers don’t fall out of range relative to the offset, in the 16 bit DOS world. I think we can largely ignore this second code for now, as I don’t think it will come into play with any of the data we are compressing. Any other value is the length value – 1, leaving a range of 3-256 bytes. To encode this longer length, a value of zero is encoded into the distance/length byte of a long pointer, and then another length byte is pushed into the data stream for a total of 3 data bytes. The end result will look something like the following. (this is set up as an else case to the encoded length check above)

                } else {
                    // send high byte of distance + '000'
                    ctx->unit.data[ctx->byte_count++] = (distance & 0x00ff);
                    encode_len--;
                    // send length-1
                    ctx->unit.data[ctx->byte_count++] = (encode_len & 0x00ff); 
                }

The above code only encodes an actual length pointer, the special case control code for EOF is handled separately, when we exit the compression loop, just before we flush out the final compression unit.

Encoding End of File

To encode EOF we must send a long long style pointer, with a value of zero as the final length. Generally speaking the distance bits have no meaning, but in the files we have to work with, they always seem to end with the sequence 00 f0 00 so we will do the same in our code. The code to encode the EOF marker is placed just after we exit the compression loop, but before we flush out the last compression unit from the buffer.

    } while(0 < len);

    // Encode EOF
    if(0 != (result = write_bit(0, ctx, dst))) goto finish;
    if(0 != (result = write_bit(1, ctx, dst))) goto finish;
    ctx->unit.data[ctx->byte_count++] = 0x00; // offset (low) 00
    ctx->unit.data[ctx->byte_count++] = 0xF0; // offset (high) F0
    ctx->unit.data[ctx->byte_count++] = 0x00; // length 0

    // send remaining compression unit
    if(0 != ctx->byte_count) {

That is it for the compressor, for now. This post is getting on the long side again, so next time we will pick up with testing, debugging, and verification of the code we’ve written here.

This post is part of a series of posts surrounding my reverse engineering efforts of the PIC file format that MicroProse used with their games. Specifically F-15 Strike Eagle II (Though I plan to trace the format through other titles to see if and how it changes). To read my other posts on this topic you can use this link to my archive page for the PIC File Format which will contain all my posts to date on the subject.

Ouch my eye!