PIC: The Next Generation

Time to take on the last of the MicroProse PIC variants we have uncovered so far, PIC⁹³. This variant stands out from the others a bit as it is the first update to the format that we’ve seen in 3 years. Also to the best of our knowledge, the only released title ever to use it was Railroad Tycoon Deluxe (RRDX). RRDX is also one of the last titles to use the PIC format that we can see. Though even with that, the titles that did release after RRDX seem to have gone back to the PIC⁹⁰ variant. Another oddity is that this format seems, at first glance, closer in kin to the PIC⁸⁹ format than the PIC⁹⁰ format. PIC⁸⁹ being the format used with the games predecessor Railroad Tycoon. With that said, let’s dig into it and see if we can reveal its secrets.

First Looks at PIC⁹³

While we did look at PIC⁹³ a little when we were first looked at RRDX, it’s time to take a closer and more detailed look. Let’s start by looking at several of the PIC files to see if any common items stand out. First thing we see is the 4 byte signature, that we saw on our previous look. All the files have these bytes the same, so if they serve some other function, that is unclear at this stage. After the signature we see a pair of 16 bit values 640 and 400, these are clearly the image dimensions.

File: DIFFS0.PIC  [95516 bytes]
Offset    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF  Decoded Text
0000000x: 00 48 38 00 80 02 90 01 00 00 00 04 04 05 05 05  · H 8 · · · · · · · · · · · · ·
0000001x: 06 07 06 06 0A 0A 0A 0D 0D 0D 0F 0F 0F 04 06 09  · · · · · · · · · · · · · · · ·
0000002x: 00 02 01 0C 0A 09 04 02 02 0B 09 08 0C 08 07 0A  · · · · · · · · · · · · · · · ·
0000003x: 07 06 0F 09 09 0D 08 07 6A 5E 9D 52 00 FF F8 33  · · · · · · · · j ^ · R · · · 3

File: LABS.PIC  [6644 bytes]
Offset    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF  Decoded Text
0000000x: 00 48 38 00 80 02 90 01 00 00 0A 0F 0F 05 00 0A  · H 8 · · · · · · · · · · · · ·
0000001x: 00 00 0A 0A 0A 00 00 08 08 08 0A 05 00 0A 0A 0A  · · · · · · · · · · · · · · · ·
0000002x: 05 05 05 05 05 0F 05 0F 05 00 0C 0F 0F 05 05 0F  · · · · · · · · · · · · · · · ·
0000003x: 0E 0B 00 00 00 0F 0F 0F 48 09 55 55 00 FF F8 FC  · · · · · · · · H · U U · · · ·

File: LOCOS0.PIC  [8554 bytes]
Offset    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF  Decoded Text
0000000x: 00 48 38 00 80 02 90 01 00 00 0A 0F 0F 05 00 0A  · H 8 · · · · · · · · · · · · ·
0000001x: 00 00 0A 0A 0A 00 00 08 08 08 0A 05 00 0A 0A 0A  · · · · · · · · · · · · · · · ·
0000002x: 05 05 05 05 05 0F 05 0F 05 00 0B 0F 0F 05 05 0F  · · · · · · · · · · · · · · · ·
0000003x: 0E 0B 00 00 00 0F 0F 0F F6 05 55 55 00 FF F8 FC  · · · · · · · · · · U U · · · ·

File: WOODTRSL.PIC  [79808 bytes]
Offset    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF  Decoded Text
0000000x: 00 48 38 00 80 02 90 01 02 02 02 09 0D 0D 06 04  · H 8 · · · · · · · · · · · · ·
0000001x: 04 0A 0E 0F 07 07 07 0A 07 06 0A 0A 0A 09 0D 0F  · · · · · · · · · · · · · · · ·
0000002x: 09 0A 05 07 09 09 0A 07 04 07 03 03 06 07 05 0B  · · · · · · · · · · · · · · · ·
0000003x: 08 06 09 03 03 0F 0F 0F 2E 45 7D F7 FF FF F8 33  · · · · · · · · . E } · · · · 3

File: WRECK1.PIC  [68500 bytes]
Offset    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF  Decoded Text
0000000x: 00 48 38 00 80 02 90 01 00 00 00 02 02 02 05 05  · H 8 · · · · · · · · · · · · ·
0000001x: 05 07 07 07 0A 0A 0A 0D 0D 0D 0A 05 00 07 0C 0E  · · · · · · · · · · · · · · · ·
0000002x: 0A 0A 0A 05 07 05 01 04 01 0C 0A 09 0B 0A 07 06  · · · · · · · · · · · · · · · ·
0000003x: 08 09 0F 0C 06 0F 0F 0F 7E 55 55 57 00 FF F8 FC  · · · · · · · · ~ U U W · · · ·

00 48 38 00: File signature [00]"H8"[00]
80 02: Image width 0x0280 (640)
90 01: Image height 0x0190 (400)
     : Unknown nibble data

After the image dimensions the next 48 bytes curiously only have the lower nibble set to anything. These bytes are are too regular and repetitive to be compressed data. This is reminiscent of our dithering tables we found in both PIC⁸⁹ and PIC⁹⁰. But the number 48 is odd… in a few of the examples above, the values sometimes look to appear as a triplet, suggesting possibly RGB palette data. However the range is not what I would expect, palette data is normally 0-63 for VGA with its 18 bit DAC [RGB (6:6:6)], 0-255 for newer (at the time) SVGA with 24 bit DAC [RGB (8:8:8)] registers, here it would only be 0-15, meaning only a 12 bit DAC [RGB (4:4:4)]. Then I remembered in one of the README files that came with the game there was a little about the history and motivation behind RRDX, perhaps that can give us some direction.

History:
I thought some of you would like to know how we came to publish Railroad Tycoon Deluxe. Our Japanese development group released Railroad Tycoon in November of 1991. The game was a direct port to the NEC 9801 from the IBM. Our Japanese customers looked at the EGA graphics and said “Uggh!” or the Japanese equivalent. This motivated us to rework the graphics system for a 640×400 16 color graphics mode, (NEC Format.)
Jeff Billings/MicroProse

Well that is interesting, so RRDX is essentially a port of the Japanese PC-98 release. (so PC -> PC-98 -> PC) We do see in the header that the image is indeed 640×400, and if those 48 bytes are palette, that would follow the 16 colours mentioned. “NEC Format” hmmm…

The other display controller is set to slave mode and connected to 256 KB of planar video memory, allowing it to display 640 × 400 pixel graphics with 16 colors out of a palette of 4096. This video RAM is divided into pages (2 pages × 4 planes × 32 KB in 640 × 400 with 16 colors)
Wikipedia: PC-98

After doing a bit of searching, and we can see above, it appears that the PC-98 (NEC 9801) did only have a 12 bit DAC (max 4096 colours). That would suggest that those 48 bytes are indeed palette data for the 16 colours used in the image. Given that we appear to see NEC specific palette data in the image header, I suspect we may see more NEC specifics in this format, differing this variant even more from its ancestors. As it stands we have a PIC⁹³ header structure that looks something like this.

typedef struct {
    uint8_t r;
    uint8_t g;
    uint8_t b;
} pal_t;

#define PIC93SIG (0x00384800) // [00]"H8"[00] as UINT32 
typedef struct {
    uint32_t sig;    // file magic signature [00]"H8"[00]
    uint16_t width;  // image width
    uint16_t height; // image height
    pal_t pal[16];   // RGB(4:4:4) palette
    uint8_t data[];  // remainder of file
} pic93_t;

The PIC93 File Layout

With the 48 bytes resolved as palette data, what follows looks like it could be a 16 bit block length value, similar to what we see with PIC⁹⁰, just without any kind of block identifier before it, not counting the file header. Looking at the next block we don’t see any identifier, but again, those first 16 bits fall within the file, if added to the current position. Interesting that we end up at exactly EOF after a few blocks, this is looking promising. Also interesting is the 0x5555 that seems to appear after the length value for each block.

File: LABS.PIC  [6644 bytes]
Offset    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF  Decoded Text
0000000x: 00 48 38 00 80 02 90 01 00 00 0A 0F 0F 05 00 0A  · H 8 · · · · · · · · · · · · ·
0000001x: 00 00 0A 0A 0A 00 00 08 08 08 0A 05 00 0A 0A 0A  · · · · · · · · · · · · · · · ·
0000002x: 05 05 05 05 05 0F 05 0F 05 00 0C 0F 0F 05 05 0F  · · · · · · · · · · · · · · · ·
0000003x: 0E 0B 00 00 00 0F 0F 0F 48 09 55 55 00 FF F8 FC  · · · · · · · · H · U U · · · ·
    ⋮
0000098x: F0 00 60 07 55 55 FF FF F8 FC FF F8 FC FF F8 FC  · · ` · U U · · · · · · · · · ·
    ⋮
000010Ex: 00 F0 00 AC 60 07 55 55 FF FF F8 FC FF F8 FC FF  · · · · ` · U U · · · · · · · ·
    ⋮
0000184x: F8 C8 00 F0 00 AC AC 01 55 55 FF FF F8 FC FF F8  · · · · · · · · U U · · · · · ·
    ⋮
000019Fx: 00 F0 00 F0 [EOF]                                · · · · [EOF]


48 09: Block Length? 0x0948 (2376) : 0x003A + 0x0948 = 0x0982 (2434)
60 07: Block Length? 0x0760 (1888) : 0x0984 + 0x0760 = 0x10E4 (4324)
60 07: Block Length? 0x0760 (1888) : 0x10E6 + 0x0760 = 0x1846 (6214)
AC 01: Block Length? 0x01AC (428)  : 0x1848 + 0x01AC = 0x19F4 (6644)

Writing a quick program to chain through the file using the first 16 bits as a block length and the next 16 bits as a magic value, I tested to see if the pattern holds for other files.

Explorer for MicroProse PIC93 Files
Analyzing: 'LABS.PIC'	File Size: 6644 bytes
Image: 640x400
Pos: 56
Block 0 Length: 2376
magic: 5555
Pos: 2434
Block 1 Length: 1888
magic: 5555
Pos: 4324
Block 2 Length: 1888
magic: 5555
Pos: 6214
Block 3 Length: 428
magic: 5555
Pos: 6644 [EOF]

Explorer for MicroProse PIC93 Files
Analyzing: 'DIFFS0.PIC'	File Size: 95516 bytes
Image: 640x400
Pos: 56
Block 0 Length: 24170
magic: 529d
Pos: 24228
Block 1 Length: 25496
magic: fffd
Pos: 49726
Block 2 Length: 21774
magic: fffd
Pos: 71502
Block 3 Length: 24012
magic: ff1d
Pos: 95516 [EOF]

Explorer for MicroProse PIC93 Files
Analyzing: 'WRECK1.PIC'	File Size: 68500 bytes
Image: 640x400
Pos: 56
Block 0	Length: 21886
magic: 5755
Pos: 21944
Block 1	Length: 20856
magic: 5755
Pos: 42802
Block 2	Length: 11042
magic: 5755
Pos: 53846
Block 3	Length: 14652
magic: ff55
Pos: 68500 [EOF]

Now that has to be more than just a coincidence. Though looks like the magic value we saw in the first file was just coincidence, as it does not carry forward to other files. So we have a block length, but not a signature or block type marker. Also given that the data changes immediately after the length from block to block and file to file, that is likely the start of the compressed data for that block. What we saw as a seemingly magic value was likely just the result of a common sequence of the input data for those blocks.

Now that we’ve identified that there are four separate blocks of data in the file. What is in those blocks? We’ve already seen a palette in the header, so none are likely to be that. We also know the graphics are 16 colour, so no need for a 256 colour palette. If we went by the PIC⁹⁰ format blocks as a guide, that leaves us with the two image packing formats… not likely that they would use both in a single file, it would be the same data after-all. Then there is the CGA and EGA dithering, uncompressed both those are smaller than any of the blocks we see here, so not likely. Not to mention these are 16 colour graphics, so at most we would need just the CGA table. That then brings us back to the “NEC Format” mentioned in the README earlier. From the Wikipedia page for the PC-98, we know the graphics memory arrangement is that of four separate image planes, so it is likely that each of these blocks is one of the image planes.

At this point, I’m not seeing any more commonality in the blocks to suggest more uncompressed data fields. That leaves us with a block structure that looks like the following. While I’ve called it a block here, given what we now believe, ‘plane’ is probably a better name for it.

typedef struct {
    uint16_t len;      // length of compressed data for this block
    uint8_t lz_data[]; // compressed data stream
} pic93_block_t;

The PIC⁹³ Compression Scheme

Having resolved the file format down to the block level, what’s left looks like the compressed data itself. We still don’t see our typical 0B at the start to indicate the LZW maximum bit-width. Perhaps MicroProse did away with that, since the only value we ever saw in the wild was 11. For now I’m going to proceed with the assumption that what follows the block length is compressed image plane data.

So using the parsing program I wrote, I had it extract each block/plane into its own file, and add the 0xF5 format identifier to make it look like a unpacked PIC⁸⁸ file. Packing makes no sense here, as we are already split into planes. If the compression scheme is what we have seen before with the other PIC variants, we should be able to render these planes to a file. The resultant image will be garbage because everything was separated into planes, but it will validate the decompression. If we successfully decompress, then I will have to write some custom code to be able to handle the planar nature of what we have here. Given the 640×400 image size, and each plane holds one bit of each pixel, each file should decompress to be 32,000 pixels. I’ll fake this to our decoder program by specifying a 80×400 image. (though any combination resulting in 32,000 will work)

PIC88 to PPM image converter
Resolution: 80 x 400
Image Buffer: 32000
Opening PIC Image: 'LABS-0.PLN'
File Size: 2377
Creating PPM Image: 'LABS-0.PPM'
Decoding ended with error: PIC_DECOMPRESSION_FAIL
Saving PPM Image

PIC88 to PPM image converter
Resolution: 80 x 400
Image Buffer: 32000
Opening PIC Image: 'LABS-1.PLN'
File Size: 1889
Creating PPM Image: 'LABS-1.PPM'
Decoding ended with error: PIC_DECOMPRESSION_FAIL
Saving PPM Image

PIC88 to PPM image converter
Resolution: 80 x 400
Image Buffer: 32000
Opening PIC Image: 'LABS-2.PLN'
File Size: 1889
Creating PPM Image: 'LABS-2.PPM'
Decoding ended with error: PIC_DECOMPRESSION_FAIL
Saving PPM Image

PIC88 to PPM image converter
Resolution: 80 x 400
Image Buffer: 32000
Opening PIC Image: 'LABS-3.PLN'
File Size: 429
Creating PPM Image: 'LABS-3.PPM'
Decoding ended with error: PIC_DECOMPRESSION_FAIL

Well that’s unfortunate. I’m not even going to bother posting the output here as even if it worked, the pixel data would seem like random noise. Either this isn’t the start of the compressed stream, or a totally different compression scheme, or bit width, is being used here. Let’s view it as a bit stream and break it into different code widths to confirm or rule out the bit-width.

55       55       00       FF       F8       FC
01010101 01010101 00000000 11111111 11111000 11111100   BE order
10101010 10101010 00000000 11111111 00011111 00111111   LE order

Little Endian (LE) Shift Order
10101 01010 10101 00000 00001 11111 11000 11111 00111    5 bits
   15                                                    symbol

101010 101010 101000 000000 111111 110001 111100 111111  6 bits
    15                                                   symbol

1010101 0101010 1000000 0001111 1111000 1111100          7 bits
     55                                                  symbol

101010101 010101000 000000111 111110001 111100111        9 bits
      155                                                symbol

1010101010 1010100000 0000111111 1100011111             10 bits
       155                                              symbol
       
10101010101 01010000000 00111111110 00111110011         11 bits
        555                                             symbol

Big Endian (BE) Shift Order
01010 10101 01010 10000 00001 11111 11111 11000 11111    5 bits
0A    15                                                 symbol

010101 010101 010100 000000 111111 111111 100011 111100  6 bits
15                                                       symbol

0101010 1010101 0100000 0001111 1111111 1100011          7 bits
2A                                                       symbol

010101010 101010100 000000111 111111111 100011111        9 bits
0AA       154                                            symbol

0101010101 0101010000 0000111111 1111111000             10 bits
155                                                     symbol

01010101010 10101000000 00111111111 11110001111         11 bits
555                                                     symbol

Taking the first 6 bytes and converting to a bit stream, everything from 5 to 11 bits fails immediately with the very first code being out of range, except for a couple that fail on the 2^nd when BE shift order is assumed. So I think we can rule out symbol width (4 or 8 bits) and LZW bit-width and order, as the culprit. That leaves us with compression algorithm, or that this isn’t actually the start of the compressed stream. Though I’m leaning more towards the former rather than the latter, given what we are seeing here. Taking a look at the just the compressed stream again, one thing I notice is a high occurrence of ‘F’ in the data.

File: LABS-0.RAW  [2376 bytes]
Offset    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF  Decoded Text
0000000x: 55 55 00 FF F8 FC FF F8 FC FF F8 FC FF F8 FC FF  U U · · · · · · · · · · · · · ·
0000001x: F8 FC FF F8 FC FF F8 FC DB 56 FF FE FF FF F8 27  · · · · · · · · · V · · · · · '
0000002x: FC B0 F8 75 C0 37 F8 27 0C B0 F8 FC B0 F8 FC 55  · · · u · 7 · ' · · · · · · · U
0000003x: B5 B0 F8 FC B0 F8 FC B0 F8 FC B0 F8 FC B0 F8 FC  · · · · · · · · · · · · · · · ·

This is reminiscent me of the pattern we see in the MicroProse F15-SE2 (Desert Storm) compressed binaries, though more pronounced here. (first 32 bytes are uncompressed header, so ignore).

File: EGAME.EXE  [56842 bytes]
Offset    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF  Decoded Text
0000000x: 4D 5A 0A 00 70 00 00 00 02 00 A7 1F FF FF 38 29  M Z · · p · · · · · · · · · 8 )
0000001x: 80 00 00 00 0E 00 B0 0D 1C 00 00 00 4C 5A 39 31  · · · · · · · · · · · · L Z 9 1
0000002x: FD 0F 00 FF F8 0E 55 8B EC 83 EC 06 C7 46 FE F0  · · · · · · U · · · · · · F · ·
0000003x: FF F5 FB FC F0 04 C4 5E FC 26 8B 07 A3 F8 0F 34  · · · · · · · ^ · & · · · · · 4
0000004x: 9D C7 06 F6 9D EC F4 42 67 FC F0 F4 40 67 0E 12  · · · · · · · B g · · · @ g · ·
0000005x: C4 1E EE 26 FF 77 BF F7 1A E8 42 06 83 C4 02 F2  · · · & · w · · · · B · · · · ·
0000006x: FD 1E E8 34 F2 F8 0A 1C E8 FE FF 26 F2 FF 8A 47  · · · 4 · · · · · · · & · · · G
0000007x: 24 A2 32 8C 26 83 7F 78 01 1B C0 7F 86 F7 D8 A3  $ · 2 · & · · x · · · · · · · ·
0000008x: 86 00 E8 6D 3B E6 ED FF FF 72 01 75 13 8B C3 8C  · · · m ; · · · · r · u · · · ·
0000009x: C2 05 48 00 52 50 9A BE 0C E1 FF A7 11 CA 04 EB  · · H · R P · · · · · · · · · ·
000000Ax: 08 B0 80 A2 E5 56 A2 E4 FF 3E 56 E8 F2 01 9A FA  · · · · · V · · · > V · · · · ·
000000Bx: 0E 8B 22 AA FD 24 9A 58 10 30 E0 F3 A8 78 C2 38  · · " · · $ · X · 0 · · · x · 8
000000Cx: 1F E6 02 73 0E B8 0C 00 C7 EF 0F E9 97 FF EB 0C  · · · s · · · · · · · · · · · ·
000000Dx: B8 10 F2 F8 09 CB 8B 47 20 A3 04 A0 C3 B0 9A 0E  · · · · · · · G   · · · · · · ·
000000Ex: 00 A6 E8 52 87 19 01 ED 0F 87 F8 09 11 87 F8 09  · · · R · · · · · · · · · · · ·
000000Fx: AA 87 FC E8 F8 3A 80 3E 84 7F 78 13 75 19 C6 06  · · · · · : · > · · x · u · · ·

MicroProse used a tool called LZEXE, which is based on the LZSS (Lempel–Ziv–Storer–Szymanski) algorithm, so were definitely aware of the algorithm. This algorithm tends to show a lot of ‘F’s in the compressed stream due to negative indices it uses for the look ups. The algorithm was very popular in the early 1990’s after Haruhiko Okumura published an implementation of the algorithm to the public domain in 1989. Thus a very good candidate for being what was used here. Now I’m not sure that either the LZEXE or the MicroProse implementation match the published reference, but I suspect it is something similar. I think we’ll put a pin in LZSS for another post. It’s a topic I was planning to cover down the road anyway. Even if it ends up not being LZSS, I’m fairly certain it’s not LZW either.

Given that the compression scheme is likely not LZW, along with all the other structural differences as well as being linked to only one MicroProse title, this version of the PIC file appears to be more of a cousin than a descendant of the PIC⁸⁸ format we started with. Thus I think I’m going to remove it from the PIC lineage as being it’s own separate format, as it doesn’t share any of the base compression code, or structure, with the other variants. Don’t fear though, we will still cover this in another post, it’s just not going to be a part of what we are assembling right now in the form of tools to read and write the MicroProse PIC format for many of their PC based games from the early 1990’s.

This post is part of a series of posts surrounding my reverse engineering efforts of the PIC file format that MicroProse used with their games. Specifically F-15 Strike Eagle II (Though I plan to trace the format through other titles to see if and how it changes). To read my other posts on this topic you can use this link to my archive page for the PIC File Format which will contain all my posts to date on the subject.

Ouch my eye!