Welcome to the Matrix

As we left off in my last entry, we had determined that the format was not the Pictor PC Paint .PIC file format as I had hypothesized. One of the first things to do is to search the Internet to see what information is already known about the format. As I mentioned in my previous post, the .PIC format used with Darklands has apparently already been documented by someone, so that is likely a good starting place. The next thing to do is to look into the files we have with a hex viewer to see if there are any common elements in the files, or anything else, that stands out. This will also allow us to see if they are similar in anyway to what has already been documented with the Darklands format. Given that Darklands was released in 1992, a few years after F15-SE2 (1989), the formats could be entirely different despite sharing the same file extension. However, even if they are different, it’s reasonable to hypothesize that the files would share the same core genetics with the latter version being more evolved version of the former. That is unless MicroProse made some radical changes along the way. So let’s take a look.

The first thing I like to do when reverse engineering a file format is to take a look at it with a hex editor to see what I’m dealing with, even before really searching the web for any information. This will help me gauge if what I find online will be helpful of not. As we are dealing with images, I’m hoping there is some sort of header data within the file, that will show up as mostly common between several of the files. As the image data itself will be different from one file to the next, so not of much help. With that in mind I quickly scanned through all the .PIC files that come with F15-SE2, looking only at the first 64 bytes or so.

For reference here are the .PIC files we have with the game:

256LEFT.PIC  256RIGHT.PIC  COCKPIT.PIC  HISCORE.PIC  MEDAL.PIC  RIGHT.PIC     WALL.PIC
256PIT.PIC   ADV.PIC       DEATH.PIC    LABS.PIC     PROMO.PIC  TITLE16.PIC
256REAR.PIC  ARMPIECE.PIC  DESK.PIC     LEFT.PIC     REAR.PIC   TITLE640.PIC

First thing I noticed is that all the files seem to start with 0x0b except for the ones that have the prefix of ‘256’ or the one with the ‘640’ suffix in their name. Given that all these different files appear to be twins of other files that do have the 0x0b leading byte, and from their name, I think we can deduce that these files are alternate versions with different colour depths and/or resolutions. These alternate files all seem to start with 0xf5 as the first byte instead. No other bytes in the files seem to be consistent, which leads me to believe we are straight into the image data, or the rest of the header is compressed by some mechanism. I doubt that it would be encrypted, this was 1989 after all.

Here is a quick sampling of the first 64 bytes for four different files (two pairs of twins). Despite it looking like there is more commonality than the first byte here with some of them, when going through the other files those other bytes that appear similar drop away, leaving only the first byte as common. Looking at their values as well does not result in anything that looks like reasonable image header data.

Presumed 16 colour (320×200) file:

File: COCKPIT.PIC  [3039 bytes]
Offset    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF  Decoded Text
0000000x: 0B 55 20 FD 0B 38 50 20 C1 83 06 13 16 5C 88 90  · U   · · 8 P   · · · · · \ · ·
0000001x: A1 C2 86 10 1F 4A 74 48 31 62 C5 89 16 33 62 DC  · · · · · J t H 1 b · · · 3 b ·
0000002x: 78 B1 A3 46 87 DC EE 40 D2 F0 0F D2 9A 3F 22 5B  x · · F · · · @ · · · · · ? " [
0000003x: 94 C4 23 32 C4 BD 92 17 1E 18 80 00 00 12 15 40  · · # 2 · · · · · · · · · · · @

Presumed 256-colour (320×200) file:

File: 256PIT.PIC  [5760 bytes]
Offset    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF  Decoded Text
0000000x: F5 00 20 FD 0B 38 50 20 C1 83 06 13 16 5C 88 90  · ·   · · 8 P   · · · · · \ · ·
0000001x: A1 C2 86 10 1F 4A 74 48 31 62 C5 89 16 33 62 DC  · · · · · J t H 1 b · · · 3 b ·
0000002x: 78 B1 A3 46 8F 1C 3F 8A 0C 49 12 A4 C9 91 27 4B  x · · F · · ? · · I · · · · ' K
0000003x: A2 5C A9 32 61 2E 0D 90 12 6C 98 89 01 D2 82 0C  · \ · 2 a . · · · l · · · · · ·

Presumed 320×200 (16 colour) file:

File: TITLE16.PIC  [7529 bytes]
Offset    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF  Decoded Text
0000000x: 0B 00 20 11 41 84 00 11 80 82 90 0E 20 00 B4 10  · ·   · A · · · · · · ·   · · ·
0000001x: 00 20 00 04 0B 22 42 04 E8 21 45 83 90 26 30 0C  ·   · · · " B · · ! E · · & 0 ·
0000002x: A8 65 21 82 80 0A BA 41 22 11 70 C2 47 48 5B 3E  · e ! · · · · A " · p · G H [ >
0000003x: 62 54 D0 84 08 24 11 D4 4A 9E F4 72 12 01 34 97  b T · · · $ · · J · · r · · 4 ·

Presumed 640×480 (16 colour) file:

File: TITLE640.PIC  [25175 bytes]
Offset    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF  Decoded Text
0000000x: F5 00 20 ED 41 10 B0 01 41 48 11 0E 16 20 C8 10  · ·   · A · · · A H · · ·   · ·
0000001x: C0 C1 03 0D 21 41 74 08 A0 E2 C1 05 07 13 3C 8C  · · · · ! A t · · · · · · · < ·
0000002x: 38 D1 22 45 8F 1E 21 2D 0C D8 F1 23 C5 86 04 3D  8 · " E · · ! - · · · # · · · =
0000003x: 36 54 59 F1 A4 49 94 2E 61 CA 8C F9 B2 E5 CC 99  6 T Y · · I · . a · · · · · · ·

Beyond the first byte always being either 0x0b or 0xf5 The data does not appear to be RLE encoded, even when I look deeper into the file. That is not to say it is not RLE encoded, just that there appears to be another layer of encoding/compression on top of it if it is RLE encoded.

So while we don’t see any identifiable header information describing the file resolution yet, we can make some educated guesses. The game is intended to run on CGA, EGA, and VGA based systems. Most likely a common resolution was chosen between them, as we don’t see all the files having twins. The only ones with twins seem to be for the 256 colour versions (ignoring the TITLE graphics for the moment). So we might as well start with those. (for reference on video modes, colour depths, and resolutions see: IBM PC Family – BIOS Video Modes)

In order to get 256 colours on a VGA system mode 0x13 is used, which has a resolution of 320×200^[1]. Consequently 320×200 also seems to be the common resolution in colour mode for all 3 adapters. VGA and EGA both supporting 16 colours (mode 0x0d), while CGA only supports 4 colours at 320×200 (mode 0x04). I suspect that the graphics files are 320x200x16 colours, and are then somehow decimated for CGA mode, to give the best results. This can be seen when running the game in each of the available modes. Consequently when running in EGA mode, the title screen seems to be using the TITLE640 graphic, which surprised me and seems to suggest that that file is actually 640×350 (mode 0x10) rather than the 640×480 (VGA only) I had initially thought.

Unfortunately I wasn’t able to get the VM to emulate a card with less than 128KB of video RAM, so I wasn’t able to see what happens at that point, but I suspect it would fall back and use TITLE16. (EGA can only support 640×350 16 colours on cards that have 128KB or more)

CGA also uses TITLE16 but appears to decimate this down to the 4 colours it has available. The game appears to be using mode 0x04 with palette 1 in high intensity with black set as the background colour. Consequently this means only black, bright cyan, bright magenta, and white can be displayed.

When comparing the two title screens we can clearly see that the VGA/EGA version of the title is using more than 200 lines of vertical resolution, as the CGA version is most definitely 200 lines. This suggests the 640×350 resolution.

Comparison between Renderings of *“Wall”* on VGA vs CGA

When looking at the main menu screen and comparing between the EGA/VGA version and the CGA version we can clearly see the decimation that is used. It appears they also perform some simple dithering with certain colours.^[2]

Having all the graphics being 320×200 has the added advantage that they can use the 16 colour image easily in the 256 colour modes with little or no work. They only need to maintain the first 16 entries of the 256 colour palette to be the same as what 16 colour mode would be, and this is actually the mode 0x13 default configuration. With that, I think it is safe to say all the graphic assets are 320×200, with the exception of the one ‘640’ title graphic file which is 640×350. We can also safely say that all images are 16 colour, with the exception of those that have the ‘256’ prefix.

At this point I think we have enough to start looking at what we can find on the web and how it compares to what we have. The first major hit we get is this text file written by Joel “Quadko” McIntyre describing the Darklands .PIC Image File Format back in 2002. Reading through the document, it looks like this is not the same format at all. Then I noticed this little gem at the bottom of the file

	<ChunkData=
		<Identifier="X0">
		<Length=Variable>
		<InternalData=
			<ImageWidth=Variable>
			<ImageHeight=Variable>
			<FormatIdentifyer="0b">
			<CompressedBitstream=Variable>
		>
	>

My eye caught that the “FormatIdentifyer” equals 0x0b, Which is the first byte we see in most of our files. Then the compressed image data followed that immediately, which makes sense for what we are seeing as well. Could it be that the Darklands version is just an onion wrapper around the version we are looking at with our files? The document doesn’t mention the 0xf5 we are also seeing, but this can’t be just a coincidence. Since that byte is just a “format identifier”, perhaps the 0xf5 variant wasn’t used anymore? Regardless, it’s worth looking more into how the data is compressed.

According to the document we found, the image is RLE_{Run-Length Encoded} encoded and then compressed with some unknown compression scheme. Well that certainly jives with what we are seeing. Now let’s see if we can find anything else online that might help identify the compression scheme used. Again, another one of the top hits of my initial search found this post on a ‘Civilization’ forum by user darkpanda (Civilization was released in 1991, bringing this format closer and closer to our 1989 date), in the post he suggests that LZW_{Lempel-Ziv-Welch} compression was used. LZW is a dictionary based compression algorithm, and should be relatively easy for us to verify without writing any new code just yet. LZW streams consist of a series of codes of some fixed bit width. If the scheme uses variable length encoding, that width will start at a minimum and increase only as necessary when the compressor needs to make room for more codes. As such, for our initial look, we can assume a fixed width encoding. Assuming an 8-bit input symbol width, that would mean our minimal code width will most likely be 9-bits. Luckily I have a viewer that allows me to look at data with arbitrary widths.

With LZW the initial dictionary of codes is set to all possible single input symbols (so in this case 0x00 to 0xff), expanded out to the code width. Which means that all codes that are just the input symbol will have all bits above the symbol width set to 0. The first new code will be our max input symbol value + some amount, but suffice it to say values for any new symbol will have at least one bit in the range of bits above the symbol width within the code set to 1. (so assuming 8-bit symbol and 9-bit code, any new code will be in the range of 256-511 [0x100-0x1ff])

File: TITLE16.PIC  [7529 bytes]  Skipped Prefix Bytes: 1  Symbol: 9 bits LE
Offset     00  01  02  03  04  05  06  07  08  09  0A  0B  0C  0D  0E  0F  10  11  
00000000: 000 090 044 088 008 088 000 105 090 007 008 080 10B 000 080 000 104 105 
00000012: 088 088 080 10F 114 106 090 013 10C 101 05A 10B 008 101 00A 0DD 090 024 
00000024: 101 013 11F 090 05B 11F 118 00A 04D 044 090 022 0D4 125 127 05E 127 008 
00000036: 0D0 12E 024 00D 101 07A 137 123 08D 101 011 10F 127 066 12D 123 131 090

From what we can see above, we can generally see codes gradually increasing in value above 0x100, for values above 0x100. And if we see a repeated code above 0x100 the value that follows is always different from what we’ve seen before. (0x101 is a good example above) This is consistent with what I would expect for LZW compression, at the start of a stream. codes will become increasingly random as more data is compressed, and the dictionary increases in size. Furthermore, the document discussed that for the RLE scheme, 0x90 is used as the token to indicate run-length encoding. In the short segment above we can see several occurrences of 0x90 which bodes well for this being LZW compressed, and having a code width of 9-bits, at least initially, and ultimately conforming to the same encode/compress scheme as described in the Darklands format.

I think this is a good place to stop, for this post. In the next post we’ll look into the specifics of the LZW compression, and see if we can decompress the data to leave behind just the RLE encoded data.

This Post is the second in a series of posts surrounding my reverse engineering efforts of the PIC file format that MicroProse used with several of their PC/DOS games. In this case, specifically F-15 Strike Eagle II (Though I plan to trace the format through other titles to see if and how it changes). To read my other posts on this topic you can use this link to my archive page for the PIC File Format which will contain all my posts to date on the subject.

Footnotes:

1: Later games sometimes used a tweaked version often referred to as mode-X to change the memory mapping for performance and/or to get a bit more resolution. However, I don’t believe that is being used in this case.

2: While I didn’t outline it, I did try substituting in some of the other files with the TITLE16 file, and the WALL file (the graphic used for the main menu screen) file to see what happened in the different video modes, to confirm the files all appear to be 16 colour.

Ouch my eye!