September 17, 2024

PIC as we know it

Time to stop procrastinating, and distracting myself with other formats, and time to put the MicroProse PIC file format to rest – at least with what we know about it so far. This post serves to act as a formal document for the PIC format, with everything we know so far, and possibly make a few changes too. If you haven’t followed along the whole adventure over the past four months, you may want to read that first, you can find all the PIC related posts here, though I will try to link to relevant posts here as I go. With that said, let’s get down to business.

Fixing the fragility problem

Before we get into the technical details of the MicroProse PIC file format, I want to make one last change to the naming, so that we don’t have problems down the road. (hopefully) There was a flaw in my naming scheme of using the asset dates or release year to mark the format, and that was that we kept finding earlier and earlier instances, so we had to rename a couple of times. I did make a post not that long ago here, were we made the latest change to the names. In that post I said I was freezing the names, and would use caveats to explain any date based discrepancy going forward. This was not the first time we had to rename the variants however. That last change has been siting like an itch at the back of my brain since that I’ve had to scratch. This is also largely driven by the findings that the PC-98 version of Civilization was using the same PIC format as Railroad Tycoon Deluxe. So I’ve decided to abandon the date based naming altogether and go with something new as you’ll see below.

The MicroProse PIC Image File Format

Forward

The following document represents about 4 months of research and work to reverse engineer the MicroProse PIC file format used with many of their game titles in the late 1980’s and through the mid 1990’s. The result is what I believe to be the most comprehensive and complete documentation on the format to date.

I have documented every variation of the format I was able to find, predominantly for the PC / DOS platform though it does extend to the PC-98 platform as well. Unfortunately I have not had the opportunity to examine assets from ports of titles to other platforms to see if they still used PIC, or perhaps even used a new version of PIC (as we found with PC-98). My hope by publishing this is to help preserve these files and some of the amazing artwork trapped within them, as well as to enable modding of these old titles so fans can breathe new life back into them.

Version 1 (PIC^v1)

PIC^v1 or PIC version 1 is the earliest form of the PIC format we’ve seen, and the first version of the format we looked at. We previously identified this version as PIC⁸⁸, this is the one variant that didn’t change names over the course of our discoveries until now. So far we’ve identified the following MicroProse titles to use the PIC^v1 format.

Game Title	Platform	Release Year	Asset Date
F-15 Strike Eagle II	PC	1989	Jun 89
F-15 Strike Eagle II Desert Storm	PC	1991	Mar 91
F-19 Stealth Fighter	PC	1988	May 88
F-117A Stealth Fighter	PC	1991	Aug 91
Gunship 2000	PC	1991	Sep 91
M1 Tank Platoon*	PC	1989

*See the “PIC Adjacent” Section below for details on the PIC format usage with M1 Tank Platoon

PIC^v1 Structure

PIC^v1 is the most basic form of the PIC format, all the subsequent versions (except PIC⁹⁸) essentially wrap themselves around this version like an onion for the most part. The file structure consists of a single byte format identifier, followed by a RLE+LZW compressed stream of data. There is no way for PIC^v1 to specify the dimensions of the image, so this information will need to be externally provided in order to properly decode a PIC^v1 image. (most common resolution is 320×200).

typedef struct {
    int8_t format;     // Format Identifier
    uint8_t lz_data[]; // RLE+LZW compressed stream
} mp_picV1_t;

Format Identifier

The format identifier is an 8bit signed value, with its absolute value representing the maximum code width for the LZW compressed stream that follows. The sign of the value indicates if the data is a 4bit packed pixel format or linear pixel format. Positive indicates pixels are packed two per byte (4bit packed), while a negative value indicates a linear arrangement (8bits per pixel). The pixel packing arrangement is discussed in the “Compression” section below. (In subsequent versions of PIC the format identifier is always positive, and does not indicate the pixel packing arrangement, only the maximum LZW code width) The most common identifier values we have seen are:

0B (11) and F5 (-11)

Both values indicate a maximum code width of 11 bits, though other values have been seen as well. Valid range is 9 to 11 (-9 to -11).

Compressed stream

The lz_data contains the variable length RLE+LZW compressed data. This compression scheme is common for PIC^v1 to PIC^v3, see the “compression” section below for further detail.

Version 2 (PIC^v2)

PIC^v2 or PIC version 2 is the first evolution we’ve seen of the PIC format we encountered in our timeline. We initially identified this version as PIC⁹⁰ and then later as PIC⁸⁹, and first took a closer look at it here. PIC^v2 appears to have several Sub-Types associated with it that control what how the data is encoded, and what additional data is also included. So far we’ve identified the following MicroProse titles to use the PIC^v2 format.

Game Title	Platform	Release Year	Asset Date
Sid Meier’s Covert Action	PC	1990	May 90
Sid Meier’s Railroad Tycoon	PC	1990	Feb 90
Silent Service II	PC	1990	Jun 90
Sword of the Samurai	PC	1989

PIC^v2 Structure

With PIC^v2 we have a more complete structure, including having the image dimensions specified within the file. This header below effectively defines PIC^v2 and is common to all sub-types. Like PIC^v1, PIC^v2 uses the same RLE+LZW compression scheme. In fact PIC^v2 is like a wrapper around PIC^v1, except that the format identifier is now just a max-bits indicator for the LZW code width, the pixel packing method is defined by the sub-type now.

Header

The PIC^v2 header is 6 bytes long and is common to all the PIC^v2 sub-types. All PIC^V2 files will start with this 6 byte header.

typedef struct {
    uint16_t sub_type;   // sub-type identifier 
    uint16_t width;      // image width in pixels
    uint16_t height;     // image height in pixels
    uint8_t  data[];     // remainder of PIC sub-type data
} mp_picV2_t;

Sub-Type identifier

The sub_type field is a 16bit value that identifies the sub-type for the image. To date we’ve discovered 4 different sub-type values. 0x06, 0x07, 0x0E, and 0x0F. These types seem to exist in pairs denoting the pixel encoding as packed or linear. Generally speaking it appears that if the least significant bit of the type value is a 1 then the pixel data is 4bit packed, if a 0 the data is 8 bit. We’ll go through each of the sub-types below.

Image Dimensions

The image width and height are 16bit values representing the pixel dimensions of the image.

Sub-Types `06` and `07`

Types 6 and 7 are the most basic sub-type we’ve seen, and basically define the rest of the data field above as being the same as that of a PIC^v1 file. Type 6 files are 8bit linear pixel data, and Type 7 are 4bit packed pixel data images.

typedef struct { // PICv2 sub-type 6 & 7 structure
    uint8_t max_bits;  // maximum code width for LZW data
    uint8_t lzdata[];  // RLE+LZW compressed stream
} mp_picV2_basic_data_t;

Max Bits

The max_bits field is an 8bit unsigned value that replaces the format identifier of the PIC^v1 structure as the prelude to the LZW compressed stream. As the value is unsigned now, it can only represent positive values, and defines the maximum number of bits for a code in the LZW compressed stream. Typical value here is 11 (0x0b) but the valid range is 9-11.

Compressed stream

The lz_data contains the variable length RLE+LZW compressed data. This compression scheme is common for PIC^v1 to PIC^v3, see the “compression” section below for further detail.

Sub-Types `0`E and `0F`

Types E and F add some more meta data to the image in the data field above. In this case a CGA dithering table is included for down-converting 16 colour EGA images to 4 colour CGA. The dithering table is followed by the same data as that of a PIC^v1 file. Type E files are 8bit linear pixel data, and Type F are 4bit packed pixel data images.

typedef struct { // PICv2 sub-type E & F structure
    uint8_t cga_dither[16]; // 16 entry dithering table
    uint8_t max_bits;       // maximum code width for LZW data
    uint8_t lzdata[];       // RLE+LZW compressed stream
} mp_picV2_dithered_data_t;

Dithering table

The cga_dither field is a 16 byte table that tells how to handle each of the 16 EGA colours in 4 colour CGA mode. Each 8 bit entry is split into 2 nibbles where each nibble specifies one of the four CGA colours to use for the given EGA colour. Selection of which of the two to use is made based on its X-Y location on the screen. That is to say on even numbered lines for even pixel addresses the low nibble is used, and odd pixel addresses the high nibble is used. The relationship between even and odd swaps for even and odd lines to generate the basic checker-board dither effect. If no dithering is to be used for a colour, then both the low and high nibbles need to be set to the same value.

px = ((x ^ y) & 0x01) = 0 ? cga_dither[px] & 0x03 : (cga_dither[px] >> 4) & 0x03;

Max Bits

Compressed stream

The lz_data contains the variable length RLE+LZW compressed data. This compression scheme is common for PIC^v1 to PIC^v3, see the “compression” section below for further detail.

Consolidated Reference Structures

For reference here are what the combined headers would look like for each of the PIC^v2 sub-type groups. These structures should be considered unaligned and packed, meaning that there is no additional padding for any of the members. All members are placed in the order in which they appear in the file.

typedef struct { // Full structure for PICv2 types 6 & 7
    uint16_t sub_type;      // sub-type identifier 
    uint16_t width;         // image width in pixels
    uint16_t height;        // image height in pixels
    uint8_t max_bits;       // maximum code width for LZW data
    uint8_t lzdata[];       // RLE+LZW compressed stream
} mp_picV2_basic_t;

typedef struct { // Full structure for PICv2 types E & F
    uint16_t sub_type;      // sub-type identifier 
    uint16_t width;         // image width in pixels
    uint16_t height;        // image height in pixels
    uint8_t cga_dither[16]; // 16 entry dithering table
    uint8_t max_bits;       // maximum code width for LZW data
    uint8_t lzdata[];       // RLE+LZW compressed stream
} mp_picV2_dithered_t;

Version 3 (PIC^v3)

PIC^v3 or PIC version 3 is the final evolution we’ve seen of the PIC format that uses the original LZW compression scheme. PIC^v3 is the most widely used that we’ve seen to date. We initially identified this version as PIC⁹¹ and then later as PIC⁹⁰ and first took a closer look at it here. PIC^v3 is the first version to have a fully tagged structure format, allowing for various types of image related data to be encoded along side the pixel data itself. So far we’ve identified the following MicroProse titles to use the PIC^v3 format.

Game Title	Platform	Release Year	Asset Date
Darklands	PC	1992	Jun 92
F14 Fleet Defender	PC	1994	Feb 94
F-15 Strike Eagle III	PC	1992	Sep 92
Hyperspeed	PC	1991	Jul 90
Knights of the Sky	PC	1990
Lightspeed	PC	1990	Jul 90
Magic: The Gathering	PC	1997
Sid Meier’s Civilization	PC	1991	Nov 91

PIC^v3 Structure

The PIC^v3 files consist of one or more tagged blocks of data. Each block begins with the same common header identifying the block type via the tag, and its length. A valid PIC file must contain one of the image types but can can optionally contain any one, or more, of the other defined block types, but only one of any give base type. The blocks may appear in any order.

Block Header

Each block starts off with a common 4 byte header identifying the blocks type and length.

typedef struct {  // PicV3 General Block Header
    char block_id[2];  // block tag
    uint16_t length;   // length of the block
    uint8_t data[];    // remaining block data
} mp_picV3_block_t;

Block Tag

The block_id field is a 2 byte character array containing the tag identifier for the block. The first character of the tag indicates the base type for the block, the 2nd character indicates a sub-type. To date only 4 different block tags have been encountered (5 if you include the sub-type) So far only the the base image “X” type has an additional sub-type. The pattern for the tags appears to be an upper case letter for the base type, and a single digit for the sub-type. With that the tag’s we have seen so far are “C0“, “E0“, “M0“, “X0“, and “X1“. Only one of the image types can appear in a single file.

Length

The length field is a 16bit value containing the total length in bytes for the block that remains. (so does not include the tag or length fields). This value can be used to skip-ahead to the next block if the current one is not needed.

Block Types `C0` and `E0` – CGA and EGA Dithering

I’ve grouped base types C0 and E0 together here because they share the same physical layout and perform the same basic function, just for different video modes. Type C0 is for CGA dithering, while type E0 is for EGA dithering. The function here is identical to that of Types E and F for PIC^v2. This block holds dithering data for converting the up to 256 colour image down to 16 colour for EGA, or 4 colour for CGA.

typedef struct { // CGA and EGA dither maps
    uint8_t first;        // index of first dither entry
    uint8_t last;         // index of last dither entry
    uint8_t dither_map[]; // last-first+1 dithering entries
} mp_picV3_dither_t;

Index Range

The first and last values for both the dithering block are 8 bit values denoting the start and end indices for the 8bit dithering data to follow, giving a range of 0-255. Calculating the number of entries in the following table is done by subtracting the two values. (last-first+1) thus can contain any contiguous sub-set of the entire 256 colour range up to the full 256 entries.

Dithering table

The dither_map field is a table that tells how to handle dithering for each of the image colours (up to 256) in 16 colour EGA mode for the E0 block, and 4 colour CGA mode for the C0 block. Each 8 bit entry is split into 2 nibbles where each nibble specifies one of the destination mode colours (CGA or EGA) to use for the given colour. Selection of which of the two to use is made based on its X-Y location on the screen. That is to say on even numbered lines for even pixel addresses the low nibble is used, and odd pixel addresses the high nibble is used. The relationship between even and odd swaps for even and odd lines to generate the basic checker-board dither effect. If no dithering is to be used for a colour, then both the low and high nibbles need to be set to the same value. For the CGA map the nibble values range from 0-3, and for the EGA map they range from 0-15. The number of entries in the table is last-first+1, meaning that each table can hold up to 256 entries.

px = ((x ^ y) & 0x01) = 0 ? dither_map[px] & 0x0F : (dither_map[px] >> 4) & 0x0F;

Block Type `M0` – Palette data

The M0 palette block holds up to 256 colour samples to define the palette for the image. The presence of this block saves the need to provide an external palette for 8bit colour images. Though its presence is not guaranteed with any image. The game can, and often does, rely on an already established palette. The format of the data is exactly the same as a bare .PAL file, except it includes start and end indices, allowing for sub-ranges to be defined.

typedef struct {
    uint8_t r;              // 8bit Red component value (0-63)
    uint8_t g;              // 8bit Green component value (0-63)
    uint8_t b;              // 8bit Blue component value (0-63)
} pal_t;

typedef struct { // PicV3 Palette Block 
    uint8_t first;          // index of first palette entry
    uint8_t last;           // index of last palette entry
    pal_t palette_data[];   // last-first+1 RGB entries
} mp_picV3_palette_t;

Index Range

The first and last values for both the palette block are 8 bit values denoting the start and end indices for the RGB table data to follow, giving a range of 0-255. Calculating the number of entries in the following table is done by subtracting the two values. (last-first+1) thus can contain any contiguous sub-set of the entire 256 colour range up to the full 256 entries.

Palette Data

Each palette entry in the palette_data table consists of 3 8 bit entries for red, green, and blue respectively. The palette data values typically range from 0-63 (this is the range for the standard VGA 18bit DAC), but on some later titles it may range from 0-255 as 24bit DACs became more common.

Block Types `X0` and `X1` – Image data

The X0 and X1 block types contain the compressed image data. With X0 holding 8bit 1 pixel per byte data and X1 containing 4bit 2 pixels per byte data. The image block also carries the image dimensions and the max-bits field for the LZW compression. Like PIC^v1, PIC^v3 uses the same RLE+LZW compression scheme. In fact PIC^v3 is like a wrapper around PIC^v1, except that the format identifier is now just a max-bits indicator for the LZW code width, the pixel packing method is defined by the block type now.

typedef struct { // PicV3 Image Block
    uint16_t width;    // image width in pixels
    uint16_t height;   // image height in pixels
    uint8_t max_bits;  // maximum code width for LZW data
    uint8_t lz_data[]; // RLE+LZW compressed stream
} mp_picV3_image_t;

Image Dimensions

The image width and height are 16bit values representing the pixel dimensions of the image.

Max Bits

Compressed stream

The lz_data contains the variable length RLE+LZW compressed data. This compression scheme is common for PIC^v1 to PIC^v3, see the “compression” section below for further detail.

Consolidated Reference Structures

For reference here are what the combined headers would look like for each of the PIC^v3 blocks. These structures should be considered unaligned and packed, meaning that there is no additional padding for any of the members. All members are placed in the order in which they appear in the file.

typedef struct { // Full structure for PICv3 "C0"/"E0" block (dithering)
    char block_id[2];       // tag "C0" or "E0"
    uint16_t length;        // length of the block
    uint8_t first;          // index of first dither entry
    uint8_t last;           // index of last dither entry
    uint8_t dither_data[];  // last-first+1 dithering entries
} mp_picV3_dither_block_t;

typedef struct { // Full structure for PICv3 "M0" block (palette)
    char block_id[2];       // tag "M0"
    uint16_t length;        // length of the block
    uint8_t first;          // index of first palette entry
    uint8_t last;           // index of last palette entry
    pal_t palette_data[];   // last-first+1 RGB entries
} mp_picV3_palette_block_t;

typedef struct { // Full structure for PICv3 "X0"/"X1" block (image)
    char block_id[2];       // tag "X0" or "X1"
    uint16_t length;        // length of the block
    uint16_t width;         // image width in pixels
    uint16_t height;        // image height in pixels
    uint8_t max_bits;       // maximum code width for LZW data
    uint8_t lz_data[];      // RLE+LZW compressed stream
} mp_picV3_image_block_t;

PC-98 Version (PIC⁹⁸)

PIC⁹⁸ or PIC for PC-98 is a complete departure of the other PIC versions, it uses an entirely new structure and compression scheme. We first looked at PIC⁹⁸ here, and originally identified this version as PIC⁹³, and later renamed it to PIC⁹¹ when we discovered the connection with Civilization for PC-98. This version looks to be specifically for, and closely tied with the capabilities of, the NEC PC-9801 (PC-98), hence the name I’ve chosen for it. Despite being made for the PC-98, this format did find its way back to the PC platform with Railroad Tycoon Deluxe. So far we’ve identified the following MicroProse titles to use the PIC⁹⁸ format.

Game Title	Platform	Release Year	Asset Date
Sid Meier’s Civilization	PC-98	1992
Sid Meier’s Railroad Tycoon*	PC-98	1991
Sid Meier’s Railroad Tycoon Deluxe*	PC	1993	Jun 93

*Railroad Tycoon Deluxe is a PC port of the PC-98 Version of the Railroad Tycoon (which in itself was a port, and graphic overhaul of the original PC title)

PIC⁹⁸ Structure

PIC⁹⁸ files consist of a well defined header followed by four image plane blocks of LZSS compressed data. This arrangement appears to only support 16 colours, though unlike EGA images, the 16 colours are fully programmable via a palette table found in the header. This is the first version of PIC to have a file based tag that can be used to to identify the format.

Header

PIC⁹⁸ files begin with a fixed 24 byte header defining the dimensions and the palette for the image, as well as having a defining signature tag for the file.

typedef struct {
    uint8_t r;         // Red component value 0-15
    uint8_t g;         // Green component value 0-15
    uint8_t b;         // Blue component value 0-15
} pal_t;

typedef struct { // Pic98 Header
    char sig[4];       // [00-"H8"-00] Pic98 signature
    uint16_t width;    // image width in pixels
    uint16_t height;   // image height in pixels
    pal_t pal[16];     // RGB palette for this image (4 bits per component)
    uint8_t data[];    // block data
} mp_pic98_t;

Signature

All PIC⁹⁸ files begin with the sequence 00 “H8” 00 or in hex 00 48 38 00. It is unknown if there is any meaning to this value, or if the value provides any additional definition to the structure of the format, only the “H8” version has been seen in the wild to date.

Image Dimensions

The image width and height are 16bit values representing the pixel dimensions of the image.

Palette

The palette for a PIC⁹⁸ file always consists of sixteen 8bit RGB entries. However the valid range for each colour component is only 4bits (0-15) as the PC-98 only had a 12bit DAC in its video system, meaning it can display a maximum of 4096 distinct colours.

Plane Block

Each PIC⁹⁸ file contains four plane blocks of data after the header. Each plane block is prefixed with its size, and compressed individually. One thing to note is that plane blocks must start on a 16bit word boundary, meaning that the previous block may need to be padded by a byte to maintain alignment. This is the first time we’ve seen enforcement of data alignment in any of the PIC versions.

typedef struct { // Pic98 Plane Block
    uint16_t length;   // length of lz_data for plane
    uint8_t lz_data[]; // LZSS compressed plane data
} mp_pic98_plane_t;

Block Size

The length field is a 16bit value containing the total length in bytes for the block that remains. (so does not include the length field itself). This value is needed in order to locate the next block, as there is no other identifying markers.

Compressed Data

The lz_data for PIC⁹⁸ utilizes a planar pixel arrangement that is then LZSS compressed. This is a major departure from the previous PIC versions we’ve looked at which use LZW compression for the image data. PIC⁹⁸ also does not use an underlying RLE compression like the previous versions. The arrangement here appears to closely mimic the arrangement of the graphics hardware on the PC-9801 itself. Further detail can be found in the “compression” section below in the Planar Pixels and LZSS Compression sections.

PIC Compression

All PIC files are compressed images. In the case of PIC^v1 – PIC^v3 multiple layers of compression are applied to try to minimize the file size. With the exception of PIC⁹⁸ all PIC files use the same compression stack that looks like the following. (When compressing an 8bit image, the pixel packing stage is bypassed)

[PIC^v1-v3] <=> LZW Compression <=> RLE Encoding <=> Pixel Packing (4bit only) <=> [RAW Image]

For PIC⁹⁸ files a different compression method, and packing method are used so the compression stack looks a bit different. (note PIC⁹⁸ files are 4 bit only images)

[PIC⁹⁸] <=> LZSS Compression × 4 <=> Pixel Planar <=> [RAW Image]

Details for each of the individual stages will be discussed below.

Pixel Packing (PIC^v1–PIC^v3)

In order to minimize the space used by a 16 colour (or less) image pixels can be packed two to a byte to instantly cut the storage size in half. PIC files do this by storing the left most pixel in a pixel pair into the low nibble (bits 0-3) of a byte, and the rightmost pixel of the pair into the upper nibble (bits 4-7). Pixel packing is not (more accurately cannot) be performed on 8 bit colour images. Pixel packing is the first stage in the compression pipeline when compressing a RAW Image, and the last stage when decompressing. More detail on the packing and associated code can be found here.

One thing that didn’t come up in the discovery phase, was how to handle odd pixel widths. While I didn’t mention it in my blog posts at the time, the code needs to account for this, as you can’t just store a nibble of data. It wasn’t an issue with any images we worked with as those all tended to be full-screen images always ending on an even boundary. As such the ‘stride’ (bytes per line) of the image was always the same as the width (or even fraction of). This is not possible with an odd-width, as you will end in the middle of a byte. While you could start with the first pixel of the next line, this does not make much sense. Doing so makes pixel addressing more difficult, if the packed format is retained for processing. Instead the lines are padded out to an even boundary. This extra-pixels worth of data is not valid image data and should be discarded when unpacking. When packing the padding should be set to either 0 or whatever the background colour is. The net-effect of this is that the packed stream of data will look like it is one pixel wider than what was specified.

int stride = (width + 1) / 2

The size of the resultant packed stream will be stride * height and not width * height / 2 bytes. When unpacking you don’t need to worry about allocating extra space, unless you don’t discard the padding as you go. In that case you will need to allocate height extra bytes, and then come back and shift all the data back to their correct positions. (or have your code be able to handle stride and width separately at all times)

RLE Encoding (PIC^v1–PIC^v3)

The next level of compression used by PIC (Versions 1-3) files is to Run Length Encode (RLE) the data. RLE encoding takes runs of repeated values and replaces them with a shorter code that can then be used on the decompression side to recreate the longer run. RLE is a very lightweight and fast compression method. All PIC (Versions 1-3) images use RLE compression regardless of the pixel packing method.

The type of RLE encoding used by the PIC file format uses a special control token value (0x90 in this case) to indicate when a run is being encoded. An encoded run occupies 3 bytes in the stream with the following sequence:

VV 90 CC

Where VV is the repeated byte in the run, and CC is the count (length) of the run minus one. Meaning that if the count is set to 5, the resultant decoded length is 6. Or if you are encoding a run of 6 repeated bytes, you set the count to 5 in the resultant code.

Because of this form of encoding the value of the token needs to be reserved. A means to encode data that might have the same value as the token must be provided as well. This is done by providing a count value of 00 as it makes no sense to encode a run of one byte, in fact it makes no sense to try to encode a length less than 4 bytes as that would result in expansion, or no gain. So this means that the following sequence:

90 00

Would be seen in the stream whenever a 90 is in the actual data being encoded. This unfortunately results in a slight expansion, but hopefully the chosen token is a rare value in the actual data stream, and the slight expansion is far outweighed by compression elsewhere. More detail on the RLE encoding can be found here.

LZW Compression (PIC^v1–PIC^v3)

The final stage in the compression pipeline for PIC^v1 to PIC^v3 images, or the first stage when decompressing, is Lempel-Ziv-Welch Compression (LZW) which we first talked about in this post. The LZW compressor is a pretty standard implementation of the algorithm, the only oddity to keep track of is that code 128, which is the first new code that would be generated when compressing or decompressing, is reserved, so the actual first code is 129. However after a table reset, which happens when we reach the maximum code size and exhaust all the codes, code 128 is no longer a reserved code. The other aspect we’ve talked about with the PIC format has been the max-bits value that is part of every PIC file before the LZW stream begins. This value sets what the maximum code size is for the given stream. The valid range looks to be 9-11 bits, with 11 bits being the most common setting in files from MicroProse.

Planar Pixels (PIC⁹⁸)

With PIC⁹⁸ pixels are packed in an entirely different manner than they were in PIC^v1–PIC^v3. Here they are decomposed onto 4 separate planes, with each plane holding 1 bit of the pixels 4 bits, meaning that each byte on a plane holds a single bit from 8 different pixels. The pixels are encoded into the bytes of each plane with the most significant bit of the byte on a plane being the left most pixel in the string of pixels being fed in. The planes are ordered within the PIC98 file from the least significant bit to the most significant bit, meaning that the plane that holds all the bit 0’s of the pixels is stored first. You can find our determining of the bit and planar order here.

LZSS Compression (PIC⁹⁸)

Finally for PIC⁹⁸ the compression scheme is Lempel–Ziv–Storer–Szymanski (LZSS) which we first explored in this post. More importantly though is that the particular implementation of the LZSS algorithm used is one created by Fabrice Bellard, that he used with his LZEXE compression utility. In this implementation the LZSS flag bits are stored up in 16bit control words that are then prefixed before the “compression unit” that they control in the stream. I highly recommend reading my posts on writing the LZSS compressor to get a better understanding of the implementation details, and specifics for the PIC⁹⁸ implementation.

PIC Aliases

In our explorations we’ve seen a few different extensions used with the PIC file format, likely to denote how they are used within the game. This list should not be considered complete, or definitive, some of these extensions could be used with other data formats on other titles.

.SPR – Sprite files. These are full screen PIC images containing various sprites. These are typically found with some of the older titles using the PIC^v1 format.
.SPK – Sprite Pack? We’ve only seen this one with one or two titles, and it consists of several PIC^v1 files concatenated together.
.MAP – Map Image. I’ve seen this with a few titles, usually contained within a container file. Mostly these seemed to be PIC^v3.

Companion Formats

There is one main companion format to the PIC file, and that is the PAL (palette) file. It is not always present, in fact they are relatively rare. PAL files really only apply when 256 colour images are involved, as for CGA and EGA the palettes are defined differently. PAL files are typically 768 bytes in size containing 256 RGB values. Each component value is typically limited in range between 0 and 63, corresponding to the limits of the VGA’s hardware DAC. Though with some later titles this changed to 0-255. PAL files generally only exist for PIC^v1 and PIC^v2 images, as both PIC^v3 and PIC⁹⁸ are capable of containing their palettes internally.

typedef struct {
    uint8_t r;  // Red component value typically 0-63
    uint8_t g;  // Green component value typically 0-63
    uint8_t b;  // Blue component value typically 0-63
} pal_t;

typedef struct {
    pal_t pal[256]; // full 8 bit palette
} pal_file_t;

PIC Adjacent

We’ve also recently come across a couple of PIC like formats. Both of these formats were found within the container files of M1 Tank Platoon, and most closely resemble the PIC^v1 format, or perhaps even rely on it, but they do not fully conform to PIC^v1, or any of the other variants.

PK Image Files

We first discovered and explored the PK images files while investigating another variant of the .CAT container format that was included with M1 Tank Platoon. There we had 3 separate container files, and within each were the main images for the game. The three container files corresponded to the three graphics modes supported by the game. (not identifiable by name, only by discovery by looking at the images)

256 Colour PK Images

This version of the PK image is actually fully compliant with the PIC^v1 format. If this was the only variant of the file we found, we would have classified this as an alias format.

16 Colour and 4 Colour PK Images

Unlike its 256 colour counterpart, the 4bit packed variant used for the CGA and EGA images breaks the format rules for PIC^v1. Instead of having the format byte have a positive value to indicate the 4bit packed pixel arrangement, the value is kept at negative indicating an 8bit linear arrangement, however the data is actually 4bit packed. To decode this variant the format byte needs to be corrected to indicated the 4bit packed pixel arrangement. Alternatively it can be decoded at half the horizontal resolution and then unpacked later. But otherwise this format is again fully compatible with the PIC^v1 format, once the correction is made.

MAX Image Files

Like the PK files above we discovered the MAX images while exploring M1 Tank Platoon. Unlike the PK format, this version does not conform to the standard PIC compression pipeline. The MAX image file is even not really a file, it’s actually just a named block of data within a larger container file. These container files have no extension, though are fairly similar to a CAT file. They have an index of the named assets (though no file extensions in this case), followed by blocks of data for each asset in the file. These files all have the name format of [graphics adapter]MAXn Where [graphics adapter] is a 4 character label for the adapter in question, if shorter than 4 a trailing underscore is added. ‘n’ is a single digit character, the files are numbered 1-4 (5 for the MCGA files) For more information about the container format, and how we decoded the MAX images you can read about it here.

In conversation with my friend PixelWings he pointed out that the name MAX in the filenames likely refers to Max Remington who was the main graphics developer at MicroProse back in the day. So with that spirit in mind, I’m considering the image format within to be a MAX image, and the container file is just named after what it contains, we’ve seen this with other titles that have files like “PICS.CAT” that contain the PIC assets.

So while the MAX image format is not really a file, for the sake of convenience we can consider it as such. We can extract the named data blocks out of the container file as individual MAX files, which is again a name that we’ve given them in this case. The MAX format seems to make three departures from the normal PIC compression stack. Firstly the format byte is always set to a negative value, indicating linear pixel arrangement, this is also supported by the image width value being in bytes and not pixels (so half the horizontal resolution in pixels when dealing with 4bit packed images). The next major departure is that the 4bit packing order is opposite of the normal PIC flow, that is to say the high nibble is the leftmost pixel in a MAX file, while the low nibble is the leftmost pixel in a normal PIC file (which may explain the need for the format byte to always be negative so the PIC 4bit unpacker is bypassed). Finally, and probably the biggest deviation, is that there is no underlying RLE compression used on the data stream before LZW compression on MAX images. The type of MAX file, 4bit (4 or 16 colour) or 8bit (256 colour), can only be determined by the container file they are inside of, there are no other definitive markers in the MAX image data header to indicate this. The compression path for a MAX file looks like the following

[MAX] <=> LZW Compression <=> Pixel Packing (4 bit only) <=> [RAW Image]

MAX Structure

With MAX we have a structure that is almost more similar to PIC^v2 than PIC^v1, in that it contains image dimension information. But then we appear to go back to a signed format byte value instead of an unsigned max-bits value.

The MAX format has 5 bytes long header that defines the image dimensions in bytes, and the LZW compression bit width. A flag bit is also used to indicate the presence of palette data after the image data in the compressed stream.

typedef struct { // MAX image format
    uint16_t width;          // image width in bytes
    uint16_t flagged_height; // image height in lines, + palette flag bit
    int8_t   format_byte;    // Format Identifier
    uint8_t  lz_data[];      // LZW compressed stream
} mp_max_t;

Image Dimensions

The image width and flagged_height are 16bit values representing the pixel dimensions of the image in bytes and lines, as opposed to pixels with the other PIC variants. The most significant bit of the height value must be masked off to get the actual height of the image. That bit is instead the palette flag.

Palette Flag

The image palette flag is encoded as a single bit in the most significant bit position of the flagged_height value. When this bit is set to 1 it indicates that a 256 entry RGB palette (768 bytes) is included after the image data within the compressed data stream.

Format Identifier

The format identifier is an 8bit signed value, with its absolute value representing the maximum code width for the LZW compressed stream that follows. For MAX files this value must be negative, indicating linear pixel data in PIC^v1 terms, regardless of the pixel packing method actually used. (valid range is still likely -9 to -11)

Compressed stream

The lz_data contains the variable length LZW only compressed data. This LZW compression is the same as with PIC, but the data does not use the additional RLE compression applied before LZW compression as PIC files normally do, see the “compression” section above for further detail on the LZW compression.

256 Colour MAX Images

This version of the MAX image is the only time we see the palette flag come into play. Not all 256 colour image have the flag bit set, but when it is set, the size of the palette is always 768 bytes. This means that when decompressing the buffer needs to be 768 bytes larger than the calculated size for the image. The Palette data is in the same format we see with PAL files and the other PIC versions that support palette data with a range of 0-63 for each colour component.

16 Colour and 4 Colour MAX Images

This version of the MAX image is where we see that the width value is half that of what we would expect because it is actually expressed in bytes instead of pixels, and the data is packed two pixels per byte. This is also the version where a different 4bit unpacker needs to be used because the pixels are packed into the nibbles in the opposite order to that of a normal PIC file. (high nibble is the left most pixel here)

References

The findings documented here for the MicroProse PIC File Format did not happen in a vacuum of course. So I want to make a few key acknowledgements for people that have previously gone down this road before, and made their findings known.

Joel “Quadko” McIntyre for his initial dive, and Darklands .PIC Image File Format document (PIC^v3). His format description keyed me into a few key details that let me get started much quicker.
“Darkpanda” for their post over on the CivFanatics forum identifying the compression format being LZW, once again saving several steps and a lot of time in the process.

Final words

That pretty much sums up all the knowledge for the MicroProse PIC file format that we’ve accumulated over the last 4 months. The next few posts will concentrate on pulling all the prototypical code we’ve written together to produce a fully functional PIC library capable of reading and writing any of the versions. Along with it we’ll produce some example/utility applications to make use of the library for converting to and from a more convenient file format.

By Thread

File Formats, MAX File Format, MicroProse, PIC File Format, Reverse Engineering

Posted by:

canadianavenger

Timeline

7 responses to “PIC as we know it”

andreas86x

September 18, 2024 at 3:45 pm

The amount of research and dedication you have put into this project is absolutely outstanding! And it is a real service to the preservation and by extension the value of our favorite games! To be able to finally see their artworks in its purest form, to learn about how the visuals were built up, sprites, layers, etc, -and also to create new content in the form of big mods or just small changes that can improve the experience. I am very thankful for all your time and effort here, and can see many good things to come for the future of these games, as I am sure others do too!

Reply
RedMike

April 16, 2025 at 3:07 am

I really wish I’d found this before spending a week figuring out just the PIC format used in Sid Meier’s Covert Action (PICv2) and the LZW quirks they have. Really useful info here.

FYI I’m trying to now figure out the PAN format from Sid Meier’s Covert Action, which seems to be PIC-adjacent. Near as I can tell, it contains multiple PIC-format images, alongside: sometimes a bunch of custom data that defines things like background colour (for transparency) ahead of the image (always 502 bytes), then always a bunch of custom data at the end that seems to be animation info (what to display after which frame skips). But I’ve spent a good while on it now and while I know which bytes seem to encode X/Y/image ID for the animation data, I can’t figure out how that image ID really maps to the actual map images (it’s clearly not offsets or a numeric index). I’ll reply back with any more info if I figure it out, since it’s definitely PIC-adjacent.

Reply
canadianavenger

April 16, 2025 at 8:58 am

Indeed it is related to the PIC format. I’ve looked at it briefly, and have extracted a frame or two, but have not done an in-depth reversal of the format yet. It is on my list of formats to visit in the future though.

Reply
1. reallybff67fce3d
  
  April 17, 2025 at 4:45 am
  
  I’ve spent a good few hours on it so far and the most I’ve managed to really do is just extract the images themselves, but the animation data itself has been really hard to figure out.
  
  There’s a data block at the end of each file that seems to encode X/Y positions, movement types, and somehow encodes which image to use, but as far as I can tell it’s not an offset to which image or a counter (unless the counter doesn’t start at 0/1).
  
  Changing X/Y position bytes seems easy enough and has the expected behaviour in-game. But as soon as I touch what looked like image IDs I start to get really bizarre behaviours (images changing from one frame to another, the movement changing as well as the image changing, etc).
  
  The individual elements in that block reference each other via offsets from the start of the data block, that much I figured out, and individual elements start with 0x05. But they seem to be a variable size with no clear length byte/end marker, so I can’t even figure out how the program would figure out how many entries there even are (unless that’s embedded in the EXE file, which I hope it isn’t), or where the data block starts. Maybe it’s compressed in some way that isn’t immediately apparent to me.
  
  I’m probably going to take a break from this for a bit and come back to it later on.
  
  Reply
  1. canadianavenger
    
    April 17, 2025 at 9:07 am
    
    I doubt any of the information is hard-coded in the EXE. But I have seen with MicroProse, that things like element counts being computed at read-time, rather than being encoded directly in the data. I’ll see if I can take a quick poke at the format again this weekend to see if I can spot anything that might be of help. (I’m also wondering if PAN has various versions as PIC did, as the format was used with several titles)
reallybff67fce3d

April 17, 2025 at 9:40 am
The EXE file has a ton of game info embedded in itself rather than in separate files, that’s why I said that. It’s not offsets and stuff, but e.g. the CRIME.DTA files define the participant/events/etc that are generic, but the mission sets that combine those CRIME.DTA files to become specific (what gets stolen/etc) are just defined in the EXE directly, not as a separate file. I’m working on a modding editor for it, and that’s been an unwelcome surprise because it means the editor might have to modify the EXE to change the game in meaningful ways. But hopefully not relevant for PAN.

I don’t know about versions but I’m also not sure if any other MicroProse game has PAN files? I don’t have a lot of them so I can’t really check.

If you want to have a look, just to share the info I’ve got so far if it helps speed some things up:
- TITLE2/CREDITS/BLDNG have a 502 byte data segment after a type/subtype uint16/byte before the first image starts with 07 00; they have type/subtype 1 and 2, 1 and 0, or 4/5 and 2
  - The only info I’ve figured out is that on TITLE2, the first byte in that data segment is the DOS palette colour to use as the ‘background’ for transparent drawing
- BUSTOUT has type/subtype 3 and 1 but starts with an image directly (although a weird one) as 07 00
- The rest are all type/subtype 1 and 1 and start with an image directly as 07 00
- There’s no apparent separator between images but you can pretty reliably detect 07 00 and use the width/height in the image to figure out the expected size as you decode it..
- Some of the images have an odd width (e.g. 191) which then decodes wrong (diagonal skewing) unless you add 1 to it. I’m assuming that’s a marker for something on the image, maybe whether to clear the area before drawing on top or not (it seems to be moving images that have this)
- There are a couple places where 07 00 shows up in the image data itself so identifying 07 00 is not enough
- After all the images there’s a segment that starts with 00 05 (and probably until EOF) that seems to contain info on what images should show up when/where
  - Based on TITLE2, chunks start with 00 05 followed by some number of XX YY ZZ 05 until 00 00 00 00; in TITLE2 the first one of these has 13 entries, which I think maps to the 13 silhouettes that move at intro start based on changing parts of it and examining; Different chunks seem to have different sizes between 00 05 and the next 00 05, but unclear why that is.
  - The first set of YY ZZ in the row is a pointer from the start of this section and points to an 05 further down the list; changing this value to another entry makes the entry look like and behave like the other entry, but keeps its current position at least;
  - After these there’s chunks of 05 XX YY ZZ WW where WW seems to repeat in a pattern, in TITLE2 it’s consistently 2 then 4 then 2 then 1, and the number of entries like this divides by 4 correctly. Changes to the values seem to change the behaviour in very odd ways (appearing/disappearing sooner or later than expected, changing which image is shown, etc). I _thought_ there might be 4 of these for each previous entry but that’s not accurate. None of these seem to be offsets.
- The initial data segment is missing for most animations so it’s not important to figuring out the base properties, so looking at just the final data segment, I haven’t seen anything that could be a pointer to where an image starts in the file (or even an offset from the start of the first image, or a counter of the Xth image in the file).
  - I did make some headway by changing e.g. BD 04 to DF 04, which makes the ‘C’ in the intro Covert Action text turn into an ‘o’; changing DF 04 to 01 05 changed the ‘o’ to a ‘v’;
  - And I realised that BD 04 offset from the start of this data segment is an 05 entry is 0x2375 which is the start of a 05 chunk, and changing data in that chunk still affected the ‘C’ in the intro
  - But changing the data in that 0x2375 chunk seemed to be bizarre, like it would have odd effects (change the image multiple times from C to other images, trigger a different movement, make it pop in/out way too early/late, etc)
  - From there I couldn’t find a way to map it to which image in the set it should use.
  - I also haven’t found anything like a “count” of these sections, or a consistent way to move backwards into the file from EOF and not overshoot into image data, other than potentially looking for 00 05 (but some of the PAN files do have that in image data I believe, although maybe not if uncompressed first)
Sorry, bit wall of text but don’t really have a way to send this info across otherwise
Reply
1. canadianavenger
  
  April 17, 2025 at 10:29 am
  
  Oh I didn’t mean to suggest that there was no data hard-coded in the EXE, only that I doubt that any PAN related data is (with the possible exception of the palette, and a fixed transparency colour) There most certainly is hard-coded data, though I suspect that mostly to be game play related. As for the PAN format itself, I did take a quick look and the version with CA does appear to be different from the other ones I’ve looked at, so there are at least 2 variants, possibly more as the format is used with several titles. Thanks for sharing what you have discovered, no worries on the long text. When I poked my head into a CA .PAN file, I was able to quickly spot a PICv2 header after a long chunk of zero bytes. You’re welcome to reach out to me on Discord if you like my username is .canadianavenger
  
  Reply

PIC as we know it

Fixing the fragility problem

The MicroProse PIC Image File Format

Contents

Forward

Version 1 (PICv1)

PICv1 Structure

Format Identifier

Compressed stream

Version 2 (PICv2)

PICv2 Structure

Header

Sub-Type identifier

Image Dimensions

Sub-Types 06 and 07

Max Bits

Compressed stream

Sub-Types 0E and 0F

Dithering table

Max Bits

Compressed stream

Consolidated Reference Structures

Version 3 (PICv3)

PICv3 Structure

Block Header

Block Tag

Length

Block Types C0 and E0 – CGA and EGA Dithering

Index Range

Dithering table

Block Type M0 – Palette data

Index Range

Palette Data

Block Types X0 and X1 – Image data

Image Dimensions

Max Bits

Compressed stream

Consolidated Reference Structures

PC-98 Version (PIC98)

PIC98 Structure

Header

Signature

Image Dimensions

Palette

Plane Block

Block Size

Compressed Data

PIC Compression

Pixel Packing (PICv1–PICv3)

RLE Encoding (PICv1–PICv3)

LZW Compression (PICv1–PICv3)

Planar Pixels (PIC98)

LZSS Compression (PIC98)

PIC Aliases

Companion Formats

PIC Adjacent

PK Image Files

256 Colour PK Images

16 Colour and 4 Colour PK Images

MAX Image Files

MAX Structure

Image Dimensions

Palette Flag

Format Identifier

Compressed stream

256 Colour MAX Images

16 Colour and 4 Colour MAX Images

References

Final words

Share this:

7 responses to “PIC as we know it”

Leave a reply to andreas86x Cancel reply

Version 1 (PIC^v1)

PIC^v1 Structure

Version 2 (PIC^v2)

PIC^v2 Structure

Sub-Types `06` and `07`

Sub-Types `0`E and `0F`

Version 3 (PIC^v3)

PIC^v3 Structure

Block Types `C0` and `E0` – CGA and EGA Dithering

Block Type `M0` – Palette data

Block Types `X0` and `X1` – Image data

PC-98 Version (PIC⁹⁸)

PIC⁹⁸ Structure

Pixel Packing (PIC^v1–PIC^v3)

RLE Encoding (PIC^v1–PIC^v3)

LZW Compression (PIC^v1–PIC^v3)

Planar Pixels (PIC⁹⁸)

LZSS Compression (PIC⁹⁸)