Time to stop procrastinating, and distracting myself with other formats, and time to put the MicroProse PIC file format to rest – at least with what we know about it so far. This post serves to act as a formal document for the PIC format, with everything we know so far, and possibly make a few changes too. If you haven’t followed along the whole adventure over the past four months, you may want to read that first, you can find all the PIC related posts here, though I will try to link to relevant posts here as I go. With that said, let’s get down to business.
Fixing the fragility problem
Before we get into the technical details of the MicroProse PIC file format, I want to make one last change to the naming, so that we don’t have problems down the road. (hopefully) There was a flaw in my naming scheme of using the asset dates or release year to mark the format, and that was that we kept finding earlier and earlier instances, so we had to rename a couple of times. I did make a post not that long ago here, were we made the latest change to the names. In that post I said I was freezing the names, and would use caveats to explain any date based discrepancy going forward. This was not the first time we had to rename the variants however. That last change has been siting like an itch at the back of my brain since that I’ve had to scratch. This is also largely driven by the findings that the PC-98 version of Civilization was using the same PIC format as Railroad Tycoon Deluxe. So I’ve decided to abandon the date based naming altogether and go with something new as you’ll see below.
The MicroProse PIC Image File Format
Contents
- Forward
- Version 1 (PICv1)
- Version 2 (PICv2)
- Version 3 (PICv3)
- PC-98 Version (PIC98)
- PIC Compression
- PIC Aliases
- Companion Formats
- PIC Adjacent
- References
Forward
The following document represents about 4 months of research and work to reverse engineer the MicroProse PIC file format used with many of their game titles in the late 1980’s and through the mid 1990’s. The result is what I believe to be the most comprehensive and complete documentation on the format to date.
I have documented every variation of the format I was able to find, predominantly for the PC / DOS platform though it does extend to the PC-98 platform as well. Unfortunately I have not had the opportunity to examine assets from ports of titles to other platforms to see if they still used PIC, or perhaps even used a new version of PIC (as we found with PC-98). My hope by publishing this is to help preserve these files and some of the amazing artwork trapped within them, as well as to enable modding of these old titles so fans can breathe new life back into them.
Version 1 (PICv1)
PICv1 or PIC version 1 is the earliest form of the PIC format we’ve seen, and the first version of the format we looked at. We previously identified this version as PIC88, this is the one variant that didn’t change names over the course of our discoveries until now. So far we’ve identified the following MicroProse titles to use the PICv1 format.
| Game Title | Platform | Release Year | Asset Date |
|---|---|---|---|
| F-15 Strike Eagle II | PC | 1989 | Jun 89 |
| F-15 Strike Eagle II Desert Storm | PC | 1991 | Mar 91 |
| F-19 Stealth Fighter | PC | 1988 | May 88 |
| F-117A Stealth Fighter | PC | 1991 | Aug 91 |
| Gunship 2000 | PC | 1991 | Sep 91 |
| M1 Tank Platoon* | PC | 1989 |
*See the “PIC Adjacent” Section below for details on the PIC format usage with M1 Tank Platoon
PICv1 Structure
PICv1 is the most basic form of the PIC format, all the subsequent versions (except PIC98) essentially wrap themselves around this version like an onion for the most part. The file structure consists of a single byte format identifier, followed by a RLE+LZW compressed stream of data. There is no way for PICv1 to specify the dimensions of the image, so this information will need to be externally provided in order to properly decode a PICv1 image. (most common resolution is 320×200).
typedef struct {
int8_t format; // Format Identifier
uint8_t lz_data[]; // RLE+LZW compressed stream
} mp_picV1_t;
Format Identifier
The format identifier is an 8bit signed value, with its absolute value representing the maximum code width for the LZW compressed stream that follows. The sign of the value indicates if the data is a 4bit packed pixel format or linear pixel format. Positive indicates pixels are packed two per byte (4bit packed), while a negative value indicates a linear arrangement (8bits per pixel). The pixel packing arrangement is discussed in the “Compression” section below. (In subsequent versions of PIC the format identifier is always positive, and does not indicate the pixel packing arrangement, only the maximum LZW code width) The most common identifier values we have seen are:
0B (11) and F5 (-11)
Both values indicate a maximum code width of 11 bits, though other values have been seen as well. Valid range is 9 to 11 (-9 to -11).
Compressed stream
The lz_data contains the variable length RLE+LZW compressed data. This compression scheme is common for PICv1 to PICv3, see the “compression” section below for further detail.
Version 2 (PICv2)
PICv2 or PIC version 2 is the first evolution we’ve seen of the PIC format we encountered in our timeline. We initially identified this version as PIC90 and then later as PIC89, and first took a closer look at it here. PICv2 appears to have several Sub-Types associated with it that control what how the data is encoded, and what additional data is also included. So far we’ve identified the following MicroProse titles to use the PICv2 format.
| Game Title | Platform | Release Year | Asset Date |
|---|---|---|---|
| Sid Meier’s Covert Action | PC | 1990 | May 90 |
| Sid Meier’s Railroad Tycoon | PC | 1990 | Feb 90 |
| Silent Service II | PC | 1990 | Jun 90 |
| Sword of the Samurai | PC | 1989 |
PICv2 Structure
With PICv2 we have a more complete structure, including having the image dimensions specified within the file. This header below effectively defines PICv2 and is common to all sub-types. Like PICv1, PICv2 uses the same RLE+LZW compression scheme. In fact PICv2 is like a wrapper around PICv1, except that the format identifier is now just a max-bits indicator for the LZW code width, the pixel packing method is defined by the sub-type now.
Header
The PICv2 header is 6 bytes long and is common to all the PICv2 sub-types. All PICV2 files will start with this 6 byte header.
typedef struct {
uint16_t sub_type; // sub-type identifier
uint16_t width; // image width in pixels
uint16_t height; // image height in pixels
uint8_t data[]; // remainder of PIC sub-type data
} mp_picV2_t;
Sub-Type identifier
The sub_type field is a 16bit value that identifies the sub-type for the image. To date we’ve discovered 4 different sub-type values. 0x06, 0x07, 0x0E, and 0x0F. These types seem to exist in pairs denoting the pixel encoding as packed or linear. Generally speaking it appears that if the least significant bit of the type value is a 1 then the pixel data is 4bit packed, if a 0 the data is 8 bit. We’ll go through each of the sub-types below.
Image Dimensions
The image width and height are 16bit values representing the pixel dimensions of the image.
Sub-Types 06 and 07
Types 6 and 7 are the most basic sub-type we’ve seen, and basically define the rest of the data field above as being the same as that of a PICv1 file. Type 6 files are 8bit linear pixel data, and Type 7 are 4bit packed pixel data images.
typedef struct { // PICv2 sub-type 6 & 7 structure
uint8_t max_bits; // maximum code width for LZW data
uint8_t lzdata[]; // RLE+LZW compressed stream
} mp_picV2_basic_data_t;
Max Bits
The max_bits field is an 8bit unsigned value that replaces the format identifier of the PICv1 structure as the prelude to the LZW compressed stream. As the value is unsigned now, it can only represent positive values, and defines the maximum number of bits for a code in the LZW compressed stream. Typical value here is 11 (0x0b) but the valid range is 9-11.
Compressed stream
The lz_data contains the variable length RLE+LZW compressed data. This compression scheme is common for PICv1 to PICv3, see the “compression” section below for further detail.
Sub-Types 0E and 0F
Types E and F add some more meta data to the image in the data field above. In this case a CGA dithering table is included for down-converting 16 colour EGA images to 4 colour CGA. The dithering table is followed by the same data as that of a PICv1 file. Type E files are 8bit linear pixel data, and Type F are 4bit packed pixel data images.
typedef struct { // PICv2 sub-type E & F structure
uint8_t cga_dither[16]; // 16 entry dithering table
uint8_t max_bits; // maximum code width for LZW data
uint8_t lzdata[]; // RLE+LZW compressed stream
} mp_picV2_dithered_data_t;
Dithering table
The cga_dither field is a 16 byte table that tells how to handle each of the 16 EGA colours in 4 colour CGA mode. Each 8 bit entry is split into 2 nibbles where each nibble specifies one of the four CGA colours to use for the given EGA colour. Selection of which of the two to use is made based on its X-Y location on the screen. That is to say on even numbered lines for even pixel addresses the low nibble is used, and odd pixel addresses the high nibble is used. The relationship between even and odd swaps for even and odd lines to generate the basic checker-board dither effect. If no dithering is to be used for a colour, then both the low and high nibbles need to be set to the same value.
px = ((x ^ y) & 0x01) = 0 ? cga_dither[px] & 0x03 : (cga_dither[px] >> 4) & 0x03;
Max Bits
The max_bits field is an 8bit unsigned value that replaces the format identifier of the PICv1 structure as the prelude to the LZW compressed stream. As the value is unsigned now, it can only represent positive values, and defines the maximum number of bits for a code in the LZW compressed stream. Typical value here is 11 (0x0b) but the valid range is 9-11.
Compressed stream
The lz_data contains the variable length RLE+LZW compressed data. This compression scheme is common for PICv1 to PICv3, see the “compression” section below for further detail.
Consolidated Reference Structures
For reference here are what the combined headers would look like for each of the PICv2 sub-type groups. These structures should be considered unaligned and packed, meaning that there is no additional padding for any of the members. All members are placed in the order in which they appear in the file.
typedef struct { // Full structure for PICv2 types 6 & 7
uint16_t sub_type; // sub-type identifier
uint16_t width; // image width in pixels
uint16_t height; // image height in pixels
uint8_t max_bits; // maximum code width for LZW data
uint8_t lzdata[]; // RLE+LZW compressed stream
} mp_picV2_basic_t;
typedef struct { // Full structure for PICv2 types E & F
uint16_t sub_type; // sub-type identifier
uint16_t width; // image width in pixels
uint16_t height; // image height in pixels
uint8_t cga_dither[16]; // 16 entry dithering table
uint8_t max_bits; // maximum code width for LZW data
uint8_t lzdata[]; // RLE+LZW compressed stream
} mp_picV2_dithered_t;
Version 3 (PICv3)
PICv3 or PIC version 3 is the final evolution we’ve seen of the PIC format that uses the original LZW compression scheme. PICv3 is the most widely used that we’ve seen to date. We initially identified this version as PIC91 and then later as PIC90 and first took a closer look at it here. PICv3 is the first version to have a fully tagged structure format, allowing for various types of image related data to be encoded along side the pixel data itself. So far we’ve identified the following MicroProse titles to use the PICv3 format.
| Game Title | Platform | Release Year | Asset Date |
|---|---|---|---|
| Darklands | PC | 1992 | Jun 92 |
| F14 Fleet Defender | PC | 1994 | Feb 94 |
| F-15 Strike Eagle III | PC | 1992 | Sep 92 |
| Hyperspeed | PC | 1991 | Jul 90 |
| Knights of the Sky | PC | 1990 | |
| Lightspeed | PC | 1990 | Jul 90 |
| Magic: The Gathering | PC | 1997 | |
| Sid Meier’s Civilization | PC | 1991 | Nov 91 |
PICv3 Structure
The PICv3 files consist of one or more tagged blocks of data. Each block begins with the same common header identifying the block type via the tag, and its length. A valid PIC file must contain one of the image types but can can optionally contain any one, or more, of the other defined block types, but only one of any give base type. The blocks may appear in any order.
Block Header
Each block starts off with a common 4 byte header identifying the blocks type and length.
typedef struct { // PicV3 General Block Header
char block_id[2]; // block tag
uint16_t length; // length of the block
uint8_t data[]; // remaining block data
} mp_picV3_block_t;
Block Tag
The block_id field is a 2 byte character array containing the tag identifier for the block. The first character of the tag indicates the base type for the block, the 2nd character indicates a sub-type. To date only 4 different block tags have been encountered (5 if you include the sub-type) So far only the the base image “X” type has an additional sub-type. The pattern for the tags appears to be an upper case letter for the base type, and a single digit for the sub-type. With that the tag’s we have seen so far are “C0“, “E0“, “M0“, “X0“, and “X1“. Only one of the image types can appear in a single file.
Length
The length field is a 16bit value containing the total length in bytes for the block that remains. (so does not include the tag or length fields). This value can be used to skip-ahead to the next block if the current one is not needed.
Block Types C0 and E0 – CGA and EGA Dithering
I’ve grouped base types C0 and E0 together here because they share the same physical layout and perform the same basic function, just for different video modes. Type C0 is for CGA dithering, while type E0 is for EGA dithering. The function here is identical to that of Types E and F for PICv2. This block holds dithering data for converting the up to 256 colour image down to 16 colour for EGA, or 4 colour for CGA.
typedef struct { // CGA and EGA dither maps
uint8_t first; // index of first dither entry
uint8_t last; // index of last dither entry
uint8_t dither_map[]; // last-first+1 dithering entries
} mp_picV3_dither_t;
Index Range
The first and last values for both the dithering block are 8 bit values denoting the start and end indices for the 8bit dithering data to follow, giving a range of 0-255. Calculating the number of entries in the following table is done by subtracting the two values. (last-first+1) thus can contain any contiguous sub-set of the entire 256 colour range up to the full 256 entries.
Dithering table
The dither_map field is a table that tells how to handle dithering for each of the image colours (up to 256) in 16 colour EGA mode for the E0 block, and 4 colour CGA mode for the C0 block. Each 8 bit entry is split into 2 nibbles where each nibble specifies one of the destination mode colours (CGA or EGA) to use for the given colour. Selection of which of the two to use is made based on its X-Y location on the screen. That is to say on even numbered lines for even pixel addresses the low nibble is used, and odd pixel addresses the high nibble is used. The relationship between even and odd swaps for even and odd lines to generate the basic checker-board dither effect. If no dithering is to be used for a colour, then both the low and high nibbles need to be set to the same value. For the CGA map the nibble values range from 0-3, and for the EGA map they range from 0-15. The number of entries in the table is last-first+1, meaning that each table can hold up to 256 entries.
px = ((x ^ y) & 0x01) = 0 ? dither_map[px] & 0x0F : (dither_map[px] >> 4) & 0x0F;
Block Type M0 – Palette data
The M0 palette block holds up to 256 colour samples to define the palette for the image. The presence of this block saves the need to provide an external palette for 8bit colour images. Though its presence is not guaranteed with any image. The game can, and often does, rely on an already established palette. The format of the data is exactly the same as a bare .PAL file, except it includes start and end indices, allowing for sub-ranges to be defined.
typedef struct {
uint8_t r; // 8bit Red component value (0-63)
uint8_t g; // 8bit Green component value (0-63)
uint8_t b; // 8bit Blue component value (0-63)
} pal_t;
typedef struct { // PicV3 Palette Block
uint8_t first; // index of first palette entry
uint8_t last; // index of last palette entry
pal_t palette_data[]; // last-first+1 RGB entries
} mp_picV3_palette_t;
Index Range
The first and last values for both the palette block are 8 bit values denoting the start and end indices for the RGB table data to follow, giving a range of 0-255. Calculating the number of entries in the following table is done by subtracting the two values. (last-first+1) thus can contain any contiguous sub-set of the entire 256 colour range up to the full 256 entries.
Palette Data
Each palette entry in the palette_data table consists of 3 8 bit entries for red, green, and blue respectively. The palette data values typically range from 0-63 (this is the range for the standard VGA 18bit DAC), but on some later titles it may range from 0-255 as 24bit DACs became more common.
Block Types X0 and X1 – Image data
The X0 and X1 block types contain the compressed image data. With X0 holding 8bit 1 pixel per byte data and X1 containing 4bit 2 pixels per byte data. The image block also carries the image dimensions and the max-bits field for the LZW compression. Like PICv1, PICv3 uses the same RLE+LZW compression scheme. In fact PICv3 is like a wrapper around PICv1, except that the format identifier is now just a max-bits indicator for the LZW code width, the pixel packing method is defined by the block type now.
typedef struct { // PicV3 Image Block
uint16_t width; // image width in pixels
uint16_t height; // image height in pixels
uint8_t max_bits; // maximum code width for LZW data
uint8_t lz_data[]; // RLE+LZW compressed stream
} mp_picV3_image_t;
Image Dimensions
The image width and height are 16bit values representing the pixel dimensions of the image.
Max Bits
The max_bits field is an 8bit unsigned value that replaces the format identifier of the PICv1 structure as the prelude to the LZW compressed stream. As the value is unsigned now, it can only represent positive values, and defines the maximum number of bits for a code in the LZW compressed stream. Typical value here is 11 (0x0b) but the valid range is 9-11.
Compressed stream
The lz_data contains the variable length RLE+LZW compressed data. This compression scheme is common for PICv1 to PICv3, see the “compression” section below for further detail.
Consolidated Reference Structures
For reference here are what the combined headers would look like for each of the PICv3 blocks. These structures should be considered unaligned and packed, meaning that there is no additional padding for any of the members. All members are placed in the order in which they appear in the file.
typedef struct { // Full structure for PICv3 "C0"/"E0" block (dithering)
char block_id[2]; // tag "C0" or "E0"
uint16_t length; // length of the block
uint8_t first; // index of first dither entry
uint8_t last; // index of last dither entry
uint8_t dither_data[]; // last-first+1 dithering entries
} mp_picV3_dither_block_t;
typedef struct { // Full structure for PICv3 "M0" block (palette)
char block_id[2]; // tag "M0"
uint16_t length; // length of the block
uint8_t first; // index of first palette entry
uint8_t last; // index of last palette entry
pal_t palette_data[]; // last-first+1 RGB entries
} mp_picV3_palette_block_t;
typedef struct { // Full structure for PICv3 "X0"/"X1" block (image)
char block_id[2]; // tag "X0" or "X1"
uint16_t length; // length of the block
uint16_t width; // image width in pixels
uint16_t height; // image height in pixels
uint8_t max_bits; // maximum code width for LZW data
uint8_t lz_data[]; // RLE+LZW compressed stream
} mp_picV3_image_block_t;
PC-98 Version (PIC98)
PIC98 or PIC for PC-98 is a complete departure of the other PIC versions, it uses an entirely new structure and compression scheme. We first looked at PIC98 here, and originally identified this version as PIC93, and later renamed it to PIC91 when we discovered the connection with Civilization for PC-98. This version looks to be specifically for, and closely tied with the capabilities of, the NEC PC-9801 (PC-98), hence the name I’ve chosen for it. Despite being made for the PC-98, this format did find its way back to the PC platform with Railroad Tycoon Deluxe. So far we’ve identified the following MicroProse titles to use the PIC98 format.
| Game Title | Platform | Release Year | Asset Date |
|---|---|---|---|
| Sid Meier’s Civilization | PC-98 | 1992 | |
| Sid Meier’s Railroad Tycoon* | PC-98 | 1991 | |
| Sid Meier’s Railroad Tycoon Deluxe* | PC | 1993 | Jun 93 |
*Railroad Tycoon Deluxe is a PC port of the PC-98 Version of the Railroad Tycoon (which in itself was a port, and graphic overhaul of the original PC title)
PIC98 Structure
PIC98 files consist of a well defined header followed by four image plane blocks of LZSS compressed data. This arrangement appears to only support 16 colours, though unlike EGA images, the 16 colours are fully programmable via a palette table found in the header. This is the first version of PIC to have a file based tag that can be used to to identify the format.
Header
PIC98 files begin with a fixed 24 byte header defining the dimensions and the palette for the image, as well as having a defining signature tag for the file.
typedef struct {
uint8_t r; // Red component value 0-15
uint8_t g; // Green component value 0-15
uint8_t b; // Blue component value 0-15
} pal_t;
typedef struct { // Pic98 Header
char sig[4]; // [00-"H8"-00] Pic98 signature
uint16_t width; // image width in pixels
uint16_t height; // image height in pixels
pal_t pal[16]; // RGB palette for this image (4 bits per component)
uint8_t data[]; // block data
} mp_pic98_t;
Signature
All PIC98 files begin with the sequence 00 “H8” 00 or in hex 00 48 38 00. It is unknown if there is any meaning to this value, or if the value provides any additional definition to the structure of the format, only the “H8” version has been seen in the wild to date.
Image Dimensions
The image width and height are 16bit values representing the pixel dimensions of the image.
Palette
The palette for a PIC98 file always consists of sixteen 8bit RGB entries. However the valid range for each colour component is only 4bits (0-15) as the PC-98 only had a 12bit DAC in its video system, meaning it can display a maximum of 4096 distinct colours.
Plane Block
Each PIC98 file contains four plane blocks of data after the header. Each plane block is prefixed with its size, and compressed individually. One thing to note is that plane blocks must start on a 16bit word boundary, meaning that the previous block may need to be padded by a byte to maintain alignment. This is the first time we’ve seen enforcement of data alignment in any of the PIC versions.
typedef struct { // Pic98 Plane Block
uint16_t length; // length of lz_data for plane
uint8_t lz_data[]; // LZSS compressed plane data
} mp_pic98_plane_t;
Block Size
The length field is a 16bit value containing the total length in bytes for the block that remains. (so does not include the length field itself). This value is needed in order to locate the next block, as there is no other identifying markers.
Compressed Data
The lz_data for PIC98 utilizes a planar pixel arrangement that is then LZSS compressed. This is a major departure from the previous PIC versions we’ve looked at which use LZW compression for the image data. PIC98 also does not use an underlying RLE compression like the previous versions. The arrangement here appears to closely mimic the arrangement of the graphics hardware on the PC-9801 itself. Further detail can be found in the “compression” section below in the Planar Pixels and LZSS Compression sections.
PIC Compression
All PIC files are compressed images. In the case of PICv1 – PICv3 multiple layers of compression are applied to try to minimize the file size. With the exception of PIC98 all PIC files use the same compression stack that looks like the following. (When compressing an 8bit image, the pixel packing stage is bypassed)
[PICv1-v3] <=> LZW Compression <=> RLE Encoding <=> Pixel Packing (4bit only) <=> [RAW Image]
For PIC98 files a different compression method, and packing method are used so the compression stack looks a bit different. (note PIC98 files are 4 bit only images)
[PIC98] <=> LZSS Compression × 4 <=> Pixel Planar <=> [RAW Image]
Details for each of the individual stages will be discussed below.
Pixel Packing (PICv1–PICv3)
In order to minimize the space used by a 16 colour (or less) image pixels can be packed two to a byte to instantly cut the storage size in half. PIC files do this by storing the left most pixel in a pixel pair into the low nibble (bits 0-3) of a byte, and the rightmost pixel of the pair into the upper nibble (bits 4-7). Pixel packing is not (more accurately cannot) be performed on 8 bit colour images. Pixel packing is the first stage in the compression pipeline when compressing a RAW Image, and the last stage when decompressing. More detail on the packing and associated code can be found here.
One thing that didn’t come up in the discovery phase, was how to handle odd pixel widths. While I didn’t mention it in my blog posts at the time, the code needs to account for this, as you can’t just store a nibble of data. It wasn’t an issue with any images we worked with as those all tended to be full-screen images always ending on an even boundary. As such the ‘stride’ (bytes per line) of the image was always the same as the width (or even fraction of). This is not possible with an odd-width, as you will end in the middle of a byte. While you could start with the first pixel of the next line, this does not make much sense. Doing so makes pixel addressing more difficult, if the packed format is retained for processing. Instead the lines are padded out to an even boundary. This extra-pixels worth of data is not valid image data and should be discarded when unpacking. When packing the padding should be set to either 0 or whatever the background colour is. The net-effect of this is that the packed stream of data will look like it is one pixel wider than what was specified.
int stride = (width + 1) / 2
The size of the resultant packed stream will be stride * height and not width * height / 2 bytes. When unpacking you don’t need to worry about allocating extra space, unless you don’t discard the padding as you go. In that case you will need to allocate height extra bytes, and then come back and shift all the data back to their correct positions. (or have your code be able to handle stride and width separately at all times)
RLE Encoding (PICv1–PICv3)
The next level of compression used by PIC (Versions 1-3) files is to Run Length Encode (RLE) the data. RLE encoding takes runs of repeated values and replaces them with a shorter code that can then be used on the decompression side to recreate the longer run. RLE is a very lightweight and fast compression method. All PIC (Versions 1-3) images use RLE compression regardless of the pixel packing method.
The type of RLE encoding used by the PIC file format uses a special control token value (0x90 in this case) to indicate when a run is being encoded. An encoded run occupies 3 bytes in the stream with the following sequence:
VV 90 CC
Where VV is the repeated byte in the run, and CC is the count (length) of the run minus one. Meaning that if the count is set to 5, the resultant decoded length is 6. Or if you are encoding a run of 6 repeated bytes, you set the count to 5 in the resultant code.
Because of this form of encoding the value of the token needs to be reserved. A means to encode data that might have the same value as the token must be provided as well. This is done by providing a count value of 00 as it makes no sense to encode a run of one byte, in fact it makes no sense to try to encode a length less than 4 bytes as that would result in expansion, or no gain. So this means that the following sequence:
90 00
Would be seen in the stream whenever a 90 is in the actual data being encoded. This unfortunately results in a slight expansion, but hopefully the chosen token is a rare value in the actual data stream, and the slight expansion is far outweighed by compression elsewhere. More detail on the RLE encoding can be found here.
LZW Compression (PICv1–PICv3)
The final stage in the compression pipeline for PICv1 to PICv3 images, or the first stage when decompressing, is Lempel-Ziv-Welch Compression (LZW) which we first talked about in this post. The LZW compressor is a pretty standard implementation of the algorithm, the only oddity to keep track of is that code 128, which is the first new code that would be generated when compressing or decompressing, is reserved, so the actual first code is 129. However after a table reset, which happens when we reach the maximum code size and exhaust all the codes, code 128 is no longer a reserved code. The other aspect we’ve talked about with the PIC format has been the max-bits value that is part of every PIC file before the LZW stream begins. This value sets what the maximum code size is for the given stream. The valid range looks to be 9-11 bits, with 11 bits being the most common setting in files from MicroProse.
Planar Pixels (PIC98)
With PIC98 pixels are packed in an entirely different manner than they were in PICv1–PICv3. Here they are decomposed onto 4 separate planes, with each plane holding 1 bit of the pixels 4 bits, meaning that each byte on a plane holds a single bit from 8 different pixels. The pixels are encoded into the bytes of each plane with the most significant bit of the byte on a plane being the left most pixel in the string of pixels being fed in. The planes are ordered within the PIC98 file from the least significant bit to the most significant bit, meaning that the plane that holds all the bit 0’s of the pixels is stored first. You can find our determining of the bit and planar order here.
LZSS Compression (PIC98)
Finally for PIC98 the compression scheme is Lempel–Ziv–Storer–Szymanski (LZSS) which we first explored in this post. More importantly though is that the particular implementation of the LZSS algorithm used is one created by Fabrice Bellard, that he used with his LZEXE compression utility. In this implementation the LZSS flag bits are stored up in 16bit control words that are then prefixed before the “compression unit” that they control in the stream. I highly recommend reading my posts on writing the LZSS compressor to get a better understanding of the implementation details, and specifics for the PIC98 implementation.
PIC Aliases
In our explorations we’ve seen a few different extensions used with the PIC file format, likely to denote how they are used within the game. This list should not be considered complete, or definitive, some of these extensions could be used with other data formats on other titles.
- .SPR – Sprite files. These are full screen PIC images containing various sprites. These are typically found with some of the older titles using the PICv1 format.
- .SPK – Sprite Pack? We’ve only seen this one with one or two titles, and it consists of several PICv1 files concatenated together.
- .MAP – Map Image. I’ve seen this with a few titles, usually contained within a container file. Mostly these seemed to be PICv3.
Companion Formats
There is one main companion format to the PIC file, and that is the PAL (palette) file. It is not always present, in fact they are relatively rare. PAL files really only apply when 256 colour images are involved, as for CGA and EGA the palettes are defined differently. PAL files are typically 768 bytes in size containing 256 RGB values. Each component value is typically limited in range between 0 and 63, corresponding to the limits of the VGA’s hardware DAC. Though with some later titles this changed to 0-255. PAL files generally only exist for PICv1 and PICv2 images, as both PICv3 and PIC98 are capable of containing their palettes internally.
typedef struct {
uint8_t r; // Red component value typically 0-63
uint8_t g; // Green component value typically 0-63
uint8_t b; // Blue component value typically 0-63
} pal_t;
typedef struct {
pal_t pal[256]; // full 8 bit palette
} pal_file_t;
PIC Adjacent
We’ve also recently come across a couple of PIC like formats. Both of these formats were found within the container files of M1 Tank Platoon, and most closely resemble the PICv1 format, or perhaps even rely on it, but they do not fully conform to PICv1, or any of the other variants.
PK Image Files
We first discovered and explored the PK images files while investigating another variant of the .CAT container format that was included with M1 Tank Platoon. There we had 3 separate container files, and within each were the main images for the game. The three container files corresponded to the three graphics modes supported by the game. (not identifiable by name, only by discovery by looking at the images)
256 Colour PK Images
This version of the PK image is actually fully compliant with the PICv1 format. If this was the only variant of the file we found, we would have classified this as an alias format.
16 Colour and 4 Colour PK Images
Unlike its 256 colour counterpart, the 4bit packed variant used for the CGA and EGA images breaks the format rules for PICv1. Instead of having the format byte have a positive value to indicate the 4bit packed pixel arrangement, the value is kept at negative indicating an 8bit linear arrangement, however the data is actually 4bit packed. To decode this variant the format byte needs to be corrected to indicated the 4bit packed pixel arrangement. Alternatively it can be decoded at half the horizontal resolution and then unpacked later. But otherwise this format is again fully compatible with the PICv1 format, once the correction is made.
MAX Image Files
Like the PK files above we discovered the MAX images while exploring M1 Tank Platoon. Unlike the PK format, this version does not conform to the standard PIC compression pipeline. The MAX image file is even not really a file, it’s actually just a named block of data within a larger container file. These container files have no extension, though are fairly similar to a CAT file. They have an index of the named assets (though no file extensions in this case), followed by blocks of data for each asset in the file. These files all have the name format of [graphics adapter]MAXn Where [graphics adapter] is a 4 character label for the adapter in question, if shorter than 4 a trailing underscore is added. ‘n’ is a single digit character, the files are numbered 1-4 (5 for the MCGA files) For more information about the container format, and how we decoded the MAX images you can read about it here.
In conversation with my friend PixelWings he pointed out that the name MAX in the filenames likely refers to Max Remington who was the main graphics developer at MicroProse back in the day. So with that spirit in mind, I’m considering the image format within to be a MAX image, and the container file is just named after what it contains, we’ve seen this with other titles that have files like “PICS.CAT” that contain the PIC assets.
So while the MAX image format is not really a file, for the sake of convenience we can consider it as such. We can extract the named data blocks out of the container file as individual MAX files, which is again a name that we’ve given them in this case. The MAX format seems to make three departures from the normal PIC compression stack. Firstly the format byte is always set to a negative value, indicating linear pixel arrangement, this is also supported by the image width value being in bytes and not pixels (so half the horizontal resolution in pixels when dealing with 4bit packed images). The next major departure is that the 4bit packing order is opposite of the normal PIC flow, that is to say the high nibble is the leftmost pixel in a MAX file, while the low nibble is the leftmost pixel in a normal PIC file (which may explain the need for the format byte to always be negative so the PIC 4bit unpacker is bypassed). Finally, and probably the biggest deviation, is that there is no underlying RLE compression used on the data stream before LZW compression on MAX images. The type of MAX file, 4bit (4 or 16 colour) or 8bit (256 colour), can only be determined by the container file they are inside of, there are no other definitive markers in the MAX image data header to indicate this. The compression path for a MAX file looks like the following
[MAX] <=> LZW Compression <=> Pixel Packing (4 bit only) <=> [RAW Image]
MAX Structure
With MAX we have a structure that is almost more similar to PICv2 than PICv1, in that it contains image dimension information. But then we appear to go back to a signed format byte value instead of an unsigned max-bits value.
The MAX format has 5 bytes long header that defines the image dimensions in bytes, and the LZW compression bit width. A flag bit is also used to indicate the presence of palette data after the image data in the compressed stream.
typedef struct { // MAX image format
uint16_t width; // image width in bytes
uint16_t flagged_height; // image height in lines, + palette flag bit
int8_t format_byte; // Format Identifier
uint8_t lz_data[]; // LZW compressed stream
} mp_max_t;
Image Dimensions
The image width and flagged_height are 16bit values representing the pixel dimensions of the image in bytes and lines, as opposed to pixels with the other PIC variants. The most significant bit of the height value must be masked off to get the actual height of the image. That bit is instead the palette flag.
Palette Flag
The image palette flag is encoded as a single bit in the most significant bit position of the flagged_height value. When this bit is set to 1 it indicates that a 256 entry RGB palette (768 bytes) is included after the image data within the compressed data stream.
Format Identifier
The format identifier is an 8bit signed value, with its absolute value representing the maximum code width for the LZW compressed stream that follows. For MAX files this value must be negative, indicating linear pixel data in PICv1 terms, regardless of the pixel packing method actually used. (valid range is still likely -9 to -11)
Compressed stream
The lz_data contains the variable length LZW only compressed data. This LZW compression is the same as with PIC, but the data does not use the additional RLE compression applied before LZW compression as PIC files normally do, see the “compression” section above for further detail on the LZW compression.
256 Colour MAX Images
This version of the MAX image is the only time we see the palette flag come into play. Not all 256 colour image have the flag bit set, but when it is set, the size of the palette is always 768 bytes. This means that when decompressing the buffer needs to be 768 bytes larger than the calculated size for the image. The Palette data is in the same format we see with PAL files and the other PIC versions that support palette data with a range of 0-63 for each colour component.
16 Colour and 4 Colour MAX Images
This version of the MAX image is where we see that the width value is half that of what we would expect because it is actually expressed in bytes instead of pixels, and the data is packed two pixels per byte. This is also the version where a different 4bit unpacker needs to be used because the pixels are packed into the nibbles in the opposite order to that of a normal PIC file. (high nibble is the left most pixel here)
References
The findings documented here for the MicroProse PIC File Format did not happen in a vacuum of course. So I want to make a few key acknowledgements for people that have previously gone down this road before, and made their findings known.
- Joel “Quadko” McIntyre for his initial dive, and Darklands .PIC Image File Format document (PICv3). His format description keyed me into a few key details that let me get started much quicker.
- “Darkpanda” for their post over on the CivFanatics forum identifying the compression format being LZW, once again saving several steps and a lot of time in the process.
Final words
That pretty much sums up all the knowledge for the MicroProse PIC file format that we’ve accumulated over the last 4 months. The next few posts will concentrate on pulling all the prototypical code we’ve written together to produce a fully functional PIC library capable of reading and writing any of the versions. Along with it we’ll produce some example/utility applications to make use of the library for converting to and from a more convenient file format.
Leave a reply to andreas86x Cancel reply