============================== PRX file format (Multi-level file format) ============================== by Anthony Kozar http://www.anthonykozar.net/ Jan. 25, 2012 ============================== Each of the various worlds and other level sets of LR2 are stored in a file with the .PRX extension. The PRX extension is short for "PRS (Presage) Format Resource File" and appears to be a generic container file format capable of storing multiple game resources of one or more types in the same file. In this file, I will present an overview of the PRX file format with special notes about how it pertains to the game files that store multiple game levels. (I will sometimes call these multi-level files "world files"). The PRX file is roughly made up of these sections: PRX header Table of Contents "PRS Format Resource File" string, etc. One or more Resource Chunks (each with header and data) Values in PRX files are in little-endian byte order in both the Mac and Windows versions of the game. Many values, especially in the table of contents, are 4 bytes long. The example values below are taken from GearL.PRX. It will also probably be instructive to examine the source code for my PRX Utilities programs. The files PRXfunc.h and PRXfunc.cpp show working data structures and functions for reading in PRX files (even if my understanding is incomplete). --------------------------------- Location $0 (byte 0): PRX HEADER --------------------------------- What I am calling the "PRX header" begins at byte 0 and is $90 (144) bytes long. It begins with $01 in the first byte and then 137 bytes of zero in every PRX file that I have looked at. 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; could $01 be a version number 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; for the PRX format ? 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 This is followed by six bytes at location $8A (138) containing two copies of the number of resources in the file: 28 00 ; 40 rsrcs in file (2-byte LE integer) 28 00 00 00 ; 40 again (2 or 4-byte LE integer?) The number of resources in the world files is generally twice the number of levels contained in the file. Two exceptions are WackyL.PRX which has three resources for every level and CreditL.PRX which has only one resource for each level. All of this is explained below. ------------------------------------------- Location $90 (byte 144): TABLE OF CONTENTS ------------------------------------------- This section is an array of short entries providing an index to the contents of the rest of the file. The first entry seems to be a "dummy" entry: 01 00 00 00 ; table index (1) 00 00 00 00 ; unknown; always zero ? FF FF FF FF ; offset (-1) 00 00 00 00 ; resource type in ASCII 00 00 00 00 ; resource ID 00 00 00 00 ; length of data portion of rsrc Because of this dummy entry, the number of entries in the table of contents is actually one more than the number of resources in the file. After this follow the other entries, one per resource in the file. Each of these entries is 24 bytes long and follows the general structure outlined above. I describe each of the values in more detail after the following examples (from GearL.PRX). 02 00 00 00 ; table index (2) 00 00 00 00 ; always zero ? 4C 00 00 00 ; offset of 1st rsrc (see notes below!) 4C 56 4C 00 ; rsrc type: 'LVL\0' in ASCII 51 46 00 00 ; rsrc ID: $4651 is 18001 5E 74 00 00 ; data length: 29,790 bytes 03 00 00 00 ; table index (3) 00 00 00 00 C6 74 00 00 ; offset of 2nd rsrc minus $194 58 50 4B 00 ; rsrc type: 'XPK\0' 51 46 00 00 ; rsrc ID: 18001 3E FA 00 00 ; data length: 64062 And here are the last two entries in GearL.PRX: 28 00 00 00 ; table index (40) 00 00 00 00 C4 9A 17 00 ; offset of 39th rsrc 4C 56 4C 00 ; rsrc type: 'LVL\0' 64 46 00 00 ; rsrc ID: 18020 4C 5E 00 00 ; data length: 24140 00 00 00 00 ; table index (0) 00 00 00 00 2C F9 17 00 ; offset of 40th and last rsrc 58 50 4B 00 ; rsrc type: 'XPK\0' 64 46 00 00 ; rsrc ID: 18020 F4 FB 00 00 ; data length: 64500 TABLE INDEX: The table index is a 4-byte integer and always seems to increase sequentially from 1 (although it may not need to do so?). The exception is that the last entry in the TOC always has an index of 0, at least in the PRX files that come with LR2. This seems just to be a signal that it is the last entry. UNKNOWN: The next 4 bytes are always zero in the files that I have looked at. If the field serves some other purpose, it is not apparent. OFFSET: The offset field is a 4-byte integer that measures the length in bytes from the 'P' in the string "PRS Format Resource File" (see below) to the beginning of the resource's DATA within this file (not to that resource's header as described below). These values are not absolute measurements from the beginning of the file but are relative to the end of the table of contents. In between the end of the TOC and the data of the first resource comes a $30 (48) byte block of mostly constant values (see below) plus the header of the first resource which is $1C (28) bytes long. Thus, the offset of the first resource in a PRX file appears to always be listed as $4C in the TOC but the actual offset from the beginning of the file depends on the size of the table of contents. As a result, the amount that needs to be added to the values in the table of contents to find the resources will vary from file to file. I prefer to calculate offsets to the headers of the individual resources instead of their data. So, the absolute offset of any resource header can be calculated as TOC offset value + size of header + size of TOC + 48 - offset of 1st rsrc ($90 or 144) (always $4C or 76) which is TOC offset value + 116 + (24 * (num of rsrcs + 1)) You must add one to the number of resources in the file because of the dummy entry in the TOC. Just add 28 to this offset to find the resource's data directly. (Thanks to an old email from Stephen Appleby, I finally figured out where the value of $4C is coming from, which had previously seemed a mystery). RESOURCE TYPE CODE: Next is a 4-byte code consisting of three uppercase ASCII letters followed by a null character ('\0' in C syntax = $00) that identifies what type of resource the entry is for. The type code matches the appropriate filename extension when the resource is a type that could be stored in its own file (eg. 'LVL' for level files or 'AIF' for AIFF files). Perhaps some or even all of the other resource types were also stored as separate files during development. A complete list of all resource types in LR2 is given below. RESOURCE ID: All of the resources in the game have numerical ID numbers assigned to them. The combination of resource type code and resource ID appears to be unique within each PRX file but not among all of the PRX files for LR2. Some of the Type/ID combinations that occur more than once appear to be identical resources that have been duplicated in two or more PRX files. In other cases, such as the LVL and XPK resources for each world, some of the same ID numbers are used for different resources in each world's PRX files. Sometimes related resources of different types share the same ID. I am calling this 4-byte integer in each TOC entry the "resource ID", but whether it is truly 4 bytes wide or only 2 bytes wide is somewhat unclear. I have definitely observed in the file types that occur in both little-endian and big-endian byte order (LVL and LRS saved game files) that resource ID numbers get byte-swapped as 4-byte values. However, all of the ID numbers as they are used elsewhere in PRX and LVL files have $0000 for the most significant two bytes. In addition, I have also observed that sometimes this 4-byte value in the TOC does not correspond exactly to the "real" resource ID. I don't know why, but the best I can determine is that for some resources (independent of type) the "ID value" in the TOC appears to be the sum of $00400000 and the resource ID as it is used everywhere else. I.E. sometimes a resource with ID $0000xxxx will show up as $0040xxxx in the TOC entry. Therefore, when searching the TOC of a PRX file for a resource by ID, it seems necessary to mask this field with the value $0000FFFF (or treat it as a 2-byte field). LENGTH: The last field in each TOC entry is a 4-byte integer that specifies the length of the resource in bytes. This value is the length of only the resource's data and does not include the 28-byte header that begins each resource later in the PRX file. ----------------------------------------------------- Location (varies): "PRS Format Resource File" string ----------------------------------------------------- After the table of contents comes a 48-byte block beginning with the ASCII string: 50 52 53 20 46 6F 72 6D 61 74 20 52 65 73 6F 75 ; "PRS Format Resource File" 72 63 65 20 46 69 6C 65 followed by 0D 0A 00 00 00 00 00 1A ; this is all constant 00 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 ; the num of rsrcs again The first 44 bytes of this "section" are identical in every PRX file that I have examined. $0D 0A 00 might be a carriage return, linefeed, and null character terminating the string "PRS Format Resource File". But what the purpose of the byte $1A might be, I cannot guess. At least for PRX files in LR2, it appears to always be the same. The last 4 bytes always seem to be another copy of the number of resources in the file. (Whether this is a 2 or 4 byte count, I am not certain, but it seems unlikely to matter). (I am wondering if an earlier version of this file format started here with the string "PRS Format Resource File", and the TOC was grafted on later to create a "PRs eXtended" format. Several indications suggest this possibility to me). ------------------------------------ Location (varies): RESOURCE CHUNKS ------------------------------------ Finally, after all of the above, come the actual resources. Each resource "chunk" includes a 28-byte header and then a variable-length data section. Each resource type has its own data structure, of course. Some types are always a constant length but most are not. Resources are stored sequentially in the order they appear in the TOC without any extra space between them in all of the PRX files that come with LR2. Since the TOC maintains offsets and lengths for each resource, it is conceivable that the resources could occur out of order or that extra space could be left in the file (say from the deletion of a resource without rewriting the entire file). I have not tested these possibilities however. The 28-byte header that begins each resource chunk mostly reiterates some of the metadata from the TOC. Here is the header from the first resource in GearL.PRX. 4C 56 4C 00 ; rsrc type: 'LVL\0' in ASCII 51 46 00 00 ; rsrc ID: 18001 00 00 00 00 ; always 12 bytes of zero ? 00 00 00 00 00 00 00 00 00 00 00 00 ; high bytes of ID ("flags") 7A 74 00 00 ; length of entire rsrc The resource type is the same as in the table of contents. The resource ID follows and always seems to correspond to the least-significant two bytes of the ID in the TOC but could still be a 4-byte value here. The next 12 bytes are always zero in my experience. Then comes a 4-byte value that is usually zero but is $00400000 if the resource ID in the TOC is of the form $0040xxxx. I have verified, at least for the LR2 PRX files, that the sum of this field and the ID number here in the resource header are always equal to the resource ID in the TOC. Since $00400000 has only one bit set to 1, I am conjecturing that this field contains one or more boolean "flags" that determine some meta-properties about how the resources should be handled by the game or Presage's resource editor. Only one "flag" bit is used here and I doubt that it will be possible to figure out its purpose. The final 4-byte field of the header is a little-endian integer specifying the length of the entire resource chunk INCLUDING the 28-byte header. Note that this is the opposite of the length value in the TOC that does not include the header bytes. (Stephen Appleby views this as the offset from this resource's data to the next resource's data, which makes sense too). After the header comes the resource data. Some resource types are simply embedded files and their data section has the same structure as the corresponding file type. Other resource types don't have a format that I recognize and several are likely unique to Presage games. An introduction to the LR2 resource types and their data structures is given in the next two sections. ------------------------------- Resource Types and Distribution ------------------------------- Various types of resources are found in the PRX files. Here is the list of all type codes that I have found: AIF sound resource (in AIFF format with header) LVL a single game level LVS list of levels in another PRX file MLT ?? MSK ?? (graphics masks ?) RDT ?? (references XPKs from other files ?) SDT ?? SID "sound ID" ? (metadata about AIF resources ?) SSA ?? (associates a group of sounds together for ?) TSD ?? XMV animation (movie) resource XPK graphics resource XUI user interface components ? Multi-level "world" files such as GearL.PRX typically alternate between LVL and XPK resources and I believe that the XPKs are the corresponding preview images shown for each level on the game's "Select Level" screen. This explains why, with two exceptions, there are twice as many resources in these files as there are levels. WackyL.PRX alternates 19 LVLs with 19 XPKs but then has 19 additional XPKs for the "goofy" easter egg, and thus has three resources for each level. (Thanks to Toastline for suggesting that the easter egg explains the extra XPKs). CreditL.PRX has no XPK resources because the Credits world levels are not listed or previewed anywhere. The resource IDs of each LVL and matching XPK resources are the same. The LVS resource type is only found in the Levels.PRX file. The LVS resource is the structure that groups levels together into "worlds" or other level sets. Stephen Appleby's "LR2 World Builder" program works by combining individual level files into PRX files and then rewriting Levels.PRX to reference the new worlds. It appears to only be possible to replace worlds, not add completely new ones. (The game executable probably has some logic that makes assumptions about the available worlds instead of relying completely on Levels.PRX). Audio files (those that end with "A.PRX" or "a.PRX") typically alternate between SID and AIF resources with occasional SSA resources. As indicated, the AIF resources are complete AIFF sound files including the AIFF format header that begins with the characters "FORM". Because some of the SID resources appear to occur in pairs with the AIFs, I am guessing that the SIDs could be metadata about the AIFs. My first wild guess about the SSA resources was that they could be playback data for the "3D sound" engine (SSA = "Surround sound audio" ????). But after looking at them, they appear to have something to do with grouping multiple sounds together. Graphics files (those that end with "G.PRX" or "g.PRX") contain both XPK and XMV resources as well as several of the unknown types. The XMV resources are not very big themselves but usually appear to be paired with a larger XPK that may contain the frames of the animation. (?) Types "MLT", "MSK", "RDT", and "SDT" only occur in the graphics files and Glrs.PRX. Each of these files has one RDT resource except for Gfg.PRX which does not have an RDT. RDTs do contain the resource IDs of the XPKs for "bricks" from other files but I still don't understand how they are used. There are only two MSK resources, both in Gog.PRX. Based on the type code and their location in a graphics files, I would guess that the MSK are graphics masks. Another guess based purely on the type code: perhaps "MLT" resources are related to the Editor feature that groups several bricks together ?? These files contain large numbers of SDT resources: Gbg.PRX (259), Glrs.PRX (279 of 298), Gog.PRX (119), compared to 0-22 for other files. I had guessed that maybe the few SDTs in some of the graphics files were a way to link sound resources to the graphics resources in those files that "make" those sounds. But since noticing that Glrs.PRX contains a preponderance of SDTs and only a few other resources, I think it is less likely and I am quite curious as to what the SDTs are for. There are only two TSD resources, one in each of Gbg.PRX and Glrs.PRX. The XUI resource type is found only in Editor.PRX (3) and UI.PRX (14). They probably coordinate the display of XPK resources in those files and link them to event processing code in the game and editor executables. (Just a guess). (The 56th resource in Goa.PRX (ID $22C4) is the evil laugh for "glazed donut" :) --------------------- Resource Data Formats --------------------- One common feature of the resource data is that each type appears to have a unique 4-byte signature (a "magic number") that identifies that type at the beginning of the data section. For example, the signature for a LVL resource is $17 1B 7E D8 which is the same as the first four bytes of any .lvl file created by the LR2 Editor on Windows. (The Mac version of the Editor writes the bytes in reverse order: $D8 7E 1B 17, signaling that the rest of the file will use big-endian byte order too). These signatures can be used to identify the type of any resource when it is stored in an individual file. The LVL resources in the PRX files contain data in the same format as a single-level file saved by the LR2 Editor (in little-endian format, even on the Mac). Similarly, for an AIF resource, the data section is an ordinary AIFF format file embedded into the PRX file. The XPK graphics data has been decoded by Toastline and I have discovered certain features or patterns within the data of some of the other types, but the structure and purpose of many of the resource types are still unknown to me. See these other accompanying documents for more information: 10_LVL_file_format.txt 21_Levels_PRX_contents.txt (includes description of "LVS" rsrc type) 40_RDT_resource_type.txt 41_SID_SSA_resource_types.txt 42_XPK_Graphics_Format.html