Difference between revisions of "PAK (Metroid Prime)"

From Retro Modding Wiki
Jump to: navigation, search
m (it's probably not actually that minor)
(File Order)
Line 122: Line 122:
 
Additionally, in Prime 2 any files can be left uncompressed if the compressed file is larger than the uncompressed one. This is not the case in Prime 1 (although custom repacking implementations should probably do this regardless).
 
Additionally, in Prime 2 any files can be left uncompressed if the compressed file is larger than the uncompressed one. This is not the case in Prime 1 (although custom repacking implementations should probably do this regardless).
  
== File Order ==
+
== Optimizing Load Times ==
  
File order matters significantly and easily makes the difference between a pak loading quickly and optimally, or taking 30+ seconds on every door. While more research is required to figure out exactly how files should be ordered to optimize loading, the game generally clusters together resources that are used together, and a file's dependencies generally appear directly before the file itself. The game also often duplicates assets that are used multiple times within the same pak; there might be 10-11 copies of the same model file in order to allow the game to more easily find and load that file when it needs it.
+
When rebuilding new paks, it's extremely important that asset order be optimized in order to minimize the distance the game has to seek on the disc in order to load any given asset. A poorly optimized pak can easily have 30+ second load times on small rooms. There are two main things that are done in order to keep load times fast.
 +
 
 +
* In world paks, assets are often duplicated so they appear multiple times within the pak; if an asset is used by a lot of different areas this can help reduce seek distance, but has the downside of bloating the total file size of the pak. To ensure the best balance between the two, asset duplication was likely flagged on a per-area basis, as large areas tend to have a lot of duplicate resources, while smaller ones that load quickly anyway don't.
 +
* Assets are generally grouped by load order - things that are used together are adjacent in the pak, and an asset's dependencies typically appear immediately preceding the asset itself. This grouping helps ensure that even when assets aren't duplicated the game will still be able to do just one large seek and then load a bunch of assets at once, instead of having to do a separate seek for each asset.
 +
 
 +
For every asset loaded, the game checks every copy of the asset in the pak and loads whichever duplicate is the shortest distance from the last read position (the end of the last asset that was loaded). When loading an area, this is the order the game loads assets in:
 +
* First the game goes down the area dependency list in the [[MLVL (File Format)|MLVL]] file and loads every file in the list for every active layer. The game follows the exact order specified by the list.
 +
* After loading all area dependencies, the area itself is loaded.
 +
* Finally the game loads assets being used by the new area that are missing from the list. This primarily means [[SCAN (File Format)|SCAN]] files, the skybox model, and assets being used by PlayerActor animsets (this avoids loading PlayerActor assets for suits that the player doesn't have). If anything else is missing from the list, they will still be loaded, but this step will cause the game to hang until the load is finished.
 +
* Note that any assets that are already in memory will be skipped.
 +
 
 +
'''Note:''' There are some discrepancies between the order assets appear in the pak and the actual load order specified by the MLVL file. There might be some more research needed to fully explain all the quirks behind Retro's pak optimization.
  
 
== Tools ==
 
== Tools ==

Revision as of 01:34, 10 August 2016

See PAK (File Format) for the other revisions of this format.

The .pak format in Metroid Prime and Metroid Prime 2 is a fairly simple packfile format; files are stored with a 32-bit file ID, and can optionally be compressed with zlib (Metroid Prime) or LZO1X-999 (Metroid Prime 2). The assets in a pak are split into two groups: named resources and dependencies. In general, only named resources are accessed directly by the game; the rest are dependencies of the named resources, and are accessed indirectly in the process of parsing those files.

Note that the Metroid Prime 3 E3 prototype uses this pak format as well, but has 64-bit file IDs. This is the only difference; the version number is still the same, so there isn't an easy way to check for this variation of the format.


Morphball render.png This file format is almost completely documented
Some research is needed to figure out the optimal way to order files to optimize load times.


Format

Header

The pak starts with a short 8-byte header:

Offset Size Description
0x0 4 Version number. Always 0x00030005.
0x4 4 Unused; always 0
0x8 End of header

Named Resources

The named resource table lists files that the game has direct access to. On world paks, this will generally only contain the MLVL file; in other paks, this table is usually quite a bit larger. Note that in non-world paks, the names are hardcoded and are how the game knows where to find the files; if you repack, you need to make sure you keep the names the same.

This section of the file begins with a 32-bit named resource count, followed by a table.

Offset Size Description
0x0 4 File type fourCC.
0x4 4 File ID.
0x8 4 Name length (NL)
0xC NL Name; not zero-terminated
0xC + NL End of entry

Resource Table

This could be considered the main table of contents of the pak. This table begins with a 32-bit resource count, followed by one entry per file; each entry is 0x14 bytes large.

Offset Size Description
0x0 4 Compression flag; this will either be 0 or 1, with 1 denoting a compressed file.
0x4 4 File type fourCC.
0x8 4 File ID
0xC 4 Size (note: always a multiple of 32. The end of the file is padded with 0xFF in order to be 32-byte aligned.)
0x10 4 Offset
0x14 End of entry

Compression

Compressed files begin with their 32-bit decompressed size, followed by the compressed file data. Metroid Prime uses zlib, which is easily recognized from zlib's 0x78DA header at the start of the compressed data. Each file is compressed as a single zlib stream.

Metroid Prime 2 uses segmented LZO1X-999, and gets slightly more complicated. Metroid Prime 2's compressed files are split up into multiple segments of compressed data, each of which is 0x4000 bytes large when decompressed (except for the last one) and should be compressed and decompressed separately. Each segment begins with a 16-bit size value before its actual compressed data begins.

Only certain formats are compressed. The following formats are always compressed:

The following formats are compressed when their uncompressed size is at least 0x400 bytes (0x40 bytes in the MP3 prototype):

Additionally, in Prime 2 any files can be left uncompressed if the compressed file is larger than the uncompressed one. This is not the case in Prime 1 (although custom repacking implementations should probably do this regardless).

Optimizing Load Times

When rebuilding new paks, it's extremely important that asset order be optimized in order to minimize the distance the game has to seek on the disc in order to load any given asset. A poorly optimized pak can easily have 30+ second load times on small rooms. There are two main things that are done in order to keep load times fast.

  • In world paks, assets are often duplicated so they appear multiple times within the pak; if an asset is used by a lot of different areas this can help reduce seek distance, but has the downside of bloating the total file size of the pak. To ensure the best balance between the two, asset duplication was likely flagged on a per-area basis, as large areas tend to have a lot of duplicate resources, while smaller ones that load quickly anyway don't.
  • Assets are generally grouped by load order - things that are used together are adjacent in the pak, and an asset's dependencies typically appear immediately preceding the asset itself. This grouping helps ensure that even when assets aren't duplicated the game will still be able to do just one large seek and then load a bunch of assets at once, instead of having to do a separate seek for each asset.

For every asset loaded, the game checks every copy of the asset in the pak and loads whichever duplicate is the shortest distance from the last read position (the end of the last asset that was loaded). When loading an area, this is the order the game loads assets in:

  • First the game goes down the area dependency list in the MLVL file and loads every file in the list for every active layer. The game follows the exact order specified by the list.
  • After loading all area dependencies, the area itself is loaded.
  • Finally the game loads assets being used by the new area that are missing from the list. This primarily means SCAN files, the skybox model, and assets being used by PlayerActor animsets (this avoids loading PlayerActor assets for suits that the player doesn't have). If anything else is missing from the list, they will still be loaded, but this step will cause the game to hang until the load is finished.
  • Note that any assets that are already in memory will be skipped.

Note: There are some discrepancies between the order assets appear in the pak and the actual load order specified by the MLVL file. There might be some more research needed to fully explain all the quirks behind Retro's pak optimization.

Tools