Skip to content

Fixed-Size Chunking

Beta

Fixed-size chunking is a beta feature. The overall functionality is usable, but specifications and behavior may change in future releases. Note that fixed_auto is in alpha status; see its section below for details

Fixed-size chunking splits files at predictable byte-offset boundaries, with every chunk being exactly the configured size (the last chunk may be smaller if the file size is not a multiple of the chunk size)

What Fixed-Size Chunking Is

Fixed-size chunking is conceptually simple: the file is divided into equal-sized pieces from start to end. Each piece is hashed and stored independently, just like CDC chunks

Unlike CDC, chunk boundaries do not shift when content is inserted or deleted in the middle of the file. Any edit before the end of a chunk changes that chunk's hash entirely, and any insertion or deletion causes all subsequent chunks to shift, potentially invalidating a large number of previously stored chunks

This means fixed-size chunking is generally inferior to CDC for files with arbitrary edits. Its benefit is only realized in scenarios where the file's write pattern is well-aligned to chunk boundaries

For example, with fixed_4k applied to a Minecraft region file:

+----------------------------------------------------------------------+
|                    file (e.g. r.0.0.mca)                             |
+------+------+------+------+------+------+------+------+------+-- - --+
| 4KiB | 4KiB | 4KiB | 4KiB | 4KiB | 4KiB | 4KiB | 4KiB | 4KiB |  ...  |
|  c1  |  c2  |  c3  |  c4  |  c5  |  c6  |  c7  |  c8  |  c9  |       |
+------+------+------+------+------+------+------+------+------+-- - --+

Each 4 KiB chunk corresponds to one internal page of the region file. When only a few game chunks change between backups, only the corresponding pages are dirtied, and the rest of the chunks are identical to those already stored

Available Algorithms

Algorithm Chunk Size Typical Use Case
fixed_4k 4 KiB Minecraft region files (.mca): each region file is organized in 4 KiB pages, so changes in one chunk only invalidate that 4 KiB page
fixed_32k 32 KiB General intermediate granularity
fixed_128k 128 KiB Append-only files: growth at the tail only creates new trailing chunks, leaving all previous chunks intact
fixed_1m 1 MiB Very large append-only files: even lower metadata overhead than fixed_128k, useful when fine-grained deduplication is not required
fixed_auto 128 KiB / 4 KiB Adaptive fixed-size strategy that uses the previous backup's same-path chunk layout to limit metadata growth while keeping some 4 KiB reuse

fixed_4k

The 4KiB chunk size aligns with the internal page structure of Minecraft's Anvil region files (.mca). In theory, modifying a small number of chunks in the game only dirties a limited number of 4 KiB pages, making fixed_4k capable of the finest-grained deduplication for region files

However, fixed_4k has serious practical drawbacks:

  • extremely high metadata overhead: a 1 GiB file requires roughly 262 144 chunk records
  • poor I/O performance: each chunk requires a separate read-write cycle during backup

Unless the file is very large and only a tiny number of pages change per backup, fixed_4k is unlikely to be worth the cost

fixed_32k

A middle-ground option. Metadata overhead is 32× lower than fixed_4k but granularity is also much coarser

fixed_128k

The 128 KiB chunk size is well-suited for files that grow by appending data at the end. When new data is appended, only the trailing chunks change; all preceding chunks retain the same hash and are reused

This makes fixed_128k a reasonable alternative to CDC for pure append-write files

fixed_1m

The 1 MiB chunk size further reduces metadata overhead compared to fixed_128k, at the cost of coarser deduplication granularity. It is suitable for extremely large append-only files where even the 128 KiB metadata overhead becomes a concern

For most use cases, fixed_128k or CDC variants are preferred. Consider fixed_1m only when the file is very large and write patterns are exclusively append-only

fixed_auto

Alpha

fixed_auto is in alpha status and is not well optimized for performance

fixed_auto walks the file in 128 KiB windows. For each full window, it checks the previous backup's same-path chunk layout at the same offset:

  • if the previous window was one 128 KiB chunk and the current content is unchanged, it keeps one 128 KiB chunk
  • if the previous window was one 128 KiB chunk and the current content changed, it stores the current window as thirty-two 4 KiB chunks
  • if the previous window was thirty-two 4 KiB chunks, it compares the 4 KiB hashes first; when none changed, it stores one 128 KiB chunk, otherwise it keeps thirty-two 4 KiB chunks

Missing previous data, direct blobs, irregular previous layouts, and incomplete tail windows are stored as one chunk for that window

With this, fixed_auto can achieve the following effect: for parts of a file that keep changing, it performs chunk-level deduplication at 4 KiB granularity; for other parts, it performs chunk-level deduplication at 128 KiB granularity

Since region files (.mca) in Minecraft saves are modified at 4 KiB granularity, fixed_auto is expected to achieve deduplication close to fixed_4k without introducing excessive metadata overhead

Poor Candidates

Fixed-size chunking is a poor choice for:

  • files that are frequently modified in the middle or beginning (insertion/deletion shifts all subsequent chunks)
  • files with completely unpredictable byte-level change patterns
  • files where the chunk size does not align with any meaningful internal structure

No Extra Dependencies

Fixed-size chunking has no additional Python dependency requirements. It is available as long as Prime Backup is installed