Chunking Recommendations
This page lists common chunking rule recommendations for Minecraft save scenarios
Not all of these rules are suitable for the default configuration. The default configuration should stay conservative and only cover file types with stable benefits, few dependencies, and low false-positive risk
Recommendation Table¶
| File type | patterns | Recommended algorithm | Threshold | Rating |
|---|---|---|---|---|
| Minecraft Anvil files | *.mca |
fixed_auto |
256 KiB |
★★★★★ |
| Log files | *.log |
fixed_128k |
10 MiB |
★★★★★ |
| SQLite main databases | *.db, *.sqlite, *.sqlite3 |
fastcdc_32k |
20 MiB |
★★★☆☆ |
| SQLite WAL files | *.db-wal, *.sqlite-wal, *.sqlite3-wal, *.wal |
fixed_128k |
10 MiB |
★★★☆☆ |
| JSONL record files | *.jsonl |
fixed_128k |
10 MiB |
★★★☆☆ |
| JSON / YAML state files | *.json, *.yaml, *.yml |
fastcdc_32k |
20 MiB |
★★☆☆☆ |
Minecraft Anvil Files¶
Minecraft world data such as region, entities, and poi data is usually stored in .mca files
These files are internally organized around 4 KiB pages. When the world is running, only some chunks, entities, or POI data change, so many pages from old backups may be reusable
fixed_auto is recommended. It uses 128 KiB as the default granularity, then falls back to 4 KiB granularity after detecting changed windows,
which provides a reasonable balance between metadata overhead and reuse quality
Example configuration:
{
"algorithm": "fixed_auto",
"file_size_threshold": 262144,
"patterns": [
"*.mca"
]
}
SQLite Main Databases¶
Many plugins use SQLite to store permissions, economy data, records, map caches, or other structured data
Common suffixes for main database files include .db, .sqlite, and .sqlite3.
These files are usually not pure append-write files; database pages may be modified, moved, or rewritten in the middle of the file
fastcdc_32k is recommended. CDC determines chunk boundaries from content, so it adapts better than fixed-size chunking to local changes inside databases
This rule is not recommended for the default configuration, because it introduces the optional pyfastcdc dependency,
and not every .db file is large enough or has stable reuse benefit
Example configuration:
{
"algorithm": "fastcdc_32k",
"file_size_threshold": 20971520,
"patterns": [
"*.db",
"*.sqlite",
"*.sqlite3"
]
}
SQLite WAL Files¶
When SQLite runs in WAL mode, new writes are first recorded into WAL files, with common suffixes such as .db-wal, .sqlite-wal, .sqlite3-wal, and .wal
The write pattern of WAL files is usually close to tail append, so fixed-size chunking can steadily reuse earlier old content while avoiding the CDC dependency
fixed_128k is recommended. Compared with 4 KiB or 32 KiB granularity, it can significantly reduce the number of chunks and is safer for large WAL files
Note that WAL files may be truncated or recreated after checkpointing. If a WAL file is often reset completely between backups, chunking benefit will decrease
Example configuration:
{
"algorithm": "fixed_128k",
"file_size_threshold": 10485760,
"patterns": [
"*.db-wal",
"*.sqlite-wal",
"*.sqlite3-wal",
"*.wal"
]
}
Log Files¶
Log files usually end with .log, and are common in the server itself, MCDR, plugins, or mods
Large logs are typically appended at the tail. For this kind of file, fixed-size chunk boundaries for old chunks are not affected by appended content, so existing content can keep being reused
fixed_128k is recommended. It does not require CDC, and its metadata overhead is much lower than smaller fixed-size chunks
Example configuration:
{
"algorithm": "fixed_128k",
"file_size_threshold": 10485760,
"patterns": [
"*.log"
]
}
JSONL Record Files¶
Some plugins or helper tools use JSONL to record events, statistics, or history data, where each line is usually an independent record
If the file mainly grows by appending records, it behaves similarly to log files and is suitable for fixed-size chunking
fixed_128k is recommended. It can steadily reuse old content while avoiding the optional CDC dependency
Example configuration:
{
"algorithm": "fixed_128k",
"file_size_threshold": 10485760,
"patterns": [
"*.jsonl"
]
}
JSON / YAML State Files¶
.json, .yaml, and .yml files are common in the plugin ecosystem, but most configuration files are small and not worth chunking
Chunking is only worth considering when these files become large state files. For example, some plugins write player data, claim data, statistics, or cache data into a single large text file
fastcdc_32k is recommended. When JSON / YAML is rewritten, insertions, deletions, or field changes may shift later content,
and CDC can preserve reusable regions better than fixed-size chunking
Use this kind of rule carefully. If the file reorders fields, refreshes many timestamps, or rewrites the whole file every time it is saved, chunking benefit is usually unstable
Example configuration:
{
"algorithm": "fastcdc_32k",
"file_size_threshold": 20971520,
"patterns": [
"*.json",
"*.yaml",
"*.yml"
]
}
Poor Candidates¶
The following files are usually not suitable as chunking rule targets:
- compressed files, such as
.gz,.zip,.zst - archives or packages, such as
.jar - images, map tiles, or other media files
- many very small configuration files
- files that are randomly rewritten as a whole on every save
Even if these files are chunked, they usually only increase the overhead of hashing, database records, and pack entries