Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/deuxfleurs-org/garage/llms.txt

Use this file to discover all available pages before exploring further.

To ensure the best durability of your data and fix any inconsistencies that may arise in a distributed system, Garage provides a series of repair operations. This guide explains each operation and when to use them.

General Syntax

Repair operations follow this format:
garage repair <repair_name>
Repair operations require the --yes flag to execute:
garage repair --yes <repair_name>
By default, repairs run only on the node your CLI connects to. To run on all nodes:
garage repair -a --yes <repair_name>

Data Block Operations

Data Store Scrub

Scrubbing examines each individual data block to verify correctness by checking hashes. Corrupted blocks (from bitrot or accidental datastore manipulation) are restored from nodes with valid copies. Automatic scheduling: Garage automatically schedules scrubs every 25-35 days (randomized to spread load). View the next scheduled scrub:
garage worker get
Manual scrub:
garage repair scrub start
Check scrub status:
  1. Find the scrub worker task ID:
    garage worker list
    
  2. View detailed statistics:
    garage worker info <scrub_task_id>
    
Pause a scrub:
garage repair scrub pause
Scrubs automatically resume after 24 hours. Garage will not let your cluster run indefinitely without regular scrubs.
Adjust scrub intensity: If scrubbing is too intensive and affecting performance:
garage worker set scrub-tranquility <value>
Higher tranquility values create longer pauses between block verifications, reducing load but extending total scrub time.

Block Check and Resync

Fixes cases where:
  • Nodes reference blocks that don’t exist on disk
  • Nodes have on-disk blocks that are no longer referenced
garage repair blocks
Recommended: Run this after changing cluster layout, once metadata tables have synchronized (usually a few hours after garage layout apply).

Inspecting Lost Blocks

In rare situations, data blocks may be unavailable from the entire cluster. These errors are stored in a “block resync errors” list. List blocks with resync errors:
garage block list-errors
Errors typically fall into three categories:
  1. Metadata inconsistency: Block still referenced but object was deleted
  2. Transient error: Block couldn’t be fetched due to temporary issue (network failure)
  3. Permanent loss: Block not available anywhere in the cluster
Get information about a specific block:
garage block info <block_hash>
This shows which objects reference the block, helping distinguish between the categories above. Retry fetching a block immediately:
garage block retry-now <block_hash>
Or retry all errored blocks:
garage block retry-now --all
If a data block is permanently lost (category 3), you must declare the S3 objects as unrecoverable and delete them:
garage block purge --yes <block_hash>
This is a destructive operation and should only be used when data is definitely unrecoverable.

Rebalancing Data Directories

In multi-HDD setups, ensure data blocks are balanced between storage locations:
garage repair rebalance
When to use:
  • After adding storage locations
  • After changing storage location capacities
  • To move data out of read-only locations
Once finished, Garage knows the single possible location for each block, improving access speed.

Metadata Operations

Metadata Snapshotting

It’s good practice to set up automatic snapshotting of your metadata database file to recover from corruption.
Garage can take snapshots of the metadata database itself (since v0.9.4): Take a manual snapshot:
garage meta snapshot
Take snapshots on all nodes:
garage meta snapshot --all
Configure automatic snapshots: Set metadata_auto_snapshot_interval in your configuration file:
metadata_auto_snapshot_interval = "24h"
Taking a snapshot is intensive as it requires a full copy of the database file. Consider using filesystem-level snapshots (ZFS, BTRFS) if available.
For recovery instructions, see Corrupted Metadata Recovery.

Metadata Table Resync

Garage automatically resyncs metadata tables hourly using Merkle trees to efficiently find differences between nodes. Manual table resync:
garage repair tables
Run on all nodes:
garage repair -a --yes tables
This is quick to run (typically less than a minute) and recommended before upgrades.

Metadata Table Reference Fixes

In rare cases where nodes are unavailable, references between objects may break (e.g., deleted objects may still have underlying versions or data blocks). Fix orphan versions:
garage repair versions
Checks that all versions belong to a non-deleted object and purges orphans. Fix orphan block references:
garage repair block-refs
Checks that all block references belong to a non-deleted object version. Orphan block references are purged, allowing blocks to be garbage-collected. Fix block reference counters:
garage repair block-rc
Checks that reference counters for blocks match the actual number of non-deleted entries in the block reference table.

Complete Repair List

Here’s a summary of all available repair operations:
CommandDescriptionRun After
scrub startVerify integrity of all blocksAutomatic
blocksResync block references and filesLayout changes
tablesFull sync of metadata tablesBefore upgrades
versionsFix orphan versionsSuspected corruption
block-refsFix orphan block referencesSuspected corruption
block-rcRecalculate block reference countersSuspected corruption
rebalanceBalance blocks across HDDsMulti-HDD changes

Best Practices

  1. Run garage repair tables before major operations or upgrades
  2. Monitor scrub operations - they should complete successfully
  3. Investigate non-zero block_resync_errored_blocks metrics immediately
  4. Snapshot metadata regularly using automatic snapshots or filesystem features
  5. Run garage repair blocks after layout changes
  6. Test repairs on non-production clusters first

Monitoring Repairs

Key metrics to monitor:
# Should be zero or quickly return to zero
block_resync_errored_blocks 0

# Normal to be non-zero during operations
block_resync_queue_length 42

# Track repair progress
table_merkle_updater_todo_queue_length 0
See Monitoring and Metrics for complete details.

See Also