Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/deuxfleurs-org/garage/llms.txt

Use this file to discover all available pages before exploring further.

This page explores how Garage manages data internally, covering the distributed systems concepts and design decisions that power its architecture.

Core concepts

Garage’s architecture is built on proven distributed systems research:

Dynamo ring

Consistent hashing ring for data distribution (paper)

CRDTs

Conflict-free Replicated Data Types for eventual consistency (paper)

Quorum consensus

Majority-based consistency without leader election

Gossip protocol

Decentralized cluster membership and failure detection
For more background, see this presentation (French).

Request routing logic

Data retrieval requests to Garage endpoints (S3 API and websites) are resolved to an individual object in a bucket. Since objects are replicated to multiple nodes, Garage must ensure consistency before answering requests.

Using quorum to ensure consistency

Garage ensures consistency by attempting to establish a quorum with the data nodes responsible for the object. When a majority of data nodes have provided metadata on an object, Garage can answer the request.
1

Request arrives

A client makes a request for an object in a bucket
2

Query preferred nodes

Assuming 3 replicas (recommended), make requests to the two preferred nodes for object metadata
3

Try backup node

If one of the two initial requests fails, try the third node
4

Establish quorum

Check that metadata from at least 2 nodes match
5

Verify not deleted

Check that the object hasn’t been marked deleted
6

Return data

  • If object is small enough: Answer with inline data from metadata
  • Otherwise: Get data blocks from preferred nodes and answer with assembled object
Garage dynamically determines which nodes to query based on health, preference, and which nodes actually host the data.

No primary concept

Garage has no concept of “primary” nodes. Any healthy node with the data can be used as long as a quorum is reached for the metadata. Benefits:
  • Better load distribution
  • No single point of failure
  • Faster responses (use closest/fastest node)

Node management

Node health monitoring

Garage maintains cluster health through active monitoring:
  1. TCP session management: Keeps a TCP session open to each node
  2. Periodic pinging: Regularly pings all nodes to verify connectivity
  3. Failure detection: Marks nodes as failed if:
    • Connection cannot be established
    • Node fails to answer multiple pings
  4. Routing decisions: Failed nodes are not used for quorum or internal requests

Node preference

Garage prioritizes which nodes to query according to specific criteria:
1

Self-preference

A node always prefers itself if it can answer the request
2

Same-zone preference

The node prioritizes nodes in the same zone
3

Latency optimization

Finally, nodes with the lowest latency are prioritized
This preference system ensures optimal performance while respecting geographical constraints.

Garbage collection

Garbage collection is critical for maintaining data integrity. A faulty procedure was the cause of critical bug #39, leading to significant improvements in PR #135.
Rationale: We must ensure Garage’s safety by preventing premature deletion of needed data.

Table entry garbage collection

The Entry trait for table entries defines an is_tombstone() function that returns true for deleted entries. CRDT semantics and tombstones: CRDT semantics keep all tombstones by default because they’re necessary for reconciliation:
  • If node A has a tombstone superseding value x
  • And node B has value x
  • Node A must keep the tombstone to properly delete x at node B
  • Otherwise, value x would flow back from B to A (deleted item reappears)
Safe tombstone deletion: The garbage collector (implemented in table/gc.rs) can delete tombstones UNDER CERTAIN CONDITIONS:
All nodes responsible for storing this entry are aware of the tombstone’s existence, ensuring they cannot hold a superseded version.
Garage uses atomic database operations (compare-and-swap and transactions) to ensure only properly propagated tombstones are deleted.

Data block garbage collection

Blocks in the data directory are reference-counted. Safety measures (introduced in PR #135):

Tombstone delay

24-hour delay before anything is garbage collected in a table

Block deletion delay

10-minute interval after RC reaches zero before blocks can be deleted
Why 10 minutes for blocks? This is a compromise between:
  • Security: Prevents race conditions like bug #39
  • Practicality: Avoids disk space explosion during rebalancing (which would occur with a 24-hour delay)

Rebalancing considerations

When a partition is moving between nodes during rebalancing, the offload process takes time.
The challenge: During the offload interval, the GC doesn’t check with the offloading node before deleting tombstones. If that node hasn’t received the tombstone before offload completes, old data could reappear. Current solution: The 24-hour delay works under the assumption that rebalances complete within 24 hours. Future improvements: In distributed systems, time-based assumptions are generally considered bad practice (synchrony assumptions). To maximize Garage’s applicability, we’d like to:
  1. Find a way to safely disable GC during data shuffling
  2. Safely detect when shuffling has terminated
  3. Resume GC after completion
This introduces protocol complexity and hasn’t been tackled yet.

Consistency model

Garage uses eventual consistency with quorum-based operations:
  • Writes: Succeed when acknowledged by a majority of replicas
  • Reads: Return data when a majority of replicas agree on object state
  • Conflicts: Resolved through CRDT merge semantics
  • Tombstones: Maintained until safe to delete
This model prioritizes availability and partition tolerance over strong consistency (AP in CAP theorem).

Further reading