This page explores how Garage manages data internally, covering the distributed systems concepts and design decisions that power its architecture.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/deuxfleurs-org/garage/llms.txt
Use this file to discover all available pages before exploring further.
Core concepts
Garage’s architecture is built on proven distributed systems research:Dynamo ring
Consistent hashing ring for data distribution (paper)
CRDTs
Conflict-free Replicated Data Types for eventual consistency (paper)
Quorum consensus
Majority-based consistency without leader election
Gossip protocol
Decentralized cluster membership and failure detection
Request routing logic
Data retrieval requests to Garage endpoints (S3 API and websites) are resolved to an individual object in a bucket. Since objects are replicated to multiple nodes, Garage must ensure consistency before answering requests.Using quorum to ensure consistency
Garage ensures consistency by attempting to establish a quorum with the data nodes responsible for the object. When a majority of data nodes have provided metadata on an object, Garage can answer the request.Query preferred nodes
Assuming 3 replicas (recommended), make requests to the two preferred nodes for object metadata
Garage dynamically determines which nodes to query based on health, preference, and which nodes actually host the data.
No primary concept
Garage has no concept of “primary” nodes. Any healthy node with the data can be used as long as a quorum is reached for the metadata. Benefits:- Better load distribution
- No single point of failure
- Faster responses (use closest/fastest node)
Node management
Node health monitoring
Garage maintains cluster health through active monitoring:- TCP session management: Keeps a TCP session open to each node
- Periodic pinging: Regularly pings all nodes to verify connectivity
- Failure detection: Marks nodes as failed if:
- Connection cannot be established
- Node fails to answer multiple pings
- Routing decisions: Failed nodes are not used for quorum or internal requests
Node preference
Garage prioritizes which nodes to query according to specific criteria:Garbage collection
Garbage collection is critical for maintaining data integrity. A faulty procedure was the cause of critical bug #39, leading to significant improvements in PR #135.Table entry garbage collection
TheEntry trait for table entries defines an is_tombstone() function that returns true for deleted entries.
CRDT semantics and tombstones:
CRDT semantics keep all tombstones by default because they’re necessary for reconciliation:
- If node A has a tombstone superseding value
x - And node B has value
x - Node A must keep the tombstone to properly delete
xat node B - Otherwise, value
xwould flow back from B to A (deleted item reappears)
table/gc.rs) can delete tombstones UNDER CERTAIN CONDITIONS:
All nodes responsible for storing this entry are aware of the tombstone’s existence, ensuring they cannot hold a superseded version.
Data block garbage collection
Blocks in the data directory are reference-counted. Safety measures (introduced in PR #135):Tombstone delay
24-hour delay before anything is garbage collected in a table
Block deletion delay
10-minute interval after RC reaches zero before blocks can be deleted
- Security: Prevents race conditions like bug #39
- Practicality: Avoids disk space explosion during rebalancing (which would occur with a 24-hour delay)
Rebalancing considerations
The challenge: During the offload interval, the GC doesn’t check with the offloading node before deleting tombstones. If that node hasn’t received the tombstone before offload completes, old data could reappear. Current solution: The 24-hour delay works under the assumption that rebalances complete within 24 hours. Future improvements: In distributed systems, time-based assumptions are generally considered bad practice (synchrony assumptions). To maximize Garage’s applicability, we’d like to:- Find a way to safely disable GC during data shuffling
- Safely detect when shuffling has terminated
- Resume GC after completion
Consistency model
Garage uses eventual consistency with quorum-based operations:- Writes: Succeed when acknowledged by a majority of replicas
- Reads: Return data when a majority of replicas agree on object state
- Conflicts: Resolved through CRDT merge semantics
- Tombstones: Maintained until safe to delete
This model prioritizes availability and partition tolerance over strong consistency (AP in CAP theorem).
Further reading
- Gateway configuration - Learn about gateway node deployment
- Cluster layout management - Understand cluster topology configuration
- Old design draft - Historical design documentation