This page presents the theoretical background on which Garage is built and describes other software storage solutions and why they didn’t meet our requirements.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/deuxfleurs-org/garage/llms.txt
Use this file to discover all available pages before exploring further.
Context
Data storage is critical—it can lead to data loss if done badly or during hardware failure. Traditional solutions have limitations:Single machine
Filesystems + RAID help on one machine, but machine failure puts storage offline and limits scalability
Expensive scaling
Limits can be pushed with expensive specialized machines, but we target off-the-shelf hardware
Distributed storage helps solve both availability and scalability problems on these machines.
Storage abstraction categories
Many distributed storage solutions exist, categorized by the abstraction they provide:Block storage
What it is: The most low-level abstraction—like exposing your raw hard drive over the network. Requirements:- Very low latencies
- Stable, often dedicated networks
- Operating system can manipulate disk devices with minimal constraints
- Supports any filesystem and exotic features
- iSCSI
- Fibre Channel
- OpenStack Cinder (uniform API proxy)
File storage
What it is: Higher abstraction providing one filesystem among others. Characteristics:- Often relaxes some POSIX constraints
- Many applications remain compatible without modification
- Can run databases (like MariaDB) over them, albeit slowly
- CephFS (read RADOS whitepaper)
- Lustre
- LizardFS
- MooseFS
- GlusterFS
- OpenStack Manila (uniform API proxy)
Object storage
What it is: The highest level abstraction, acknowledging that POSIX filesystem APIs aren’t adapted to distributed filesystems. Key differences:Eventual consistency
Strong consistency dropped in favor of eventual consistency—more convenient with high latencies and unreliability
WAN-optimized
Often described as “filesystem for the WAN” (pioneered by S3)
Existing object storage solutions
MinIO
Website: min.io Shared goals:- Self-contained & lightweight
- Storage optimizations through erasure coding
- POSIX/Filesystem compatibility through strong consistency
| Property | Issue | Impact |
|---|---|---|
| Simple | Erasure coding creates severe limitations on adding/removing drives | Complex cluster management |
| Internet enabled | Strong consistency makes it latency-sensitive | Poor performance over WAN |
| Geo-distribution | No knowledge of “sites” | Can’t distribute data to minimize site failure |
OpenStack Swift
Website: docs.openstack.org/swift Issues:- Requires around 8GB of RAM to start
- Too resource-intensive for hyperconverged infrastructure
- Not Simple to operate
Ceph
Website: ceph.io/ceph-storage/object-storageThis review applies to the entire Ceph stack: RADOS paper, Ceph Object Storage module, RADOS Gateway, etc.
Additional issues:
- Industry-oriented design makes it far from Simple to operate
- Not Self-contained & lightweight
- Hard to integrate in hyperconverged infrastructure
Ceph and MinIO are closer to each other than they are to Garage or OpenStack Swift.
Pithos
GitHub: github.com/exoscale/pithos Architecture:- S3 proxy in front of Cassandra (also worked with ScyllaDB)
- Stored data directly in Cassandra
- Complicate development
- Complicate deployment
- Complicate operations
- Reduce flexibility
Riak CS
Website: docs.riak.com/riak/csAnalysis not yet written.
IPFS
Website: ipfs.ioIPFS has design goals radically different from Garage.
- IPFS focuses on content-addressed, distributed, peer-to-peer file sharing
- Garage focuses on S3-compatible object storage with geo-replication
- Different use cases and consistency models
Why Garage is different
Garage fills a unique niche by combining:Geo-distribution first
Built from the ground up for multi-site deployments over Internet connections
Eventual consistency
Embraces eventual consistency through CRDTs instead of fighting it
True simplicity
No external dependencies, minimal resource requirements, simple operations
Replication over erasure coding
Simpler data management and placement at the cost of storage efficiency
Research background
Garage is built on proven distributed systems research:- Dynamo: Consistent hashing and eventual consistency (paper)
- Maglev: Consistent hashing for load balancing (paper)
- CRDTs: Conflict-free Replicated Data Types (paper)
- LBFS: Used as basis for Seafile
- Various other academic projects that never saw effective implementation