Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/deuxfleurs-org/garage/llms.txt

Use this file to discover all available pages before exploring further.

This page presents the theoretical background on which Garage is built and describes other software storage solutions and why they didn’t meet our requirements.

Context

Data storage is critical—it can lead to data loss if done badly or during hardware failure. Traditional solutions have limitations:

Single machine

Filesystems + RAID help on one machine, but machine failure puts storage offline and limits scalability

Expensive scaling

Limits can be pushed with expensive specialized machines, but we target off-the-shelf hardware
Our focus: Non-specialized, off-the-shelf machines that can be as low-powered and failure-prone as a Raspberry Pi.
Distributed storage helps solve both availability and scalability problems on these machines.

Storage abstraction categories

Many distributed storage solutions exist, categorized by the abstraction they provide:

Block storage

What it is: The most low-level abstraction—like exposing your raw hard drive over the network. Requirements:
  • Very low latencies
  • Stable, often dedicated networks
Advantages:
  • Operating system can manipulate disk devices with minimal constraints
  • Supports any filesystem and exotic features
Examples:

File storage

What it is: Higher abstraction providing one filesystem among others. Characteristics:
  • Often relaxes some POSIX constraints
  • Many applications remain compatible without modification
  • Can run databases (like MariaDB) over them, albeit slowly
Examples:
  • CephFS (read RADOS whitepaper)
  • Lustre
  • LizardFS
  • MooseFS
  • GlusterFS
  • OpenStack Manila (uniform API proxy)

Object storage

What it is: The highest level abstraction, acknowledging that POSIX filesystem APIs aren’t adapted to distributed filesystems. Key differences:

Eventual consistency

Strong consistency dropped in favor of eventual consistency—more convenient with high latencies and unreliability

WAN-optimized

Often described as “filesystem for the WAN” (pioneered by S3)
The standard: The S3 HTTP REST API acts as an industry standard, though Amazon S3 source code is not open.

Existing object storage solutions

MinIO

Website: min.io Shared goals:
  • Self-contained & lightweight
Conflicting choices: MinIO selected two of our non-goals:
  • Storage optimizations through erasure coding
  • POSIX/Filesystem compatibility through strong consistency
By pursuing these non-goals, MinIO doesn’t reach our desirable properties.
Where MinIO falls short:
PropertyIssueImpact
SimpleErasure coding creates severe limitations on adding/removing drivesComplex cluster management
Internet enabledStrong consistency makes it latency-sensitivePoor performance over WAN
Geo-distributionNo knowledge of “sites”Can’t distribute data to minimize site failure

OpenStack Swift

Website: docs.openstack.org/swift
Fails on Self-contained & lightweight goal.
Issues:
  • Requires around 8GB of RAM to start
  • Too resource-intensive for hyperconverged infrastructure
  • Not Simple to operate
Similarity to Garage: Swift and Pithos are probably the most similar to AWS S3 with their consistent hashing ring approach. Why not Swift: Tests by the ACIDES project showed Swift consumes far more resources (CPU + RAM) than we can afford. Additionally, Swift developers didn’t design for geo-distribution.

Ceph

Website: ceph.io/ceph-storage/object-storage
This review applies to the entire Ceph stack: RADOS paper, Ceph Object Storage module, RADOS Gateway, etc.
Core design decision: Ceph was designed for POSIX/Filesystem compatibility, which requires strong consistency. Cascading effects:
1

Strong consistency requirement

POSIX compatibility demands strong consistency
2

Latency sensitivity

Strong consistency makes Ceph latency-sensitive
3

Fails Internet-enabled goal

High latency sensitivity prevents effective operation over Internet connections
Additional issues:
  • Industry-oriented design makes it far from Simple to operate
  • Not Self-contained & lightweight
  • Hard to integrate in hyperconverged infrastructure
Ceph and MinIO are closer to each other than they are to Garage or OpenStack Swift.

Pithos

GitHub: github.com/exoscale/pithos
Pithos has been abandoned and should probably not be used.
Architecture:
  • S3 proxy in front of Cassandra (also worked with ScyllaDB)
  • Stored data directly in Cassandra
Why it was abandoned: From the designers themselves: storing data in Cassandra showed limitations that justified abandoning the project. Pithos v2: They built a closed-source version 2 that doesn’t store blobs in the database (only metadata) but didn’t communicate further on it. Why we didn’t adopt their v2 design: It doesn’t fit our Self-contained & lightweight and Simple properties. It would:
  • Complicate development
  • Complicate deployment
  • Complicate operations
  • Reduce flexibility

Riak CS

Website: docs.riak.com/riak/cs
Analysis not yet written.

IPFS

Website: ipfs.io
IPFS has design goals radically different from Garage.
We have a detailed blog post discussing the differences: Garage vs IPFS Key differences:
  • IPFS focuses on content-addressed, distributed, peer-to-peer file sharing
  • Garage focuses on S3-compatible object storage with geo-replication
  • Different use cases and consistency models

Why Garage is different

Garage fills a unique niche by combining:

Geo-distribution first

Built from the ground up for multi-site deployments over Internet connections

Eventual consistency

Embraces eventual consistency through CRDTs instead of fighting it

True simplicity

No external dependencies, minimal resource requirements, simple operations

Replication over erasure coding

Simpler data management and placement at the cost of storage efficiency

Research background

Garage is built on proven distributed systems research:
  • Dynamo: Consistent hashing and eventual consistency (paper)
  • Maglev: Consistent hashing for load balancing (paper)
  • CRDTs: Conflict-free Replicated Data Types (paper)
Historical attempts: Many research projects attempted similar goals. For example:
  • LBFS: Used as basis for Seafile
  • Various other academic projects that never saw effective implementation
Garage takes these research concepts and creates a practical, production-ready implementation focused on real-world deployments.