Garage is a stateful clustered application where all nodes communicate and share data structures. This makes upgrades more complex than stateless applications, requiring careful planning and execution.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/deuxfleurs-org/garage/llms.txt
Use this file to discover all available pages before exploring further.
Understanding Upgrade Types
There are two types of upgrades:- Minor upgrade: Protocols and data structures remain the same
- Major upgrade: Protocols or data structures changed
Version Numbering
You can identify the upgrade type from the version number:- Major upgrade: First nonzero component changes (e.g., v0.7.2 → v0.8.0)
- Minor upgrade: First nonzero component stays the same (e.g., v0.8.0 → v0.8.1)
Monitoring Current Versions
Thegarage_build_info Prometheus metric shows which Garage versions are running in your cluster:
Minor Upgrades
Minor upgrades do not require cluster downtime.Preparation
- Read the changelog at git.deuxfleurs.fr/Deuxfleurs/garage/releases
- Test on a staging cluster if possible
- Check cluster health:
- Verify repairs completed:
Done state.
Upgrade Process
Upgrade nodes one by one:- Stop the Garage daemon
- Install the new binary
- Update configuration if needed
- Restart the daemon
- Repeat for next node
The cluster remains available during the entire process. Take your time between nodes to monitor for any issues.
Major Upgrades
Major upgrades can be done with minimal downtime with preparation, but the simplest method is putting the cluster offline during migration.Method 1: Full Downtime (Recommended)
This is the safest approach:Step 1: Preparation
- Disable API access (in reverse proxy or configuration)
- Verify cluster is idle - check for active requests
- Check cluster health:
Step 2: Shutdown and Backup
- Stop all Garage nodes
- Backup metadata folders on all nodes
Data blocks are immutable and don’t need backing up, but metadata must be preserved to enable rollback.
Step 3: Upgrade
- Install new binary and update configuration on all nodes
- Start all Garage nodes
- Run migrations if needed:
Step 4: Verification
- Check cluster health:
- Re-enable API access
- Monitor cluster load and application behavior
Method 2: Minimal Downtime (Advanced)
Minimal downtime is possible by coordinating a simultaneous restart of all nodes.The downtime is limited to the time needed for all nodes to stop and start (typically less than a minute).
Step 1: Preparation
- Check cluster health:
- Backup metadata on all nodes:
If automatic snapshotting is enabled, Garage only keeps the last two snapshots. Consider disabling automatic snapshots until the upgrade is confirmed successful.
cluster_layout file from any node (it’s the same on all nodes and can be copied while Garage is running).
Step 2: Preparation
- Prepare new binaries and configuration files on all nodes
Step 3: Coordinated Restart
- Restart all nodes simultaneously in the new version
Step 4: Post-Upgrade
- Run required migrations per version-specific documentation
- Online: Can run on live nodes during normal operation
- Offline: Requires taking nodes offline again one by one
Troubleshooting
Nodes Not Communicating After Upgrade
Cause: Nodes upgraded at different times using incompatible RPC protocols. Solution: Complete the upgrade on all remaining nodes as quickly as possible.Migration Fails
Cause: Cluster state incompatible with new version. Solution:- Review migration logs for specific errors
- Restore from metadata backups if necessary
- Consult version-specific upgrade documentation
Cluster Performance Degraded
Cause: New version resyncing data or running background migrations. Solution:- Check
garage stats -afor ongoing operations - Monitor
garage worker listfor active background tasks - Allow time for stabilization (may take hours for large clusters)
Version-Specific Guides
Major version upgrades may require special procedures. Check the “Working Documents” section for:- v0.7.x → v0.8.x migration guide
- v0.8.x → v0.9.x migration guide
- v0.9.x → v1.0.x migration guide
Best Practices
- Always test upgrades in a staging environment first
- Back up metadata before any major upgrade
- Read the changelog thoroughly
- Monitor during and after upgrades
- Upgrade during low-traffic periods
- Document your upgrade procedure for your specific deployment
- Have a rollback plan with metadata backups ready
Rollback Procedure
If an upgrade fails:- Stop all Garage nodes
- Restore metadata backups on all nodes
- Reinstall previous version binaries
- Restore previous configuration files
- Restart all nodes
- Verify cluster health: