GlusterFS overview

GlusterFS is a distributed file system. Think of it as a replacement of traditional file storage (a single NFS/samba server), an alternative to Microsoft’s DFS, or a modern implementation of SAN. It really shines when you have multiple locations and need a file server which must have the same data and be continually in sync. It is also superb for virtual machine disks as they will then become highly available.

You can use GlusterFS in a replica, distributed, and distributed-replica models. Replica is where a copy of file a is located on all GlusterFS hosts. Distributed is where file a is on some hosts and file b is on the other hosts. Distributed-replica is a combination of both – in other words a subset of two distributed hosts in a parent of replicas.

To get started with GlusterFS, all you need is commodity hardware. Nothing has to match – not even the harddrive space. GlusterFS will configure the storage allocation pool automatically. I do recommend at least a 1GB NIC connection and a large internet pipe between locations. Partitioning your system appropriately must also be considered – have a separate mount for /var/log and /data. Keeping /data as the location of your shares makes adding and removing nodes consistent with the documentation.

You need at least a multiple of 2 GlusterFS hosts to experience replica, distributed (minimum of 2 hosts), and (minimum of 4) distributed-replica. If you plan on serving Virtual machines off of the GlusterFS volume, multiples of 3 are recommended. Clusters can also be geographically bound so that if one node fails, your clients will connect to another gluster server in that region rather than just any gluster node.

The quick start documentation goes over setting up two nodes, pairing them together, connecting via the GlusterFS protocol on your client, and creating 100 files. In total, this is about 6 commands.

For managing a large cluster of GlusterFS servers, one may want to take a look at heketi which manages the lifecycle of GlusterFS. Facebook also developed a tool called AntFarm, but it is currently closed source.