GlusterFS – Configuration, Performance, and Redundancy

by | Nov 18, 2015

GlusterFS has been a growing distributed filesystem, and is now a part of Red Hat Storage Server. GlusterFS, also with Ceph (not covered in this article), have created a software based filesystem free of metadata. By using a hash algorithm distributed to the clients, this has removed the bottleneck of searching metadata for every file […]

GlusterFS has been a growing distributed filesystem, and is now a part of Red Hat Storage Server. GlusterFS, also with Ceph (not covered in this article), have created a software based filesystem free of metadata. By using a hash algorithm distributed to the clients, this has removed the bottleneck of searching metadata for every file that we are attempting to read. This also allows for easy replication, even to nodes outside of the current infrastructure. As GlusterFS runs in user space, there is no need to worry about kernel updates or changes like older distributed filesystems.

With GlusterFS being open source, speeds are as fast as or faster than most enterprise solutions, there is substantial ease of use, and benefits include a cloud-based storage approach that runs on commodity hardware. This makes the solution appealing to startups, non-profits or those running a mostly cloud-based infrastructure. Another bonus is that files are stored in EXT3/4. There is no need for migrations to new versions of Gluster. You can simply strip away GlusterFS without the need to convert filesystem’s if Gluster is not working for you.

Below is a quick tutorial on a small, 2-node GlusterFS filesystem to provide redundancy for two load balanced Apache servers. We will be running Ubuntu Server 14.0.3. This will act as a mirrored RAID NAS device.

Clients: websrv01, websrv02

GlusterFS: Gluster01, Gluster02

We are going to install our Gluster PPA (personal package archive) on all 4 of our Linux servers, which will allow us to pull down our packages.

sudo apt-get install software-properties-common
sudo add-apt-repository ppa:gluster/glusterfs-3.5
sudo apt-get update

Configuring the Servers
Here we will go ahead and configure the GlusterFS servers. Let’s first install the GlusterFS server package.

sudo apt-get install glusterfs-server

Once these packages are installed, we will establish connectivity and trusts between the 2 hosts.

gluster@gluster01:~$ sudo gluster peer probe gluster02
peer probe: success.

Now that both nodes (these are also known as bricks) are trusted, we can begin creating our Storage Volume. As we are using GlusterFS to create a redundant volume for our web servers, we will be using the replica option, not stripe. Here we specify the volume name, the number of servers we are mirroring to, our transport method which is TCP and the paths which the volume will be stored.

gluster@gluster01:~$ sudo gluster volume create webdata replica 2 transport tcp gluster01:/webdata gluster02:/webdata force
volume create: webdata: success: please start the volume to access data

All that is left is to start our new volume.

gluster@gluster01:~$ sudo gluster volume start webdata
volume start: webdata: success

Configuring the Clients
First we again, simply install our packages.

sudo apt-get install glusterfs-client

Now I am going to create a directory to mount the GlusterFS Volume to and I am going to add an entry into fstab to allow this directory to always mount.

webadmin@websrv02:~$ sudo mkdir /webdata
webadmin@websrv02:~$ sudo vi /etc/fstab
#Add the following entry
gluster01:/webdata      /webdata        glusterfs       defaults,_netdev       0       0
webadmin@websrv02:~$ sudo mount -a
webadmin@websrv02:~$ df -hT
Filesystem                    Type            Size  Used Avail Use% Mounted on
/dev/mapper/websrv02--vg-root ext4             14G  1.8G   11G  15% /
none                          tmpfs           4.0K     0  4.0K   0% /sys/fs/cgroup
udev                          devtmpfs        486M  4.0K  486M   1% /dev
tmpfs                         tmpfs           100M  408K   99M   1% /run
none                          tmpfs           5.0M     0  5.0M   0% /run/lock
none                          tmpfs           497M     0  497M   0% /run/shm
none                          tmpfs           100M     0  100M   0% /run/user
/dev/sda1                     ext2            236M   68M  156M  31% /boot
gluster01:/webdata            fuse.glusterfs   47G  1.6G   43G   4% /webdata

Take note that it does not matter which host you point the mount to. I will add gluster02 to websrv01.

webadmin@websrv01:~$ df -hT
Filesystem                    Type            Size  Used Avail Use% Mounted on
/dev/mapper/websrv01--vg-root ext4             14G  1.9G   11G  15% /
none                          tmpfs           4.0K     0  4.0K   0% /sys/fs/cgroup
udev                          devtmpfs        486M  4.0K  486M   1% /dev
tmpfs                         tmpfs           100M  408K   99M   1% /run
none                          tmpfs           5.0M     0  5.0M   0% /run/lock
none                          tmpfs           497M     0  497M   0% /run/shm
none                          tmpfs           100M     0  100M   0% /run/user
/dev/sda1                     ext2            236M   68M  156M  31% /boot
gluster02:/webdata            fuse.glusterfs   47G  1.6G   43G   4% /webdata

Since the Hash Algorithm is sent to the client, in the event we lose Gluster02 our client will reach out to Gluster01 for the file, regardless of the mount. I will add some test data in websrv01 and see if we can see those files on websrv02.

#These permission changes are bad practice, more just to demonstrate.
webadmin@websrv01:/webdata$ sudo chmod 775 /webdata/
webadmin@websrv01:/webdata$ sudo chown root.webadmin /webdata/
webadmin@websrv01:/webdata$ touch data{1..40}
webadmin@websrv01:/webdata$ ls
data1   data11  data13  data15  data17  data19  data20  data22  data24  data26  data28  data3   data31  data33  data35  data37  data39  data40  data6  data8
data10  data12  data14  data16  data18  data2   data21  data23  data25  data27  data29  data30  data32  data34  data36  data38  data4   data5   data7  data9

webadmin@websrv02:~$ cd /webdata/
webadmin@websrv02:/webdata$ ls
data1   data11  data13  data15  data17  data19  data20  data22  data24  data26  data28  data3   data31  data33  data35  data37  data39  data40  data6  data8
data10  data12  data14  data16  data18  data2   data21  data23  data25  data27  data29  data30  data32  data34  data36  data38  data4   data5   data7  data9

Already, in a matter of minutes we have a redundant, fast and flexible storage pool for our web servers. Now let us make sure no other GlusterFS nodes connect to this volume, as the current setup will allow for any server to connect to this volume. We only do this from 1 storage node, as this will replicate to the other. Note: Make sure you have a functional DNS or add the entries manually or use IP addresses. I ended up adding the hosts to /etc/hosts once my DNS decided to stop resolving for no reason.

gluster@gluster01:~$ sudo gluster volume set webdata auth.allow websrv01,websrv02

Now for additional information regarding the status of your GlusterFS cluster we will go into the console.

Type: Replicate
Volume ID: f253dab7-8d10-49a5-aec7-da4f4f755a8d
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Brick1: gluster01:/webdata
Brick2: gluster02:/webdata
Options Reconfigured:
auth.allow: websrv01,websrv02

gluster> peer status
Number of Peers: 1

Hostname: gluster02
Uuid: ea354d6a-f7a9-4ef6-abec-3de9fa013c11
State: Peer in Cluster (Connected)

gluster> volume profile webdata start
Starting volume profile on webdata has been successful
gluster> volume profile webdata info
Brick: gluster01:/webdata
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us             40     RELEASE
      0.00       0.00 us       0.00 us       0.00 us             55  RELEASEDIR

    Duration: 2100 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

Interval 0 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us             40     RELEASE
      0.00       0.00 us       0.00 us       0.00 us             55  RELEASEDIR

    Duration: 2100 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

Gluster attempted to keep it as simple as possible, and I would be hard pressed to find another solution that makes it this simple. Now, playing devil’s advocate, let’s say we need to account for added traffic, or we have added a new database to this filesystem. We need to quickly add another node to the cluster to support our new load. We quickly spin up a new VM, install GlusterFS Server package and begin to scale the cluster.

Here we connect and trust our new node (brick).

gluster@gluster01:~$ sudo gluster peer probe gluster03
peer probe: success.

From here we simply add the node to the volume we already have created.

gluster@gluster01:~$ sudo gluster volume add-brick webdata replica 3 gluster03:/webdata force
volume add-brick: success

The Gluster volume status command will in most cases be the go to troubleshooting command in most case, here we can see the data is replicated and the port we are open and connected on.

root@gluster03:/webdata# ls
data1   data11  data13  data15  data17  data19  data20  data22  data24  data26  data28  data3   data31  data33  data35  data37  data39  data40  data6  data8
data10  data12  data14  data16  data18  data2   data21  data23  data25  data27  data29  data30  data32  data34  data36  data38  data4   data5   data7  data9

gluster@gluster01:~$ sudo gluster volume status
Status of volume: webdata
Gluster process                                         Port    Online  Pid
Brick gluster01:/webdata                                49152   Y       2773
Brick gluster02:/webdata                                49152   Y       2713
Brick gluster03:/webdata                                49152   Y       2624
NFS Server on localhost                                 2049    Y       3041
Self-heal Daemon on localhost                           N/A     Y       3048
NFS Server on gluster02                                 2049    Y       2756
Self-heal Daemon on gluster02                           N/A     Y       2763
NFS Server on gluster03                                 2049    Y       2636
Self-heal Daemon on gluster03

Our new node is added to the cluster and our filesystems are mirrored. This shows how easily we are able to scale. I took down both gluster01 and gluster02 just to prove redundancy.

root@websrv02:/webdata# touch newfile

root@gluster03:/webdata# ls -ltr newfile
-rw-r--r-- 2 root root 0 Oct 21 22:34 newfile

With Gluster communicating over TCP we can create a Disaster Recovery solution just as simply as we did here. We are able to move private clouds to public clouds or create redundancy between AWS sites.

Final Thoughts:
It’s fun to see the next generation filesystems in action. The ability to stand up this type of solution on commodity hardware is a big plus for startups or those looking to cut costs. We are already seeing GlusterFS in major companies like Pandora, which run solely a GlusterFS environment.

Figures I last saw as far as workload were 75 million users listening to 13 million audio files scaling to petabytes with a network throughput of 50GB/s. Other companies reports running GlusterFS environments into the double digit petabytes. Currently these solutions I believe are mostly focuses toward Cloud and Business continuity applications. As it stands GlusterFS is battling against Ceph (Canonical/Inktank) in the Hash Algorithm Distributed filesystem department. Time will tell who will make it out on top as the fight to deeply integrate with OpenStack continues. Also it seems that these types of filesystems are impacted by Network and I/O latency on small file performance.

In a future blog article I will cover load balancing our apache servers and wrap up a completely redundant small web application.

Related Articles