Here’s Part 1 which is really background information.
Part 2 will be actually doing stuff, if you dont want to read how/why I ended up here skip to Part 2.
A few days ago I yet again ran out of space on my server. Normally this would just mean deleting a load of junk files, but I’ve been doing that for months I’m now at the point there are no junk files left to delete. So time to increase the storage. Unfortunately problem #2 I currently have 4 sata drives in the server taking up all the connections. Expanding wouldn’t be a problem as I originally setup the drives with lvm to handle large storage requirements. but now there’s nowhere to turn to increase the capacity in this server.
So instead I thought I’d have a look at the alternatives. I’d been hoping to move some of the server services over to PI’s since I first heard about the Raspberry PI project (long before they were released), I knew I could make good use of lots of them.
After a little searching I found Gluster, and this seems to be ideal for what I need. At this point I should say that I clearly expect any USB drive connected to the PI to be slower access than sata on my server. but for my use slower doesn’t matter. I doubt this would suit everyone, but I think Gluster is even a good option on beefy servers/pc’s.
I have an idea that needs testing, so I setup 2 PI’s with 2 USB sticks for storage. A simple apt-get install glusterfs-server gets me moving quickly while reading 2 guides http://www.linuxjournal.com/content/two-pi-r?page=0,0 and http://www.linuxuser.co.uk/tutorials/create-your-own-high-performance-nas-using-glusterfs I quickly have a working system. I’m not going to go into the steps, both the links give details, but here’s my layout:-
2 8Gb SD Cards
2 4Gb USB Sticks
2 512Mb USB Sticks.
So each PI has an SD card, a 4gb USB Stick and a 512Mb USB Stick.
As I prefer to encrypt all data on my drives I installed cryptsetup and using luksFormat got the USB Sticks ready (more on this later), I unlocked the USB sticks on both PI’s which opens the drive to /dev/mapper/USB1_Crypt and /dev/mapper/USB2_Crypt
I then mount each to /mnt/USB1 and /mnt/USB2 (both were already ext4 filesystems) with USB1 being the 4Gb.
The PI’s are called Gluster-1 and Gluster-2, I know my DHCP and DNS are working so I can use the names when setting up the gluster peers and volumes instead of IP addresses.
Once I’d created the Gluster volume (I tested both stipe and replica) I mounted the testvol (Gluster Volume) into /media/testvol
I made a quick file, and could see it from both servers. It’s also interesting looking in the /mnt/USB1/ folder (but do NOT change anything in here), while using a stripe volume the new file only appears on 1 of the PI’s, creating more files puts some on Gluster-1 and some on Gluster-2. So to see what happens I reboot Gluster-1 and watched the files in /media/testvol yep all the files that are in testvol and actually on Gluster-1 disappear from the file list. As soon as Gluster-1 was back (decrypted and mounted /mnt/USB1) the files were back.
So this is looking pretty good, and adding more storage was easy (I didn’t go into rebalancing the files across new bricks, but I don’t see this being a huge problem). Now I moved onto using Replica instead of Stripe (my ultimate intention will be to use both). Here’s where the fun begins, I actually didn’t delete the test files in /mnt/USB1 from either server, just stopped the gluster volume and deleted it. Creating a new gluster volume in stripe mode was easy, and a few seconds after I started it gluster sync’d up the files already in /mnt/USB1 on both PI’s so now in /media/testvol I can still see all the files and access them fine. Now I reboot Gluster-2 and checked the files in /media/testvol on Gluster-1 yep all the files are there (there’s was a 2-3 second delay on ls I assume I was previously connected to Gluster-2 and it had to work out it wasn’t there anymore but from there on was fine).
I have a working replica system 🙂 and it’s taken me no-time at all to get running. I brought Gluster-2 back online, then thought what if I change files when one of them is offline. Reboot Gluster-1, and change some of the test in a few of the test files in /media/testvol and create a new testfile. Then I bring Gluster-1 back online, and here’s where things fall apart!!!
The new testfile is on both and fine, the existing files I hadn’t changed are all fine too. But the files I changed I can’t access anymore, I’m getting cat: testy1: Input/output error. This isn’t good so I check the file testy1 file in /mnt/USB1 on both PI’s, Gluster-2 has the changes (as expected it was online when I made the changes) Gluster-1 has the original.
So head over to google to find out how I’d fix such a scenario (a friend told me years ago “your never the first person to have a problem” google every error message) yep there’s information on it, it’s called split-brain. The first solution I find is to delete the file that’s wrong from /mnt/USB1 that is incorrect. This didn’t solve my problem, testy1 was recreated but was still giving I/O error a little more reading says that gluster creates hard links and you need to go and delete these too, apparently in a .glusterfs folder but I couldn’t find it, there was also a heal function that would show me what files are affected. But not on my system. gluster doesn’t know heal. WHY????
gluster -V tells me I’m running an old version from 2013 (I think 3.2.8 but that’s from memory) the latest version is 3.5.0
I also came across some information saying that Stripes using 3 or more can quorum changes to resolve issues with file differences (I think this has to be setup though, I can see situations that automatically doing so could cause data loss, such as log files being written and having different data not just that one may have stopped being written to).
Anyway it looks like having 3.5.0 with a heal function would be beneficial, and I always prefer for things to be upto date where possible anyway.
Looking at the Gluster download page there’s a simple way to add gluster to the debian apt system. So I followed the steps to add it, and run apt-get update. Now another problem it’s not able to download the gluster info, but the link it’s using I can access fine. Then something jumps out at me, “Architectures: amd64″, Raspberry uses arm so this isn’t built for it. Now I have no idea if this is right or wrong but it makes sense to me.
Like many other things, it’s time to manually compile and install.
So there’s the background (sorry it took so long), Part 2 is going to run through manually installing gluster on the PI