Raspberry PI + GlusterFS (Part 4)

In Part 1 I mentioned encrypting my disks. but didn’t go into it, so here I’m going to run through encrypting, decrypting and using it with GlusterFS.
Part 2 Was an attempted but failed install of the latest GlusterFS (3.5.0) Server
Part 3 Covered installing GlusterFS Server with the new information from Ashley

To recap I’m using the following:-
2 PI’s
2 8Gb SD Cards
2 4GB USB Sticks
2 512Mb USB Sticks.

As yet we haven’t setup any Gluster Volumes and this is all on a pretty fresh system.

First we need to install some tools we’ll be using.

apt-get install cryptsetup pv

I know my 4GB USB Stick is on /dev/sda and 512Mb is on /dev/sdb, I’ll only be concentrating on the 4Gb in this, but make sure if your following along that your using the correct paths. Using the wrong paths can wipe your data.

I dont want any partitions on the stick (I’ll be encrypting the whole drive)

fdisk -l

Shows me I’ve got a few partitions on the stick:-

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   ?   778135908  1919645538   570754815+  5b  Unknown
/dev/sda2   ?   168689522  2104717761   968014120   65  Novell Netware 386
/dev/sda3   ?  1869881465  3805909656   968014096   79  Unknown
/dev/sda4   ?  2885681152  2885736650       27749+   d  Unknown

I can’t remember what this stick was used for (to my knowledge I’ve never used Novell partitions), but we’ll delete them all.

fdisk /dev/sda
d
1
d
2
d
3
d
wq
My partitions were listed 1-4 so it was nice and easy. You can rerun the fdisk -l command to check they’ve all gone.
This step wasn’t strictly necessary but I always like to make sure I’m working with the correct drives.
With the Drive empty of partitions I like to unplug it and plug it back in (keep everything fresh) Note: if you do reconnect the drive make sure your still working with the correct /dev/sd* path. Sometimes this can change.
Now run
cryptsetup -y -v luksFormat /dev/sda
This creates a new encryption key for the Drive (note this is not how you add new keys on a drive, only do this once!!)
Then we need to unlock the drive for use
cryptsetup luksOpen /dev/sda USB1_Crypt
/dev/sda is the drive path
USB1_Crypt is what we’re going to be labelling the decrypted drive.
You’ll be prompted for the Drive passphrase that you just created. If successful it doesn’t actually tell you, just drop you back to a prompt. From here on we wont be doing any drive work on /dev/sda as this will be outside the encrypted bit, we’ll be using /dev/mapper/USB1_Crypt
We can check it’s unlocked with
ls -l /dev/mapper/
You should see something similar to
lrwxrwxrwx 1 root root       8 May 13 18:39 USB1_Crypt -> ../dm-1
You can also check the status using
cryptsetup -v status USB1_Crypt
Now that we have the drive with an encryption key and unlocked we’ll write a bunch of data across the drive
pv -tpreb /dev/zero | dd of=/dev/mapper/USB1_Crypt bs=128M
Writing zero’s to a drive is generally considered bad for data security, but we’re writing them to the encrypted system not the actual stick, so the output to the stick will be encrypted data.
Once the data has finished writing we’ll create a new filesystem on the encrypted disk
mkfs.ext4 /dev/mapper/USB1_Crypt
You don’t have to use ext4, but I generally do.
That’s the USB Stick encrypted
We close the encrypted drive and remove it from /dev/mapper/ with

cryptsetup luksClose USB1_Crypt

If all you wanted was an encrypted Drive that’s it, and you can unlock the drive on systems with cryptsetup installed and then mount away.

So far we’ve encrypted the entire USB Stick, written a bunch of encrypted data across the entire Stick, created a new filesystem, and closed the Stick.
Now we’re ready to mount the Stick ready for Gluster to use.
We’re going to create a folder to mount the Drive into

mkdir /mnt/USB1
We’ll open the encrypted Drive again using

cryptsetup luksOpen /dev/sda USB1_Crypt
Then mount the decrypted drive

mount /dev/mapper/USB1_Crypt /mnt/USB1
If you

ls -l /mnt/USB1
You should see the lost+found directory on the filesystem.
I should mention again I’ve been running through this process on 2 PI’s, and to keep things simple I’m keeping the same names on both systems /mnt/USB1
Now it’s time to get GlusterFS running with these drives.
So while on Gluster-1(PI) issue the command

gluster peer probe Gluster-2
This should find and add the peer Gluster-2 and you can check with

gluster peer list
and

gluster peer status
Now because I always want each Gluster system working by name from Gluster-2 I issue

gluster peer probe Gluster-1
This updates the Gluster-1 peer to it’s name not it’s IP address, There’s nothing wrong with using IP addresses if your using static assigned IP’s on your PI’s, but I wouldn’t recommend doing so if your IP address is DHCP’d
With glusterfs knowing about both Gluster-1 and Gluster-2 we can create a new volume (It’s important that /mnt/USB1 has been mounted on both system before proceeding)
On either PI you can create a new replica volume with 

gluster volume create testvol replica 2 Gluster-1:/mnt/USB1 Gluster-2:/mnt/USB1
This will create a new volume called testvol using /mnt/USB1 on both PI’s. The folder /mnt/USB1 is now referred to as a brick. and volumes consist of bricks.
Now we start the volume

gluster volume start testvol
Finally we need somewhere to mount the gluster filesystem

mkdir /media/testvol
Then we mount it

mount.glusterfs Gluster-1:/testvol /media/testvol
It doesn’t matter which host we use in this command, apparently it’s only used to pull the list of bricks for this volume and will then balance the load.
Now you can write data to /media/testvol. If you’ve mounted the volume on both PI’s you will see the files on both.
You can also

ls -l /mnt/USB1
To see the actual files on the stick (DO NOT do any more than just read the files from /mnt/USB1, playing in this folder can cause issues, you should only be using /media/testvol from now on).
If instead of replica you used a stripe, you’ll be able to see all the files in /media/testvol but only some files in /mnt/USB1 on each PI.
Shutting down 1 of the PI’s in a replca mode volume wont show any difference in /media/testvol (and hopefully on the new 3.5.0 version wont cause you as much of a headache if files get updated while 1 PI is offline, though it is likely to need manual intervention to fix maybe a part 4 🙂 when I get that far) but in striped mode with 1 of the PI’s offline you’ll notice files in /media/testvol have gone missing. For this reason I’m hoping to do both stripe and replica to keep files available across multiple PI’s and allow me to increase the storage space easily.
Replicating across 2 drives will mean I will need to add new storage 2 drives at a time.
Replicating across 3 drives would mean I need to add 3 new drives each time.
Just to make things easy I’ll list the commands to decrypt and mount after the PI has been reboot

cryptsetup luksOpen /dev/sda USB1_Crypt

mount /dev/mapper/USB1_Crypt /mnt/USB1

mount.glusterfs Gluster-1:/testvol /media/testvol

2 thoughts on “Raspberry PI + GlusterFS (Part 4)”

  1. interesting reading 🙂 thanks for posting.

    I am playing around with GlusterFS and RPis just now.. installing GlusterFS is easier.

    No more linked posts, I would be curious to hear any updates.

    1. Hi Gordon,
      Unfortunately no more posts on it. I can't really remember playing with it lol my memory isn't the greatest which I why I try to document things I think I may need to go back to.
      I do know that I didn't end up using GlusterFS and a few years ago did a similar setup with MooseFS using 7xPI4's. I like Moose in the sense you can you different drives (size wise) and it just seemed all around easier. The downsides however, as a free user your reliant on a single live meta server (I did enquire about a price for pro but they never got back to me) and because it stores the files in it's own way if something goes wrong you can't just go and grab the files from ext4/xfs like you could with a gluster system (I believe). The PI's still aren't upto the task of mass storage imo, it's workable, I was using MooseFS as a 30TB backup for other drives and it just took longer to copy stuff in and then sync up. However out of 6 disks 3 failed! They were all brand new disks and are currently with WD (hitting 3 months) waiting on replacements (that remind me I need to go shout at them again this month). Sadly I had important files set at 3 copies and less important to 2 copies. With 3 disk failures, it was just too much and I wrote off all the data (thankfully it was just backups). I think the system itself tried to do what it should, i.e disk 1 failed so the chunks it's holding are now below the target 2 or 3, so Moose started copying the endangered chunks to other disks, but this then hit a problem on disk #2 which died. Any files set to 2 copies that were on disks 1 & 2 at this point were lost. Files with 3 copies still had a chance, but then disk #3 died so any files with chunks only on disk 1, 2 & 3 were lost at this point.
      I semi take the blame here, because I was testing stuff I didn't setup any alerts or monitoring. The problems started about a month before I checked it.
      I think if the PI's could handle decent transfer speeds then it may have been able to get things back in sync before #2 and then #3 failed. It looks like it had been days/weeks syncing stuff. Ultimately the drives shouldn't have failed, they were all new and spent about 95% of their time doing hardly anything.

      It's something I hope to revisit as I really do need to sort some kind of large shared storage (cheap) but no plans to atm.
      I'd be interested to hear how you got on with GlusterFS though. I tried it years ago as a backend to multiple webservers. The idea being that each server would be part of gluster holding a copy of the files, so the webserver only had to go locally but chances would be automatically sync'd etc. but encountered problems as the servers decided to speak to each other to get the files and I couldn't see an easy way of saying Server A gets files from Server A first then any other if A not available. I ended up with A getting files from A,B,C,D and just all around increasing network traffic. Day to day that would be fine but when in the middle of a traffic spike it just ground it all to a halt.

Leave a Reply

Your email address will not be published. Required fields are marked *