GlusterFS woes

If your looking for the gluster error ‘brick2.mount_dir not present’ 
jump to the end

Time for another post 🙂

I’ve been using DigitalOcean for some time now, and I’m still tweaking my setup. Once thing I really hope they sort soon is proper private lan between your own droplets, for now we just have to use a vpn between them.

Being responsible for a new website can give a lot of headaches, especially when you have to try to guess just how popular it will be. So about a year ago I setup a new droplet to host the new site, testing it was going well and I increased the droplet before we launched to handle a spike. Sadly I underestimated just how busy it would get, based on the numbers I was given I think I was about right but unfortunately those numbers were way off.

But each failure it just another learning curve 🙂 so as quickly as possible that was fixed, then the site got to normal volume so we scaled it back down (yes it’s a whole cost exercise especially when your paying for it). Then we had the lead up to Christmas, in an attempt to not repeat the problems at launch, I changed the whole configuration so that I could (if needed) take a server down while staying partly operational. This kind of worked and was needed when some brightspark promoted the site a day early and we hadn’t scaled back up!

Come the new year I decided it was time to seriously sort the infrastructure for the site. It now has an online store and it’s important it keep running, it’s not just a blog anymore. So I put in place the following setup (working around various obstacles).

DNS:

  • All websites name servers are pointing to cloudflare and they handle the first web connection. It works really well on their free tier, and changes (adding new servers) are pretty quick to take effect.

DigitalOcean Droplets:

  • 1x Server running as a load balancer.
  • 2x Servers running as webservers.
  • 1x Server running as database server.
  • 1x Server running as email (not quite running).
  • At the same time as making this setup I decided to ditch apache and move to nginx, so loadbalancer and webservers are running that.

Software:

  • 4x Nginx (loadbalancer and webservers, and installed on database server for stats).
  • 2x Syncthing (webservers) to keep the www folders in sync.
  • 1x MySQL Server.
  • 4x OpenVPN (connections between loadbalancer and webservers, webservers and database).
  • 1x Redis Server (for session data, I tried nginx load balancing options but it still screwed up if I had to take one of the web servers out for maintenance, so installed Redis on the Database server.

As this progressed I dropped the VPN between loadbalancer and webservers and just use HTTPS/SSL instead. Syncthing already has it’s own SSL built-in so I could leave that over the semi-private LAN. but I really would like to change MySQL to be encrypted and drop the vpn from that too, but find info on doing this for wordpress seems to be non-existent at the moment.

Roll forward a few months, this has been working but still has areas to resolve. Such as syncthing: yes it keeps the folders in sync and is actually really good that I can also store them on another system easily. but it doesn’t listen to the OS for changes to files. Instead it polls every x seconds. Although there’s nothing much changing, updating plugins became a problem if you click the update button it downloads but then nginx sends your next request to another server and now the plugin.zip isn’t there so wordpress throws an error.

My whole reason for running syncthing was I wanted the files to be available on each server independently. So if Server A goes down it doesn’t matter Server B has all the files locally anyway. NFS would still give me a single point of failure. On looking into resolving this though I remembers GlusterFS. I’d played with it a long time ago, but dropped it as a solution (can’t remember what I was doing or why it wasn’t working). Now it’s time to try it again. downside I’m back to needing VPN’s and OpenVPN isn’t the easiest to quickly add a new server.

So I’ve done the following:

  • Added a new server just for the files (I don’t like gluster being in a 2 replica incase there’s a problem, there should be a majority who thinks they are holding the correct file).
  • Swapped out OpenVPN for Tinc, I have to say one of the best decisions. yes there are downsides, it creates a mesh (only doable with OpenVPN by running quagga for manually forcing routes) but I have no idea which Server is actually connected to which Servers. There’s no VPN status and I can’t see how much traffic has gone between 2 particular servers (iptables helps but it’s not 100%)
  • Added another new server for nagios and central logging.

There were a load of changes within a few weeks of each other, but I now have a setup I’m confident I can scale more quickly than ever before. Yes it has single points (load balancer, mysql) but I know if the load balancer has a problem it’s pretty static so can be wiped and redeployed quickly, as well as it will take a few minutes to open the webservers to the world and let cloudflare hit them directly. So MySQL is the real problem and I’ll be addressing that one soon enough.

 

So now onto today’s problem 🙂

I’ve had gluster running a few weeks, and I have our testing website (for theme changes etc) setup on our webservers behind the loadbalancer. The last few days I’ve need to do more extensive testing than just changing bits in a theme, so I’ve decided to split the tester site onto it’s own droplet (still behind the loadbalancer and with VPN to the databases). I thought I may as well make use of Gluster here too (yes it would be in a 2 replica setup itself and the fileserver. I don’t like that idea). So I brought up a new server and configured it: new users, firewall rules, tinc, nginx, php, etc.

I added gluster and copied the /etc/hosts entries over from the other servers. All looked good. I gluster peer probe ServerX and it worked, gluster peer status and I could see it fine. but on trying to add a new volume:

gluster volume create xxx-yyy-zzz replica 2 transport tcp FILESERVER:/GLUSTER/xxx.yyy-zzz TESTSERVER:/GLUSTER/xxx.yyy-zzz force

I was getting the error:

volume create: xxx-yyy-zzz: failed: Commit failed on localhost. Please check the log file for more details.

Checking the logs on both servers would show (maybe slight variation):

[2015-07-28 16:00:41.612907] E [glusterd-hooks.c:328:glusterd_hooks_run_hooks] 0-management: Failed to open dir /var/lib/glusterd/hooks/1/create/pre, due to No such file or directory
[2015-07-28 16:00:41.614499] E [glusterd-volume-ops.c:1811:glusterd_op_create_volume] 0-management: brick2.mount_dir not present
[2015-07-28 16:00:41.614587] E [glusterd-syncop.c:1288:gd_commit_op_phase] 0-management: Commit of operation 'Volume Create' failed on localhost

I tried a series of things to fix it:

I thought maybe the /GLUSTER/xxx-yyy-zzz needed to be created (I already made /GLUSTER) – Nope

  • I detached the peer and reattached – No.
  • I reboot the file server and test server – No.
  • I detached, reboot, reattached – No.
  • I tried creating the volume with just the test server and no replica – No.
  • I tried creating the volume on just the fileserver with no replica – Yes.

So the problem is point to the new system, but it’s a brand new system. They’re peers and connected.

  • I tried uninstalling and reinstalling gluster – No.
  • I tried uninstalling, purging and reinstalling – No.
  • I tried uninstalling, purging, manually deleting the /var/lib/gluster (probably a mistake that I didn’t detach first :() and reinstalling – No.
  • I have no idea why this wont WORK!!!!!

Let’s go further back, check the VPN, ping the servers.

  • Ping fileserver from testserver – Yes.
  • Ping testserver from fileserver – Yes/Hang on that’s the wrong IP!! Yes I’d copied an entry from webserverB into /etc/hosts, update the name but missed the IP address. Idiot! correct that. Ping – OK.
  • Try gluster again – Yes.

So if you’re having problems and seeing brickX.mount_dir not present make sure your DNS between servers is correct.

I don’t really know how the peer probe worked, but I think I must have done that from a servers who’s hosts was correct

Todays WordPress Adventures

Well it had to happen at some point, today I had a nice email from DigitalOcean saying they’ve disabled networking on one of my servers. This was because it’s ip address had been reported to RBL’s by several other servers.

Looking at the logs they included I was beating the s**t out of others wp-admin login pages. Now I know I wasn’t doing it personally, it was the first time in a long time I was in bed early and this seemed to start at 2am.

Luckily I could access my Droplet using the Console page, so after login I sat thinking ‘um…..’ where exactly do you start. The server normally has quite a bit of traffic so the logs are always cluttered. Needle in a haystack springs to mind.

I decided to run htop and see if the server was doing much without any traffic coming in. Oh yes /usr/bin/host is eating resources. So do I kill it or not. I decided not to at this point. Without networking I’m not doing anymore harm, and leaving it running may help find out what’s calling it.

It was a good call. I can’t give details of everything I did, I spent a hour hours checking through stuff. I do remember checking lsof and finding a link between a process id for host and a file within wpallimport’s uploads directory. So I had a look in there, followed by some further searching of google. 1 file in particular .sd0 seems to bring back results and this seems to be what’s caused it.

To get my server running again, I disabled the entire site within apache that was affected (luckily not a major site) and reboot the server. Once I was happy there were no cronjobs or anything calling on this script I mailed DigitalOcean and asked them to re-enable networking. They’re pretty speedy and within 15 mins had done it. A further reboot and my servers back online minus the one site I’ve disabled.

I expect the cleanup for this is going to take weeks of checking files, against backups while keeping as much as possible online.

I’m pretty confident I know what’s caused it, an out of date wordpress install with an out of date wpallimport install. It really goes to show that you have to check old stuff and keep it upto date.

The most annoying thing for me is that WordPress has a multisite option (which I use on 2 installs) and this allows me to keep plugins and everything upto date easily of sub-sites that are barely used. but it doesn’t extend to multiple domains which would really allow wordpress to be used across all my domains from one central console and then everything would be kept upto date in one go.

I know there’s a plugin for multisite domains, but I feel this is more of a hack of the wordpress system rather than wordpress properly designed to function with this in mind. I don’t want to install it and encounter many more problems.

It’s very bad admin on my part not having kept this site upto date, I’ll be the first to admit that but it’s easy to forget about installs you don’t use regularly. There must be some kind of nagios plugin to alert me to out of date plugins/versions for wordpress so I’ll be looking for that later in the week 🙂

Raspberry PI + Maplin WH1080 Weatherstation

Download and write to SD Card the debian image for Raspberry PI.
Boot and connect via SSH (Putty from Windows)
Change to root using sudo su – alternative use sudo in front of commands. Using sudo su – is bad, but I always do it.

Run

apt-get update
apt-get upgrade

Download the latest Debian Weewx version (http://sourceforge.net/projects/weewx/files/) using wget. Then run

dpkg -i weewx_2.6.4-1_all.deb

This will most likely throw error errors about missing dependencies. Install them using apt-get install

dpkg -i weewx_2.6.4.1_all.deb
(Reading database ... 75409 files and directories currently installed.)
Preparing to replace weewx 2.6.4-1 (using weewx_2.6.4.1_all.deb) ...
Unpacking replacement weewx ...
dpkg: dependency problems prevent configuration of weewx:
 weewx depends on python-configobj (>= 4.5); however:
  Package python-configobj is not installed.
 weewx depends on python-cheetah (>= 2.0); however:
  Package python-cheetah is not installed.
 weewx depends on python-imaging (>= 1.1.6); however:
  Package python-imaging is not installed.
 weewx depends on python-usb (>= 0.4); however:
  Package python-usb is not installed.
dpkg: error processing weewx (--install):
 dependency problems - leaving unconfigured
Errors were encountered while processing:
 weewx

So in my case I run
apt-get install python-configobj python-cheetah python-imaging python-usb
During the install you’ll be asked several configuration questions, fill them in (You can always edit the config file later if you make a mistake).

For the maplin WH1080, Select the FineOffsetUSB.

Once installed weewx should attempt to start (if not you can start it with /etc/init.d/weewx start). Check the syslog for any errors
tail /var/log/syslog -n 50
If all goes well you should see something like

wxengine: Initializing weewx version 2.6.4


wxengine: Using Python 2.7.3 (default, Mar 18 2014, 05:13:23) #012[GCC 4.6.3]


wxengine: pid file is /var/run/weewx.pid


wxengine: Using configuration file /etc/weewx/weewx.conf


wxengine: Loading station type FineOffsetUSB (weewx.drivers.fousb)


fousb: driver version is 1.6


fousb: polling mode is PERIODIC


fousb: polling interval is 60


fousb: altitude is 4.2672 meters


fousb: pressure offset is 0.0


fousb: found station on USB bus=001 device=005


wxengine: StdConvert target unit is 0x1


wxengine: Record generation will be attempted in 'hardware'


wxengine: The archive interval in the configuration file (300) does not match the station hardware interval (60).


wxengine: Using archive interval of 60 seconds


archive: Created and initialized table 'archive' in database 'weewx.sdb'


wxengine: Using archive database: archive_sqlite


stats: Created schema for statistical database


stats: stats database up to date.


wxengine: Using stats database: stats_sqlite


wxengine: Starting up weewx version 2.6.4

If you encounter errors you can edit your weewx configuration using nano -w /etc/weex/weewx.conf
Once you’ve finished editing press ctrl+x (to exit), then y(to save), then enter(same filename). Then restart weewx using /etc/init.d/weewx restart
If all has gone well you may also see entries in your syslog like
weewx[12559]: archive: added record 2014-09-20 18:32:44 UTC (1411237964) to database 'weewx.sdb'; table 'archive'
You should also have /var/www/weewx loaded with files.
As we haven’t yet installed a webserver though you can’t view them.
We’ll install Apache2 Server to handle our webpages.
apt-get install apache2
Once apache is installed open the weewx pages by visiting http://{raspberry pi ip address}/weewx from your browser. e.g. http://192.168.1.52/weewx
This setup does leave weewx running as root, not really something you would do for a system running on the internet. but outside the scope of securing your server for here.
Checkout http://www.weewx.com/docs/usersguide.htm#installing for more info on installing, problems and further guides for configuring your weewx installation.
Also checkout http://davies-barnard.co.uk/2013/12/weewx-rasp/ for a better looking skin template (the link is in one of the comments).

Raspberry PI + RTorrent +Apache2 + RUTorrent

I’ve been using one of my PI’s as a torrent server for some time. Recently I decided to refresh the entire system. This will NOT go into the legalities of downloading anything, I expect everyone to only be using this for downloading raspberry images 🙂

Version Info:
2014-01-07-wheezy-raspbian.img
libtorrent 0.13.2
rtorrent 0.9.2
rutorrent 3.6

I’m going to assume you can SSH to your PI, and recommend you get all the latest updates before you start. I’m also going to be naughty and be running all the commands as root.
sudo su –

Then we’ll get the stuff needed to compile rtorrent and a few things needed for the plugins

apt-get install subversion build-essential automake libtool libcppunit-dev libcurl3-dev libsigc++-2.0-dev libxmlrpc-c-dev unzip unrar-free curl libncurses-dev apache2 php5 php5-cli php5-curl libapache2-mod-scgi mediainfo ffmpeg screen

While your waiting may as well get a coffee. With that all finished we’re going to grab the rtorrent packages.

mkdir /root/rtorrent
cd /root/rtorrent

wget http://libtorrent.rakshasa.no/downloads/libtorrent-0.13.2.tar.gz
wget http://libtorrent.rakshasa.no/downloads/rtorrent-0.9.2.tar.gz
wget http://dl.bintray.com/novik65/generic/rutorrent-3.6.tar.gz
wget http://dl.bintray.com/novik65/generic/plugins-3.6.tar.gz

tar xvf libtorrent-0.13.2.tar.gz
tar xvf rtorrent-0.9.2.tar.gz
tar xvf rutorrent-3.6.tar.gz
tar xvf plugins-3.6.tar.gz

Now that you’ve got everything extracted it’s time to compile and install libtorrent

cd /root/rtorrent/libtorrent-0.13.2
./autogen.sh
./configure
make
make install

With libtorrent installed it’s time to compile and install rtorrent

cd /root/rtorrent/rtorrent-0.9.2
./autogen.sh
./configure --with-xmlrpc-c
make
make install
ldconfig

Once you’ve reached this bit, we’re finished with the hanging around. We’ll now install rutorrent and it’s plugins

cd /root/rtorrent
rm /var/www/index.html
cp -r rutorrent/* /var/www/
cp -r plugins/* /var/www/plugins
chown -R www-data:www-data /var/www
a2enmod scgi
service apache2 restart

We’ll next create a new user account for rtorrent to run as

adduser -m -r rtorrent

Now we switch to the new user account to add the required rtorrent directories and config

su rtorrent
mkdir .sessions
mkdir complete
mkdir torrents
mkdir watch
nano -w .rtorrent.rc

Copy and Paste the following:-

# This is an example resource file for rTorrent. Copy to
# ~/.rtorrent.rc and enable/modify the options as needed. Remember to
# uncomment the options you wish to enable.

# Maximum and minimum number of peers to connect to per torrent.
#min_peers = 40
#max_peers = 100

# Same as above but for seeding completed torrents (-1 = same as downloading)
#min_peers_seed = 10
#max_peers_seed = 50

# Maximum number of simultanious uploads per torrent.
#max_uploads = 15

# Global upload and download rate in KiB. "0" for unlimited.
download_rate = 0
upload_rate = 100

# Default directory to save the downloaded torrents.
directory = ~/torrents

# Default session directory. Make sure you don't run multiple instance
# of rtorrent using the same session directory. Perhaps using a
# relative path?
session = ~/.sessions

# Watch a directory for new torrents, and stop those that have been
# deleted.
schedule = watch_directory,5,5,load_start=~/watch/*.torrent
schedule = untied_directory,5,5,stop_untied=~/watch/*.torrent

# Close torrents when diskspace is low.
schedule = low_diskspace,5,10,close_low_diskspace=200M

# Stop torrents when reaching upload ratio in percent,
# when also reaching total upload in bytes, or when
# reaching final upload ratio in percent.
# example: stop at ratio 2.0 with at least 200 MB uploaded, or else ratio 20.0
#schedule = ratio,60,60,"stop_on_ratio=200,200M,2000"
#schedule = ratio,5,5,"stop_on_ratio=1,1M,10"
ratio.enable=
ratio.min.set=1
ratio.max.set=2
ratio.upload.set=1K
system.method.set = group.seeding.ratio.command, d.close=, d.stop=

# Set Schedules
#schedule = throttle_1,00:10:00,24:00:00,download_rate=0
#schedule = throttle_2,07:50:00,24:00:00,download_rate=200

# Stop Seeding When complete
#system.method.set_key = event.download.finished,1close_seeding,d.close=
#system.method.set_key = event.download.finished,2stop_seeding,d.stop=

# The ip address reported to the tracker.
#ip = 127.0.0.1
#ip = rakshasa.no

# The ip address the listening socket and outgoing connections is
# bound to.
##bind = 127.0.0.1
#bind = rakshasa.no

# Port range to use for listening.
port_range = 51515-51520

# Start opening ports at a random position within the port range.
#port_random = no

# Check hash for finished torrents. Might be usefull until the bug is
# fixed that causes lack of diskspace not to be properly reported.
#check_hash = no

# Set whetever the client should try to connect to UDP trackers.
use_udp_trackers = yes

# Alternative calls to bind and ip that should handle dynamic ip's.
#schedule = ip_tick,0,1800,ip=rakshasa
#schedule = bind_tick,0,1800,bind=rakshasa

# Encryption options, set to none (default) or any combination of the following:
# allow_incoming, try_outgoing, require, require_RC4, enable_retry, prefer_plaintext
#
# The example value allows incoming encrypted connections, starts unencrypted
# outgoing connections but retries with encryption if they fail, preferring
# plaintext to RC4 encryption after the encrypted handshake
#
# encryption = allow_incoming,enable_retry,prefer_plaintext
#encryption = allow_incomming,try_outgoing

# Enable DHT support for trackerless torrents or when all trackers are down.
# May be set to "disable" (completely disable DHT), "off" (do not start DHT),
# "auto" (start and stop DHT as needed), or "on" (start DHT immediately).
# The default is "off". For DHT to work, a session directory must be defined.
#
# dht = auto
dht = off

# UDP port to use for DHT.
#
# dht_port = 6881

# Enable peer exchange (for torrents not marked private)
#
# peer_exchange = yes
peer_exchange = no

#
# Do not modify the following parameters unless you know what you're doing.
#

# Hash read-ahead controls how many MB to request the kernel to read
# ahead. If the value is too low the disk may not be fully utilized,
# while if too high the kernel might not be able to keep the read
# pages in memory thus end up trashing.
#hash_read_ahead = 10

# Interval between attempts to check the hash, in milliseconds.
#hash_interval = 100

# Number of attempts to check the hash while using the mincore status,
# before forcing. Overworked systems might need lower values to get a
# decent hash checking rate.
#hash_max_tries = 10

#Added for rutorrent stuff
encoding_list = UTF-8
#scgi_local = /tmp/rpc.socket
#schedule = chmod,0,0,"execute=chmod,777,/tmp/rpc.socket"
scgi_port = localhost:5000

# Start The Plugins when Rtorrent Starts not when the page is first opened. If apache service is restart separately the plugins are likely to be stopped. Only really needed for RSS feeds.
execute = {sh,-c,/usr/bin/php /var/www/php/initplugins.php &}

Save and Exit (ctrl+x then y then enter)
We now need to perform a test run of rtorrent.

rtorrent

It should start without any problems. You may get a few warnings inside rtorrent, but it should still be running. To Exit press ctrl+q.
You should now exit the rtorrent user.

exit

Finally we’re going to setup rtorrent to automatically start when the PI is powered up.

nano -w /etc/init.d/rtorrent

Copy and Paste the Following:-

#!/bin/bash

# To start the script automatically at bootup type the following command
# update-rc.d torrent defaults 99

RTUSER=rtorrent
TORRENT=/usr/local/bin/rtorrent

case $1 in
start)
#display to user that what is being started
echo "Starting rtorrent..."
sleep 4
#start the process and record record it's pid
rm /home/rtorrent/.sessions/rtorrent.lock
start-stop-daemon --start --background --pidfile /var/run/rtorrent.pid --make-pidfile --exec /bin/su -- -c "/usr/bin/screen -dmUS torrent $TORRENT" $RTUSER
## start-stop-daemon --start --background --exec /usr/bin/screen -- -dmUS torrent $TORRENT
#output failure or success
#info on how to interact with the torrent
echo "To interact with the torrent client, you will need to reattach the screen session with following command"
echo "screen -r torrent"
if [[ $? -eq 0 ]]; then
echo "The process started successfully"
else
echo "The process failed to start"
fi
;;

stop)
#display that we are stopping the process
echo "Stopping rtorrent"
#stop the process using pid from start()
start-stop-daemon --stop --name rtorrent
#output success or failure
if [[ $? -eq 0 ]]; then
echo "The process stopped successfully"
else
echo "The process failed to stop"
fi
;;

*)
# show the options
echo "Usage: {start|stop}"
;;
esac

Save and Exit (ctrl+x then y then enter)
Then run


chmod +x /etc/init.d/rtorrent

update-rc.d rtorrent defaults

And that’s it. You could now start rtorrent using “/etc/init.d/rtorrent start”, but it’s just as easy to reboot and test that the startup scripts runs. Once you’ve reboot or started rtorrent you can access the webpage at http://{ip-address or name} Notes:-

This setup is meant to run internally, as such there is no security on the apache setup.

Personally I forward ports 51515-51520 on the router onto the PI, this makes a difference in download speed (much quicker) but as it's opening ports it's a security risk so you'll have to decided whether or not to.

I run this setup behind a vpn using ipredator.se, if there's any demand I'll write up another guide on how to configure that and ensure your traffic is locked to only go over the vpn.

otrs forbidden installer.pl or index.pl

Currently moving an OTRS installation from one server to another. This installation has been running fine for months, I’ll be upgrading as part of the process from 3.2.6 to 3.3.1 but to keep the migration simple I thought I’d just tar up the current OTRS installation and dump the database. Copy them over to the new server extract the files and restore the database and permissions.
This all went well, so I configured apache (copied the existing apache config from the old server). Restart the apache server and tried accessing the page. Kept getting ‘Forbidden’ messages, everything pointing towards permissions, so checked and reran otrs.SetPermissions still no joy. As nothing seemed to be moving forward I decided to wipe the install, and perform a fresh install. Did this but then found that apache just wouldn’t start. This was down to alot of entries in various failes pointing to /opt/otrs/ rather than my installed /usr/local/otrs/ Once I sorted this I was back to encountering ‘Forbidden’ messages again.
After alot of poking around and searching the net I found:-
http://httpd.apache.org/docs/2.4/upgrading.html
I hadn’t considered that apache may have been different between servers, a quick ‘apache2 -v’ on each server confirmed one server running 2.2 and the new running 2.4.

So to solve this I had to replace:-
        Order allow,deny
        Allow from all

with

        Require all granted

all through the apache config. After a service apache2 restart this let me get to the installer.pl so with that working, I’ll be back to extracting the OTRS and database from the old server.
I’ll apologise for any mistypes, I’ve written all this from memory, while having a well deserved coffee.
Old system was running Ubuntu 12.04 new is 13.10?