dpkg: error processing package libc6:amd64 (–configure):

A nice easy one for 2am 🙁 but it took me hours to work out (and it really shouldn’t have).

Trying to update servers using apt-get upgrade. It listed about 6 packages to be upgraded, but kept throwing an error:

Setting up libc6:amd64 (2.23-0ubuntu4) ...
sh: echo: I/O error
sh: echo: I/O error
dpkg: error processing package libc6:amd64 (--configure):
subprocess installed post-installation script returned error exit status 1
Errors were encountered while processing:
libc6:amd64
E: Sub-process /usr/bin/dpkg returned an error code (1)

I spent a good few hours searching google and trying to work out what had been installed that could be causing an issue with libc6. Though I was fairly certain nothing had been (only because this is 2 days after a migration to new servers, and I’m hitting this problem on 2/4 servers, but they have extremely similar setups).

By chance I noticed:

Filesystem Size Used Avail Use% Mounted on
udev 489M 0 489M 0% /dev
tmpfs 100M 100M 0 100% /run
/dev/vda1 30G 16G 13G 55% /
tmpfs 497M 0 497M 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 497M 0 497M 0% /sys/fs/cgroup
tmpfs 100M 0 100M 0% /run/user/1000

I’ll give you a clue “look at /run” 🙁

Something is eating all the space. So I tried to run ncdu to look for large files (I know there’s other ways, but I like ncdu). But I hadn’t installed it on this new server and I can’t install it with apt-get broken.

Thinking /run is bound to be causing some issues (still not sure if it’s causing this particular issue), I reboot the server (bad move!). It locked up and had to be power cycled. Thankfully it’s a droplet and with DigitalOcean I can power cycle easily (I did try the console but it wouldn’t connect).

Anyway after a reboot, /run started at 5% used, but quickly grew to 70%. but I did managed to install ncdu, and with that I knew the problems I had with apt-get were being caused by a full /run.

After a quick (very quick) look at ncdu /run I could see hhvm.hhbc taking up approx 85Mb+

A quick check of the config and I can see hhvm is configured to do so. So I adjusted the config to put it in /var/cache/hhvm/hhvm.hhbc instead and update the systemd service script to create /var/cache/hhvm and set it’s owner.

Another reboot, everything seems fine and /run is now at 3% used.

And I’ve run apt-get upgrade successfully.

I’m thankful that I noticed, I really thought I’d screwed something on these 2 servers while migrating, and I could see another night of rebuilding new servers ahead of me.

Morale of the story: Check your not out of space when you get weird errors (yes the I/O should have rang some bells, but hey it is 2am).

APT not automatically updating DigitalOcean

I’ve recently noticed a problem on 3 of my Digital Ocean Servers. The APT package lists are not automatically updating every day. I try to keep all servers upto date, and rely on Nagios to inform me when there’s packages needed to be updated and that’s the main reason I noticed something was broken.

The 3 servers in particular are newer builds to the rest of the system, and they dont have near as much installed as the others, so at first I didn’t pay too much attention when other servers were going into warning state on nagios indicating updates but these 3 weren’t. However I would still connect to these servers and run my normal command:-

apt-get update && apt-get upgrade && apt-get dist-upgrade && apt-get autoremove

A few times these servers did install updates and I just thought it must have been my timing, that the package lists hadn’t yet been updated by the cron.daily.

But after this happening a few times, I decided to not run the above and see how long these servers would take for nagios to throw an alert. It never did and that got me a little worried.

Over the last few days I’ve been diagnosing what’s wrong. I started out with making sure cron is working properly. Then kept an eye on the file timestamps

ls -ltrh /var/lib/apt/lists/

Eventually getting to /etc/cron.daily/apt and checking through what was was doing on the working servers compared to the broken ones. I turned on VERBOSE and got a bit of info when running /etc/cron.daily/apt but it seemed to exist quite quicky.

Comparing it to a working server the important bit seemed to be around

+ apt-config shell Debdelta APT::Periodic::Download-Upgradeable-Packages-Debdelta
+ eval
+ [ 1 -eq 0 ]
+ do_cache_backup 0

On the broken servers I was getting

+ [ 0 -eq 0 ]
+ [ 0 -eq 0 ]
+ [ 0 -eq 0 ]
+ [ 0 -eq 0 ]

Then it would exit. Further investigating was showing a few settings were being populated on the working servers but not on the broken ones.

So I compared the directory /etc/apt/apt.conf.d/ on both servers an found the following files missing from the broken servers

00aptitude

Aptitude::Get-Root-Command "sudo:/usr/bin/sudo";

10periodic

APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Download-Upgradeable-Packages "0";
APT::Periodic::AutocleanInterval "0";

15update-stamp

APT::Update::Post-Invoke-Success {"touch /var/lib/apt/periodic/update-success-stamp 2>/dev/null || true";};

20archive

APT::Archives::MaxAge "30";
APT::Archives::MinAge "2";
APT::Archives::MaxSize "500";

99update-notifier

DPkg::Post-Invoke {"if [ -d /var/lib/update-notifier ]; then touch /var/lib/update-notifier/dpkg-run-stamp; fi; if [ -e /var/lib/update-notifier/updates-available ]; then echo > /var/lib/update-notifier/updates-available; fi "; };
(reverse-i-search)`apt-': apt-get update && apt-get upgrade && apt-get dist-upgrade && apt-get autoremove

I think the main one being 10periodic but I didn’t fancy spending days/weeks adding each and waiting to see what happened, so I added them all in one go.

Then run /etc/cron.daily/apt again, and this time it’s taken a lot longer to run i.e I’ve written this entire post and it’s still running, as opposed to exiting within a few seconds earlier.

I’m pretty confident that this has now solved my APT package list not automatically updating problem, and providing it has Nagios will start warning on these 3 servers now.

Ansible Part 1

I already have a Droplet for management so I’m going to be using this for Ansible and a new droplet to test some deployments. I’ve done the following on the server:

add-apt-repository ppa:rquillo/ansible
apt-get update
apt-get install ansible

I have no doubt that I will learn I’ve done bits wrong as I get further in, but I’m going to start with the following hosts config

[initial]
C3PO-1

[balancers]

[backends]

[databases]

[fileservers]

So the first thing I’m going to tackle is setting up new users. I’ve been playing with it for about an hour, although I got the user created very quickly I hit a problem with my setup. I need to send multiple ssh keys for some users (we use different keys on PC’s, Laptops, Mobiles). Every example I found seemed to a) want to pull a key from a file, b) just use one key.

After quite some playing, and trying different things I found a way. This in turn meant I had to slightly change the create user bit of the playbook.

Here’s the users.yml playbook so far

---
- hosts: all

  tasks:
    - name: Add Users from group_vars file
      action: user name={{ item.name }} password={{ item.password }} shell={{ item.shell }} state={{ item.state }} update_password=always
      with_items: users

    - name: Add SSH User Keys from group_vars files
      authorized_key: user={{ item.0.name }} key='{{ item.1 }}'
      with_subelements:
        - users
        - authorized

and uses a group_vars file (group_vars/initial)

---

users:
  - name: NAME
    password: HASHED_PASSWD
    authorized:
      - ssh-rsa SSH_KEY_1
      - ssh-rsa SSH_KEY_2
      - ssh-rsa SSH_KEY_3
      - ssh-rsa SSH_KEY_4
    shell: /bin/bash
    state: present

I did think about pulling the keys in from the authorized_keys files, but not all users are allowed on the management, so I’d have to keep files for them and if I’m going that far, I may as well just keep them in group_vars. It doesn’t look as nice if you cat the file but it’s structured and makes sense.

The last thing I want to do is set out add the user to some groups.

This was nice and easy, add the groups to the group_vars file

    groups: sudo,www-data

Then change the users.yml to add the groups

    action: user name={{ item.name }} password={{ item.password }} shell={{ item.shell }} state={{ item.state }} groups={{ item.groups }} update_password=always

I was worried that it may screw up the users own group as the manual says it delete’s all other groups except the primary, since I haven’t told it a primary using ‘group’ I thought it may be a problem, but thankfully it’s not this just worked.

Well I thought at this point I was pretty finished. Ha not a chance. I added a user for ansible to be able to connect as since I’ll be removing root ssh access. This means I’m going to need to let ansible sudo so I’d best sort that now. Here’s a bit of code I found and changed slightly. Make sure you change the username in the sudoers.d/ansible file

Appended to the end of users.yml

  - name: Ensure /etc/sudoers.d directory is present
    file: path=/etc/sudoers.d state=directory

  - name: Ensure /etc/sudoers.d is scanned by sudo
    action: lineinfile dest=/etc/sudoers regexp="#includedir\s+/etc/sudoers.d" line="#includedir /etc/sudoers.d"

  - name: Add ansible user to the sudoers
    action: 'lineinfile dest=/etc/sudoers.d/ansible state=present create=yes regexp="ansible .*" line="USERNAME ALL=(ALL) NOPASSWD: ALL"'

  - name: Ensure /etc/sudoers.d/ansible file has correct permissions
    action: file path=/etc/sudoers.d/ansible mode=0440 state=file owner=root group=root

On first run you’d use root to connect up. Then you would use

ansible-playbook users.yml -u USERNAME -s

All working. It’s taken about 2 hours to get to the point of deploying a couple of users automatically. I’m not so sure this has saved me time in the long run 🙂 but it’s the first step in a much bigger project. I’m kind of glad it wasn’t just copy and paste others code and stuff broke, it gave me a chance to understand a bit more.

Part 2 will be coming soon. There we’ll lock down SSHd and apply the default firewall rules.

New to Ansible

So If you read my last post (it was really long sorry), you’ll see right at the end the current deployment. I had tried a few managers to be able to deploy/scale the whole system, but it really overcomplicated the whole thing. Chef looked really good (I can’t remember the other one), but it was problematic and just didn’t suit.

Instead I kept with the scripts I had written for the time being. They are in no way good enough to share as they are very customised to my setup but they achieve what I need. However to run them takes quite a bit of initial manual work.

So what do I need from a system:-

  1. It has to just work, not go installing stuff it depends on to run.
  2. It has to be able to split the setup into an initial and running level.
  3. It has to be able to be told easily about a new server and what role it will be, then do the initial setup and onto the relevant running level.
  4. It MUST be simple to use and understand. I’m getting a bit sick of having to read through how to configure weird stuff because someone decided to do things completely differently to how you’d expect it to work.
  5. It must have very low resource requirements. I want to run this from a management droplet that already runs nagios, so it can’t sit there just eating resources while it has nothing to do.

I saw a very quick video on Ansible with someone using it for Raspberry PI’s (something else I have far too many of) and thought right away I should look into that. So here I am. I’m going to do some testing with Ansible.

Initially all I’m looking for it to do is handle the initial setup that I already have scripts for.

  1. Create a new user, add it to the sudo group, set a password and copy the SSH keys.
  2. Reconfigure sshd to deny root connections.
  3. Set the server’s timezone.
  4. Set the keypad option in nanorc.
  5. Set the .bashrc for root to use a red prompt.
  6. Install some basic packages such as screen and htop
  7. Create a swapfile
  8. Install and configure NTP
  9. Setup IPTables (I dont use UFW, I prefer to deploy an iptables file and have this restored when the loopback interface is loaded).

I’ve had a quick look on DigitalOcean community (I love the resources there), but the stuff about using Ansible seems a little more throw it all in one file rather than properly split them out like I saw in the video. I think splitting out each task is a must to be able to understand what’s going on and making changes.

The video I’m referring to is https://www.youtube.com/watch?v=ZNB1at8mJWY

So that’s the start, let’s get going and see what I can screw up

Part 1 is Here

GlusterFS woes

If your looking for the gluster error ‘brick2.mount_dir not present’ 
jump to the end

Time for another post 🙂

I’ve been using DigitalOcean for some time now, and I’m still tweaking my setup. Once thing I really hope they sort soon is proper private lan between your own droplets, for now we just have to use a vpn between them.

Being responsible for a new website can give a lot of headaches, especially when you have to try to guess just how popular it will be. So about a year ago I setup a new droplet to host the new site, testing it was going well and I increased the droplet before we launched to handle a spike. Sadly I underestimated just how busy it would get, based on the numbers I was given I think I was about right but unfortunately those numbers were way off.

But each failure it just another learning curve 🙂 so as quickly as possible that was fixed, then the site got to normal volume so we scaled it back down (yes it’s a whole cost exercise especially when your paying for it). Then we had the lead up to Christmas, in an attempt to not repeat the problems at launch, I changed the whole configuration so that I could (if needed) take a server down while staying partly operational. This kind of worked and was needed when some brightspark promoted the site a day early and we hadn’t scaled back up!

Come the new year I decided it was time to seriously sort the infrastructure for the site. It now has an online store and it’s important it keep running, it’s not just a blog anymore. So I put in place the following setup (working around various obstacles).

DNS:

  • All websites name servers are pointing to cloudflare and they handle the first web connection. It works really well on their free tier, and changes (adding new servers) are pretty quick to take effect.

DigitalOcean Droplets:

  • 1x Server running as a load balancer.
  • 2x Servers running as webservers.
  • 1x Server running as database server.
  • 1x Server running as email (not quite running).
  • At the same time as making this setup I decided to ditch apache and move to nginx, so loadbalancer and webservers are running that.

Software:

  • 4x Nginx (loadbalancer and webservers, and installed on database server for stats).
  • 2x Syncthing (webservers) to keep the www folders in sync.
  • 1x MySQL Server.
  • 4x OpenVPN (connections between loadbalancer and webservers, webservers and database).
  • 1x Redis Server (for session data, I tried nginx load balancing options but it still screwed up if I had to take one of the web servers out for maintenance, so installed Redis on the Database server.

As this progressed I dropped the VPN between loadbalancer and webservers and just use HTTPS/SSL instead. Syncthing already has it’s own SSL built-in so I could leave that over the semi-private LAN. but I really would like to change MySQL to be encrypted and drop the vpn from that too, but find info on doing this for wordpress seems to be non-existent at the moment.

Roll forward a few months, this has been working but still has areas to resolve. Such as syncthing: yes it keeps the folders in sync and is actually really good that I can also store them on another system easily. but it doesn’t listen to the OS for changes to files. Instead it polls every x seconds. Although there’s nothing much changing, updating plugins became a problem if you click the update button it downloads but then nginx sends your next request to another server and now the plugin.zip isn’t there so wordpress throws an error.

My whole reason for running syncthing was I wanted the files to be available on each server independently. So if Server A goes down it doesn’t matter Server B has all the files locally anyway. NFS would still give me a single point of failure. On looking into resolving this though I remembers GlusterFS. I’d played with it a long time ago, but dropped it as a solution (can’t remember what I was doing or why it wasn’t working). Now it’s time to try it again. downside I’m back to needing VPN’s and OpenVPN isn’t the easiest to quickly add a new server.

So I’ve done the following:

  • Added a new server just for the files (I don’t like gluster being in a 2 replica incase there’s a problem, there should be a majority who thinks they are holding the correct file).
  • Swapped out OpenVPN for Tinc, I have to say one of the best decisions. yes there are downsides, it creates a mesh (only doable with OpenVPN by running quagga for manually forcing routes) but I have no idea which Server is actually connected to which Servers. There’s no VPN status and I can’t see how much traffic has gone between 2 particular servers (iptables helps but it’s not 100%)
  • Added another new server for nagios and central logging.

There were a load of changes within a few weeks of each other, but I now have a setup I’m confident I can scale more quickly than ever before. Yes it has single points (load balancer, mysql) but I know if the load balancer has a problem it’s pretty static so can be wiped and redeployed quickly, as well as it will take a few minutes to open the webservers to the world and let cloudflare hit them directly. So MySQL is the real problem and I’ll be addressing that one soon enough.

 

So now onto today’s problem 🙂

I’ve had gluster running a few weeks, and I have our testing website (for theme changes etc) setup on our webservers behind the loadbalancer. The last few days I’ve need to do more extensive testing than just changing bits in a theme, so I’ve decided to split the tester site onto it’s own droplet (still behind the loadbalancer and with VPN to the databases). I thought I may as well make use of Gluster here too (yes it would be in a 2 replica setup itself and the fileserver. I don’t like that idea). So I brought up a new server and configured it: new users, firewall rules, tinc, nginx, php, etc.

I added gluster and copied the /etc/hosts entries over from the other servers. All looked good. I gluster peer probe ServerX and it worked, gluster peer status and I could see it fine. but on trying to add a new volume:

gluster volume create xxx-yyy-zzz replica 2 transport tcp FILESERVER:/GLUSTER/xxx.yyy-zzz TESTSERVER:/GLUSTER/xxx.yyy-zzz force

I was getting the error:

volume create: xxx-yyy-zzz: failed: Commit failed on localhost. Please check the log file for more details.

Checking the logs on both servers would show (maybe slight variation):

[2015-07-28 16:00:41.612907] E [glusterd-hooks.c:328:glusterd_hooks_run_hooks] 0-management: Failed to open dir /var/lib/glusterd/hooks/1/create/pre, due to No such file or directory
[2015-07-28 16:00:41.614499] E [glusterd-volume-ops.c:1811:glusterd_op_create_volume] 0-management: brick2.mount_dir not present
[2015-07-28 16:00:41.614587] E [glusterd-syncop.c:1288:gd_commit_op_phase] 0-management: Commit of operation 'Volume Create' failed on localhost

I tried a series of things to fix it:

I thought maybe the /GLUSTER/xxx-yyy-zzz needed to be created (I already made /GLUSTER) – Nope

  • I detached the peer and reattached – No.
  • I reboot the file server and test server – No.
  • I detached, reboot, reattached – No.
  • I tried creating the volume with just the test server and no replica – No.
  • I tried creating the volume on just the fileserver with no replica – Yes.

So the problem is point to the new system, but it’s a brand new system. They’re peers and connected.

  • I tried uninstalling and reinstalling gluster – No.
  • I tried uninstalling, purging and reinstalling – No.
  • I tried uninstalling, purging, manually deleting the /var/lib/gluster (probably a mistake that I didn’t detach first :() and reinstalling – No.
  • I have no idea why this wont WORK!!!!!

Let’s go further back, check the VPN, ping the servers.

  • Ping fileserver from testserver – Yes.
  • Ping testserver from fileserver – Yes/Hang on that’s the wrong IP!! Yes I’d copied an entry from webserverB into /etc/hosts, update the name but missed the IP address. Idiot! correct that. Ping – OK.
  • Try gluster again – Yes.

So if you’re having problems and seeing brickX.mount_dir not present make sure your DNS between servers is correct.

I don’t really know how the peer probe worked, but I think I must have done that from a servers who’s hosts was correct

xlib_shm Zoneminder Got unexpected memory map file size X, expected Y

After upgrading my Ubuntu system first from 12.04 to 13.10 then onto 14.04.1 I started getting problems with xlib_shm
It wouldn’t start up, all the cameras were working fine from the web interface but nothing to the TV.

So I started troubleshooting and found the following error coming back at me.

shared_data_size=12166644
mem_size=884
Got unexpected memory map file size 4056640, expected 884

It doesn’t really make any sense though, it take you down the path of increasing the shmmax an shmall. But this was all working fine before the update and these values haven’t been changed. Still I increased them just incase.
No joy.
I also tried to recompile xlib_shm incase (being upgraded a few times) too much had changed. This unfortunately fell flat on its face when it can’t compile there’s stuff missing (we’ll come back to that).
After going round in circles for quite a while, I found while checking the camera configs that each camera had been set to 8bit greyscale. I set this back to 24bit colour. And restart. hey presto! my tv is once again filled with cameras.
I do still have a problem at the moment though, although the cams are now displaying they are frozen. I’m not sure though if this is more to do with how I start xlib_shm, and think it may be crashing out but not completely enough to take the display away.
So if your having problems with xlib_shm after upgrading either your OS or Zoneminder, check that your cams are setup properly. I almost missed the colour as it’s nighttime and they are grey anyway 🙂
As for the compile problems. It looks as though upgrading through 2 versions of Ubuntu has removed some packages, I’ll have to try and work out what incase I need to compile again.

Mining DarkCoin on Linux using Digital Ocean

A friend recent;y asked me about DarkCoin, and I had to admit I hadn’t heard anything about it. Sure I have some bitcoins but only really to say I have them. So when he asked I just thought `ah there’s another one`. He then asked how someone would mine these things, So I showed him a few youtube videos of pretty impressive setups.

Anyway to the point, I ran through a quick demonstration of how the miner would work and setup one up on a Virtual Server (Droplet). So for anyone that’s interested here goes (p.s. I can’t say you’ll make a fortune from this method, it’s more proof of concept).

We’ll be using DigitalOcean for our server, so you’ll need an account. Signup here There’s always some promo codes for some credit at present SSDMAY10 give you an extra $10 credit. I personaly use DigitalOcean to host a few of my projects and they’re great for being able to test ideas.

Once you have your account setup, Create a new Droplet I normally use the Latest Ubuntu.
So the specs I’m testing this on are:-

Hostname : Testy
Size : 512MB, 1CPU, 20GB SSD, 1TB XFER $5pm
Region : Amsterdam 2
Image : Ubuntu 14.04 x64
SSH Key : testy (optional, you may not have added any ssh keys so you can skip this selection)
Settings : Enable VirtIO - Checked, Private Networking - Unchecked, Enable Backups - Unchecked.

Once created you will be emailed your root password (unless you selected SSH Keys).
More CPU’s would be better, but this one’s just for testing it out.

Using Putty login to your new Droplet

Now we’ll install some dependencies, and download the mining git.

apt-get update
apt-get install build-essential m4 libssl-dev libdb++-dev libboost-all-dev libminiupnpc-dev git automake libcurl4-openssl-dev screen

git clone https://github.com/ig0tik3d/darkcoin-cpuminer-1.2c.git

cd darkcoin-cpuminer-1.2c
chmod a+x autogen.sh
./autogen.sh
./configure
make
make install
cd ~

If all has gone well you can run the miner using

minerd -a X11 -o stratum+tcp://drkpool.com:3333 -u bighippo999.testy -p password

This should output something similar to

One thing to note, the above command will test under MY pool test login. You will not receive any coins doing this and should only be used as a test. You can stop this using CTRL+C

As long as that all seems to be ok, we should now join a pool. I’m currently using DarkCoin Official Pool I have no idea how ‘Official’ it is, but I was drawn to the graphs 🙂 I did briefly run a miner on windows using this guide and received coins directly into my wallet. Whereas using the pool I’ll have to cash anything out to my wallet and that will incur a charge. I would have also shown using the servers from the other guide, but at present they seem to be down I suppose that’s why this Pool does charge small admin % on each transaction. I can’t say I mind for good service.

Once you’ve signed up and have your details just run the miner again

e.g. minerd -a X11 -o server:port -u user.worker -p password

Replacing server:port user.worker and password with your own details.

If your concerned of your pool failing a handy script a come across

#!/bin/bash
## Miner Failover Script

## Will continously try each pool until one responds, ordered by priority.

#GLOBAL
## Set options
RETRIES="3"
SECONDS="5"
MINERD="minerd"

#POOL1
## Set userpass information for first pool
USERPASS1="name.worker:password"
URL1="stratum+tcp://lotterymining.com"

#POOL2
## Set userpass information for second pool
USERPASS2="name.worker:password"
URL2="stratum+tcp://www.drkpool.com:3333"

while :
do
$MINERD -a X11 --retries=$RETRIES --retry-pause=$SECONDS --userpass=$USERPASS1 --url=$URL1
$MINERD -a X11 --retries=$RETRIES --retry-pause=$SECONDS --userpass=$USERPASS2 --url=$URL2
done

You can copy and past this into a new document

e.g. nano -w startmining.sh

Paste the code.
CTRL+X, then Y and enter. Will exit saving the file.
Then issue

chmod +x startmining.sh

To make the file executable. Lastly it’s always better to run processes you want to keep running (but also be able to check on) in a screen.
The command

screen -d -m -S MINER minerd -a X11 -o stratum+tcp://drkpool.com:3333 -u bighippo999.testy -p password

Will start the miner in a screen, and you can view this with

screen -r

Then either disconnect with CTRL+A, then D or stop the miner using CTRL+C.
If you used the above script you can start that in a screen with

screen -d -m -S MINERPOOL ./startmining.sh

Again to view it

screen -r

Then either disconnect with CTRL+A, then D or stop the miner using CTRL+C.

EDIT:
The other pool I’m using came back online check it out here. This would be started with the following

minerd -a X11 -o http://q30.qhor.net:7903 -u Xn7JauXvQmorx82yEN2EMvbt3dX45uoiCh -p password

I hope you’ve found some of this useful, if you feel like sending me a few DarkCoins my address Xn7JauXvQmorx82yEN2EMvbt3dX45uoiCh and my BitCoin address 1DpfEhiVjNM4WZT49X18m3vXyUvGpuvz9i

Nagios Twitter Alerts

I’ve had nagios running for years, so decided to play around with the alerts.
Twitter seemed the obvious choice, it’s easy for a people to follow the twitter account that’s publishing the alerts, and great if you actively use twitter (I don’t, this was more of a ‘how would you’ rather than a need).
First thing is to register a twitter account that nagios will publish as. I setup https://twitter.com/NagiosStarB
Once you’ve registered you need to edit your profile and add a mobile phone number (This is needed before you can change the app permissions later. Once you’ve done that you can delete the mobile number).
Now head over to https://apps.twitter.com/ and create a new app
Fill in Name, Description and Website (This isn’t particularly important, as we’re not pushing this app out to users).
You’ll be taken straight into the new app (if not simply click on it).

We need to change the Access Level, Click on ‘modify app permissions’

I chose ‘Read, Write and Access direct messages’, although ‘Read and Write’ would be fine. Click ‘Update settings’ (If you didn’t add your mobile number to your account earlier, you;ll get an error).
Now click ‘API Keys’

You need to copy the API key and API secret (Please dont try to use mine).
Now click ‘create my access token’ close to the bottom of the page.

You also need to copy your ‘Access token’ and ‘Access tocken secret’

Now we move onto the notification script.
Login to your Nagios server via SSH.
You need to ensure you have python-dev & python-pip installed.

apt-get install python-dev python-pip
pip install tweepy

Then cd into your nagios libexec folder (mines at /usr/local/nagios/libexec)

cd /usr/local/nagios/libexec/

We now add a new file called twitternagiosstatus.py

nano -w twitternagiosstatus.py

Copy and paste the following code into the file

#!/usr/bin/env python2.7
# tweet.py by Alex Eames http://raspi.tv/?p=5908
import tweepy
import sys
import logging

# Setup Debug Logging
logging.basicConfig(filename='/tmp/twitternagios.log',level=logging.DEBUG)
logging.debug('Starting Debug Log')

# Consumer keys and access tokens, used for OAuth
consumer_key = 'jNgRhCGx7NzZn1Cr01mucA'
consumer_secret = 'nTUDfUo0jH2oYyG8i6qdyrQXfwQ6QXT7dwjVykrWho'
access_token = '2360118330-HP5bbGQgTw5F1UIN3qOjdtvqp1ZkhxlHroiETIQ'
access_token_secret = 'rXjXwfoGGNKibKfXHw9YYL927kCBQiQL58Br0qMdaI5tB'

# OAuth process, using the keys and tokens
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

# Creation of the actual interface, using authentication
api = tweepy.API(auth)

if len(sys.argv) >= 2:
    tweet_text = sys.argv[1]
    logging.debug('Argument #1 ' + tweet_text)

    if len(tweet_text) <= 140:
        logging.debug('Tweeting: ' + tweet_text)
        api.update_status(tweet_text)
    else:
        print "tweet sent truncated. Too long. 140 chars Max."
        logging.debug('Too Long. Tweet sent truncated.')
        api.update_status(tweet_text[0:140])

Replace consumer_key with your API key, consumer_secret with your API secret, access_token with your access token and access_token_secret with your Access token secret.
Now save and exit the editor.

CTRL+x then Y then Enter.

With the file saved, we need to make it executable.

chmod +x twitternagiosstatus.py

You can now test that the script works by typing

./twitternagiosstatus.py "testy testy"

You should now be able to see the Tweet on your new account (you may need to refresh the page).
If all has gone well so far, you can now add your Nagios Configuration.
Change Directory into your nagios etc

cd /usr/local/nagios/etc/

Edit your commands.cfg (mine is inside objects)

nano -w objects/commands.cfg

Where you choose to place the new configurations doesn’t really matter, but to keep things in order I choose just below the email commands.
Copy and paste the following

# 'notify-host-by-twitter' command definition
define command{
        command_name    notify-host-by-twitter
        command_line    /usr/local/nagios/libexec/twitternagiosstatus.py "$NOTIFICATIONTYPE$: $HOSTALIAS$ is $HOSTSTATE$"
}

# 'notify-service-by-twitter' command definition
define command{
        command_name    notify-service-by-twitter
        command_line    /usr/local/nagios/libexec/twitternagiosstatus.py "$NOTIFICATIONTYPE$: $SERVICEDESC$ ON $HOSTALIAS$ is $SERVICESTATE$"
}

You can adjust the specifics, but adding other $$ arguments (Use the email notification commands as an example). Save and exit

CTRL+x, then Y, then ENTER

Now we add a new contact. Edit contacts.cfg

nano -w objects/contacts.cfg

Copy and Paste the following

define contact{
        contact_name                    nagios-twitter
        alias                           Nagios Twitter

        service_notification_period     24x7
        host_notification_period        24x7
        service_notification_options    w,u,c,r,f
        host_notification_options       d,u,r,f,s
        service_notification_commands   notify-service-by-twitter
        host_notification_commands      notify-host-by-twitter
        }

define contactgroup{
        contactgroup_name       nagiostwitter
        alias                   Nagios Twitter Notifications
        members                 nagios-twitter
        }

I decided to create a specific contact and contact-group for this, but you can adjust as you wish, add the contact to other contact-groups if you wish.
Now the last bit,
Add the new contact group to the hosts & services, templates or host-groups and service-groups.
How you decide to do this will depend on how you’ve set out your hosts, services, templates and contacts. For me I edit the each of the host files and add contact_groups  nagiostwitter to each host and service.
(IMPORTANT: this will override settings that are inherited from templates, so if you already have email notifications active you’ll either have to just add nagiostwitter to the template or add users to this). Dont forgot to , delimited
An example host of mine

define host{
        use                     linux-server            ; Name of host template$
                                                        ; This host definition $
                                                        ; in (or inherited by) $
        host_name               excalibur
        alias                   Excalibur
        address                 192.168.1.27
        parents                 switch-netgear8
        hostgroups              linux-servers
        statusmap_image         linux40.gd2
        contact_groups          nagiostwitter,sysadm
        }

An example service on this host

define service{
        use                             generic-service         ; Name of servi$
        host_name                       excalibur
        service_description             PING
        check_command                   check_ping!100.0,20%!500.0,60%
        contact_groups                  nagiostwitter,sysadm
        }

That’s it, hopefully if all’s done right you can restart the nagios service.

/etc/init.d/nagios restart

Now your twitter feed will start to be populated with each alert. I can’t emphasis enough that if the nagios configuration is done wrong you may break other alerts that are already setup.
I really need to thank http://raspi.tv/2013/how-to-create-a-twitter-app-on-the-raspberry-pi-with-python-tweepy-part-1#install here as I used this as a starting point.

UPDATE:
A few weeks ago I received an email from twitter telling me my application had been blocked for write operations. It also said to check the Twitter API Terms of Service. I didn’t think this would cause a problem, I’m not spamming anyone other than myself or users I’ve asked to follow the alerts. So I read the Terms of Service, and it’s all fine. I raised a support request with Twitter and had a very quick response saying “Twitter has automated systems that find and disable abusive API keys in bulk. Unfortunately, it looks like your application got caught up in one of these spam groups by mistake. We have reactivated the API key and we apologize for the inconvenience.”
This did stop my alerts for a few days though.So just be aware of this.

UPDATE 2:
Thanks to a comment from Claudio to truncate messages over 140 characters. I’ve incorporated this recommendation into the code above.

Raspberry PI + RTorrent +Apache2 + RUTorrent

I’ve been using one of my PI’s as a torrent server for some time. Recently I decided to refresh the entire system. This will NOT go into the legalities of downloading anything, I expect everyone to only be using this for downloading raspberry images 🙂

Version Info:
2014-01-07-wheezy-raspbian.img
libtorrent 0.13.2
rtorrent 0.9.2
rutorrent 3.6

I’m going to assume you can SSH to your PI, and recommend you get all the latest updates before you start. I’m also going to be naughty and be running all the commands as root.
sudo su –

Then we’ll get the stuff needed to compile rtorrent and a few things needed for the plugins

apt-get install subversion build-essential automake libtool libcppunit-dev libcurl3-dev libsigc++-2.0-dev libxmlrpc-c-dev unzip unrar-free curl libncurses-dev apache2 php5 php5-cli php5-curl libapache2-mod-scgi mediainfo ffmpeg screen

While your waiting may as well get a coffee. With that all finished we’re going to grab the rtorrent packages.

mkdir /root/rtorrent
cd /root/rtorrent

wget http://libtorrent.rakshasa.no/downloads/libtorrent-0.13.2.tar.gz
wget http://libtorrent.rakshasa.no/downloads/rtorrent-0.9.2.tar.gz
wget http://dl.bintray.com/novik65/generic/rutorrent-3.6.tar.gz
wget http://dl.bintray.com/novik65/generic/plugins-3.6.tar.gz

tar xvf libtorrent-0.13.2.tar.gz
tar xvf rtorrent-0.9.2.tar.gz
tar xvf rutorrent-3.6.tar.gz
tar xvf plugins-3.6.tar.gz

Now that you’ve got everything extracted it’s time to compile and install libtorrent

cd /root/rtorrent/libtorrent-0.13.2
./autogen.sh
./configure
make
make install

With libtorrent installed it’s time to compile and install rtorrent

cd /root/rtorrent/rtorrent-0.9.2
./autogen.sh
./configure --with-xmlrpc-c
make
make install
ldconfig

Once you’ve reached this bit, we’re finished with the hanging around. We’ll now install rutorrent and it’s plugins

cd /root/rtorrent
rm /var/www/index.html
cp -r rutorrent/* /var/www/
cp -r plugins/* /var/www/plugins
chown -R www-data:www-data /var/www
a2enmod scgi
service apache2 restart

We’ll next create a new user account for rtorrent to run as

adduser -m -r rtorrent

Now we switch to the new user account to add the required rtorrent directories and config

su rtorrent
mkdir .sessions
mkdir complete
mkdir torrents
mkdir watch
nano -w .rtorrent.rc

Copy and Paste the following:-

# This is an example resource file for rTorrent. Copy to
# ~/.rtorrent.rc and enable/modify the options as needed. Remember to
# uncomment the options you wish to enable.

# Maximum and minimum number of peers to connect to per torrent.
#min_peers = 40
#max_peers = 100

# Same as above but for seeding completed torrents (-1 = same as downloading)
#min_peers_seed = 10
#max_peers_seed = 50

# Maximum number of simultanious uploads per torrent.
#max_uploads = 15

# Global upload and download rate in KiB. "0" for unlimited.
download_rate = 0
upload_rate = 100

# Default directory to save the downloaded torrents.
directory = ~/torrents

# Default session directory. Make sure you don't run multiple instance
# of rtorrent using the same session directory. Perhaps using a
# relative path?
session = ~/.sessions

# Watch a directory for new torrents, and stop those that have been
# deleted.
schedule = watch_directory,5,5,load_start=~/watch/*.torrent
schedule = untied_directory,5,5,stop_untied=~/watch/*.torrent

# Close torrents when diskspace is low.
schedule = low_diskspace,5,10,close_low_diskspace=200M

# Stop torrents when reaching upload ratio in percent,
# when also reaching total upload in bytes, or when
# reaching final upload ratio in percent.
# example: stop at ratio 2.0 with at least 200 MB uploaded, or else ratio 20.0
#schedule = ratio,60,60,"stop_on_ratio=200,200M,2000"
#schedule = ratio,5,5,"stop_on_ratio=1,1M,10"
ratio.enable=
ratio.min.set=1
ratio.max.set=2
ratio.upload.set=1K
system.method.set = group.seeding.ratio.command, d.close=, d.stop=

# Set Schedules
#schedule = throttle_1,00:10:00,24:00:00,download_rate=0
#schedule = throttle_2,07:50:00,24:00:00,download_rate=200

# Stop Seeding When complete
#system.method.set_key = event.download.finished,1close_seeding,d.close=
#system.method.set_key = event.download.finished,2stop_seeding,d.stop=

# The ip address reported to the tracker.
#ip = 127.0.0.1
#ip = rakshasa.no

# The ip address the listening socket and outgoing connections is
# bound to.
##bind = 127.0.0.1
#bind = rakshasa.no

# Port range to use for listening.
port_range = 51515-51520

# Start opening ports at a random position within the port range.
#port_random = no

# Check hash for finished torrents. Might be usefull until the bug is
# fixed that causes lack of diskspace not to be properly reported.
#check_hash = no

# Set whetever the client should try to connect to UDP trackers.
use_udp_trackers = yes

# Alternative calls to bind and ip that should handle dynamic ip's.
#schedule = ip_tick,0,1800,ip=rakshasa
#schedule = bind_tick,0,1800,bind=rakshasa

# Encryption options, set to none (default) or any combination of the following:
# allow_incoming, try_outgoing, require, require_RC4, enable_retry, prefer_plaintext
#
# The example value allows incoming encrypted connections, starts unencrypted
# outgoing connections but retries with encryption if they fail, preferring
# plaintext to RC4 encryption after the encrypted handshake
#
# encryption = allow_incoming,enable_retry,prefer_plaintext
#encryption = allow_incomming,try_outgoing

# Enable DHT support for trackerless torrents or when all trackers are down.
# May be set to "disable" (completely disable DHT), "off" (do not start DHT),
# "auto" (start and stop DHT as needed), or "on" (start DHT immediately).
# The default is "off". For DHT to work, a session directory must be defined.
#
# dht = auto
dht = off

# UDP port to use for DHT.
#
# dht_port = 6881

# Enable peer exchange (for torrents not marked private)
#
# peer_exchange = yes
peer_exchange = no

#
# Do not modify the following parameters unless you know what you're doing.
#

# Hash read-ahead controls how many MB to request the kernel to read
# ahead. If the value is too low the disk may not be fully utilized,
# while if too high the kernel might not be able to keep the read
# pages in memory thus end up trashing.
#hash_read_ahead = 10

# Interval between attempts to check the hash, in milliseconds.
#hash_interval = 100

# Number of attempts to check the hash while using the mincore status,
# before forcing. Overworked systems might need lower values to get a
# decent hash checking rate.
#hash_max_tries = 10

#Added for rutorrent stuff
encoding_list = UTF-8
#scgi_local = /tmp/rpc.socket
#schedule = chmod,0,0,"execute=chmod,777,/tmp/rpc.socket"
scgi_port = localhost:5000

# Start The Plugins when Rtorrent Starts not when the page is first opened. If apache service is restart separately the plugins are likely to be stopped. Only really needed for RSS feeds.
execute = {sh,-c,/usr/bin/php /var/www/php/initplugins.php &}

Save and Exit (ctrl+x then y then enter)
We now need to perform a test run of rtorrent.

rtorrent

It should start without any problems. You may get a few warnings inside rtorrent, but it should still be running. To Exit press ctrl+q.
You should now exit the rtorrent user.

exit

Finally we’re going to setup rtorrent to automatically start when the PI is powered up.

nano -w /etc/init.d/rtorrent

Copy and Paste the Following:-

#!/bin/bash

# To start the script automatically at bootup type the following command
# update-rc.d torrent defaults 99

RTUSER=rtorrent
TORRENT=/usr/local/bin/rtorrent

case $1 in
start)
#display to user that what is being started
echo "Starting rtorrent..."
sleep 4
#start the process and record record it's pid
rm /home/rtorrent/.sessions/rtorrent.lock
start-stop-daemon --start --background --pidfile /var/run/rtorrent.pid --make-pidfile --exec /bin/su -- -c "/usr/bin/screen -dmUS torrent $TORRENT" $RTUSER
## start-stop-daemon --start --background --exec /usr/bin/screen -- -dmUS torrent $TORRENT
#output failure or success
#info on how to interact with the torrent
echo "To interact with the torrent client, you will need to reattach the screen session with following command"
echo "screen -r torrent"
if [[ $? -eq 0 ]]; then
echo "The process started successfully"
else
echo "The process failed to start"
fi
;;

stop)
#display that we are stopping the process
echo "Stopping rtorrent"
#stop the process using pid from start()
start-stop-daemon --stop --name rtorrent
#output success or failure
if [[ $? -eq 0 ]]; then
echo "The process stopped successfully"
else
echo "The process failed to stop"
fi
;;

*)
# show the options
echo "Usage: {start|stop}"
;;
esac

Save and Exit (ctrl+x then y then enter)
Then run


chmod +x /etc/init.d/rtorrent

update-rc.d rtorrent defaults

And that’s it. You could now start rtorrent using “/etc/init.d/rtorrent start”, but it’s just as easy to reboot and test that the startup scripts runs. Once you’ve reboot or started rtorrent you can access the webpage at http://{ip-address or name} Notes:-

This setup is meant to run internally, as such there is no security on the apache setup.

Personally I forward ports 51515-51520 on the router onto the PI, this makes a difference in download speed (much quicker) but as it's opening ports it's a security risk so you'll have to decided whether or not to.

I run this setup behind a vpn using ipredator.se, if there's any demand I'll write up another guide on how to configure that and ensure your traffic is locked to only go over the vpn.

otrs forbidden installer.pl or index.pl

Currently moving an OTRS installation from one server to another. This installation has been running fine for months, I’ll be upgrading as part of the process from 3.2.6 to 3.3.1 but to keep the migration simple I thought I’d just tar up the current OTRS installation and dump the database. Copy them over to the new server extract the files and restore the database and permissions.
This all went well, so I configured apache (copied the existing apache config from the old server). Restart the apache server and tried accessing the page. Kept getting ‘Forbidden’ messages, everything pointing towards permissions, so checked and reran otrs.SetPermissions still no joy. As nothing seemed to be moving forward I decided to wipe the install, and perform a fresh install. Did this but then found that apache just wouldn’t start. This was down to alot of entries in various failes pointing to /opt/otrs/ rather than my installed /usr/local/otrs/ Once I sorted this I was back to encountering ‘Forbidden’ messages again.
After alot of poking around and searching the net I found:-
http://httpd.apache.org/docs/2.4/upgrading.html
I hadn’t considered that apache may have been different between servers, a quick ‘apache2 -v’ on each server confirmed one server running 2.2 and the new running 2.4.

So to solve this I had to replace:-
        Order allow,deny
        Allow from all

with

        Require all granted

all through the apache config. After a service apache2 restart this let me get to the installer.pl so with that working, I’ll be back to extracting the OTRS and database from the old server.
I’ll apologise for any mistypes, I’ve written all this from memory, while having a well deserved coffee.
Old system was running Ubuntu 12.04 new is 13.10?