As I’m sure I’ve posted before, I use OpenVPN and Quagga to build up my network. After recently updating all my Ubuntu servers something strange happened, quagga that had been pretty rock solid started screwing up. Previously I’ve had the odd problem where a VPN would drop out and somehow block coming back up, so I scripted some VPN checks to confirm each link was up. If not the script restarts the VPN link that’s down. This has been working fine on each server and with quagga running routes around the entire network just keep working. Until of course the recent updates on each system that seem to have introduced a fault with quagga. Although quagga is remains running, all the routes disappear and just wont come back. An error does get logged in one of the logs (I’ll try to find what the error was and update here), but the quagga watchdog doesn’t see a problem since everything is still running. So I’ve put together a little script below that checks the routing table, and if there’s no entries relating to other networks (not local) then it’s considered that quagga has gone faulty and restarts it.
nano -w /usr/sbin/check_quagga_routes.sh
#!/bin/bash
checktime=`date`
echo $checktime : Checking Routing... >> /var/log/connection.info
routing=`route -n | grep -i 255.255.255.0 | grep -vi eth0`
if [ -z "$routing" ]
then
# No Routes to VPNs Detected. Restart Quagga
/etc/init.d/quagga restart
# log the restart
echo $checktime : VPN Routes NOT Detected. Restarting Quagga! >> /var/log/connection.info
fi
echo $checktime : Routing Check Complete. >> /var/log/connection.info
exit 0
The crontab entry is:-
*/1 * * * * root /usr/sbin/check_quagga_routes.sh > /dev/null 2>&1
This should mean that I wont have to manually restart quagga again if the fault occurs. Hopefully whatever has happened in the update will be fixed, but there’s no harm in leaving this in place as far as I can see.
Normally I’d opt for Nagios to run a check and on failure run a handler script, but since all the nagios checks and handlers get run across the VPN, as soon as the routes go down nagios is pretty useless. So this has to be run on each of the servers.