dpkg: error processing package libc6:amd64 (–configure):

A nice easy one for 2am 🙁 but it took me hours to work out (and it really shouldn’t have).

Trying to update servers using apt-get upgrade. It listed about 6 packages to be upgraded, but kept throwing an error:

Setting up libc6:amd64 (2.23-0ubuntu4) ...
sh: echo: I/O error
sh: echo: I/O error
dpkg: error processing package libc6:amd64 (--configure):
subprocess installed post-installation script returned error exit status 1
Errors were encountered while processing:
libc6:amd64
E: Sub-process /usr/bin/dpkg returned an error code (1)

I spent a good few hours searching google and trying to work out what had been installed that could be causing an issue with libc6. Though I was fairly certain nothing had been (only because this is 2 days after a migration to new servers, and I’m hitting this problem on 2/4 servers, but they have extremely similar setups).

By chance I noticed:

Filesystem Size Used Avail Use% Mounted on
udev 489M 0 489M 0% /dev
tmpfs 100M 100M 0 100% /run
/dev/vda1 30G 16G 13G 55% /
tmpfs 497M 0 497M 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 497M 0 497M 0% /sys/fs/cgroup
tmpfs 100M 0 100M 0% /run/user/1000

I’ll give you a clue “look at /run” 🙁

Something is eating all the space. So I tried to run ncdu to look for large files (I know there’s other ways, but I like ncdu). But I hadn’t installed it on this new server and I can’t install it with apt-get broken.

Thinking /run is bound to be causing some issues (still not sure if it’s causing this particular issue), I reboot the server (bad move!). It locked up and had to be power cycled. Thankfully it’s a droplet and with DigitalOcean I can power cycle easily (I did try the console but it wouldn’t connect).

Anyway after a reboot, /run started at 5% used, but quickly grew to 70%. but I did managed to install ncdu, and with that I knew the problems I had with apt-get were being caused by a full /run.

After a quick (very quick) look at ncdu /run I could see hhvm.hhbc taking up approx 85Mb+

A quick check of the config and I can see hhvm is configured to do so. So I adjusted the config to put it in /var/cache/hhvm/hhvm.hhbc instead and update the systemd service script to create /var/cache/hhvm and set it’s owner.

Another reboot, everything seems fine and /run is now at 3% used.

And I’ve run apt-get upgrade successfully.

I’m thankful that I noticed, I really thought I’d screwed something on these 2 servers while migrating, and I could see another night of rebuilding new servers ahead of me.

Morale of the story: Check your not out of space when you get weird errors (yes the I/O should have rang some bells, but hey it is 2am).

Nginx + WordPress + Fail2Ban + CloudFlare

I hate being woken at 2am, to be told “we’re under attack!”. Well that’s pretty much this morning 🙁

Now to be fair, it wouldn’t have even been noticed. Our servers and setup handled it very well and it was only noticed during a switch over to a backup server with less resources during maintenance.

On checking the logs I could see lots of attempts like

X.Y.X.Y - - [28/Aug/2016:03:12:16 +0100] "POST /wp-login.php HTTP/1.0" 200 5649 "-" "-"

We’re walking 114,893 requests to just the one server. My first instinct is to add the offending IP address to our IPTables BLOCK list and just drop the traffic altogether. Except this no longer works with CloudFlare since the IP addresses the firewall will see is theirs not the offender!

No problem, CloudFlare deals with this(!?) I can just turn on ‘Under Attack’ mode and let them handle it. This is where you quickly learn it doesn’t really do much. Personally I got a lovely 5 second delay when accessing our websites with ‘Under Attack’ activated. but our servers were still being bombarded with requests. So I added the offending IP addresses to the firewall on CloudFlare. Surely that will block them from even reaching our servers! While I can’t say it had no effect, out of a number of IP addresses I had added some were still hitting our servers quite heavily.

So the question becomes ‘How can I drop the traffic at nginx level?’. Nginx is configured to restore the real IP addresses, so I should be able to block the real offenders here not CloudFlare.

Turns out to be pretty easy. Add:

### Block spammers and other unwanted visitors ###
include blockips.conf;

Into the http section in /etc/nginx/nginx.conf

http {

### Block spammers and other unwanted visitors ###
include blockips.conf;

...

}

Then create the /etc/nginx/blockips.conf:

deny X.Y.X.Y;

Just add a deny line for each offending IP. Then reload nginx (service nginx reload). I’d recommend testing your new config first (nginx -t)

Now with that done of both servers I’m still seeing requests but they are now getting 403 errors and not allowed to hit any of the websites on our servers 🙂 after another 5 minutes of attack they clearly gave up as all the requests stopped.

But we’re not done. What if they’re just changing IP addresses? we need this to be automatic. Enter Fail2Ban. I haven’t used this in some time but I know what I need it to do.

  1. Check all the websites log files for wp-login.php attempts
  2. Write to /etc/nginx/blockips.conf
  3. Reload nginx

Should be quite simple. Turns out it is, but it took hours trying to figure out how since all the guides seem to think you’ll be using Fail2Ban to configure IPTables.

Here’s the files/configuration I added for Fail2Ban

# WordPress brute force auth filter
#
# Block IPs trying to auth wp wordpress
#
# Matches e.g.
# X.Y.X.Y - - [28/Aug/2016:03:12:16 +0100] "POST /wp-login.php HTTP/1.0" 200 5649 "-" "-"
#

[Definition]
failregex = ^<HOST> .* "POST /wp-login.php.*200
ignoreregex =</pre>

/etc/fail2ban/jail.d/wordpress-auth.conf
<!--?prettify linenums=true?-->
<pre class="prettyprint">[wordpress-auth]
enabled = true
filter = wordpress-auth
action = wordpress-auth
logpath = /var/log/nginx/*access*.log
bantime = 1200
maxretry = 8
# Fail2Ban configuration file based on dummy.conf
#
# Author: JD
#
#

[Definition]

# Option: actionstart
# Notes.: command executed once at the start of Fail2Ban.
# Values: CMD
#
actionstart = touch /etc/nginx/blockips.conf
service nginx reload

# Option: actionstop
# Notes.: command executed once at the end of Fail2Ban
# Values: CMD
#
# dont do this actionstop = echo "" &gt; /etc/nginx/blockips.conf
# service nginx reload
actionstop =

# Option: actioncheck
# Notes.: command executed once before each actionban command
# Values: CMD
#
actioncheck =

# Option: actionban
# Notes.: command executed when banning an IP. Take care that the
# command is executed with Fail2Ban user rights.
# Tags: See jail.conf(5) man page
# Values: CMD
#
actionban = echo "deny &lt;ip&gt;;" &gt;&gt; /etc/nginx/blockips.conf
service nginx reload

# Option: actionunban
# Notes.: command executed when unbanning an IP. Take care that the
# command is executed with Fail2Ban user rights.
# Tags: See jail.conf(5) man page
# Values: CMD
#
actionunban = sed -i "/&lt;ip&gt;/d" /etc/nginx/blockips.conf
service nginx reload

[Init]

init = 123

It’s pretty simple and will need some tweaking. I’m not so sure 8 requests in 20 mins is very good. We do have customers on some of our sites who regularly forget their password. The regex does look at the 200 code, I read that a successful auth would actually give a 304. Not sure if this is correct so will need some testing. I also found other information on how to change the code to a 403 for failed login attempt. I think this would be a huge plus, but I’m not looking into that tonight.

A few tests using curl and I can see Fail2Ban has done it’s job, added the IP to the nginx blockips file and reload nginx 😀 I’m not too worried about syncing this across servers, as shown tonight they will balance well and I’m confident that within a few minutes of being bombarded they would both independently block the IP.

So there’s my morning of working around using CloudFlare but still keeping some form of block list. Hope this helps someone. Please comment to let me know if you found any of this useful/worth reading.

WooCommerce API: No order data specified to edit order

Yesterday I received my Barcode scanner, after a little playing scanning everything in sight I got to work on programming a Raspberry PI in Python to use the barcode scanner to update orders in WooCommerce.

I’m not going to go into full details in this post on how. but I will write it up soon.

A BIG problem I hit though was updating orders using the WooCommerce API. I kept getting error code 400 ‘No order data specified to edit order’. I’d followed the example but changed it to fit my code i.e dont print the output but return it for error checking.

Searching google for that exact phrase gave just 1 result, and it’s someone commenting they are getting the error with no response (and it’s 4 months old on a 2 year old post).

After looking through the API code and trying to follow along (I’m not the best programmer but I can generally follow what it’s trying). I found it looking for ‘order’

So after looking at how the bulk update is done and with a bit of playing I found that instead of

data = {'status': 'completed'}

you need

data = {"order":{"status":"completed"}}

Hey Presto it works. My specific code is now

def WooCommerceUpdateOrder(order, data):
   global wcapi
   putstring = "orders/%s" % order
   result = wcapi.put(putstring, data)
   return result

which is called using

data = { 'order': { "status": "completed" } }
updateresult = WooCommerceUpdateOrder(order, data)

Hope this helps someone.  I’ll post the full python program for the barcode reader later in the week. Basically it uses a PI, PI LCD Screen with 4 buttons, Barcode Reader, Speakers. 🙂

APT not automatically updating DigitalOcean

I’ve recently noticed a problem on 3 of my Digital Ocean Servers. The APT package lists are not automatically updating every day. I try to keep all servers upto date, and rely on Nagios to inform me when there’s packages needed to be updated and that’s the main reason I noticed something was broken.

The 3 servers in particular are newer builds to the rest of the system, and they dont have near as much installed as the others, so at first I didn’t pay too much attention when other servers were going into warning state on nagios indicating updates but these 3 weren’t. However I would still connect to these servers and run my normal command:-

apt-get update &amp;&amp; apt-get upgrade &amp;&amp; apt-get dist-upgrade &amp;&amp; apt-get autoremove

A few times these servers did install updates and I just thought it must have been my timing, that the package lists hadn’t yet been updated by the cron.daily.

But after this happening a few times, I decided to not run the above and see how long these servers would take for nagios to throw an alert. It never did and that got me a little worried.

Over the last few days I’ve been diagnosing what’s wrong. I started out with making sure cron is working properly. Then kept an eye on the file timestamps

ls -ltrh /var/lib/apt/lists/

Eventually getting to /etc/cron.daily/apt and checking through what was was doing on the working servers compared to the broken ones. I turned on VERBOSE and got a bit of info when running /etc/cron.daily/apt but it seemed to exist quite quicky.

Comparing it to a working server the important bit seemed to be around

+ apt-config shell Debdelta APT::Periodic::Download-Upgradeable-Packages-Debdelta
+ eval
+ [ 1 -eq 0 ]
+ do_cache_backup 0</pre>
On the broken servers I was getting
<pre>+ [ 0 -eq 0 ]
+ [ 0 -eq 0 ]
+ [ 0 -eq 0 ]
+ [ 0 -eq 0 ]

Then it would exit. Further investigating was showing a few settings were being populated on the working servers but not on the broken ones.

So I compared the directory /etc/apt/apt.conf.d/ on both servers an found the following files missing from the broken servers

00aptitude

Aptitude::Get-Root-Command "sudo:/usr/bin/sudo";

10periodic

APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Download-Upgradeable-Packages "0";
APT::Periodic::AutocleanInterval "0";

15update-stamp

APT::Update::Post-Invoke-Success {"touch /var/lib/apt/periodic/update-success-stamp 2&gt;/dev/null || true";};

20archive

APT::Archives::MaxAge "30";
APT::Archives::MinAge "2";
APT::Archives::MaxSize "500";

99update-notifier

DPkg::Post-Invoke {"if [ -d /var/lib/update-notifier ]; then touch /var/lib/update-notifier/dpkg-run-stamp; fi; if [ -e /var/lib/update-notifier/updates-available ]; then echo &gt; /var/lib/update-notifier/updates-available; fi "; };
(reverse-i-search)`apt-': apt-get update &amp;&amp; apt-get upgrade &amp;&amp; apt-get dist-upgrade &amp;&amp; apt-get autoremove

I think the main one being 10periodic but I didn’t fancy spending days/weeks adding each and waiting to see what happened, so I added them all in one go.

Then run /etc/cron.daily/apt again, and this time it’s taken a lot longer to run i.e I’ve written this entire post and it’s still running, as opposed to exiting within a few seconds earlier.

I’m pretty confident that this has now solved my APT package list not automatically updating problem, and providing it has Nagios will start warning on these 3 servers now.

Redis Sentinel PHP Session Handler

In the last few weeks I’ve been rebuilding servers & all services. I like to do this every so often to clear out any old junk, and take the opportunity to improve/upgrade systems and documentation.

This time around it’s kind of a big hit though. While I’ve had some services running with HA in mind, most would require some manual intervention to recover. So the idea this time was to complete what I’d previously started.

So far there’s 2 main changes I’ve made:

  1. Move from MySQL Master/Master setup to MariaDB cluster.
  2. Get redis working as HA (which is why your here).

I’ll bore everyone in another post with the issues on MariaDB cluster. But this one concentrates on Redis.

The original setup was 2 redis servers running, 1 master 1 slave. With php session handler configured against a hostname which was listed in /etc/hosts. However this time as I’m running a MariaDB cluster, it kind of made sense to try out a redis cluster (dont stop reading yet). After reading lots, I decided on 3 Servers each running a master and 2 slaves. A picture here would probably help but you’ll just have to visualize. 1, 2 & 3 are Servers, Mx is Master of x, Sx is Slave of x. So I had 1M1, 1S2, 1S3, 2S1, 2M2, 2S3, 3S1, 3S2, 3M3.

This worked, in that if server 1 died, it’s master role was taken over by either 2 or 3. And some nagios checks and handlers brought it back as the master once it was back online. Now I have no idea if this was really a good setup (I didn’t test it for long enough), but 1 of the problems I encountered was where the PHP sessions ended up. I (wrongly) thought the php session would work with the cluster to PUT and GET the data, so I gave it all 3 master addresses. Reading info on redis, if the server your asking doesn’t have the data it will tell you which does, so I thought it’s not really a problem if 1M1 goes down and the master becomes 2M1 because the other 2 masters will know so will say the data is over there. In manual testing this worked. but PHP sessions doesn’t seem to work with being redirected (and this is also a problem later).

So after seeing this as a problem, I thought maybe a cluster is a bit overkill anyway and simplifying it to 1 Master and 2 Slaves would be fine anyway.

I wiped out the cluster configuration and started again, but I knew I was also going to need sentinel this time to manage which is the master (I’d looked at it before, but went for cluster instead. Time to read up again).

After getting a master up and running and then adding 2 slaves. I pointed PHP Sessions to all 3 servers (again a mistake). I was hoping (again) that the handler would be sensible enough to connect to each and if it’s a slave (read only) detect that it can’t write and move to the next. It doesn’t. It happily writes errors in the logs for UNKNOWN.

So I need a way for the session handlers to also know which is the current master, and just use this.

My setup is currently 3 MariaDB/redis servers S1, S2 & S3 and 2 Nginx servers N1 & N2.

I decided to install redis-sentinel on all 5, with a quorum of 3. The important bit in my sentinel config is:

sentinel client-reconfig-script mymaster /root/SCRIPTS/redis-reconfigure.sh

and the redis-reconfigure.sh script:

#!/bin/bash
adddate() {
	while IFS= read -r line; do
		echo "$(date) $line"
	done
}

addrecord() {
	echo "## Auto Added REDIS-MASTER ##" &gt;&gt; /etc/hosts
	echo "$1 REDIS-MASTER" &gt;&gt; /etc/hosts
}

deleterecord() {
	sed -i '/REDIS-MASTER/d' /etc/hosts
}

# &lt;master-name&gt; &lt;role&gt; &lt;state&gt; &lt;from-ip&gt; &lt;from-port&gt; &lt;to-ip&gt; &lt;to-port&gt;
# $1 $2 $3 $4 $5 $6 $7

if [ "$#" -eq "7" ]; then
	if grep -q "REDIS-MASTER" /etc/hosts; then
		echo "Delete and Re-Add $6 REDIS-MASTER" | adddate &gt;&gt; /var/log/redis/reconfigure.log
		deleterecord
		addrecord "$6"
	else
		echo "Add $6 REDIS-MASTER" | adddate &gt;&gt; /var/log/redis/reconfigure.log
		addrecord "$6"
	fi
fi

Basically this script is run whenever the master changes (I need to add some further checks to make sure <role> and <state> are valid, but this is being done for quick testing.

I’ve then changed the session path:

session.save_path = "tcp://REDIS-MASTER:6379"

and again in the pool.d/www.conf:

php_admin_value[session.save_path] = "tcp://REDIS-MASTER:6379"

Quite simply I’m back to using a hosts entry which points at the master. but the good thing (I wasn’t expecting) was I dont have to restart php-fpm for it to detect the change in IP address.

It’s probably not the most elegant way of handling sessions, but it works for me. The whole point in this is to be able to handle spikes in traffic by adding more servers N3, N4, etc. and providing they are sentinels with the script they will each continue to point to the master.

I did think about writing this out as a step by step guide with each configuration I use, but the configs are pretty standard and as it’s a bit more advanced than just installing a single redis to handle sessions, I think the above info should really only be needed by anyone with an idea of what’s where anyway.

I still dont really get why the redis stuff for PHP doesn’t follow what the redis server tells it i.e the data is over there. I suppose it will evolve. If your programming up your own use of redis like the big boys then you’ll be programming for that. I feel like I’m doing something wrong or missing something, I’m certain I’m not the only one who uses redis for PHP sessions across multiple servers behind a load balancer, but there is very little I could find googling beyond those that just drop a single redis instance into place. As this becomes single point of failure it’s pretty bizarre to me that every guide seems to go this way.

If anyone does read this and actually wants a step by step, let me know in the comments.

Just to finish off I’ll give an idea of my current (it’s always changing) topology.

2 Nginx Load Balancers (1 active, 1 standby)

2 Nginx Web Servers with PHP (running syncthing to keep files in sync, I tried Gluster FS which everyone raves about, but I found it crippling so I’m still looking for a better solution).

3 Database Servers (running MariaDB in cluster setup and Redis in 1 Master, 2 Slaves configuration).

1 Monitoring Server (running nagios, checking pings, disk space, processes, ports, users, ram, cpu, MariaDB cluster status, Redis, DB & redis is accisble from Web Servers, vpns, dns and probably a load more).

Rasberry 1-Wire Resolution

More of a pastebin post.

https://www.raspberrypi.org/forums/viewtopic.php?f=37&t=115775

Is an important URL for changing the resolution on a DS18B20 via the Raspberry PI. Pay attention to the note re RPi2 and changing the code.

I also changed the GPIO pin to 4 to save rewiring but disabled the config.txt option for 1wire and disabled the gpio and thrm in /etc/modules just in case while running this program.

It seems to be working, now have the resolution set at 12bits on a new sensor I received that was working in .5c steps.

Ansible Part 1

I already have a Droplet for management so I’m going to be using this for Ansible and a new droplet to test some deployments. I’ve done the following on the server:

add-apt-repository ppa:rquillo/ansible
apt-get update
apt-get install ansible

I have no doubt that I will learn I’ve done bits wrong as I get further in, but I’m going to start with the following hosts config

[initial]
C3PO-1

[balancers]

[backends]

[databases]

[fileservers]

So the first thing I’m going to tackle is setting up new users. I’ve been playing with it for about an hour, although I got the user created very quickly I hit a problem with my setup. I need to send multiple ssh keys for some users (we use different keys on PC’s, Laptops, Mobiles). Every example I found seemed to a) want to pull a key from a file, b) just use one key.

After quite some playing, and trying different things I found a way. This in turn meant I had to slightly change the create user bit of the playbook.

Here’s the users.yml playbook so far

---
- hosts: all

  tasks:
    - name: Add Users from group_vars file
      action: user name={{ item.name }} password={{ item.password }} shell={{ item.shell }} state={{ item.state }} update_password=always
      with_items: users

    - name: Add SSH User Keys from group_vars files
      authorized_key: user={{ item.0.name }} key='{{ item.1 }}'
      with_subelements:
        - users
        - authorized

and uses a group_vars file (group_vars/initial)

---

users:
  - name: NAME
    password: HASHED_PASSWD
    authorized:
      - ssh-rsa SSH_KEY_1
      - ssh-rsa SSH_KEY_2
      - ssh-rsa SSH_KEY_3
      - ssh-rsa SSH_KEY_4
    shell: /bin/bash
    state: present

I did think about pulling the keys in from the authorized_keys files, but not all users are allowed on the management, so I’d have to keep files for them and if I’m going that far, I may as well just keep them in group_vars. It doesn’t look as nice if you cat the file but it’s structured and makes sense.

The last thing I want to do is set out add the user to some groups.

This was nice and easy, add the groups to the group_vars file

    groups: sudo,www-data

Then change the users.yml to add the groups

    action: user name={{ item.name }} password={{ item.password }} shell={{ item.shell }} state={{ item.state }} groups={{ item.groups }} update_password=always

I was worried that it may screw up the users own group as the manual says it delete’s all other groups except the primary, since I haven’t told it a primary using ‘group’ I thought it may be a problem, but thankfully it’s not this just worked.

Well I thought at this point I was pretty finished. Ha not a chance. I added a user for ansible to be able to connect as since I’ll be removing root ssh access. This means I’m going to need to let ansible sudo so I’d best sort that now. Here’s a bit of code I found and changed slightly. Make sure you change the username in the sudoers.d/ansible file

Appended to the end of users.yml

  - name: Ensure /etc/sudoers.d directory is present
    file: path=/etc/sudoers.d state=directory

  - name: Ensure /etc/sudoers.d is scanned by sudo
    action: lineinfile dest=/etc/sudoers regexp="#includedir\s+/etc/sudoers.d" line="#includedir /etc/sudoers.d"

  - name: Add ansible user to the sudoers
    action: 'lineinfile dest=/etc/sudoers.d/ansible state=present create=yes regexp="ansible .*" line="USERNAME ALL=(ALL) NOPASSWD: ALL"'

  - name: Ensure /etc/sudoers.d/ansible file has correct permissions
    action: file path=/etc/sudoers.d/ansible mode=0440 state=file owner=root group=root

On first run you’d use root to connect up. Then you would use

ansible-playbook users.yml -u USERNAME -s

All working. It’s taken about 2 hours to get to the point of deploying a couple of users automatically. I’m not so sure this has saved me time in the long run 🙂 but it’s the first step in a much bigger project. I’m kind of glad it wasn’t just copy and paste others code and stuff broke, it gave me a chance to understand a bit more.

Part 2 will be coming soon. There we’ll lock down SSHd and apply the default firewall rules.

New to Ansible

So If you read my last post (it was really long sorry), you’ll see right at the end the current deployment. I had tried a few managers to be able to deploy/scale the whole system, but it really overcomplicated the whole thing. Chef looked really good (I can’t remember the other one), but it was problematic and just didn’t suit.

Instead I kept with the scripts I had written for the time being. They are in no way good enough to share as they are very customised to my setup but they achieve what I need. However to run them takes quite a bit of initial manual work.

So what do I need from a system:-

  1. It has to just work, not go installing stuff it depends on to run.
  2. It has to be able to split the setup into an initial and running level.
  3. It has to be able to be told easily about a new server and what role it will be, then do the initial setup and onto the relevant running level.
  4. It MUST be simple to use and understand. I’m getting a bit sick of having to read through how to configure weird stuff because someone decided to do things completely differently to how you’d expect it to work.
  5. It must have very low resource requirements. I want to run this from a management droplet that already runs nagios, so it can’t sit there just eating resources while it has nothing to do.

I saw a very quick video on Ansible with someone using it for Raspberry PI’s (something else I have far too many of) and thought right away I should look into that. So here I am. I’m going to do some testing with Ansible.

Initially all I’m looking for it to do is handle the initial setup that I already have scripts for.

  1. Create a new user, add it to the sudo group, set a password and copy the SSH keys.
  2. Reconfigure sshd to deny root connections.
  3. Set the server’s timezone.
  4. Set the keypad option in nanorc.
  5. Set the .bashrc for root to use a red prompt.
  6. Install some basic packages such as screen and htop
  7. Create a swapfile
  8. Install and configure NTP
  9. Setup IPTables (I dont use UFW, I prefer to deploy an iptables file and have this restored when the loopback interface is loaded).

I’ve had a quick look on DigitalOcean community (I love the resources there), but the stuff about using Ansible seems a little more throw it all in one file rather than properly split them out like I saw in the video. I think splitting out each task is a must to be able to understand what’s going on and making changes.

The video I’m referring to is https://www.youtube.com/watch?v=ZNB1at8mJWY

So that’s the start, let’s get going and see what I can screw up

Part 1 is Here

How I started using WordPress

I should make it clear from the outset, this post isn’t going to be solving anything. I’ve spent about 3 days working on stuff and this is just bugging me.

Let’s go back to around this time last year. A couple of friends and I were working on how to get some money back from a facebook page one of them had setup a few years prior. I had been working with him on it since a few weeks after he set it up, and we’d pushed around ideas to sell some merchandise alongside it a few times, but never really got anywhere.

He had made a rash decision one night to use an ‘online website creation’ provider to get a site online. From the start I hated it. Not the idea, I think it was about time to get our own site running, but he spent a few weeks tweaking about 8 pages to look really good, using their WYSIWYG editor. Then wanted me to change a few things in code that wasn’t right. It was an absolute nightmare! I can’t remember the name of the site but I think it had a W E X somewhere in the title.

It was a paid “solution”, and I think cost about £40 per month by the time he’d added a mailing list option to capture email addresses (not actually handle any emails) and a few other bits.

Coming from more of an IT position, my main concern was around load/spikes being handled. There was very little information about how well they could handle this (and I think we found the reason why). We finally put this live posting the link to about 40k followers.

Watching the nice google analytics (that I had to add to each page, because they didn’t have a drop in the tracking code option), within seconds we hit around 150 hits per second. This continued for a few hours, but 2 problems became apparent:

  1. The site was struggling, and we would probably have been hitting a higher number. We were getting positive feedback and people understood it was busy, but I still wasn’t happy we’re paying for them to handle this and it’s just not.
  2. And this really hits onto point 1. He’d setup a site that was pretty static! There was nothing to get people coming back for more. Yes there was a news page that we could update, but other than the mailing list form there was nothing interactive. (So hitting point 1, static content should really have been able to be handled 100x better).

Anyway we took that for what it was, a basic site with a bit of info and something to get us started.

We already had a ‘Shop coming soon’ page, so the next thing we were trying to figure out was what are we going to sell and how?

Initially we thought t-shirts, and started looking at some of the ways we could do this. 2 main providers seemed to jump out zazzle and cottonpress (I think, it was a while ago). While both had some good offerings, neither really grabbed us. I can’t remember which, but one of them deleted an image we uploaded for copyright (it was our logo, and we had it plastered everywhere and the account was signed up with a domain name using the same logo), but they wanted us to fill in and snailmail/fax some copyright forms and reupload the logo. Considering we were only seeing what we could do at the time, we decided to drop them as an option. If we have to jump through these hoops with everything we do, we’ll spend more time filling in their paperwork than anything else.

Time went on, visits to the site died down (did I mention lack of content/interactivity), and we still hadn’t sorted out products, a store, a business.

We continued with the facebook page and still poked around ideas on how to get a site/shop running. I spent a good few weeks working with oscommerce (I’d previously used it a little for another project idea, but it never went live) and finally had something to show, a semi working shop front (it had no products).

We discussed that neither of us really had a clue about setting up a proper business. I’m all IT and have no interest in writing business plans or doing business meetings (I should mention that a previous role I was an IT Manager and regularly had to be part of “grown up” meetings, I’m a Tech, I hate people, I hate meetings, give me something broken – I’ll fix it, give me a problem – I’ll work out a solution. But in NO WAY do I want to be taking part in any more business meetings).

A few months later I was helping him move. Another friend of his was also helping. I’d met him once before but didn’t get chatting then. He mentioned he was in his last year of university and was studying business and finance! Just how he hadn’t thought of this before I dont know, but instantly we knew he was coming on board 🙂

We spent about 3 hours in McDonalds discussing what we had, what we’d like to do, and just how crap we’ve been so far. Within this conversation we said about selling t-shirts/mugs/bags etc. Just like a genii out of a lamp our 3rd comes out with ‘oh my almost father in-law does printing stuff, I know he does mugs. Shall I speak to him?’ Just like a match made in heaven, we suddenly had our missing piece! someone who should have more of an idea on the business side (or at least know someone to ask) and connections to a printers for the kind of stuff we want to sell. You just couldn’t make it up, he’d known this guy for a few years and never thought to asking him about business stuff.

Things started moving forward, slowly at first but at least they were moving. We met up with “almost” father in-law and went through some designs and processes. We setup a business. I continued work on the shop website and we took down the other that was costing too much and not really doing anything.

In around October time we were set. Nothing spectacular, about 9 mugs and a few t-shirts. The mugs would be the easiest, we just send the order to “almost” and he takes care of printing, packaging and sending. The t-shirts would be a little different as we’d have to get a template made for them, and couldn’t afford the cost until we had some orders in.

We launched, I had tried to over-spec the server(s) but in itself this was tricky. There were no real stats on how well oscommerce could perform on certain hardware. And scalable VPS’s such as DigitalOcean’s current system just didn’t exist. Scaling would mean taking a new 30 day server and moving everything over to it. Certainly not a 5 min job, and definitely not something to start an hour after we’ve launched. We’ll just have to bite the bullet and see.

My memory of launch night is fuzzy to say the least. I think I’d been working 36-48 hours trying to finish stuff off. I had a big list of checks and can’t remember doing half of them.

Our page audience had grown to about 70k, so I was very nervous. We launched the shop and watched. ping, email – it’s an order, ping, another, ping another. It was working. I have to say the server(s) held up pretty well. It wasn’t without problems, we did start seeing the site timing out on new connections for about 10 mins. but a swift kick of apache sorted that and it didn’t cause a problem again.

Finally we were running. The feedback again was good. We had a bunch of concerns like

  1. Will the system work
  2. Will the servers(s) hold up
  3. What happens if it goes mental and we sell a thousand mugs
  4. Can we really do this

I think all in all it went well. We could have done better, but it also could have been a lot worse. We ran with oscommerce for a few months. Shortly after launching the shop I had a discussion on just how were we going to get facebook and the shop incorporated? There was no obvious answer, then we hit a problem. Once of the posts to facebook got reported (it’s a humorous page, and we only every post stuff sent in to us), this showed us just how much we’re reliant on facebook. Suddenly we’re all logged out and the poster was blocked for 24 hours. Luckily facebook pretty much left the page alone (just delete the one post), so we played on it that one of our admins was in the dog house for an earlier post. But it still didn’t take long to realise if facebook wanted they can delete the page at a whim and we’ve suddenly lost all our content and fans!

This just didn’t sit right with me, and I started looking at how the hell to get a backup of OUR page and content. There was nothing. So I started looking at how can we do things differently, enter WORDPRESS.

I’d seen this name floating about for the last few months, but never really saw the point in using it. I dont blog, we dont blog, so what’s the point? (I’m still not entirely sure I understand the point) but it’s close to one of the best things I’ve spent weeks fiddling with.

I’d installed wordpress on our VPS for me to have a look around, it still wasn’t a site that we could really use, but as a CMS maybe I can find a way to connect to facebook and backup our stuff. There must be people who do this right? WRONG. There’s loads of plugins for wordpress & facebook, but I’ve only ever found 1 that takes your page and puts it as posts. To make matters worse, it’s flakey as f**k hadn’t been worked on in god knows how long, and of the very few comments in the code their in Chinese.

Now I would never describe myself as a coder. I’ve used Delphi and VB for writing some functional programs in the past, and had to program a few in VB.net when I was IT Manager (the old problem/solution thing), I could also write some ASP and PHP, really most of my stuff was drag and drop boxes and program them up for stuff. I did quite a bit of databases stuff within them, but that was it. There was absolutely no such thing as using classes (I think even think they existed). But as part of my job I had a dev team who did develop in PHP and VB.net, and they were always amazed when trying to tell me how something couldn’t work, that I could not only follow along but tell them why they were wrong and on several occasions when something broke could actually read their code and work out (normally the simple thing) a temporary fix.

And so it begins I now have no dev team, a bunch of php code and classes that really didn’t make much sense to me. Bit by bit I managed to work out what each bit was doing, then moved on to changing it so that it would run for us. I know it will seem like simple stuff (especially looking back), but things like:

  • Changing a hard coded loop to only pull 10 facebook posts, to take a limit from a setup interface where you can specify how many to pull.
  • Adding in Date ranges to pull from and to.
  • Improving the cron job, so it look from the time of the lastest post it was + 1 second.
  • Downloading any attached image and saving it to the server (huge accomplishment).
  • Changing the posts content that gets published and updating links back to the post it just posted.

There’s load of stuff I’ve had to do to this plugin to get it working, let alone better and working for us. Eventually I’d finished (your never really finished, I have a list of new changes to get to sometime). After running it on a fresh wordpress install, we suddenly have a complete backup of our facebook page, around 9k posts and images, all sat in wordpress. and what’s more automatically grabbing new stuff 🙂

I showed off my new achievement, personally I dont think it was appreciated just how much time and effort I had put into this. but it went down really well. We now had a blog! we now had a blog alongside our store and it was really starting to come together.

Over the next few weeks I kept working on improving the blog while managing the shop. Then suddenly a new disaster, our server had for some reason gone offline. Trying to connect via the backup terminal access just gave me a blank screen, something was wrong and I couldn’t get access to see what. To make matters worse, our provider had very nicely decided to cut back on it’s 24/7 support, and now only operated 8-8. At around midnight, it’s not exactly what you want to be finding out that the support has changed and no-one bothered to tell you. The ONLY thing I could do was email them, and hope someone picked it up soon. They didn’t! I spent a good few hours trying everything I could think of to get hold of someone or find a way to the console, but nothing. This had the effect of making me sleep through my alarm at 8am, but I woke at 9am and called them. After a few choice words, I was assured the tech team would look at it right away. I was so tired I fell back to sleep and woke again about mid day. The first thing I did was open our site, or I should say attempted to! it was still down!! Another call, more choice words, and me advising them I’m not going anywhere and if they cut me off I’ll just keep calling until I can speak to a “Tech”, then explaining the problem and what I had tried to tier 1 monkey, quickly got me escalated. I couldn’t stop laughing when I finally did get to tier 2, their tier 1 had placed me on hold to get someone then come back to me and said ‘I need you to take this call, this bloke knows what the f**k he’s on about, I’d put him to tier 3 but I can’t direct transfer’ to which I replied ‘Yes and I know how to work a phone system, Line 1 is the customer, you should be on Line 2 for that conversation’ 🙂 I have to be honest just that mistake made my day. Tier 1, 2 and the manager that called me back an hour later were mortified, but as I explained to him I’ve been a senior tech on phone support, I’ve been an IT manager, I’m guessing I hit a newish person and scared the crap out of them. I only care in getting this back online. To be fair to Tier 2, I was connected to the console while he was apologising. (This part really could have had it’s own post).

Anyway getting over that failure I started looking for another VPS provider, I had no problem with their VPS and generally it was a very stable system, but 8-8 support with no out of hours we’re really f**ked option, forced my hand. It had gone down very badly with the others that this had costs us money and there was no way I could argue it as I agreed the situation was crap.

I found another provider and started moving stuff, but it just wasn’t right. It was actually a previous colleagues company, but something just wasn’t right. So I kept looking. Then I found Digital Ocean, initially I started using them to test some wordpress plugins, but I loved that I could very quickly bring up new servers in a matter of minutes. This surely has to be better than waiting hours. And it was. Testing was going well, so I started moving everything over. Everything just worked, and where I had to contact their support for a few little things (1 account related can’t remember the others), I had a reply very quickly sometimes within minutes other times within 30 mins. I couldn’t fault their support and I wasn’t bounced around, they knew exactly what I needed and sorted it.

So here we have a medium spec’d Digital Ocean server, running our WordPress and OsCommerce solutions and handling both pretty well.

But being one to never settle, I kept tweaking stuff and looking at out options. I setup another server (droplet) for testing, another wordpress install later and I’m going through trying out the ecommerce plugins. I was blown away with WooCommerce! yes OsCommerce worked for us. and yes I had put in quite a bit of time customising it and getting it to work with our processes. but the whole feel of the interface was crap. Woocommerce was like a breathe of fresh air. It had a bunch of functionality, there’s loads more plugins, it’s far easier to customise, it works from the wordpress themes, and fits right in with our blog and doesn’t look disjointed.

I proposed we move over to this and it went down well. Well enough infact the the others wanted to get more involved, we spent weeks working on changes to the theme (that’d we’d paid about $50 for), I moved the shop over and made it live without telling our facebook audience. We started getting some sales via Woocommerce, and it was obvious that this just integrated well.

We were going to have a relaunch to show off the new blog and store, I think I managed to p*ss the others off, when WordPress brought in a new standard theme that worked even better with Woocommerce and I changed to it to show them. It was obvious that it did and we should stick with it, but it also meant the last few weeks of customising was wiped out (and they still bring up the time I wiped out a few weeks work when I changed the theme).

I would never say wordpress/woocommerce is perfect, I’ve found many issues along the way and had to find work arounds for a lot of stuff. I still dont truly feel like I know what I’m doing and there’s no way that we use wordpress to it’s full potential. Currently we have the blog and shop running, we have somewhere in the region of 10k posts and around 15k sales. We still don’t publish to the blog independent of facebook/twitter but it’s on the roadmap.

One thing that has caught us out a few times is DigitalOcean scaling. Because we very often have little traffic, I always keep the servers scaled down with the intention of boosting them up before we push anything new. On at least 2 occasions, we’ve forgotten this and overloaded our site.

I’ve also gone through a few different configurations just trying to find the best solution.

1st We had 1 server, that was mid range and just worked, but I knew this alone wouldn’t handle the traffic.

2nd I brought up 2 web servers and a database server. This wasn’t an ideal setup, loadbalancing was at DNS level, syncing was done via cron jobs, and the whole thing held together via a VPN to keep database connections secure. This had a bunch of problems.

Next I moved back to a single web server but kept a separate database. This was better, and around the time DigitalOcean allowed you to scale up easier (but not down you still had to wipe out the server to do so).

Because having a single web server just wasn’t enough, I went back to 2, but added in the new(ish) cloudflare CDN in front of the servers. This really helped (though I’m still not convinced really does CDN for us)

As part of the above, I tried incorporating GlusterFS (absolute disaster). From every web search I did GlusterFS looks to be THE solution. In practice for us it took a website responding (with some heavy  graphics) in 3 seconds longest avg 2secs, to 30sec longest 18secs avg. I know everyone rave’s on about how great it is and how if it’s slow it’s something you’ve done. I dont believe this for a second. I’ve spent days at a time trying to make it better, but the simple truth is if the files are pulled locally I get the 2/3secs above. When using a Gluster mount point to the files (which are still actually local, Gluster on both web servers, mounted back to themselves), I get the 18-30secs. Both web servers have a private lan connection to use gluster in the same Digital Ocean Location and NO amount of tweaking or testing seems to every really improve this. It was only made worse during testing when I took down one of the servers, so that the other could only use itself to serve the files and this managed to take out the mountpoint until I restart, and still it served up the pages slowly. I thought the whole point in using Gluster (at least for me) was HA, no single point of failure. Having both servers offline if one goes down does not seem very HA to me.

The ONE thing I really want Digital Ocean to sort out is their private lan. In order to solve the issue of anyone else on the private lan being able to see my traffic between servers I’ve had to use VPN’s between them. This adds complications to the entire setup, and a private lan per account would be very welcomed.

The setup I’m currently in the middle of deploying is:

a) Cloudflare

b) 2xNginx loadbalance proxies (also serve up maintence pages if they can’t connect back.

c) 2xNginx Backend servers

d) 2xMySql+Redis Servers

e) 2xNFS Servers

I’m happy with the load balancers, though I would love for DigitalOcean to offer a proper loadbalancing solution.

The MySQL servers took some config to get replication working properly while also using SSL for the connection to each other and from the backend web servers.

I still haven’t managed to configure MySQL to be HA from the web servers, so at the moment this would be a manual switch. I’ve found HyperDB for wordpress, while should resolve this, but since I had to slightly change the wordpress config to do SSL for MySQL, HyperDB doesn’t seem to be able to use SSL so I need to work out how to do this. I find this really weird as once of the suggestions is to have your database remote, I really would have thought being remote (especially if using something like Amazon for the database) would mean you’d want to use SSL to keep your database traffic secure. It seems strange that this isn’t a fundamental option in HyperDB (unless I’m just not seeing it).

And the last part NFS Servers, I still need to find out how to keep these in sync (without using Gluster), I’ve previously used Syncthing to keep servers in sync, it works but is pretty much held together with tape (my configuration of it not the actual program). Once I have the NFS sync’d I also need to find a way for the web servers to use both HA.

I do feel like this configuration is the closest to the best I can achieve on a budget. Once I have the MySQL and NFS stuff worked out, I will then be able to scale any server without completely taking the site offline. Which will really help in being able to deal with spikes. It is not much easier to scale with Digital Ocean, but I’d still really want to know doing so or taking a server out for maintenance is fine because everything will just keep running.

If you’ve got this far, I really thank you for reading. I hope the next couple of posts will be my solutions to the MySQL SSL and NFS problems. It’s not 2:41am and I think I’ve been writing this for about 2 hours, so I’m going to sleep 🙂 leave a comment if you got this far, include the words ‘sleep deprived’ so I dont think it’s spam.

Setting Featured image on WordPress Posts in BULK

I’m in the middle of changing another site to a new theme. The problem being the front page uses the feature image from each post to build up the display. None of the posts have a feature image set. It uses a facebooktowordpress plugin (heavily customised the original no longer pull the images, and really couldn’t handle pulling what was needed).

With this plugin each post on facebook is posted to the blog, the first image (this need changing to all images) on the post is downloaded to the wordpress server and then the source of the post uses the local image. But at no point did I ever anticipate needing featured images to be set.

So here’s the code I’ve put together to:

  • Pull a list of posts without a thumbnail/featured image set.
  • Pull the contents of each post in the list and look for img src
  • Download the image
  • Add the image to the media library against the current post
  • Pull the ID of the attachment and set it as the featured image

This is meant again to run from the command line NOT as a plugin.

<?php
 $counter=0;
 $limit = 20;
 $updated=0;

 if( php_sapi_name() !== 'cli' ) {
 die("Meant to be run from command line");
 }

 function find_wordpress_base_path() {
 $dir = dirname(__FILE__);
 do {
 //it is possible to check for other files here
 if( file_exists($dir."/wp-config.php") ) {
 return $dir;
 }
 } while( $dir = realpath("$dir/..") );
 return null;
 }

 define( 'BASE_PATH', find_wordpress_base_path()."/" );
 define('WP_USE_THEMES', false);
 global $wp, $wp_query, $wp_the_query, $wp_rewrite, $wp_did_header;
 require(BASE_PATH . 'wp-load.php');
 echo "Site URL: " . get_site_url() . "\n\r";
 echo "Base: " . find_wordpress_base_path()."/\n\r";
 echo "Upload DIR: " . wp_upload_dir()['path'] . "\n\r";
 echo "Posts: " . wp_count_posts()->publish."\n\r";

 $query = array (
 'posts_per_page' => -1,
 'post_type' => 'post',
 'meta_key' => '_thumbnail_id',
 'meta_compare' => 'EXISTS',
 );
 $my_query = new WP_Query($query);
 $posts_with_thumbs = $my_query->post_count;
 echo "Posts with thumbnail_id: " . $posts_with_thumbs . "\n\r";

 $query = array (
 'posts_per_page' => -1,
 'post_type' => 'post',
 'meta_key' => '_thumbnail_id',
 'meta_compare' => 'NOT EXISTS',
 );
 $my_query = new WP_Query($query);
 $posts_without_thumbs = $my_query->post_count;
 echo "Posts without thumbnail_id: " . $posts_without_thumbs . "\n\r";

 $counter = 0;
 while(( $my_query->have_posts() ) and ($counter < $limit)) {
 $my_query->the_post();
 echo "\n\r\n\r";
 echo "Post ID: " . $my_query->post->ID . "\n\r";
 echo "Post Title: " . $my_query->post->post_title . "\n\r";
 $content = $my_query->post->post_content;
 $sub="";
 $video_id="";
 $sub_image=flase;
 $sub_changed=false;
 $sub_youtube=false;

 if ( strpos($content, 'img src') !== false ) {
 $re = "/<img.*?src='([^\"]*)'.*\/>/i";
 preg_match_all($re, $content, $matches);
 $sub = $matches[1][0];
 echo "Image URL: " . $sub . "\n\r";
 $sub_image=true;
 if ( substr( $sub,0,11 ) === "/wp-content" ) {
 $sub = get_site_url() . $sub;
 echo "Real URL: " . $sub . "\n\r";
 $sub_changed=true;
 }
 }

 if ( $sub == "" ) {
 if (strpos($content, 'youtube') !== false) {
 $re = "/\[[{embed\}]]([^\"]*)\[\/embed\]/i";
 preg_match_all($re, $content, $matches);
 $sub = $matches[1][0];
 echo "Youtube Detected. URL: " . $sub . "\n\r";
 $re = "/watch\?v=([^\"]*)\[\/embed\]/i";
 preg_match_all($re, $content, $matches);
 $video_id = $matches[1][0];
 $sub = "http://img.youtube.com/vi/". $video_id . "/hqdefault.jpg";
 echo "Youtube Thumbnail: " . $sub . "\n\r";
 $sub_youtube = true;
 }
 }

 echo "Content: " . $content . "\n\r";

 if ( ($sub_image === true) || ($sub_youtube === true) ) {
echo "Image or Youtube detected!\n\r";
 if ($sub_image === true) {
echo "Image!!\n\r";
 $media = media_sideload_image($sub, $my_query->post->ID, $my_query->post->post_title);
 } elseif ($sub_youtube === true) {
echo "Youtube!!\n\r";
 // Download file to temp location
 $tmp = download_url( $sub );
 // Set variables for storage
 // fix file filename for query strings
 preg_match('/[^\?]+\.(jpg|JPG|jpe|JPE|jpeg|JPEG|gif|GIF|png|PNG)/', $sub, $matches);
// $file_array['name'] = basename($matches[0]);
 $file_array['name'] = $video_id . ".jpg";
 $file_array['tmp_name'] = $tmp;
 // If error storing temporarily, unlink
 if ( is_wp_error( $tmp ) ) {
 @unlink($file_array['tmp_name']);
 $file_array['tmp_name'] = '';
 }
 // do the validation and storage stuff
 $media = media_handle_sideload( $file_array, $my_query->post->ID, "YouTube: " . $my_query->post->post_title );
 // If error storing permanently, unlink
 if ( is_wp_error($media) ) {@unlink($file_array['tmp_name']);}
 }

 if(!empty($media) && !is_wp_error($media)){
 echo "File Downloaded!\n\r";
 $args = array(
 'post_type' => 'attachment',
 'posts_per_page' => 1,
 'post_status' => 'any',
 'post_parent' => $my_query->post->ID,
 );

 $attachments = new WP_Query($args);
 while( $attachments->have_posts() ) {
 echo "Attachment ID: " . $attachments->post->ID . "\n\r";
 set_post_thumbnail($my_query->post->ID, $attachments->post->ID);
 if ($sub_image) {
 $atturl = wp_get_attachment_url($attachments->post->ID);
 $atturl = preg_replace("(^https?:)", "", $atturl );
 echo "Attachment URL: " . $atturl . "\n\r";
 if ($sub_changed) {
 $sub = str_replace(get_site_url(), "", $sub);
 }
 $newcontent = str_replace($sub,$atturl,$content);

 if ($newcontent != $content){
 $update_post = array(
 'ID' => $my_query->post->ID,
 'post_content' => $newcontent,
 );
 wp_update_post($update_post);
 }
 $updated++;
 break;
 }
 }
 echo "\n\r";
 }
 }
 $counter++;
 }
 echo "With Thumbs: " . $posts_with_thumbs . " and " . $posts_without_thumbs . " Without.\n\r";
 echo "Updated: " . $updated . " of " . $counter . "\n\r";
?>

BE WARNED this is designed to make changes to your wordpress posts. The usual about taking backups is a MUST. both of the database and your wordpress www folder.

  • The “Limit” is set low, you’ll need to adjust this as needed.
  • Within the script there is
[[{embed\}]]

You will need to change this to

[ embed\ ]

WITHOUT the spaces, wordpress decided I was trying to embed something so I can’t paste it without a slight change.

  • I’ve tried to keep it as generic as possible, so there shouldn’t be any mention of my site in forced links or searches. When it searches for an image it also looks to see if it’s local to the site using /wp-content instead of a full url, it should get around this problem using get_site_url() but you could force your domain or a different path if needed here.
  • It also looks for any youtube content in the post and tries to pull the relevant image. I just wish it would pull one with a play button on it (I dont want to use CSS to get around this)
  • It handles the youtube download differently (this was made over a few days fixing problems as they come up) but I think it’s needed to give each image a unique filename.
  • You will have to run several times probably increasing the limit each time, on 2nd run it again checks all the previous ones that it couldn’t get an image for, so you should end up with 0 updated of x where x is probably all your posts without an image. I’ll be updating it to use a/(random of a few) image relevant to the site so everything has a featured image.