Setting Featured image on WordPress Posts in BULK

I’m in the middle of changing another site to a new theme. The problem being the front page uses the feature image from each post to build up the display. None of the posts have a feature image set. It uses a facebooktowordpress plugin (heavily customised the original no longer pull the images, and really couldn’t handle pulling what was needed).

With this plugin each post on facebook is posted to the blog, the first image (this need changing to all images) on the post is downloaded to the wordpress server and then the source of the post uses the local image. But at no point did I ever anticipate needing featured images to be set.

So here’s the code I’ve put together to:

  • Pull a list of posts without a thumbnail/featured image set.
  • Pull the contents of each post in the list and look for img src
  • Download the image
  • Add the image to the media library against the current post
  • Pull the ID of the attachment and set it as the featured image

This is meant again to run from the command line NOT as a plugin.

<?php
 $counter=0;
 $limit = 20;
 $updated=0;

 if( php_sapi_name() !== 'cli' ) {
 die("Meant to be run from command line");
 }

 function find_wordpress_base_path() {
 $dir = dirname(__FILE__);
 do {
 //it is possible to check for other files here
 if( file_exists($dir."/wp-config.php") ) {
 return $dir;
 }
 } while( $dir = realpath("$dir/..") );
 return null;
 }

 define( 'BASE_PATH', find_wordpress_base_path()."/" );
 define('WP_USE_THEMES', false);
 global $wp, $wp_query, $wp_the_query, $wp_rewrite, $wp_did_header;
 require(BASE_PATH . 'wp-load.php');
 echo "Site URL: " . get_site_url() . "\n\r";
 echo "Base: " . find_wordpress_base_path()."/\n\r";
 echo "Upload DIR: " . wp_upload_dir()['path'] . "\n\r";
 echo "Posts: " . wp_count_posts()->publish."\n\r";

 $query = array (
 'posts_per_page' => -1,
 'post_type' => 'post',
 'meta_key' => '_thumbnail_id',
 'meta_compare' => 'EXISTS',
 );
 $my_query = new WP_Query($query);
 $posts_with_thumbs = $my_query->post_count;
 echo "Posts with thumbnail_id: " . $posts_with_thumbs . "\n\r";

 $query = array (
 'posts_per_page' => -1,
 'post_type' => 'post',
 'meta_key' => '_thumbnail_id',
 'meta_compare' => 'NOT EXISTS',
 );
 $my_query = new WP_Query($query);
 $posts_without_thumbs = $my_query->post_count;
 echo "Posts without thumbnail_id: " . $posts_without_thumbs . "\n\r";

 $counter = 0;
 while(( $my_query->have_posts() ) and ($counter < $limit)) {
 $my_query->the_post();
 echo "\n\r\n\r";
 echo "Post ID: " . $my_query->post->ID . "\n\r";
 echo "Post Title: " . $my_query->post->post_title . "\n\r";
 $content = $my_query->post->post_content;
 $sub="";
 $video_id="";
 $sub_image=flase;
 $sub_changed=false;
 $sub_youtube=false;

 if ( strpos($content, 'img src') !== false ) {
 $re = "/<img.*?src='([^\"]*)'.*\/>/i";
 preg_match_all($re, $content, $matches);
 $sub = $matches[1][0];
 echo "Image URL: " . $sub . "\n\r";
 $sub_image=true;
 if ( substr( $sub,0,11 ) === "/wp-content" ) {
 $sub = get_site_url() . $sub;
 echo "Real URL: " . $sub . "\n\r";
 $sub_changed=true;
 }
 }

 if ( $sub == "" ) {
 if (strpos($content, 'youtube') !== false) {
 $re = "/\[[{embed\}]]([^\"]*)\[\/embed\]/i";
 preg_match_all($re, $content, $matches);
 $sub = $matches[1][0];
 echo "Youtube Detected. URL: " . $sub . "\n\r";
 $re = "/watch\?v=([^\"]*)\[\/embed\]/i";
 preg_match_all($re, $content, $matches);
 $video_id = $matches[1][0];
 $sub = "http://img.youtube.com/vi/". $video_id . "/hqdefault.jpg";
 echo "Youtube Thumbnail: " . $sub . "\n\r";
 $sub_youtube = true;
 }
 }

 echo "Content: " . $content . "\n\r";

 if ( ($sub_image === true) || ($sub_youtube === true) ) {
echo "Image or Youtube detected!\n\r";
 if ($sub_image === true) {
echo "Image!!\n\r";
 $media = media_sideload_image($sub, $my_query->post->ID, $my_query->post->post_title);
 } elseif ($sub_youtube === true) {
echo "Youtube!!\n\r";
 // Download file to temp location
 $tmp = download_url( $sub );
 // Set variables for storage
 // fix file filename for query strings
 preg_match('/[^\?]+\.(jpg|JPG|jpe|JPE|jpeg|JPEG|gif|GIF|png|PNG)/', $sub, $matches);
// $file_array['name'] = basename($matches[0]);
 $file_array['name'] = $video_id . ".jpg";
 $file_array['tmp_name'] = $tmp;
 // If error storing temporarily, unlink
 if ( is_wp_error( $tmp ) ) {
 @unlink($file_array['tmp_name']);
 $file_array['tmp_name'] = '';
 }
 // do the validation and storage stuff
 $media = media_handle_sideload( $file_array, $my_query->post->ID, "YouTube: " . $my_query->post->post_title );
 // If error storing permanently, unlink
 if ( is_wp_error($media) ) {@unlink($file_array['tmp_name']);}
 }

 if(!empty($media) && !is_wp_error($media)){
 echo "File Downloaded!\n\r";
 $args = array(
 'post_type' => 'attachment',
 'posts_per_page' => 1,
 'post_status' => 'any',
 'post_parent' => $my_query->post->ID,
 );

 $attachments = new WP_Query($args);
 while( $attachments->have_posts() ) {
 echo "Attachment ID: " . $attachments->post->ID . "\n\r";
 set_post_thumbnail($my_query->post->ID, $attachments->post->ID);
 if ($sub_image) {
 $atturl = wp_get_attachment_url($attachments->post->ID);
 $atturl = preg_replace("(^https?:)", "", $atturl );
 echo "Attachment URL: " . $atturl . "\n\r";
 if ($sub_changed) {
 $sub = str_replace(get_site_url(), "", $sub);
 }
 $newcontent = str_replace($sub,$atturl,$content);

 if ($newcontent != $content){
 $update_post = array(
 'ID' => $my_query->post->ID,
 'post_content' => $newcontent,
 );
 wp_update_post($update_post);
 }
 $updated++;
 break;
 }
 }
 echo "\n\r";
 }
 }
 $counter++;
 }
 echo "With Thumbs: " . $posts_with_thumbs . " and " . $posts_without_thumbs . " Without.\n\r";
 echo "Updated: " . $updated . " of " . $counter . "\n\r";
?>

BE WARNED this is designed to make changes to your wordpress posts. The usual about taking backups is a MUST. both of the database and your wordpress www folder.

  • The “Limit” is set low, you’ll need to adjust this as needed.
  • Within the script there is
[[{embed\}]]

You will need to change this to

[ embed\ ]

WITHOUT the spaces, wordpress decided I was trying to embed something so I can’t paste it without a slight change.

  • I’ve tried to keep it as generic as possible, so there shouldn’t be any mention of my site in forced links or searches. When it searches for an image it also looks to see if it’s local to the site using /wp-content instead of a full url, it should get around this problem using get_site_url() but you could force your domain or a different path if needed here.
  • It also looks for any youtube content in the post and tries to pull the relevant image. I just wish it would pull one with a play button on it (I dont want to use CSS to get around this)
  • It handles the youtube download differently (this was made over a few days fixing problems as they come up) but I think it’s needed to give each image a unique filename.
  • You will have to run several times probably increasing the limit each time, on 2nd run it again checks all the previous ones that it couldn’t get an image for, so you should end up with 0 updated of x where x is probably all your posts without an image. I’ll be updating it to use a/(random of a few) image relevant to the site so everything has a featured image.

WordPress Posts without Featured Thumbnail Count

I have another script I’ve been building up to download the first  embedded image in a post, add it to the media library, attach it to the post and set it as featured image.

Here’s some quick code to query how many posts do and do not have a featured image

<?php
 if( php_sapi_name() !== 'cli' ) {
 die("Meant to be run from command line");
 }

 function find_wordpress_base_path() {
 $dir = dirname(__FILE__);
 do {
 //it is possible to check for other files here
 if( file_exists($dir."/wp-config.php") ) {
 return $dir;
 }
 } while( $dir = realpath("$dir/..") );
 return null;
 }

 define( 'BASE_PATH', find_wordpress_base_path()."/" );
 define('WP_USE_THEMES', false);
 global $wp, $wp_query, $wp_the_query, $wp_rewrite, $wp_did_header;
 require(BASE_PATH . 'wp-load.php');
 echo "Base: " . find_wordpress_base_path()."/\n\r";
 echo "Upload DIR: " . wp_upload_dir()['path'] . "\n\r";
 echo "Posts: " . wp_count_posts()->publish."\n\r";

 $query = array (
 'posts_per_page' => -1,
 'post_type' => 'post',
 'meta_key' => '_thumbnail_id',
 'meta_compare' => 'EXISTS',
 );
 $my_query = new WP_Query($query);
 $posts_with_thumb = $my_query->post_count;
 echo "Posts with thumbnail_id: " . $posts_with_thumb . "\n\r";

 $query = array (
 'posts_per_page' => -1,
 'post_type' => 'post',
 'meta_key' => '_thumbnail_id',
 'meta_compare' => 'NOT EXISTS',
 );
 $my_query = new WP_Query($query);
 $posts_without_thumb = $my_query->post_count;
 echo "Posts without thumbnail_id: " . $posts_without_thumb . "\n\r";
?>

It’s made to run from the terminal NOT as a wordpress plugin, just place it in the root wordpress folder and run with php <filename>

It will output something like

Base: /var/www/{folder}/
Upload DIR: /var/www/{folder}/wp-content/uploads
Posts: 8547
Posts with thumbnail_id: 6824
Posts without thumbnail_id: 1723

Took a few seconds to return, it’s probably not the best way of doing it, but worked for what I wanted. I’d like to give credit for each bit of code but really no idea what I got the different bits from.

GlusterFS woes

If your looking for the gluster error ‘brick2.mount_dir not present’ 
jump to the end

Time for another post 🙂

I’ve been using DigitalOcean for some time now, and I’m still tweaking my setup. Once thing I really hope they sort soon is proper private lan between your own droplets, for now we just have to use a vpn between them.

Being responsible for a new website can give a lot of headaches, especially when you have to try to guess just how popular it will be. So about a year ago I setup a new droplet to host the new site, testing it was going well and I increased the droplet before we launched to handle a spike. Sadly I underestimated just how busy it would get, based on the numbers I was given I think I was about right but unfortunately those numbers were way off.

But each failure it just another learning curve 🙂 so as quickly as possible that was fixed, then the site got to normal volume so we scaled it back down (yes it’s a whole cost exercise especially when your paying for it). Then we had the lead up to Christmas, in an attempt to not repeat the problems at launch, I changed the whole configuration so that I could (if needed) take a server down while staying partly operational. This kind of worked and was needed when some brightspark promoted the site a day early and we hadn’t scaled back up!

Come the new year I decided it was time to seriously sort the infrastructure for the site. It now has an online store and it’s important it keep running, it’s not just a blog anymore. So I put in place the following setup (working around various obstacles).

DNS:

  • All websites name servers are pointing to cloudflare and they handle the first web connection. It works really well on their free tier, and changes (adding new servers) are pretty quick to take effect.

DigitalOcean Droplets:

  • 1x Server running as a load balancer.
  • 2x Servers running as webservers.
  • 1x Server running as database server.
  • 1x Server running as email (not quite running).
  • At the same time as making this setup I decided to ditch apache and move to nginx, so loadbalancer and webservers are running that.

Software:

  • 4x Nginx (loadbalancer and webservers, and installed on database server for stats).
  • 2x Syncthing (webservers) to keep the www folders in sync.
  • 1x MySQL Server.
  • 4x OpenVPN (connections between loadbalancer and webservers, webservers and database).
  • 1x Redis Server (for session data, I tried nginx load balancing options but it still screwed up if I had to take one of the web servers out for maintenance, so installed Redis on the Database server.

As this progressed I dropped the VPN between loadbalancer and webservers and just use HTTPS/SSL instead. Syncthing already has it’s own SSL built-in so I could leave that over the semi-private LAN. but I really would like to change MySQL to be encrypted and drop the vpn from that too, but find info on doing this for wordpress seems to be non-existent at the moment.

Roll forward a few months, this has been working but still has areas to resolve. Such as syncthing: yes it keeps the folders in sync and is actually really good that I can also store them on another system easily. but it doesn’t listen to the OS for changes to files. Instead it polls every x seconds. Although there’s nothing much changing, updating plugins became a problem if you click the update button it downloads but then nginx sends your next request to another server and now the plugin.zip isn’t there so wordpress throws an error.

My whole reason for running syncthing was I wanted the files to be available on each server independently. So if Server A goes down it doesn’t matter Server B has all the files locally anyway. NFS would still give me a single point of failure. On looking into resolving this though I remembers GlusterFS. I’d played with it a long time ago, but dropped it as a solution (can’t remember what I was doing or why it wasn’t working). Now it’s time to try it again. downside I’m back to needing VPN’s and OpenVPN isn’t the easiest to quickly add a new server.

So I’ve done the following:

  • Added a new server just for the files (I don’t like gluster being in a 2 replica incase there’s a problem, there should be a majority who thinks they are holding the correct file).
  • Swapped out OpenVPN for Tinc, I have to say one of the best decisions. yes there are downsides, it creates a mesh (only doable with OpenVPN by running quagga for manually forcing routes) but I have no idea which Server is actually connected to which Servers. There’s no VPN status and I can’t see how much traffic has gone between 2 particular servers (iptables helps but it’s not 100%)
  • Added another new server for nagios and central logging.

There were a load of changes within a few weeks of each other, but I now have a setup I’m confident I can scale more quickly than ever before. Yes it has single points (load balancer, mysql) but I know if the load balancer has a problem it’s pretty static so can be wiped and redeployed quickly, as well as it will take a few minutes to open the webservers to the world and let cloudflare hit them directly. So MySQL is the real problem and I’ll be addressing that one soon enough.

 

So now onto today’s problem 🙂

I’ve had gluster running a few weeks, and I have our testing website (for theme changes etc) setup on our webservers behind the loadbalancer. The last few days I’ve need to do more extensive testing than just changing bits in a theme, so I’ve decided to split the tester site onto it’s own droplet (still behind the loadbalancer and with VPN to the databases). I thought I may as well make use of Gluster here too (yes it would be in a 2 replica setup itself and the fileserver. I don’t like that idea). So I brought up a new server and configured it: new users, firewall rules, tinc, nginx, php, etc.

I added gluster and copied the /etc/hosts entries over from the other servers. All looked good. I gluster peer probe ServerX and it worked, gluster peer status and I could see it fine. but on trying to add a new volume:

gluster volume create xxx-yyy-zzz replica 2 transport tcp FILESERVER:/GLUSTER/xxx.yyy-zzz TESTSERVER:/GLUSTER/xxx.yyy-zzz force

I was getting the error:

volume create: xxx-yyy-zzz: failed: Commit failed on localhost. Please check the log file for more details.

Checking the logs on both servers would show (maybe slight variation):

[2015-07-28 16:00:41.612907] E [glusterd-hooks.c:328:glusterd_hooks_run_hooks] 0-management: Failed to open dir /var/lib/glusterd/hooks/1/create/pre, due to No such file or directory
[2015-07-28 16:00:41.614499] E [glusterd-volume-ops.c:1811:glusterd_op_create_volume] 0-management: brick2.mount_dir not present
[2015-07-28 16:00:41.614587] E [glusterd-syncop.c:1288:gd_commit_op_phase] 0-management: Commit of operation 'Volume Create' failed on localhost

I tried a series of things to fix it:

I thought maybe the /GLUSTER/xxx-yyy-zzz needed to be created (I already made /GLUSTER) – Nope

  • I detached the peer and reattached – No.
  • I reboot the file server and test server – No.
  • I detached, reboot, reattached – No.
  • I tried creating the volume with just the test server and no replica – No.
  • I tried creating the volume on just the fileserver with no replica – Yes.

So the problem is point to the new system, but it’s a brand new system. They’re peers and connected.

  • I tried uninstalling and reinstalling gluster – No.
  • I tried uninstalling, purging and reinstalling – No.
  • I tried uninstalling, purging, manually deleting the /var/lib/gluster (probably a mistake that I didn’t detach first :() and reinstalling – No.
  • I have no idea why this wont WORK!!!!!

Let’s go further back, check the VPN, ping the servers.

  • Ping fileserver from testserver – Yes.
  • Ping testserver from fileserver – Yes/Hang on that’s the wrong IP!! Yes I’d copied an entry from webserverB into /etc/hosts, update the name but missed the IP address. Idiot! correct that. Ping – OK.
  • Try gluster again – Yes.

So if you’re having problems and seeing brickX.mount_dir not present make sure your DNS between servers is correct.

I don’t really know how the peer probe worked, but I think I must have done that from a servers who’s hosts was correct