Quest for The One Blog, Part 8

1251 wc

My current thinking is that I should be looking into two distinct blogging platforms. One for the “live” blog, the most recent stuff, where people can theoretically interact and leave comments on a daily basis, and consume an RSS feed.

The second platform would be for the “archives:” Mainly static pages of older posts for reference, without much of any interactivity.

Regardless of how I proceed with the “live” blog, the idea of a “static site generator” for the older archives appeals to me, or one of the Markdown-based platforms I’ve previously mentioned. I keep reading that Hugo is excellent for this. But before I can even think of going down that or any other road, I need to know how much work is involved in exporting some sixteen years worth of blog posts to Markdown format.

I briefly mentioned in Part 4 a WordPress plugin that is intended to export WordPress posts to Hugo, a static site generator. When I gave the instructions a cursory glance, I had the impression that it would be difficult to get working, so I put off trying it. I finally dove in and gave it a shot yesterday. It’s actually easier than I thought it would be. But it’s still far from a “one-click” solution.

The first step is installing the WordPress To Hugo Exporter plugin. That link takes you to the GitHub home of the plugin. You may notice there is no large flashing button that says “install this plugin by clicking here!” You may also have noticed, as I did, that if you go to the WordPress Plugin page and try to add the plugin from the WordPress ecosystem, it’s not there. That means we have to roll up our sleeves and install this plugin the old-fashioned way, by uploading files directly to our web host. It goes something like this:

  • Download the plugin files from GitHub by clicking on the green “Clone or Download” button and selecting “Download ZIP.”
  • Unzip the plugin zip file to a local directory.
  • Use your FTP software of choice (I use FileZilla) and connect to your web host.
  • Upload the plugin files to the wp-content/plugins folder of your WordPress site. (Creating a new directory within for the plugin files, eg. hugo-export.)
  • Open the WordPress Dashboard Plugins page, and activate the new plugin.

Once the plugin is activated, you’ll have a new menu under Tools for exporting posts. All you have to do at this point is select that menu, and a zip file containing all of your posts (and associated images) will be created for you.

Turning To The Console

At least it would if your site isn’t very big. Endgame Viable, it turns out, has over 1000 posts on it from 2012 through today. That’s too big, and when I tried to use the Export menu from the Dashboard, the web page timed out long before the plugin finished processing.

If the web page times out, that means rolling up our sleeves even further and digging into the rest of the instructions on that GitHub page up above. We’re going to have to use SSH and connect to our web host and type some Linux commands to finish this thing.

I won’t go into the exact steps here, because if you’ve reached this point, you can probably follow the “Command-Line Usage” instructions on that GitHub page just as well as I can.

I did encounter one issue that needs to be addressed, though, and that is my web host only had PHP 5.4 available on the command-line, which does not meet the requirements for using the export tool. I was quite sure that newer versions PHP were installed on the host server, because you can select to run PHP 7.1 from CPanel, which I’m sure I had already done for my web site.

After some Googling, I found that newer versions of PHP were installed in /opt/php**/ directories on the server. Eg. /opt/php56/ or /opt/php71/. I created an alias in .bashrc according to instructions I found on this StackOverflow page. Then I was able to run the newer version of PHP on the command line and run the export tool. (I probably didn’t need to make an alias, but it saved me from typing /opt/php71/bin/php every time instead of just php.)

It took a long time to export all of my blog posts, but eventually the tool created a zip file that I was able to download. It came out to a healthy 1.5Gb. Almost all of that space was taken up by images, because it makes a copy of every image referenced by every post on the blog. Which, it turns out, is a lot of files, particularly since WordPress takes every image you upload and breaks it into multiple copies of varying sizes.

Inside the zip file I found the Holy Grail I had been looking for: A very long list of individual Markdown files, each one containing a blog post! 1,059 of them, to be precise. From July 4, 2012 through today. Well, through Saturday. I don’t know when I’ll be publishing this post. It was the exact thing I needed to get started on building an archive site.

But Not Quite

I loaded some of the Markdown files and looked them over. Yep, that’s my writing all right. No HTML angle brackets in sight. Except, oh, wait … well, it turns out there are a few imperfections in the conversion.

The first thing I don’t like is what it does to all the apostrophes. Every apostrophe turned into “’” which will be fun to try to escape so you can actually see it. That’s ampersand-pound(or hashtag for the kids)-8217-semicolon. It breaks up the readability of the text just a tiny little bit.

A similar treatment was given to quotation marks, dashes, and a variety of other symbols. As a programmer, I know exactly why it happened and why it’s best for everyone to store web text like that. But as a writer I don’t want it to see it, and it’s one reason Markdown was invented in the first place. I’ll need to figure out a way to fix as much of that as I can.

The second thing I don’t like is what happens to images. This is probably a situation where there’s little to be done about it. But what in my mind should have looked something like this:

[my-image.jpg]

Actually looks like this:

<figure class="wp-block-image"><img src="https://d2jkbzrop6wflm.cloudfront.net/img/2019/09/internet-radio.jpg" alt="" class="wp-image-9867" srcset="https://d2jkbzrop6wflm.cloudfront.net/img/2019/09/internet-radio-1024x683.jpg 1024w, https://d2jkbzrop6wflm.cloudfront.net/img/2019/09/internet-radio-300x200.jpg 300w, https://d2jkbzrop6wflm.cloudfront.net/img/2019/09/internet-radio-768x512.jpg 768w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure> 

That doesn’t really “fit in” with what’s supposed to be a plain text representation. But I don’t think there’s much I can do about that. It’s hard to make a plain text version of an image.

Beyond the markup, I’m not fond of the absolute link addresses, either. I would have preferred relative addresses. But perhaps that’s okay.

On another matter, I like that draft posts were included in the migration, because I don’t want to lose them. But they are mixed in with the published posts. The only way to distinguish them is to look for the “draft: true” tag at the top of the file. I’d prefer they were moved to a seperate directory.

While the plugin is a great start, I’m going to need to do some further customization to make it work the way I want it to.

This post is part of The Quest for The One Blog. Next up: Part 9 - Hugo.

This page is a static archival copy of what was originally a WordPress post. It was converted from HTML to Markdown format before being built by Hugo. There may be formatting problems that I haven't addressed yet. There may be problems with missing or mangled images that I haven't fixed yet. There may have been comments on the original post, which I have archived, but I haven't quite worked out how to show them on the new site.

Note: Comments are disabled on older posts.