Archive for February, 2009

How to Convert Your Blog From SubText to Wordpress

No matter what anyone tells you, there’s more to changing blog engines than just clicking a few buttons and importing data. It tends to be a fair amount more complicated than that. In my last post, I highlighted some of the reasons, both rational and not so rational for my decision to change from SubText to Wordpress.

It’s only been about a week or so, but I’ve learned a huge amount of “stuff”, for lack of a better term, and am thus far, I am rather pleased with the transition to Wordpress. I thought that for the benefit of others, I would document my move to Wordpress 2.7 from SubText 1.9.3 so that if others decide they’d like to try it, they can hopefully retrace my steps and experience a little bit less pain than I did.

When undertaking this sort of project, there’s really only one viable place to start looking for information on how to do this. Google. I found a number of links that I thought might be helpful and used them as a general starting point for deciding what to do, and how to do it.

Subtext to Wordpress:
http://www.ageektrapped.com/blog/subtext-to-wordpress-converting-blog-engines/
http://www.copyandwaste.com/2008/09/15/hello-goodbye-subtext-to-wordpress/
http://blog.digitaltinder.net/2008/12/exporting-blogml-from-subtext-21-and-importing-blogml-into-wordpress-27/

Blogger to Wordpress:
http://www.aaronlerch.com/blog/2007/08/23/breaking-up-moving-blog-engines/

DasBlog to Wordpress:
http://www.kavinda.net/2008/10/23/migrating-from-dasblog-to-wordpress.html

Wordpress to SubText:
http://betterthaneveryone.com/archive/2007/11/04/wordpress-to-subtext-done.aspx

Obviously the first set of links was much more helpful than the others, but for reference purposes, seeing how others dealt with converting to Wordpress or reasons they moved away was somewhat enlightening. After running through the instructions on each of the first set of links, I came to one conclusion: that none of their sets of instructions were going to work for me directly due to the myriad of problems I was running into that they did not address. I felt there was more hand waving than hand holding. I like to hold hands, so here’s how I converted from SubText 1.9.3 to Wordpress 2.7.1.

At a high level, the idea is to export from SubText to some intermediate format, and then import that into Wordpress. In this case, the intermediate format which is probably the most straightforward to use is a BlogML XML file. The rationale is that you want to be able to keep all of your content and save time doing the conversion. The last blog engine transition I did was from CityDesk to SubText and it involved a lot of copy/pasting. Not fun.

BlogML is supposed to be a standard of some sort for moving your blog content from one platform to another. Unfortunately, the development is somewhat stagnant and not much of anything has gone on in quite some time. Their roadmap as of today indicates that version 3.0 is expected to be released in mid-2008. It’s early 2009 and version 2.5 is the only thing out there.

Another potential sticky point is that Wordpress does not ship with a BlogML import module. Fortunately, a fellow blogger named Aaron Lerch built one and there are several variations floating around which fix a few different bugs. I’ll be offering up my own version in order to fix a couple more.

So, to reiterate, the idea is to export from SubText to BlogML, then import the XML into Wordpress. Easier said than done.

Problem #1: Exporting to BlogML.
The BlogML exported in SubText doesn’t appear to work. At least it didn’t at first and in the version of SubText I was using. For the record, I was using version 1.9.3. Fortunately, I discovered almost by accident that the BlogML export feature for SubText doesn’t work if you instruct it to include embedded content, which is the default. I’m not sure specifically what that is meant to be, but from doing a bit of research, I gather that embedded content includes things like flash files, YouTube videos, or maybe even local images. In any case, including the embedded content caused it to fail. I unchecked the box to include embedded content, and viola. My BlogML XML file was ready to download.

Apologies to those of you using embedded content, but I really didn’t look too far into this. The cursory research on what embedded content was lead me to believe that I didn’t have any on my blog and could probably safely ignore it. Your mileage may vary.

Problem #2: Using Aaron Lerch’s BlogML Importer for Wordpress.
This seemed flaky at first and it wasn’t clear at all why it just wasn’t working. I’d get the file upload textbox like the instructions stated, I’d attempt to upload my file, and then the fields would disappear and my browser would act as if nothing was wrong and it was done doing what it was supposed to do. I tried a few different browsers and got the same result with Firefox, IE 7, and Google Chrome.

It turns out that the BlogML import seems to use a fair amount of memory. My BlogML XML file was about 1.6MB. After digging through the apache error logs on my web server, I found that the web page was requesting about 32MB of memory to parse the XML file and the web server was denying that request, as it was limited to much less in terms of memory.

I really don’t have any idea why it requires so much memory to parse the BlogML file. Quick estimates ballpark the required memory to be about 20 times the size of your BlogML file. In my case, this was about 32MB of RAM. If my BlogML file were 5MB, I would likely need more than 100MB memory.

The quick fix to this issue was to add the following line of code to the blogml.php file:

ini_set("memory_limit", "64M");

You could always bump it up to 128M or higher, if needed. The alternative is to modify your php.ini file and alter the memory_limit for the entire apache instance, but I felt that this blog import was only going to be done once, so there was no need to allocate additional resources if it wasn’t really necessary. The machine has them to spare, but no point in wasting them.

You can download the XPath.class.php and blogml.php files that I used from here.

Problem #3: File upload problems
Once the BlogML importer seemed to be working, I immediately ran into a permissions issue. The BlogML Importer was unable to save my uploaded file to the web server due to a permissions error. I poked around a lot and the “fix” most often recommended was to change the permissions on the /wp-content/uploads directory to 777. Forgive me for working in the security field, or even being remotely security minded at all, but that’s the single most ridiculous suggestion I’ve ever heard.

If it was only made by one person, I could possibly dismiss this as just ignorance, but numerous people were suggesting that this approach was not only common, but was the recommended fix. Sorry folks. It’s not. I found that the most straightforward approach was to provide ownership of the uploads directory to apache. Immediately the problem went away, and nothing had to be made writeable by world.

Problem #4: Link redirection
This one could have been a total nightmare, but wasn’t nearly as bad as it could have been. If you have a blog that you’ve been running for any length of time, the hope is that other people have linked to your blog. Even better, there’s a steady stream of traffic headed your way. Well, to keep that traffic from drying up quickly, you’re going to need to set up URL redirection using a .htaccess file on your new web server, thus redirecting pages from
/2008/12/25/its-christmas-time.aspx to something like /2008/12/25/its-christmas-time/.

That means that you need to know exactly what every single internal link on your site is, and exactly where it goes. Once you know where all your links are, then you add a RewriteRule to your .htaccess file for each of them. This RewriteRule will perform a redirection at the web server level, simultaneously providing the browser with a 302 error code to indicate that the page has permanently moved.

This should have been easier than it was, but I wasn’t using pretty URL’s in SubText, so I had to suffer through this part of it. It didn’t take long before I came up with what I felt was an adequate solution. I poked around a lot using Google and Yahoo, looking for web tools that would crawl my site and find all of my page links for me, but I didn’t find anything that was terribly helpful. Finally, I gave up and decided to roll my own.

Using my trusty Perl skills, I wrote a website crawler which I pointed at my original blog. After reading in the main page, it parsed the page for every link on the page. If the link was local to my domain, it would retrieve the contents of that page and recursively continue to do so until it had followed every single link on my website which pointed back to itself. I ignored image references and relative URL’s. I also ignored any link that was to an external website, as I have no control over those links anyway.

Given that there was a page in SubText containing a list of archived links, this solution worked really well. I was able to capture every single link on the page and for each URL, I was able to obtain the title of the page. This made building my .htaccess file pretty hassle free. It was still a little tedious, but for a few hundred links it only took a couple hours to search the content of my new blog for the title’s that I captured and match them up to the original URL’s.

Here is a link to the Perl code that I used for this. Feel free to hack away and use it for whatever you want. I’m releasing it under the GPL 3 license. Do with it what you will. To use it, simply install the Perl libraries (assuming you don’t have them) and set the primaryURL and the TLD variables. Run the subTextCrawler.pl file, and it will spit out a bunch of half-written RewriteRule’s for your .htaccess file.

The assumption is that you have your Wordpress site up and running and have imported your BlogML file. I used a temporary domain pointer for this, so I was able to take the title printed on each RewriteRule line and search for the corresponding URL on my new Wordpress site.

I could have gotten much fancier and searched my Wordpress site using the title from the SubText site and completely automated it, but I’ll leave that for someone else to do. I’m just trying to get you most of the way there.

subTextCrawler.pl

Problem #5: Learning how to actually build a .htaccess file.
I was pretty stupid the first time I was working with my .htaccess file. It turns out that there are two things you need to keep in mind when using Wordpress. First, is that Wordpress expects to be able to modify this file. So, making .htaccess owned by apache solved the first issue. The second issue I had here was that the .htaccess file automatically is filled in with a set of rules that are dictated by your Permalink preferences. Whenever you browse to the Permalink preferences page within Wordpress, this file is read, parsed, and then rewritten. All without clicking a save button.

It’s pretty irritating to put all of your RewriteRule lines in there, only to find they don’t work for some reason and not realize that it’s because the file is being overwritten whenever you browse to a specific admin page in Wordpress. Your .htaccess file should look something like this:

# BEGIN WordPress
‹IfModule mod_rewrite.c›
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
‹/IfModule›
# END WordPress

The trick to adding your own RewriteRule options is to add another set of instructions above the ones created by Wordpress. Once you do that, your changes will not be lost whenever you browse to the Permalinks page. I’m a fan of examples, so here’s some of what I ended up with:

‹IfModule mod_rewrite.c›
RewriteEngine On
RewriteBase /
RewriteRule ^archive/2005/08\.aspx$ /2005/08/ [R=302,L,NC]
RewriteRule ^archive/2005/08/21/1\.aspx$ /2005/08/21/day-11-starting-a-new-business/ [R=302,L,NC]
RewriteRule ^archive/2005/08/22/2\.aspx$ /2005/08/22/day-12-the-website/ [R=302,L,NC]
...
one rule per line of subTextCrawler.pl output
...
‹/IfModule›

# BEGIN WordPress
‹IfModule mod_rewrite.c›
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
‹/IfModule›

# END WordPress

Conclusion:

Hopefully, someone out there finds this retelling of my experience useful and can save themselves a great deal of time and effort. Between the links above where people explained their processes, and my retelling of the problems that I ran into, you should at least have some answers as to how to tackle some of the problems you might run into.

Eventually, the Google Blog Converters project may help to allow your data to migrate between blog engines a little easier but it’s really just not there yet. Good luck!

Abandoning SubText

Apologies for the lateness of this post. I wanted to post it last Tuesday which was when I made my new blog engine live. Unfortunately, I developed double ear infections that evening, which rapidly turned into a pair of ruptured ear drums, lots of pain, some hearing loss, 5 days of bed rest, and prescriptions involving Percocet and antibiotics. I can’t hear much at the moment, but happy about it and don’t feel any pain. I don’t really know how that works but it does. I slept for a few hours this afternoon so now it’s the middle of the night and I can’t sleep. I feel like I’m on the mend for the time being. Monday I go under the knife for unrelated surgery which will put me back on bed rest for several days. Back to my originally scheduled blog post…

It’s pretty amazing that it’s so incredibly easy to start a blog these days. Unfortunately, writing and maintaining a blog is like getting married. Once you choose blogging platform, you’re essentially stuck and changing blogging platforms is about as painless as getting divorced. In the best case, it’s not any fun. In the worst, you lose pretty much everything you ever had.

For more than a year now, I’ve been considering changing blog engines. Interestingly enough, that time frame also coincides quite nicely with my dramatic drop-off in blog posts. Why put more work into a blog if you’re just digging your hole deeper? I knew that every blog post I was going to add was just going to increase the amount of work I’d need to do to perform the conversion and decrease the motivation to actually move to another platform.

However, last week I decided to finally bite the bullet and just get it over with. After all I’d been absent from my blogging duties for nearly a year. With my Masters degree now out of the way, I really didn’t have a good excuse to put it off any longer. So I started at the most obvious place imaginable for how to convert my blog from SubText to something else. Google.

I suppose I should back up a little bit and explain my reasons for abandoning SubText. After all, I do a fair amount of .NET development and SubText is written in .NET with a SQL Server back end. Let me put it bluntly. I had higher expectations for SubText as a platform.

Issue #1: Annoying bugs
One example I can point immediately to because it was the most irritating when I finally found the cause was the existence of a paging error in the categories. If you had 9 catagories, it worked fine. If you had 10, it wouldn’t show any categories. If you had 11, you were ok again. So for a while I had 11 categories, one of which I never ever used.

But you’re a .NET developer! It’s an open source platform! You could have fixed it yourself.

I’m sorry, but I have this thing about software. It just needs to work. Time and time again, I hear developers scream about how great open source software is because if there’s a bug, you can go into the code and fix it yourself. That’s nice. Honestly, I think that’s really great. But when push comes to shove, I really don’t have the time to waste making features work that should have worked to begin with. The release I was using was

Issue #2: Excessive Complexity
I opened up the source code for version 1.9.3, which is the version of SubText I was on and there were 10 projects. TEN! Maybe I feel like I was just spoiled by the classic ASP programming model where you had one asp page file, maybe a couple of include files for commonly used functions, and that was about as complicated as it got for any given web page. I’m as much a fan of the .NET framework as the next Windows programmer, but ten projects for a blogging engine just seems really excessive. Just finding the right project is sometimes a chore, and the documentation for SubText leaves a little to be desired. You also need to download additional components to get SubText to compile.

I’ve always hated it when you want to compile something and you’re missing a dependency, which has another dependency, which also has another dependency. Eventually, you’ve installed just about everything and are 9 dependencies deep, only to find out that there are bugs which won’t let it compile anyway and then you have to fix those too because someone didn’t account for something like 64 bit processors which were years and years away. Occasionally, you’ll also find that due to some obscure bug in version 9.4.2 of dependency 34 in the sixth level of hell, it won’t compile, but only on your machine, as evidenced by the following directives found buried in some include file.

#ifdef YOURMACHINE
#error "This won't compile on your machine because we think you're a loon!"
#else
#error "Not your computer this time, but we still think you're a loon!"
#endif

In this case, I actually already had the required additional components from Microsoft on one of my machines, but not all of them. This wasn’t terribly painful, just another minor annoyance.

In the end, I found the guts of it just too incomprehensible to work with easily. I’m sure that given enough time, I could have done so, but time is not something that I have a lot of, so I simply didn’t bother.

Issue #3: Roadmap, or lack there-of
Another point in the defense of SubText is the fact that I was running version 1.9.3 and the most recent version is 2.1. Perhaps the most recent version would have made a difference, but I’m guessing not. There have only been 4 new releases since I installed the version I have, and that was 4 years ago. The roadmap for the product could use some work, as could the communication of when releases are actually going to happen.

Knowing what is going to be available in version 2.0 is nice. Knowing when I should expect to actually see version 2.0 would be nicer. Yes, I know it’s an open source product, but I really don’t care. I see no reason why open source projects shouldn’t be held to the same standards as commercial software. If they want to be treated as if they’re just as good, they need to be just as good, and that includes release schedules.

Issue #4: Plug-ins for extensibility
Something else I wanted was to be able to extend the blog with plug ins, and there just weren’t any for SubText. Were they coming, and if so, when? I had no idea. Supposedly in version 2.0, whenever that was going to be. (See Issue #3)

Issue #5: Comment spam
Honestly, this should have been a no-brainer. Didn’t anyone working on SubText see the amount of spam coming into their own blogs? It’s pretty ridiculous the amoung of comment spam that I get on my own blog, and the expressions that you’re supposedly able to add to help filter them simply don’t work. (yes, another annoying bug, see Issue #1)

Issue #6: Haphazard looking website
If you browse the website for SubText, one thing you’re going to notice is that depending on which links you click, you seem to get conflicting information. For example, the main page states that version 2.1 is out. If you click on Docs, then About, and then “Requirements”, they state the requirements for Subtext 1.5 and 1.9(when it is released).

Eventually, I realized that none of these things were getting any better. I could get involved and try to help out, but I really just don’t have the time. Maybe when I’m independently wealthy, I’ll revisit this course of action, but for now it’s easiest to just abandon SubText and move to a blog engine that better suits my needs.

It took me all of 10 minutes to decide to switch to Wordpress. I’ve used it in the past for other blog projects, and it worked really well. I did a quick install on one of my linux servers, kicked the tires a bit, and decided that it was certainly going to be worth whatever trouble I had to endure. In fact, I installed a plug-in called WP-Syntax in less than 5 minutes and achieved syntax highlighting and line numbering on the C pre-processor code you see above. And so it is, that you’re reading this today on a new Wordpress blog engine.

My next post (possibly not until after my surgery) will be all the fun things I encountered while trying to get away from SubText and into Wordpress.

Twitter Delicious Facebook Digg Stumbleupon Favorites More