hmmm....

Migrating Post Tags From Wordpress to Octopress

I’ve migrated from Wordpress to Octopress and used the Jekyll wordpress migrator to move my posts over. Unfortunately, this doesn’t preserve post tags. The output looks like this:

1
2
3
4
5
6
7
8
---
layout: post
title: Devops w/ Perl @ Linode PPW Talk Slides
wordpress_id: 330
wordpress_url: http://michael.thegrebs.com/?p=330
date: 2011-10-27 14:46:31 -04:00
---
Earlier this month I gave a talk about...

Having the wordpress post id means extracting the post tags from the db should be quite easy. First we define our desired output:

1
2
3
4
5
6
7
8
9
---
layout: post
title: Devops w/ Perl @ Linode PPW Talk Slides
wordpress_id: 330
wordpress_url: http://michael.thegrebs.com/?p=330
date: 2011-10-27 14:46:31 -04:00
tags: geek perl slides
---
Earlier this month I gave a talk about ...

The tags field really can appear anywhere in this YAML fragment but I chose to throw it at the end. With 103 posts to loop over, run a query and insert a new line a short script makes sense. The real win for our script though is using the Tie::File module which presents each file as an array with an element for each line.

From there it’s simply a matter of reverse engineering the Wordpress schema to come up with a query that will return a space delimited list of tags for a given post id.

1
2
3
4
5
SELECT GROUP_CONCAT(t.`name` SEPARATOR " ") FROM `wp_term_relationships` r
INNER JOIN `wp_term_taxonomy` tax ON r.`term_taxonomy_id` = tax.`term_taxonomy_id`
INNER JOIN `wp_terms` t ON tax.`term_id` = t.`term_id`
WHERE r.`object_id` = ?
AND tax.`taxonomy` = "post_tag";

Throw that query in a script that iterates over the list of files and uses Tie::File to add a line to the post and we get:

(word2octo-tags.pl) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#!/usr/bin/perl 

use 5.010;
use strict;
use warnings;

use DBI;
use Tie::File;

my $path_to_posts = 'source/_posts';

my $dbh = DBI->connect('DBI:mysql:db_name:db_host','db_user','db_pass');

my $get_tags = $dbh->prepare(q{
    SELECT GROUP_CONCAT(t.`name` SEPARATOR " ") FROM `wp_term_relationships` r
    INNER JOIN `wp_term_taxonomy` tax ON r.`term_taxonomy_id` = tax.`term_taxonomy_id`
    INNER JOIN `wp_terms` t ON tax.`term_id` = t.`term_id`
    WHERE r.`object_id` = ?
    AND tax.`taxonomy` = "post_tag";
    });


for my $file (glob "$path_to_posts/*.markdown") {
    say $file;
    add_tags_to_file($file);
}

sub add_tags_to_file {
    my $filename = shift;
    tie my @file, 'Tie::File', $filename or die "No tie: $!";
    my $tags;
    while (my ($rec, $data) = each @file) {
        if ($data =~ m/^wordpress_id: (\d+)$/) {
            $tags = get_tag($1);
        }
        if( $data =~ /^date:/) {
            splice @file, $rec + 1, 0, $tags if $tags;
            return;
        }

    }
}

sub get_tag {
    my $post = shift;
    $get_tags->execute($post);
    my ($tags) = $get_tags->fetchrow_array();
    return "tags: " . $tags if $tags;
}

Comments