Migrating Post Tags from WordPress to Octopress

I’ve migrated from WordPress to Octopress and used the Jekyll wordpress migrator to move my posts over. Unfortunately, this doesn’t preserve post tags. The output looks like this:

nn

1n2n3n4n5n6n7n8n
---nlayout: postntitle: Devops w/ Perl @ Linode PPW Talk Slidesnwordpress_id: 330nwordpress_url: http://michael.thegrebs.com/?p=330ndate: 2011-10-27 14:46:31 -04:00n---nEarlier this month I gave a talk about...n

nnn

Having the wordpress post id means extracting the post tags from the db should be quite easy. First we define our desired output:

nn

1n2n3n4n5n6n7n8n9n
---nlayout: postntitle: Devops w/ Perl @ Linode PPW Talk Slidesnwordpress_id: 330nwordpress_url: http://michael.thegrebs.com/?p=330ndate: 2011-10-27 14:46:31 -04:00ntags: geek perl slidesn---nEarlier this month I gave a talk about ...n

nnn

The tags field really can appear anywhere in this YAML fragment but I chose to throw it at the end. With 103 posts to loop over, run a query and insert a new line a short script makes sense. The real win for our script though is using the Tie::File module which presents each file as an array with an element for each line.

nnnnn

From there it’s simply a matter of reverse engineering the WordPress schema to come up with a query that will return a space delimited list of tags for a given post id.

nn

1n2n3n4n5n
SELECT GROUP_CONCAT(t.`name` SEPARATOR " ") FROM `wp_term_relationships` rnINNER JOIN `wp_term_taxonomy` tax ON r.`term_taxonomy_id` = tax.`term_taxonomy_id`nINNER JOIN `wp_terms` t ON tax.`term_id` = t.`term_id`nWHERE r.`object_id` = ?nAND tax.`taxonomy` = "post_tag";n

nnn

Throw that query in a script that iterates over the list of files and uses Tie::File to add a line to the post and we get:

nn

(word2octo-tags.pl) download
n

1n2n3n4n5n6n7n8n9n10n11n12n13n14n15n16n17n18n19n20n21n22n23n24n25n26n27n28n29n30n31n32n33n34n35n36n37n38n39n40n41n42n43n44n45n46n47n48n49n
#!/usr/bin/perl nnuse 5.010;nuse strict;nuse warnings;nnuse DBI;nuse Tie::File;nnmy $path_to_posts = 'source/_posts';nnmy $dbh = DBI->connect('DBI:mysql:db_name:db_host','db_user','db_pass');nnmy $get_tags = $dbh->prepare(q{n    SELECT GROUP_CONCAT(t.`name` SEPARATOR " ") FROM `wp_term_relationships` rn    INNER JOIN `wp_term_taxonomy` tax ON r.`term_taxonomy_id` = tax.`term_taxonomy_id`n    INNER JOIN `wp_terms` t ON tax.`term_id` = t.`term_id`n    WHERE r.`object_id` = ?n    AND tax.`taxonomy` = "post_tag";n    });nnnfor my $file (glob "$path_to_posts/*.markdown") {n    say $file;n    add_tags_to_file($file);n}nnsub add_tags_to_file {n    my $filename = shift;n    tie my @file, 'Tie::File', $filename or die "No tie: $!";n    my $tags;n    while (my ($rec, $data) = each @file) {n        if ($data =~ m/^wordpress_id: (d+)$/) {n            $tags = get_tag($1);n        }n        if( $data =~ /^date:/) {n            splice @file, $rec + 1, 0, $tags if $tags;n            return;n        }nn    }n}nnsub get_tag {n    my $post = shift;n    $get_tags->execute($post);n    my ($tags) = $get_tags->fetchrow_array();n    return "tags: " . $tags if $tags;n}n

nn