Migrating from WordPress to Django

This is the system I devised to move my blog posts from WordPress to a custom Django application.  It looks a bit intricate, but it did not take more than an evening of coding.

Start from the obvious:  WordPress exported XML

The WordPress native export functionality gets you a custom XML file, mixing  your post data,  with a lot of information you do want necessarily to keep, because it is needed only by WordPress internals (for example, post ids).

Instead of writing a program which would parse and create content for the new system directly from the XML, I thought it would be simpler and more convenient for future use to have two: a program to first output a subset of the XML data  into JSON,  and  another program which interfaces with the Django database API and loads the required data from the JSON.

This way you obtain a clean JSON file, without WordPress-specific cruft,  which can also work a sort of intermediate backup.

Python standard lib all the way

I decided the Python standard lib had everything I needed:t etree for XML and the json module. etree is quite an interesting API, though as most XML APIs it seems geared towards extracting a simple piece of information from the document, not trasforming the whole document into something else. I  think SAX is still the best for that, once you get the hang of it.

Anyway, this time I had conveniently reduced the task to what the XML library was actually good at, so I had only to target the tag list, the post title and the post slug, create a Python dictionary for each post with this data, and dump that to JSON. The code was not probably very efficient, but there was not a lot of data either.

Then it is only a matter of turning the JSON back into Python (json.load), then running through the dictionary json.load returned and saving objects through the Django db api.