This post is a response to something on Tom Johnson’s blog – (upon which I is be groovin’ regularly and I highly recommend).

On a recent post we got into a discussion regarding the use of XML. Naturally I was all over that. I felt really bad I talked so much but never addressed a direct question of Tom’s:

“Here’s what I am interested in with XML. How can I extract pages on a Mediawiki wiki and package them up into selective guides and then publish them in a book-like PDF format?”

I felt like I was ‘flooding’ the comments so I moved this over here.

A fond memory of Paris, but god knows how they pulled it off! – Photo by Noz Urbina


I’m afraid I don’t know the specifics of getting content out of Mediawiki. I’ve never implemented nor have I reviewed it in depth. I can say that I doubt it’s simple. XML is not a format like others. It’s a meta-format. Saying something is XML is like saying a food is ‘French’ rather than saying it’s a ‘croissant’. You may know how to make French food generally, but that doesn’t mean you know how to bake, much less bake croissants. I’m sure there’s a better analogy than that, but that’s what I’ve got for now. So, you can’t just go to ‘XML’ anymore than you can make a ‘French meal’ without at some point nailing down exactly what recipe you’re going to be cooking up. Even DITA is somewhat flexible. You can go to DITA, but not all DITA files are equal. Like FrameMaker or Word with our without the judicious use of styles; same format, very different beast.

A fond memory of Paris, but god knows how they pulled it off!

The reason there’s no tool to do it is because there’s not much of a point. Like a machine that cooks for you, unless you’re specific in setting it up, the chances it’ll cook what you want are slim. What happens in practice is that people make a mapping from the Mediawiki (source) mark-up to the XML (target) mark-up that *they* want; as simple or as complex as their business case dictates. Once you’ve got the extraction routine in place, you’ll get an XML file to your specifications from whichever WIKI entries you like. You need a publishing tool chain set up to handle that file. If you don’t already have a system to publish your XML through, there’s not a lot of point in going generic XML. DITA is better in this regard.

DITA gets around this to a great degree by allowing you to use whatever DITA-compliant applications you want (there are far more generic DITA tools than XML tools). You could apply DITA filtering attributes turning certain content ‘on or off’ depending on your intended audience.

This might sound a bit opaque or vague – I’m not sure you’re existing level of XML knowledge. Unfortunately, you need a critical mass of XML understanding to intuit the details. Again, back to my cooking analogy, if you’ve mastered meat dishes and someone asks you how to cook a cheese soufflé, you might not be able to answer. If they asked you how to make salami, you might not be able to make them understand the response if they don’t already have a certain background.

Maybe I should have gone with music instead of cooking as my analogy? Anyway, unless you really had a recurring reason to do it, you’d (probably) not go through the trouble to do XML.


Industrialising content processes is highly beneficial, but you need a defined reason (a business case) that makes sense.

Industrialising content processes is highly beneficial, but you need a defined reason (a business case) that makes sense.

DITA/XML is an ‘industrialisation’ of content processes. You don’t set up a whole content strategy and governance process for something you’re only going to do once (unless that thing is a truly big undertaking). You don’t buy a 100-gallon mixing machine if you’re only going to use it once, you just get out 10 10-gallon spots and pull a few all-nighters. You don’t set an assembly line up if you want to make sculpture.  Same for XML – it makes sense for things that are going to be used on a certain volume of content, or by people (or small teams) expert enough to handle every aspect themselves.

PS – Call for Speakers for Congility 2011 is still open!