Happy Holidays everyone!

I was woken up today at 5:30am by a head full of XML.

This doesn’t happen to me that often, but probably more than it happens most people…

This blog presupposes a certain amount of knowledge of XML and somewhat of DITA, so if you don’t know what either are I suggest backtracking a bit.  If you know XML but not DITA you can check my Intro to DITA webinar for a crash course.

I’ve been thinking about this for a while but a recent conversation at X-Pubs 2009 with a salesman from one of the major DITA-supporting Component Content Management Vendors really kicked me in the butt to write this. 

We were discussing a competitor of his who has implemented a Mark Logic (i.e. XQuery-based XML native) base for their system.  This blog is not an endorsement of any given product, but a hurrah for a company (any company) that takes an approach we’ve been nagging them to take for a couple of years now.  The approach is one we have in fact even already implemented as bespoke developments on Mark Logic and open-source databases, but is poorly addressed by the vendor community.

The response from the first CCMS vendor was (paraphrasing), “So what? With DITA surely you just manage at the topic level?”

The answer is: so lots of stuff.

Lots of Stuff

My colleague Mark Poston did a webinar a few months ago through X-Pubs walking people through a case study (sorry about the audio quality, and the zooming that the screen-share software does. It’s a bit raspy, so I suggest speakers instead of headphones, and you have to listen carefully as the text has been squished somewhat) where two features in specific demonstrated a heavy reliance on XQuery. 

For those of you who don’t know, XQuery is a technology that has just (finally!) started to make waves.  Those of us in the industry have been hearing about it since the standard started development circa 8 years ago, but fine tuning it for release, tool support and industry understanding mean its “time” is only just dawning. 

Although of course I recommend you take the time to watch Mark’s whole session, only certain bits are directly illustrative of today’s rant.  If you check out the webinar from about 47:30 on, it shows a DITA document (although the fact it’s DITA is not relevant) with database integration where a 3rd party XML data service is being interwoven into a user-created file.  Neat stuff. 

From about 51:45 the system shows how you can enter mark up strings of XML (across element boundaries) to define areas of text that require specialist Subject Matter Expert (SME) approval.  In this case, data that requires Copyright needs to go to a copyright editor for administration, management and release. 

As far as I’m concerned, by far the coolest part of this whole shooting match is when the Copyright editor is shown their optimised view of the content.  This type of interface with XML is very possible, but with fully topic-based access management, it’s not easy.

Early in the webinar in the MS Word DITA integration and on the left hand side of the management console, we see a DITAmap (basically a ToC (table of contents) laying out the hierarchy of DITA topics).  In the Copyright area, we see a view of a document that is hierarchical, but has only two levels and has nothing to do with the ToC of the document in question.  It is organised by the company that the copyright editor needs to talk to, not the document’s ordering, and shows:

  • the status of each individual bit of Copyright-requiring information, with buttons to change it from approved, waiting, rejected, etc..
  • a preview of the text in question
  • a “Context Zoom” where if the Copyright editor needs more context to understand the text they are looking at, they can “zoom out” level by level in the XML file

Exactly the It is worth noting here that the whole point of using Quark XML Author was to avoid using the term “XML” at all. As far as users are concerned they are not editing XML they are simply marking up content from point A to point B. XML is the enabler to a much more powerful solution than Word or an ordinary component-based CMS, but is made to look simple for the user. The power lies under the hood, and that’s where XQuery is invaluable.

When Reality Attacks

So, we’re seeing here a real-world use case, which I can say from client experience is not unique to this customer or industry, where we are managing XML fragments that:

  • are not at all bind-able by topics individually
  • cannot be grouped and ‘hived off’ into topics in advance for planned reuse; by definition they can’t exist until authoring-time.

For those in the know, and especially for the second example, this is seriously, seriously, difficult to pull off with a purely topic-based paradigm, and to an extent without our friend XQuery. 

I don’t want to make this blog all about XQuery, but the relevance of what you’re seeing here is that it essentially requires that you treat your whole DITAmap, if not your whole content set of many DITAmaps and Topics, as a single, query-able XML file: Each part of each document is accessible at any time for recombination, management and on-the-fly preview. Simple to say, hard to grasp just how useful it is.

When Are Topic-based Systems Good Enough?

As you may know Mekon work with many CCMS vendors and Technical Communications teams, and will sing the praises of our favourites when they are being applied in the right context.  Most, however, are optimised to manage the internal operations of a Technical Communications team, and hold on to the Topic / Data Module or a MRU (Minimum Reusable Unit) concept.  When either on the collaboration side (as in the copyright example), or in the delivery side, you want to get more fancy with your XML, they rely on downstream systems. Which in a way is fine, but being able to manipulate your content into new deliverables is part of the creation process, and therefore I’d say should be addressed in the CCMS creation-management layer. 

In the case study I referenced, Topic-management and reuse were not major drivers, it was collaboration and delivery that were key, so we developed this system on top of a native XQuery Engine and XML repository (Mark Logic) because Topics alone weren’t good enough to get the job done.

Anything that bound us completely into a topic-based management paradigm on the back end  without the ability to:

  • easily and quickly display fragments as discrete manageable objects to the user
  • combine content that doesn’t exist in topics (e.g. data feeds) seamlessly
  • cheaply and quickly generate custom interfaces for viewing and managing these new “fragments objects”

meant to deliver the end-to-end user experience we needed (to authors, managers and SME editors) would have been difficult, and an inefficient use of our resources.  Poor use of our resources of course also means poor value for the client – so, we didn’t do it.

Horses for Courses

Your horse might be a native XQuery supporting CCMS, or one that integrates tightly with a native XML-based system, if you want full management capability – status, validation, workflow – on things that aren’t discrete topics in the system. 

If you are starting to think about how you can make your content experience more engaging than big flat PDFs and want to do things like this:

  • Allow user compilation of their desired content, directly, regardless of whether you as a content provider decided that that was a “Topic” or not – theoretically, even if you didn’t think to apply any special metadata to it.
  • Maybe tie this into a content licensing/purchases system if it’s IP with a £$€ value attached?
  • Allow internal collaborative users direct and flexible access to only parts of the content, e.g.:
    • warnings
    • statements in proposals or specifications that legal or R&D might need to review/query
    • anything: maybe you just want a user to be able to add custom metadata to any given element anywhere at any time!
  • Dynamic extraction of compiled content (Automatically generated topics?) based on semantics found in other documents:
    • E.g., you have lots of documents that talk about tools and prerequisites for doing certain tasks, and you want to generate a dynamic list of all the tools or prerequisites across multiple topics. (For those going, “Just use XSLT” yes, there’s definite potential there, but it is not as dynamic as you’ve got to write the XSLT in advance, n’est-ce pa? Arguments invited.)

The Counter-Argument

Did you notice that I mentioned the system in the case study was built on DITA?  That means that in fact the whole system is, technically, topic-based.  So, it is important that I clarify: it is not topics themselves that are the problem, but lazy CMS developers who look to topics as a way out of providing more flexible content management paradigms to their users.  I think it’s an extension of the difficulty in breaking the document mind-set.  They think that although you’re component content managing, but the desired output, all interdepartmental collaboration, and all user manipulation of the content are always going to be on whole topic chunks, or worse, flattened, rendered documents.

The most common argument is ‘not enough customers are demanding that’, which means a huge lot in a software company (I worked in them for ½ my career, I know).   But, try a similar argument: “Our horse-drawn carriage customers don’t ask for carriages that can do 70kmph for 3 hours at a stretch without a break.”  If that made any sense we’d have no cars.  People don’t ask for what they can’t envisage, but that doesn’t they don’t need it, or buy when you show it to them.

The line between publishing and creation is blurring as WIKIs, Web 2.0 and Intelligent Enterprise Content sweep over us in waves.  We need to be able to preserve our semantics right up to the user so that we can remix, annotate, and user-categorise it to our heart’s content.  Even if you’re in Medical, Military or Pharma, your users are still allowed to make favourites, search by categories or give paragraph specific-comments that you need to deal with on a sub-topic basis. 

If you get 800 comments on different paragraphs across 1000 topics in a map, do you really want to deal with each of those individually by opening up each topic, or would you like the comment and it’s paragraph displayed in a summary view for you to step through… you decide.  This is where XQuery provides capabilities that other CCMSs cannot (so readily) support.

When Topic-based Management is A-OK

Tech Pubs applications that try to increment up one step from your existing document processes can get bye fine with topics.  There’s so much we’ve just grown accustomed to living with, that vendors don’t need to fix the things buyers don’t realise are broken.  Review tools in most CCMS are poor to non-existent, and still vendors get away with not implementing better ones because the functions they provide are still so much better than the doldrums of unstructured, copy-paste-based processes.

You have multiple doc types (like install manual, admin manual and box top labelling) and across 4 products (3 types x 4 products = 12 instances) there are lots of shared pre-definable components.  Conref is a sub-topic reuse mechanism that can help you handle all your internal workings quite well.  

Sometimes you may want to start here, and then evolve your use of semantics and XML up over time. Such approaches are supported by the DITA folks with the DITA Maturity Model and this is completely valid. 

I for one still think the platforms should at least offer the under-the-hood architecture – and by implication, the flexibility in the front-end UI – that will allow you integrate easily into CM regardless of your background, and to grow as an organisation without necessarily outgrowing your CCMS in the first 3-5 years. 

Thinking ‘we just manage topics’ is a lazy design attitude and cheats the customers out of features that could really save them time and frustration.