160517_Michael_PriestleyFor those familiar with DITA, the name Michael Priestley speaks for itself. One of the founding fathers of the standard since its beginning in 2000, he is an Enterprise Content Technology Strategist at IBM, where he started working twenty two years ago. His main role is to lead the strategy for how IBM’s content infrastructure connects content organizations and supports the coordination of content and content processes.

Michael, how does DITA fit into your work at IBM?

DITA isn’t actually part of my job description; but it is part of IBM’s Enterprise Content strategy, as an important structured content standard. I continue to work on the standard, particularly in the area of Lightweight DITA, which is particularly relevant to the cross-organizational scope of my current role.

Let’s travel back in time: what was the driver for developing DITA?

IBM had been using a proprietary SGML language called IBMIDDoc, roughly comparable to Doc Book, with a similar scope and intent. As XML emerged the corporate team took the opportunity to revisit the architecture, and formed a workgroup with representation from every site across IBM. At the time I was an Information Developer and Architect and I was chosen to represent my site as a user, and a problematic one on top of that, as my group was not actually using IBMIDDoc at the time. We had rejected SGML and chosen to author directly in HTML. My background was authoring topic-oriented content in HTML chunks following IBM guidelines for content typing using three basic types: concept, task, and reference.

While I was a writer, I developed XML skills because the product I had been documenting was XML-centric. This allowed me to bring both XML expertise and an appreciation for what authors like me really needed. As we went into the process of revisiting the SGML solution, I strongly pushed for specific features: a topic-oriented and information-typed solution, with collection-based link management and an information typing architecture called specialization that allowed inheritance of behaviors. I worked part-time on the initial design, then full-time on the initial transforms to HTML and on our first DITA user guide.

What were the core objectives of the first version of DITA?

I wanted to reconcile two schools of thought: the SGML community that was primarily book/PDF focused, and the HTML community that was primarily web and online help focused. Both were single-sourcing – groups produced websites from their SGML source. The HTML users had a convoluted process to go from HTML to PDF. I was aiming for a single architecture that focused on the content and was separate from the deliverable. I also wanted to separate the linking relationships, the metadata and the overall navigation structure from the pieces of usable and reusable content. We needed strong typing. I viewed DITA as a chance to propagate our best practices around content typing, while leaving open the door to further type development as our needs evolved.

 What do you call a content type?

Initially we just had three types: concept, task, and reference, all at the level of a content chunk we call a topic. Today there are many content types available. I like to think that any level of content can have a type and structure, and therefore be a content type – whether it’s at the collection level, like an installation manual or a campaign, at the page level, like a tutorial or a whitepaper, at the topic level, like a product overview or task, or even at the block component or phrase level, like a set of steps, or a title field.

 When did the DITA standard grow outwards and involve other firms?

Very early on. We designed the proof of concept in 2001, a little after the IBM workgroup was put together. That got other companies interested, and implementing it even before IBM did. Nokia and Ixiasoft were among the early adopters.

How is DITA applied to IBM today?

The chief users are still the technical writers, a total of 1500 full-time professional authors using the Oxygen® editor. They publish to a web application that we call the IBM Knowledge Center.

 How many topics does IBM manage?

On the source side it’s around 4 million in our central CMS, with maybe another 2 or 3 million elsewhere, untracked. On the publication side it’s higher than it should be because of legacy content, around 60 million in English alone, several times that with other languages.

 Which DITA version is IBM using?

We’re using some features of 1.3, but are using mostly 1.2. Every new capability has a trade-off, additional complexity versus additional function. We work with our authors and try to respect their requirements.

 How would you breakdown the outputs of your DITA content?

HTML is number one. There is still a lot of PDF as a supplementary format to comply with legal accessibility rules. EPUB is coming up fast and is considered a strategic direction, because of the benefits it brings in terms of accessibility and responsiveness.

What is IBM’s next step in deploying DITA?

The big one is around Lightweight DITA, especially implementing the DITA model in a non-XML fashion, for example having a Markdown-compatible implementation of DITA. We have content that needs to be editable for a GitHub environment without access to a special editor and using default GitHub permissions and infrastructure. We know we have Markdown requirements and the Lightweight DITA mapping for XML DITA to Markdown DITA is already being used. Thanks to the DITA Open Toolkit we already have support for Markdown input and output from DITA. We are also able to publish content to HTML 5 with semantic markup; it maps back to DITA and is consumable as DITA. This lets us simplify processing flows and relationships. You can have a site that manages its content in any source format, but then publishes the content to Lightweight DITA in whichever form (XML, HTML, Markdown…) is easiest, and that published content becomes consumable by DITA processing without doing any extra work.

 What is Lightweight DITA?

It’s a simplified schema for topics and maps, with fewer elements, tighter content models and a simplified specialization architecture to define new types. If there are three ways of doing things with full DITA, there will be only one way to do it with Lightweight DITA. That simplification makes it possible to implement DITA without XML – for example using Markdown, or HTML5. This brings the advantages of structured authoring to where people are already creating content, rather than trying to get every author onto one content platform. Lightweight DITA can be the glue that ties together many different authoring platforms across a company. It can also be an on-boarding ramp for full DITA. Lightweight DITA is particularly attractive to companies who need a faster ROI and an easier learning curve.

 Is Lightweight DITA available?

Some parts are, with the caveat that it is still in development. We released draft document types for topic and map last year and they are already incorporated into some authoring tools, such as Oxygen®, Simply XML® and FontoXML®. We have created two test sets of specializations of Lightweight DITA, namely for Marketing and Learning and Training content types. So a company can implement Lightweight DITA today, bearing in mind that it will evolve.

 You mentioned the 1500 technical writers using DITA at IBM; is it adopted elsewhere within your organization?

Some smaller teams are using DITA, for example with our announcement letter process. Other teams are interested. It tends to be cyclical. If they are in a position to revisit their infrastructure and content choices, then DITA is on the list of things they look at. Lightweight DITA facilitates conversions because it is perceived as simpler and less technical than DITA. It enables teams to free their content of its dependency to a platform without requiring them to leave the platform. They get access to other tools outside the system by using Lightweight DITA as an interchange standard, which can also be used for sharing and reusing content. As an Enterprise Content Technology Strategist, that’s where I’m really focused – on the connections between the silos. I don’t care what is inside their black box, as long as their box can exchange content with other boxes on my diagram.

 How has IBM managed the localization of DITA content?

We translate into about ten standard languages and about fifty total ones; our local offices decide on a per-product basis whether content needs to be translated or not. We opted early on for the integration of our linguistic capabilities. Our in-house translators have been supporting DITA from day one, which means there is optimal integration between DITA and localization.

WhP specializes in DITA content localization. To find out more about the features of our solution click here.