Over the past few years, here at O'Reilly we've slowly been migrating much of the production for our frontlist from FrameMaker over to DocBook XML. There are some real benefits to working with content that's in XML, and along the way we also found some interesting ways to capture many of those benefits within our legacy FrameMaker workflow. The topic bubbled up during a recent internal conversation, and I realized there wasn't much publicly posted tying the (open source) pieces together. This post aims to fix that.
FrameMaker includes a text-based file format called "Maker Interchange Format" or MIF (pdf). Its primary purpose is to provide a backward-compatible file format, so that earlier versions of Frame can open files created with newer versions (discarding features and content that are not supported). A secondary benefit is that such a text-base format allows for easy manipulation using standard text tools.
Here's a small snippet of MIF:
<Para <Unique 87211> <PgfTag `Body'>
<PgfNumString `'>
<ParaLine
<String `Second'>
<Char EmDash>
<String `and this is the biggie'>
<Char EmDash>
<String `the event handler will run even if you save '>
> # end of ParaLine
The syntax is a bit awkward (though infinitely more readable than Adobe's similar "INDX" format for InDesign), but with a bit of practice, it's quite simple to use text-processing tools like sed to make changes that would be difficult or just time-consuming to do from within Frame itself. However, more complex changes require ever more acrobatic maneuvers when using basic text-processing tools. The more you try to maintain multiple states while navigating one of these 100K+ line documents, the more you wish you could just use XPath.
And indeed you'll notice that MIF looks awfully close to XML, though it's not quite XML. But it's close enough that with a bit of hacking using a Perl module, I was able to coerce MIF into an XML version. That same snippet above as XML (we settled on "MX" as the file-format extension and name) looks like this:
<Para> <Unique>87211</Unique>
<PgfTag>`Body'</PgfTag>
<PgfNumString>`'</PgfNumString>
<ParaLine>
<String>`Second'</String>
<Char>EmDash</Char>
<String>`and this is the biggie'</String>
<Char>EmDash</Char>
<String>`the event handler will run even if you save '</String>
</ParaLine>
As soon as it's in XML, you're able to bring the vast XML toolset to bear on filtering, searching, processing, and transforming those documents (the same can be said about the palpable excitement when Microsoft added an XML file-format to Word).
The Perl method worked perfectly as a proof of concept, but for technical reasons was just too slow for practical use. One of our then-engineers, Andy Bruno, wrote some software to accomplish the same result much more quickly.
As for the return trip from XML back into MIF, because the input was XML we were able to use a very simple XSLT transform.
While we are certainly very enthusiastic about migrating much of our production over into a pure XML workflow (which means true single sourcing for print, web, and ebook), there are still a lot of legacy FrameMaker documents out there, and I hope some of these simple but powerful tools find continued use somewhere (and we do continue to use them regularly for a shrinking portion of our titles).
The tools and related links:
Tags:
Tools of Change for Publishing is a division of O'Reilly Media, Inc.
© 2009, O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
TOC Community | TOC Blog | TOC Directory | TOC Job Board | About TOC
You need to be a member of Tools of Change for Publishing to add comments!
Join this Ning Network