Sunday, May 16, 2010

Is PDF an open standard?

Is PDF an open standard? Duff Johnson, CEO, Appligent Document Solutions, weighs in:

On May 13, the founders of Adobe Systems stepped up to the microphone to deliver a response to Steve Jobs' open letter about Flash. They say Adobe has acted on open standards while Apple offers mere words.

At the outset, I must acknowledge that I owe my livelihood to the genius of these two gentlemen. The inventors of PostScript and PDF and the creators of Adobe Systems, Warnock and Geschke are gods in my Pantheon. They are the founding fathers of technologies that have been instrumental in making computers relevant to the modern everyday operations of government and business.

That said, claims about PDF being a true open standard need to be placed in context.

Adobe Systems has published the PDF Reference, the rulebook for PDF developers, since 1993. At the very beginning, if you wanted to make, view or manipulate PDFs you bought the book in the store for a few dollars. Pretty soon it was (and still is) available online at no charge.

On July 1, 2008, version 1.7 of the PDF Reference was rewritten as ISO 32000, a document managed by committees under the auspices of the International Standards Organization. ISO 32000 is managed by individual representatives of interested parties in open meetings under parliamentary rules. Anyone can observe and participate. While they are obviously heavily invested in the outcome of the committee's decisions, Adobe Systems has only one vote at the table, the same as any other.

By now, the rulebook for PDF is relatively mature and precise in its language. It was not always so. Adobe's very openness – their willingness to let third-parties in to make their own PDFs before the PDF Reference was a mature document – was and continues to be a source of pain.

Three of five PDF viewers displayed this PDF incorrectly.

When millions of PDF files from hundreds of different applications started flying around, two major problems with the rulebook for PDF emerged.

First, while the Reference set rules it is not a cookbook; it included no recipes for how to create content on a PDF page.

Second, the Reference was ambiguous in some areas and left other matters under-considered, sometimes unaddressed.

When dealing with real-world documents, Adobe's software had to deal with these vagaries, so more rules were written; specific details of their implementation were crafted to address the issues encountered in the real world.

These new rules, however, were in the software, not the Reference. As the Reference developed, Adobe's implementation and the published rules began to diverge. It became possible to create a “legal” PDF file that otherwise perfectly serviceable software couldn't handle quite right (or handled dead wrong). In fact, because the early versions of the PDF Reference were so vague (relatively speaking), the range of possible oddities that were legal in a PDF was very wide indeed. A lot of sloppy PDF software was (and still is) written for this reason.

I remember discussing this problem with Adobe developers in the late 1990s. First and foremost, we all knew PDF had to be reliable. PDFs had to display the same way on-screen and in-print, no matter the platform. The problem with these “legal” but otherwise oddball PDF files was that if they displayed with problems in Adobe Reader, then Adobe (not the PDF's producer) would get the blame.

A pattern was established in which poorly-structured PDF files were roaming around in the wild, and that problem has worsened over time. As PDF has grown more popular, more and more applications of widely varying quality make bad PDF.

Adobe's solution was to engineer Adobe Reader to handle all the various oddball PDF files out there. It's one of the main reasons why Adobe Reader is a larger application to download and install compared to its rivals. Reader includes lots of code to deal with the thousands of different types of exceptions to “good” PDF that Reader users worldwide can and will encounter on a regular basis.

In their attempt to ensure that even the sloppiest PDF files still worked, Adobe created a situation in which developers could (and have) used Adobe's Reader as the reference implementation for their PDF software.

In 2010, there is still no alternative to Adobe Reader when it comes to validating third-party software.

As the vice-chair of ISO 32000, that bothers me, and if you're relying on the idea that PDF is indeed an International Standard in your organization, it should bother you too.

To make the final move in ensuring PDF is a durable international standard, Adobe should release their test suite of PDF files used to test Adobe Reader. This could take form in several ways, the simplest of which would be a collection of PNG images demonstrating the authoritative rendering of example PDF pages.

This test suite should be referenced in the upcoming ISO 32000-2, the forthcoming update to the International Standard for PDF.

When this step is taken will it become possible to validate the open standard of ISO 32000 without the proprietary Adobe Reader, an objective which is fundamental to the project of PDF as an International Standard.

Establishing an open test suite will make PDF truly an open standard in the spirit of Warnock and Geschke's letter. The advantages for consumers will be substantial. Adobe and software developers can produce conversion software to resolve the old files.

With no further excuse for sloppy code, non-compliant software will tend to die away, removing a major source of problems.

PDF will become truly reliable and based not only on an international standard, but one that may be readily validated.

Adobe will have begun the process of liberating itself from supporting old (and now invalid) PDFs, and will eventually be free to re-direct engineering resources away from propping up other people's software and into creative development.

I can't imagine a world without PDF; if it didn't exist it would have to be invented. PDF is indeed an open standard, but it's incomplete. It's time to finish the story and end the practice of making Adobe Reader a de facto reference implementation.

What do you think?

No comments: