Unlike many of my journalistic confrères, I did not seize on this when it came out: “OOXML and Office 2007 Conformance: a Smoke Test”, even though the following tantalising result emerged:
Such a test is only indicative, of course, but a few tentative conclusions can be drawn:
Word documents generated by today's version of MS Office 2007 do not conform to ISO/IEC 29500
Making them conform to the STRICT schema is going to require some surgery to the (de)serialisation code of the application
Making them conform to the TRANSITIONAL will require less of the same sort of surgery (since they're quite close to conformant as-is)
The main reason I did not join in the playground cries of “nah-nah, I told you so” was because I noticed this at the end of the same post:
To repeat the exercise with ISO/IEC 26300:2006 (ODF 1.0) and a popular implementation of OpenDocument. Will anybody be brave enough to predict what kind of result that exercise will have?
Again, only tentative conclusions can be drawn from a smoke test (readers unfamiliar with this term as applied to software testing are recommended to read the Wikipedia article on it before grumbling about the depth of the test, please).
For ISO/IEC 26300:2006 (ODF) in general, we can say that the standard itself has a defect which prevents any document claiming validity from being actually valid. Consequently, there are no XML documents in existence which are valid to ISO ODF.
Even if the schema is fixed, we can see that OpenOffice.org 2.4.0 does not produce valid XML documents. This is to be expected and is a mirror-case of what was found for MS Office 2007: while MS Office has not caught up with the ISO standard, OpenOffice has rather bypassed it (it aims at its consortium standard, just as MS Office does).
Any fool could see that I coming – and I offer myself as proof. But it takes rather more than foolishness to critique what exactly these results mean, since it hinges on extremely subtle (and – whisper it – rather dull) technicalities about what should be tested against what and how.
Fortunately, Rob Weir either finds this kind of stuff deeply exciting, or is encouraged by his employer, IBM – an ODF supporter – to get stuck in anyway. And my, how he gets stuck in:
In the end we should put this in perspective. Can OpenOffice produce valid ODF documents? Yes, it can, and I have given an example. Can OpenOffice produce invalid documents? Yes, of course. For example when it writes out a .DOC binary file, it is not even well-formed XML. And we've seen one example, where via a conversion from OOXML, it wrote out an ODF 1.1 document that failed validation. But conformance for an application does not require that it is incapable of writing out an invalid document. Conformance requires that it is capable of writing out a valid document. And of course, success for an ODF implementation requires that its conformance to the standard is sufficient to deliver on the promises of the standard, for interoperability.
It is interesting to recall the study that Dagfinn Parnas did a few years ago. He analyzed 2.5 million web pages. He found that only 0.7% of them were valid markup. Depending on how you write the headlines, this is either an alarming statement on the low formal quality of web content, or a reassuring thought on the robustness of well-designed applications and systems. Certainly the web seems to have thrived in spite of the fact that almost every web page is in error according to the appropriate web standards. In fact I promise you that the page you are reading now is not valid, and neither is Alex Brown's, nor SC34's, nor JTC1's, nor Ecma's, nor ISO's, nor the IEC's.
So I suggest that ODF has a far better validation record than HTML and the web have, and that is an encouraging statement. In any case, Alex Brown's dire pronouncements on ODF validity have been weighed in the balance and found wanting.
Now, as far as I am able to follow the intricacies of Weir's argument, ODF seems to emerge pretty well from his analysis. But that's not really what interests me here. The point is standards and conformance to them are incredibly fiddly, complicated things, best left to expert who can argue over the niceties. And precisely because they are complicated, and the issues are subtle, the average IT manager on the Clapham Omnibus is going to be hopelessly overtaxed by the whole area – just like me.
What this means in practice is that Microsoft will be able to get away with blue murder, by deftly moving between all the different kinds of standards – ECMA standards, putative ISO standards, de facto standards etc. - until people charged with procuring office software will simply throw their hands up and sign on the dotted line for another ten years of Microsoft Office bondage.
Now, you might argue that the ODF side can play the same games, and I agree that some of the more, er, commercially-minded outfits might well be tempted. But there's a big difference from the OOXML world in that ODF is today a key part of the free software world. As such, there are crates of nitpickers and argumentative technical pub bores who really care about ODF and its inner wonders, and will delight in pouncing on such inaccuracies – not least because there is no love lost between them and the commercial side of things. If Sun or IBM or anyone else misbehaves, somebody will spot it, and blog about it.
OOXML, on the other hand, is essentially a product of one company, with practically no open source community around it (Novell hardly counts). This means there are unlikely to be many voices from within that community to point out inconvenient truths about OOXML conformance and suchlike, and how it really fares compared to ODF. That will be left to the few people on the outside who are both willing and able to provide critiques backed up by personal research – people like Rob Weir.
Checking standard conformance is a dirty job, but someone's got to do it.