Britain’s National Archive has embarked on a plan to make terabytes of government data locked up in mostly Microsoft proprietary file formats viewable to the public in their original form.
The National Archives, the repository for government records, has digital records in literally hundreds of esoteric file formats, however, the vast majority of data is stored in legacy Microsoft Office formats, said Natalie Ceeney, chief executive.
Changes in software and operating systems have made viewing those documents in their original format impossible, said David Thomas, director of technology and chief information officer for the National Archives.
"We're not building a museum of old computers here," Thomas said. "We want to make it [National Archive content] readable on current desktop technology."
Microsoft has offered its assistance for the National Archives to use Virtual PC 2007, a virtualisation product that allows multiple operating systems - as well as legacy OSes such as Windows 95 -- to run on a single piece of hardware. Microsoft is also providing older versions of Windows OSes as well as Office applications for the project.
The technology would offer the public the advantage of viewing documents in the form they were created, which can add context and depth, Ceeney said.
The National Archives receives much of its government information through a secure intranet, and that data is backed up to tape, Thomas said. Tape storage is the cheapest and the most robust way to keep data, so there are no old floppy disks around, he said. So far, the National Archives has about 580 terabytes of digital data.
Eventually, the National Archives hopes to have a system where a citizen could use a computer at the National Archives running Virtual PC 2007 and view, for example, an older Microsoft Word document in its original form. A further step would be creating a way where people could do that over the Internet, he said.
At the National Archives, in Kew, Microsoft argued hard for the default Office 2007 file format, Open XML (extensible markup language).