Old file formats battle extinction threat

The European Union is co-funding a project to prevent application file formats becoming extinct.


The European Union is co-funding a project to prevent application file formats becoming extinct.

A group of national libraries, research institutes, IBM and Microsoft has formed the PLANETS (Preservation and Long-term Access through NETworked Services) consortium to avert a threatened digital black hole as electronic document file formats fall out of use and vanish. Documents created by outdated and out-of-use applications may no longer be readable and as their file formats are, in effect, extinct. This problem will get worse in the future.

For example, IBM, Wang, Digital Equipment and other vendor's original word processing file formats are no longer used. Unless modern word processing packages can import them or emulation software can access the files the electronic records are unavailable.

This problem is far larger than the inexorable replacement of storage media, such as the floppy disk by newer media. That problem can, in theory, be simply solved by copying everything to Plasmon's UDO which has a 50-plus year lifespan. But there is no point in doing this if, after fifty years, the original software needed to read and display the stored information no longer exists.

Adam Farquhar, Head of e-Architecture at the British Library, which is leading the PLANETS Project, said: “As past and current computer hardware and software becomes obsolete, digital information reliant on this technology becomes increasingly hard to find, view, search and re-use. There is a growing consensus on the need to act now to avoid a gaping hole in our cultural and scientific record.”

A PLANETS statement says 'The PLANETS consortium estimates that EU member countries produce around 5 billion documents per year; of this total, around 2 percent (100 million documents per year) comprise information that is worth archiving. Around 2 million documents out of this sub-total are held in formats that constitute a long-term preservation risk. Taking into account the production costs of these documents – along with the estimated worth of the information to others – many millions of Euros-worth of information currently languishes in endangered formats.'

The EU has agreed to help back the PLANETS project for four years so that it can devise and deliver a way around this problem. It will contribute 8.6 million of the total project spend of 14 million euros.

A PLANET presentation mentions the IBM concept of a Universal Virtual Computer (UVC). This is based on a research paper by Raymond Lorie of IBM's Almaden Research Centre.

This paper states: "an application program written today would generate a data file, which is archived for the future. In order for the file to be understood at a later date, a program P would also be archived, which can decode the data and present it to the client in an understandable form. Program P would be written for a UVC machine. ... In the future, the UVC Interpreter interprets the UVC instructions that emulate the old instructions; that emulation essentially produces an equivalent of the old machine, which then executes the original application code. The execution yields the same results as the original program."

The UVC is actually software which would run on a current or future computer system. A Wikipedia note on the UVC states: "in contrast with normal (virtual machines) the UVC is designed to be universal. That is, it offers a platform independent layer that will always remain the same. In this way, programs developed for the UVC are guaranteed to run anytime, in present and future."

The process requires that file formats used for storing electronic documents - meaning text, images, sounds, music, video, etc. - must be understood and documented so that decode software can read a stored file and represent it faithfully via the UVC.

Farquhar says national libraries and archives across Europe have a legal responsibility to safeguard digital information. They must provide sustained access to cultural and scientific knowledge and need to act to ensure that today’s digital information will be accessible for future generations.

The PLANETS project is their way of preventing widespread digital document extinction.

"Recommended For You"

Sun tape ascends to digitising video What Was Gordon Brown Thinking this Morning?