In 2012, the British Film Institute launched an ambitious programme to digitise archive film footage in a bid to preserve Britain's moving images history. 'Unlocking Film Heritage' would see more than 10,000 films from regional and national archives being digitised, preserved and ultimately made available to the public.
It is no exaggeration to say the work the BFI archivists are undertaking is preventing film heritage from being lost in time. Without the appropriate technology, tools and skill sets, the content that is on the 'carriers' - industry lingo for the objects that store the content - will be locked there forever. The team has a phrase for the phenomenon: "stranded in the analogue domain".
"All video tape formats are effectively obsolete, unsupported, not manufactured, so it's a race against time to digitise the collection from the BFI plus all the collections from around the UK, to get them off of those videotapes at high volume, as quickly, and cheaply as we can to preserve them in the digital domain," Stephen McConnachie, head of data for collections and information at the BFI explained to Computerworld UK.
The original Unlocking Film Heritage five-year plan started with a lengthy procurement process that led the BFI's archive team to choose tape storage solutions from SpectraLogic, brokered by system integrator specialists from OvationData.
"We went through this procurement for that five year public project to digitise 10,000 films and we started building a solution that is really a bunch of systems, so we tend to call it an infrastructure," McConnachie says.
When McConnachie finished his master's degree in film and television in 1995, he travelled to London and got a job working with the BFI, mostly opening film cans, looking at what was in them, documenting them, and adding them to the collection - the analogue world of moving image archiving.
But he gradually became more interested in the systems that document and manage the content, so he "went from being the person opening the can to look at the film, to being the person developing and managing the systems those people use," before his current work in preserving those "1s and 0s" created from the "old, delicate films and videos".
Some gems he draws attention to are a Victorian-era striptease shot in a living room 'set' outside due to lighting and filming restrictions of the time, bringing to life a period of history that is usually confined to the pages of books.
The BFI also found a surprise hit in a film that documented the development of Milton Keynes, with thousands of views across the BFI's various web channels.
Once the procurement process for Unlocking Film heritage was settled, the team set about building, trouble shooting, and configuring the infrastructure, which took about two years. When it was all working, they began an 'ingest' project to fill it up with that national collection data, which took another two years, amounting to about 2 petabytes of data at present. This project saw archivists bringing together and digitising regional and national film from stores all over Britain.
This initial five-year project drew to a close in 2017, but the BFI has since started a new preservation programme called Heritage 2022, which aims to digitise video tapes, and which will reuse the infrastructure first created in the Unlocking Film Heritage programme.
Two Spectra Logic T950 tape libraries were installed at a BFI site in Hertfordshire, physically separated for resiliency reasons and joined by 10GB per second optical fiber cable.
They are configured with LTO-6 tape drives and media as well as IBM TS1150 tapes for "media diversity" bringing additional resilience to the table instead of counting on just the one technology being supported. One of the libraries can scale to store 20PBs without expansion, somewhat future proofing the systems for, the BFI hopes, between five and 10 years. But more on that later.
A major problem for archivists is the sheer size of the files created for digital preservation. One film in 2K can reach 2TBs, and as the quality of the films increases with 4K and beyond, the file sizes increase too.
"When you scan a film you take a really big resolution photograph of every frame, and that essentially is our preservation master," says McConnachie. "There's a clone of that which exists in the two data tape libraries. We create what we call a proxy, and that proxy is a lower bitrate, lower quality, much smaller viewing copy.
"For everything we ingest into our system, we create that access copy, and that is stored in a big NAS - a huge storage server in the middle of our infrastructure. That stores them online in a big spinning disk server and that makes them available to the internet with loads of access web applications," McConnachie says, adding that there are facilities in the BFI Southbank where the archive material can be watched on screens by the public.
Image by James Cumpsty
Meanwhile, a rest API interface for the SpectraLogic systems called BlackPearl acts as a gateway between the BFI team and the tape libraries themselves, which McConnachie says was a "massive benefit" because it drastically simplified the ability of the BFI team to write its own applications, interact, and integrate with the system.
"We can now put data in and get data out and ask questions about the data," he explains. "That's kind of revolutionised our workflow - we are able to build huge pipelines of automations and really fast processes in ways that would have been very expensive before in different models. We'd have to pay companies to build integrations. But now we can do it ourselves."
Race against time
The "mammoth" task of rescuing film from obsolete formats is unsurprisingly not without its challenges.
"Literally 30 years before [obsolete consumer video format Betamax] there was the broadcast sector's video formats that were used to create television programmes," McConnachie says. "They were used to broadcast them, they were used to store them.
"So you have hundreds of formats, all of them completely obsolete. And by obsolete what we mean is you cannot buy the machines to play them on the market. All of the manufacturing has stopped. In other words, you can buy the machines second-hand - and if you have the technical skills to keep them going you can keep digitising."
But there is of course a limited supply of obsolete machines. The machinery and the spare parts are extremely difficult to track down, leading to the team scouring second-hand sellers on eBay to buy up the remaining stock - keeping it stored away in preparation for the inevitable wearing, tearing, and other kinds of breakdowns.
The layers on video heads, for example, wear every time a tape passes across them. Then there are the mechanics that move the tape, and the buttons that operate the machines.
"All of that wears out," says McConnachie. "As you digitise hundreds and hundreds and hundreds of things it wears and wears and wears, and if you can't buy parts then you have to cannibalise machines you have in stock to replace parts."
Image by James Cumpsty
"The second major challenge is the skills required to take those machines apart, replace parts, and clean them. They need constant cleaning, because materials come off the tape. In other words, the skills to repair the tape are also now 50 years old.
"What that means is as broadcasters stopped using tape - they're all file-based now - as storage tape usage stopped, people retired, or left the business, or found different work, so what you have is not just a machinery gap but a skills gap.
"There is a retirement tidal wave hitting the sector, where all of the people who are broadcast engineers and technicians for 20, 30, 40 years are approaching retirement age. That's a big challenge for us because we have a lot of tapes still to digitise so what we try and do is get the people with these skills to train younger staff in their teams, pass down their knowledge and document it.
"But it's very complex because there's hundreds of formats with hundreds of machines. That's our challenge, with not only hardware but with skills."
According to McConnachie, the BFI had to skill up staff within the BFI, as well as recruit from outside the organisation, change department structures, and job descriptions to meet all of these challenges.
But the work is fulfilling, even when these problems rear their heads thick and fast. When the team solves them and the pipeline is functioning as it should, there is a "humming factory of digitisation, documentation, ingest preservation, and access" for future generations to enjoy.
"When you see that whole A-Z of the lifecycle it's really satisfying. When you see it running smoothly, you see the film on the shelf, you barcode scan it, you get it to its digitisation room and it gets turned into 0s and 1s and they go into a system... then you see someone at the Southbank site sit down, and watch that film on a screen. That's a very satisfying job."
The anatomy of a film
To scan a film and thus digitise it means using a machine that operates in a similar way to a projector, but instead of running a light through the machine to display the images, the device captures each frame with an ultra-high resolution photograph. In the end, this 'scanner' machine could result in 150,000 ultra high-resolution images for a single feature film.
Image by James Cumpsty
"Imagine storing 150,000 very high quality images to represent a single feature film," says McConnachie. "You really quickly learn that's a massive undertaking, even to move 150,000 images in one safe block across a network, you can very easily cause problems and performance issues, your network can be hugely slow, you can drop pixels or bits. So you have to be really careful when you're moving every bit of every pixel of every image from every frame of the film, safely and securely.
"It's a really complex object to preserve. You have to make sure you're doing that right, because if you do it wrong you've spent a lot of money doing all the work to preserve something that comes out unusable."
The 10GB per second connection that the BFI built with OvationData helps move those high-bitrate files across.
The BFI deals mostly in extremely large formats that are uncompressed, unlike the files most consumers will be familiar with. That leads to yet more complications - the organisation had to procure an enterprise-grade firewall that was able to handle such heavy traffic, especially because at the end of the pipeline the files are prepared for free access on the web.
If preservation is about sustaining the past to be enjoyed by generations in time to come, there are intrinsic challenges around future-proofing any systems that are built or procured, especially in the fast-moving world of digital.
"When we procure the systems we give ourselves the ambition to buy and build something that would be reasonably sustainable for a 20-year window," says McConnachie. "When we proposed a 20-year number, people were very cautious because the sector moves so quickly, the domain moves so quickly, that even a 20-year assumption is very ambitious. Some of these technologies come and go within 10 years.
"You have to be really careful as an archivist - we have got films in the collection from 1895, so when you think how long we've kept the physical things safely and make them usable, you have to think: are we confident we are doing this correctly to the point where in 120 years these digits will be reusable?"
But 20 years in digital preservation is "like an infinity". The technology won't remain static, with minor refreshes planned for every two years and potentially large rejuvenation projects every five years.
"The basic architecture, in other words, the data tape libraries with some of their components - we're hopeful we can extend them to a 20-year window before we really need to tear it up and start again," he says. "Put it that way."
However, a useful component to the SpectraLogic solution is that previously mentioned BlackPearl interface, so data can be easily retrieved and even handled with automation using scripts. For the BFI, the fact that its own developers could work with the system without outside help was a major advantage of the SpectraLogic machines.
"One of the things you do when you build a digital preservation system is you plan for divorce," says McConnachie. "In fact you write a pre-nup before you get into the business, because you know no system in this domain will be forever. We kind of wrote the pre-nup by buying open, standardised technologies... so if we make it to 20 years I think we'll be very happy."