The European Bioinformatics Institute plans to cut its storage footprint by 70 percent over the next two years, according to technical services head Steven Newhouse.
EBI is part of the European Molecular Biology Laboratory, one of Europe's leading life sciences laboratories, which specialises in storing and analysing large sets of scientific data.
It manages 55 petabytes of data and receives about 12 million requests a month for life sciences data from global researchers. Data volumes grow by about 50 percent every year, Newhouse told ComputerworldUK.
Newhouse’s 50-strong team have already virtualised the institute’s compute and analysis, but over the next two years they will also virtualise its database infrastructure using Delphix’s ‘Data as a Service’ platform.
The institute has over 500 databases at moment, mainly Oracle but also MySQL, PostgreSQL and NoSQL, he explained.
The databases are hosted across three data centres: one main data centre in Hinxton for research and production, an external data centre in Hemel Hempstead for public-facing services, and a smaller disaster recovery data centre, according to Newhouse.
“Previously our database activities weren’t virtualised. We ran a large number of physical servers, had been virtualising that physical infrastructure, but had no tech that would help us do the snapshotting, cloning and so on of databases themselves.
“It [the Delphix platform] will make it easier to test the databases, in terms of taking snapshots of databases for developers to go off and alter,” Newhouse said.
The platform will save “a lot of money” on storage, reduce the amount of data shipped between data centres and “save about 70 percent of disc space”, he added.
The institute helps to support the 100,000 NHS-led genome projects across the country, Newhouse explained.
This research centres around personalised medicine, which focuses on treating patients not just by their general condition but by their specific genetic profile.
“That obviously has the potential to help target the right drugs to the right person for the right disease, and provide much greater cost effectiveness than conventional techniques at the moment.
“It’s about getting the volume of data to make statistical inferences on how diseases behave with different genetic profiles…it’s an area that’s growing rapidly,” Newhouse said.
“The cost of sequencing machines and their capacity has dropped dramatically in recent years. When the first human genome was sequenced fifteen years ago, it cost $3 billion dollars and took ten years to do. Now that can be done for about $1,000 dollars and takes days, if not hours.”