Predicting Ebola spread with HPCs paying off

Computers predict where West Africa’s health ministry and US Department of Defense should put resources on the ground to combat Ebola.


Infectious disease kills over 13 million people every year and Ebola alone has claimed over 9,000 lives since its outbreak in West Africa last year.

Behind the scenes, the largest - and deadliest - Ebola epidemic on record is being monitored by Virginia Bioinformatics Institute, which has been sending the US Department of Defense and West Africa’s ministries of health forecasts on vaccine production and real-time infection spread using big data analysis. This has allowed the ministry and US aid workers to decide where resources are needed to combat the outbreak.

The institute is a research body that studies genomes and disease to find vaccines, medicines and diagnostic tools and to better understand how disease spreads, using computational models, databases and bioinformatics tools.

While other scientists have similarly visualised which countries are most susceptible to an Ebola outbreak (one predicted that Europe is more likely to suffer outbreak than US, for example), this is one of the first that has been applied by aid workers in the fight against the disease.

In one instance, the US Department of Defense rang the research firm requesting advice on where to place new emergency treatment units within three days’ time. Scientists analysed variables like road infrastructure in West Africa, as well as outbreak “hotspots” to suggest where military transport planes should land.

“It was critical to provide answers within the Department of Defense’s decision cycle as a later answer was the same as no answer at all,” said Keith Bisset, senior research scientist at Virginia Bioinformatics Institute.

The computational model for outbreaks uses virtual cities on local, regional and global levels, “infused with a tremendous amount of demographics collected from census, power, mobile phone, transportation, social networks and other data,” said Bisset.

But it was the ability to expand its population framework quickly when called on by the US Department of Defense, that helped it model Ebola.

Flexible and scalable

To support the outbreak modeling, the research body needed a mix of computations, with compute and storage flexible enough to handle different workloads. It used four main simulation engines, varying in speed and granularity of its data collection. For example, one system checks social networks every second - working at a slower pace but offering more detailed information.

“Understanding all the nuances is the art behind which simulation engine we choose to perform different computational models,” Bryan Lewis, computational epidemiologist said.

The dedicated Ebola rapid response team of 30 needed to scale its storage to keep pace with this escalating growth at a moments notice. Just one instance of the population requires 10TB, and it may be run up to 15 times with varying simulations.

The dedicated Ebola rapid response team of 30 used a variety of HPC modelling tools developed in-house, including EpiFast, EpiSimdemics and Indemics an interactive SQL-based simulation query tool. The team also used Panda and Python open-source data analysis tools to forecast outbreaks and estimate what resources would be necessary in certain areas.

It used a variety of other applications like Sibel, a public health response tool, Epinome, a public health training tool, SIV population visualisation, Flu Caster, a crowd-sourced surveillance tool and Virus Tracker, a live action simulation application.

To support this, and to scale up when necessary, VBI used a Shadowfax computer cluster with 2,500 cores and almost 1PB of high performance storage to create a parallel file system built on two DDN Storage Fusion Architecture GRIDScaler appliances.  

The research firm hopes to make its synthetic global population and other informatics research available to other researchers globally. Currently, it shares the model with 75 other scientists but it hopes to use DataDirect Network’s WOS Object Storage to efficiently share, protect and distribute huge volumes of its data, including unstructured social analytics like Twitter.

Image: Ebola response training by the US Department of Defense. Credit: Flickr/Army Medicine

"Recommended For You"

Are e-health records at fault for Ebola mistakes? Mobile testing for Ebola gains renewed urgency as outbreak grows