Computers will struggle to support genetic research for public health services

A shortage of programmer skills may hinder the application of genetic research in treating mass populations for disease.


Fifteen years since scientists first mapped a human genome, the public health service is adjusting to the change in culture toward DNA-based treatments. Already, some breast cancer therapies can be eliminated based on a person’s genetic makeup (as they may not work on some mutated genes, for example).

Public fascination over what our genetic makeup might spell for our health hit a high in 2013 when Hollywood actress Angelina Jolie had a mastectomy after realising she carried a BRCA mutation that put her at risk for breast and ovarian cancer.

But while genome sequencing - the mapping of a genome to help scientists understand how genes influence a person’s health - has entered the mainstream, a computational skills shortage could hinder its wider application in public health services.

This is because “old-school” programmers who can work creatively with high performance computers are a dying breed, warns Dr Robert Esnouf, head of the research computing core at the Wellcome Trust Centre for Human Genomics.

The number of engineers who understand how to grow computer systems beyond a few boxes is a “fairly small pool of people,” Dr Esnouf warns, “and the seriousness of that is just about to emerge.”

Genome sequencing requires robust computer systems. At Oxford University, where the Wellcome Trust’s genomic centre is based, Esnouf’s High Power Computer (HPC) consists of two blade-based clusters, a dedicated Infiniband network and DDN GRIDScaler storage system that supports the work of over 100 researchers, who use it simultaneously.

The department produces over 500 genomes a year - each one using about 30GB on disk when compressed. It stores around 20,000 genomes taking up half a petabyte.

The system works well for academic purposes and the budget constraints of a university - even if it is one of leading faculties in the world.

But genetic research like this could only be applied in a clinical setting (for the mass population) if these systems are able to scale. Dr Esnouf warns that programmers capable of writing that code may be nowhere to be found in the future.

“There is a real requirement to build better algorithms - algorithms that scale to huge population numbers that spread across thousands of computers.

“That will be a bottleneck in applying genetics in clinical situations because those algorithms will have to be built completely from scratch,” Esnouf says.

Death of mainframe

“I tend to feel I’m rather old-school,” Esnouf admits. “We were born and bred on spectrums and BBC Microcomputer, for example. You really got your hands dirty, felt how a computer ran and how it worked.”

But today’s computing training focuses heavily on iterations and writing lines of code correctly rather than ensuring it can perform.

The more slapdash approach, Esnouf says, allowed greater creativity as engineers pushed small computers to their limits.

The entrance of code to the classrooms is a step in the right direction, but a move towards app design rather than mainframe engineering is a sugar coating for a mounting skills gap.

“Things that get children or students interested in programming is thinking of ways to get machines working – there is huge creativity where there is that slightly different way of thinking that makes a spectacular programmer – and that hasn’t been at the forefront of teaching.

“Courses are concerned with algorithmic correctness and unit testing and object oriented coding and good practice to reduce the number of bugs, which are all terribly important, but the performance aspect and how it maps onto hardware architecture has been neglected,” Esnouf says.

The first sequenced genome cost over $3 billion to produce in 2000. Now, there are machines that can produce 18,000 a year on the market at a fraction of the price.

The more we learn about common mutations that result in diseases like cancer, the more data scientists will need to process, in comparison to the rarer genetic diseases that were first sequenced during the infancy of the technology. The struggle for computers to support that will be the main challenge for making disease avoidable in the future. There are new technologies on the horizon that will revolutionise our understanding of our genes, including looking at sequences of a single molecule of DNA - eliminating guesswork about where change is happening within gene pairs. Applying that to the mass population is in the hands of programmers to pass those skills on.

Image: Angelina Jolie took the decision to have a mastectomy following the discovery she carried a gene mutation that often causes breast and ovarian cancer. Flickr/Gage Skidmore

"Recommended For You"

Mike Lynch invests in medical analytics firm Cancer Research targets human genome breathrough with supercomputer