Learning from Openness: Partial Sharing is not Enough

Recently, I wrote about the Open Humans Network, which aims to make it easier for people to share various forms of biological data. That's become a hugely important area; the hope is that by applying big data analytical techniques to the increasingly large stores of genetic data now being produced by low-cost sequencing techniques, it will be possible to come up with personalised medicines designed for a specific genetic make-up, rather than average ones as now.

Share

Recently, I wrote about the interesting Open Humans Network, which aims to make it easier for people to share various forms of biological data. That's become a hugely important area; the hope is that by applying big data analytical techniques to the increasingly large stores of genetic data now being produced by low-cost sequencing techniques, it will be possible to come up with personalised medicines designed for a specific genetic make-up, rather than average ones as now.

However, as an interesting editorial in Nature notes, for these personalised genome projects to thrive, they need to share their data. That's beginning to happen, but with some bumps along the road:

The principals behind one genetic data-sharing project unveiled last week have described their initiative as a model of “scientific openness” that offers “broader access” to genetic data. Indeed, the name of the project — BRCA Share — trades on the idea of data freedom. The initiative focuses on clinical data concerning mutations in the genes BRCA1 and BRCA2, which increase risk of breast and ovarian cancer.

Alas, BRCA Share's idea of sharing is not what most people in the world openness would consider the term to mean:

[BRCA Share] will not share data with similar efforts such as ClinVar, a US National Institutes of Health-funded initiative that is making linked genetic and medical data publicly available for all. Quest says that BRCA Share cannot contribute to ClinVar because its data are structured differently.

That's clearly just an excuse, and a pretty feeble one at that: transforming structured datasets from one format into another is bread-and-butter stuff for data scientists. As Nature comments:

The episode showcases an uncomfortable truth about personalized medicine: everyone agrees that large data sets are crucial, and everyone is racing to collect them. The larger the data set, the more useful. The most useful of all would be one huge database containing all available data. But even though all parties recognize the value of it, many are choosing not to share, and this holds back medical progress.

You might have hoped that people would have learned the lesson taught by three decades of free software that everyone benefits when information is freely and fully shared. But it seems that some are still reluctant to let go completely.

Follow me @glynmoody on Twitter or identi.ca, and +glynmoody on Google+

"Recommended For You"

Data science and machine learning jobs: British Gas, Comparethemarket, Barclays Microsoft takes computer science into fight against HIV