In Praise of Modularity

Share

One of Linus Torvalds' greatest contributions to free software – and, indeed, to software in general – came about purely by chance. As he told me back in 1996, as he reflected on how the Linux kernel had come about and grown:

the way the whole system is set up - which ... happened kind of by itself - is that if you change one driver it's really so localised that it should never impact anything else.

The reason was simple:

It has to be so, when there's people all over the world doing this. I can't meet them - there's no weekly brainstorming session where everybody gets together and discusses things.

In other words, the fact that he was working from his bedroom in Helsinki, with a team of coders around the world, whom he had never met, linked by the Internet, meant that the work on the kernel had to be parcelled out in a very precise way. This, in its turn, meant that the interface between the parcels had to be kept very clean – there was no scope for “fudges” to make stuff work.

The knock-on consequence of this well-organised code was that new modules could be swapped in extremely easily. So when somebody came up with a better solution to a particular part of the kernel it was relatively easy to adopt it. This drove an extremely rapid evolution of the code while preserving great robustness.

Modularity, and the re-use it allows, have since become two of the hallmarks of open source software. Indeed, the idea of modularity has proved so successful, that it has been applied at the next level up, in the creation of distributions. As opposed to the monolithic approaches of Windows, say, where elements are tightly integrated across packages and applications, the free software approach is to keep everything very loosely coupled so as to allow maximum flexibility, but without sacrificing compatibility.

The idea has also been successfully applied far beyond software. Perhaps the best example is the Human Genome Project, started in 1990 with the aim of elucidating the 3 billion or so DNA “letters” that go to make up the totality of human genomic information.

This huge project only became possible when the work of sequencing different chromosomes was apportioned to different laboratories around the world, connected across the Internet, each of which broke down individual chromosomes into tiny DNA fragments that were then stitched together before being uploaded to a central genomic database. The modular nature of DNA meant that code could always be resequenced if necessary (for example, to improve the accuracy, or to provide a check) and simply dropped in to the overall genome, just like a software module.

Knowledge, too, can be modularised, as this wiki page from the Open Knowledge Foundation suggests:

At present knowledge development displays very little componentization but as the underlying pool of raw, 'unpackaged', information continues to increase there will be increasing emphasis on componentization and reuse it supports. (One can conceptualize this as a question of interface vs. the content. Currently 90% of effort goes into the content and 10% goes into the interface. With components this will change to 90% on the interface 10% on the content).

The change to a componentized architecture will be complex but, once achieved, will revolutionize the production and development of open knowledge.

At the opposite end of the modularisation spectrum is the pure data set. Here there is no “interface”, to use the Open Knowledge Foundation's term, it is all content. Large-scale databases not only fail to tap into the benefits of modularity, they have a huge security downside: since they are conceived of as a single, huge mass of data, they tend to be moved around as such – and lost as such, as the recent mislaying of 25 million family records by HMRC demonstrated only too painfully.

Against this background, it is striking that the Data Sharing Review Report, written by the Information Commissioner, Richard Thomas, and the Director of the Wellcome Trust, Mark Walport, and released last week, seems to take no account of this dimension. Here's what it has to say on the subject:

"Recommended For You"

Cleveland Clinic uses IBM's Watson in the cloud to fight cancer Coming to an ID Card Near You: Your DNA