Last month I wrote about the UK government's "Making Open Data Real" consultation. That's actually just the first part of a double-headed enquiry into open government data. The other part concerns "Data Policy for a Public Data Corporation" (PDC).
As you may recall, this PDC is a key – and potentially contentious – element of the UK government's generally forward-looking approach to opening up. The question is to what extent some public data will not really be open, but sold – and whether private enterprise will be allowed to profit from that fact.
As the Executive Summary to the present consultation explains:
In January this year, Government announced its intention to create a Public Data Corporation (PDC). This would bring together data-rich organisations with the aims of:
providing a more consistent approach towards access to and accessibility of public sector information, balancing the desire for more data free at the point of use whilst ensuring affordability and value for taxpayers;
creating a centre of excellence driving further efficiencies in the public sector;
and creating a vehicle that can attract private investment.
The document raises questions under four main headings: PDC approach to access and release (which has already been largely addressed in the preceding consultation); Charging for PDC information; Licensing; and Regulatory oversight. Each section concludes with a number of questions touching on those areas.
Given its importance I would urge you to read the (mercifully short) consultation document, available online and as a PDF download, and then respond, either via an online questionnaire or via email. But you'll need to hurry – deadline is 27 October. As usual, I've added my response below.
In the following response to the UK government's "Data Policy for a Public Data Corporation", I have concentrated on three sections: Charging for PDC information; Licensing; and Regulatory oversight. I have not commented on the PDC approach to access and release section because I have submitted my thoughts on this area to the related "Making Open Data Real" consultation.
The key issue of Section 4 is the economics of open data: how should datasets that cost non-trivial amounts to gather be made available? It seems to me that there is a danger in trying to combine two distinct approaches here: making open data freely available, and trying to make money from it directly. This will end up being unsatisfactory from both viewpoints.
Instead, it would be far better to make all the data freely available, for any kind of use (including commercial): that would maximise the chances of a rich ecosystem growing up around it. That, in turn, would generate increased taxes – indirect revenue, rather than direct. If the data is not completely open – for example, if there are constraints on what can be done with it – the ecosystem that grows up around it is likely to be constrained, and the tax revenues far less.
A good example of this approach would be the genomic data produced by the Human Genome Project (HGP). Following the Bermuda Agreement, this was all placed in the public domain as soon as it was gathered – that is, no attempt was made to derive income from it directly. The wisdom of that approach has been shown by a study from Battelle Technology Partnership Practice, which estimates that the $3.8 billion investment in the HGP has resulted in a $796 billion economic impact, and created 310,000 jobs (report available from http://www.genome.gov/27544383).
Of course there is no guarantee that releasing government data freely would have quite such a multiplier effect, but it is important to bear in mind that trying to save money by locking down data may be a classic example of penny-wise, pound-foolish.
Moreover, just because all the data is made freely available does not mean the organisations like Ordnance Survey or the Met Office would be unable to generate income from it. As has been shown many times in other fields where information is given away, this simply means that the opportunities to make money move to services built around that information. The key point is that the organisations gathering that information typically have a first-mover advantage; coupled with a greater depth of analytical skills, that places them in a very strong position to compete with external users.
That's particularly the case for organisations like the Ordnance Survey or the Met Office, which have world-class reputations. They could build on these to offer higher-level services based on data that might well include material from other sources as well as their own.
This approach also greatly simplifies the licensing issues discussed in Section 5. If as much data as possible is made freely available under a liberal licence (such as the excellent Open Government Licence), then many of the complications discussed in this part of the consultation disappear. This simplicity is another powerful reason to aim for full data release.
In terms of section 6's questions about regulatory oversight, this might usefully be coupled with the role of an Open Data Commissioner raised in the earlier consultation. Making PDC data freely available under liberal licensing simplifies questions of oversight, since they can then be subsumed into the more general issues of open government data. Again, simpler solutions come with many ancillary benefits.
To summarise: the example of the Human Genome Project has shown how making even evidently valuable data freely available can kickstart the creation of an enormous associated ecosystem, with huge knock-on benefits in terms of employment and tax revenues. Moreover, making data available for commercial re-use, for example, would also feed into the UK government's desire to promote an Internet start-up culture in the UK.
It is increasingly being recognised that data is the key raw material for digital startups; restricting its flow through complicated licensing will simply place obstacles in the way of entrepreneurs. Providing as wide a range of public data as possible will, by contrast, inspire more startups to explore ways of building on that public resource and creating an ecosystem around it (as has already happened in the US in the case of open geodata.) The more such startups there are, the greater the likelihood of an entrepreneurial spirit taking root in the UK, with further positive knock-on effects for the wider economy.