Open data: where the movement started and where it’s headed

giuseppe sollazzo2
© Giuseppe Sollazzo

Giuseppe Sollazzo, Senior Systems Analyst at St George's, University of London and open data expert, explains the history of open data, barriers to opening up data and thoughts on its future.

Share

What do hurricanes and massive lumps of fat have in common? The answer is: open data.

The following two stories illustrate where the wider data agenda is going.

The defining moment of the open data movement

The eleventh named storm of the 2005 Atlantic hurricane season, Katrina hit the coast of Louisiana with immense destructive force. The devastation it brought to New Orleans was hailed as the “worst catastrophe in U.S. History” by Michael Chertoff, then Homeland Security Secretary.

The impact was made worse by a slow reaction from a highly unprepared government. In what was described as an ‘almost stateless’ context, volunteers in the region began to organise relief.

With the goal of finding out which areas were most affected, an army of geeks scraped any piece of information from government websites that could be useful: building permits, population estimates, the parcel layer, and so on. The data was assembled on spreadsheets and shared. Open data was born: a community-led effort to take information from the state and give back to the community.

New York City goes data-first

According to Wikipedia, Fatbergs are 'congealed lumps of fat, sanitary wipes and similar items found in sewer systems'. Although the future might see fatbergs mined as biofuel, they currently represent a huge public health issue as they clog city sewers. The New York Times reported: "The city has spent more than $18 million in the past five years on wipe-related equipment problems. The volume of materials extracted from screening machines at the city’s wastewater treatment plants has more than doubled since 2008, an increase attributed largely to the wipes."

One of the causes for this trend in New York is the impact of waste management regulations, with their difficult bureaucracy and high costs. Combined with an estimated 25,000 restaurants, 11,000 of which just in Manhattan, this is a recipe for disaster.

In 2013, John Feinblatt and Michael Flowers, respectively City Hall's Chief Planning Advisor and Chief Analytics Officer, decided it was time to think differently. They realised that inspectors could not possibly be sent to 25,000 restaurants; so they opted for data. As reported by the Financial Times, a small gathering of data scientists was given access to the 60 data sources, both open and private, that constitute intelligence on this problem: waste management, pollution, restaurant licences, and so on. The scientists mined the data and produced a "treasure map" of culprits: a heat map of restaurants likely to be responsible for illegal dumping of oil.

The result of this success? A massive programme of transformation in government operations in New York City: similar techniques are now being used to catch tobacco smugglers and to prevent crime. Success rates are up.

The two waves of open data

There is a clear tension between these two open data stories. It is about a shift in emphasis from "open" to "data". Open data is no longer just a community-led affair; it is becoming an increasingly complex business that has got leadership involved and produced changes to capability and organisational structure.

Open data has come in two waves: the "open" wave and the "data" wave.

In the former, transparency and community activism were the major motivators. As such, the demand for data was mostly centred on spending and other datasets that were used very little by the public sector itself. In many cases, the datasets were assembled solely with the goal of release, presenting huge accuracy issues as a result. The first wave of open data was characterised by a sequence of isolated data releases collected in data catalogues like data.gov and data.gov.uk.

The second wave is about data release processes and frameworks, and about structures of data accountability, quality assurance, and data authority.

The Health and Social Care Information Centre has a very well documented process around data creation, use, and release, although they are still a unique example. The future will see more of this. Regardless of the ethos of access to information that inspired early open data releases, the major motivator in this wave is the effective running of data-based operations. Consequently, accuracy will cease to be the main worry, while ethical issues, especially privacy, will emerge.

The second wave is not quite there yet

The transition to the second wave is happening but far from complete. The vision of Government-as-a-Platform and the discussions about a UK National Information Infrastructure point clearly in this direction. But there are still some huge barriers that need to be addressed.

Licensing, for example, is still quite confused (and confusing). The public sector most commonly uses the Open Government Licence. What happens when third-party data is found in datasets released in the wild is still unknown, and resulted in embarrassment for organisations more than once.

The 2013 Secretary of Justice Code of Practice on Datasets goes as far as encouraging the use of the Non-Commercial Government Licence (a more restrictive version of the OGL), and creates the expectation that UK Public Authorities should charge for datasets released as responses to FOI requests. Ordnance Survey, despite its recent embracing of the OGL, is still causing policy headaches.

The need for clarity

Where does this leave the community? There is a dire need of clarity. A layered licensing approach, more similar to the Creative Commons approach, could be a good rabbit-in-the-hat, but will achieve little without a clear and sound statement on public bodies’ statutory obligations regarding open data.

 The FOI Commission might give some indications in this respect. Such an approach requires skills and capability building, and of course data-first yet user-centric operations.

Paul Maltby's appointment to Director of Data in the Cabinet Office is welcome news, but the civil service is, again, without a Chief Data Officer. Open data still lacks many satisfying examples of positive uses of datasets, and the community needs to concentrate on this as a way to keep focus. Most importantly, data as a whole is often a taboo topic for the general public, surely a task for the next CDO to face.

What next

The future is already sketched out, and the community is alive as ever.

Open Data Camp, an unconference that gathered over 100 people in Winchester last year, will take place on 10-11 October 2015 in Manchester, where Trafford Council - the hosts - have established an "intelligence lab" tasked with both reviewing the council's internal procedures and using data as a way to engage with the local community through captivating "data surgeries".

Despite the difficulties, the confusion and the slow pace at which things often happen in the civil service, the country is moving to a more data-based operational model – and the increasing demand for transparency could strengthen the agenda in unexpected ways.

Giuseppe Sollazzo is Senior Systems Analyst at St George's, University of London and an expert on open data.

Find your next job with computerworld UK jobs