Open the data, save the world


I am a bad blogger. I write the  start to so many posts and don’t finish, but as I was walking to work this morning, listening to a song called Storm by G!YBE a thought occurred. ‘Maybe, to be more open, I need to write about being more open’. So here it is. 

One of the things I have found in my year and a bit in ONS and the wider GDS is that people love a slogan. Especially one on a sticker. Often in quite stern joyless fonts with clipped, punchy thoughts extolling the virtues of achieving loftier and loftier goals. Over time I have found them overbearing. Less a call to action and more of a reminder of yet another thing I *should* be doing to be better. Some of them have helped (Be Bold is a favourite of mine) and the one that has stuck with me the most is within the design manual for government projects and in my head for The Web. “Make things open, it makes things better".  Seven little words and everything that I want to be, all wrapped up together.

So. How can I help make things open and better? My job is to run the ONS website(s). This, it has to be said, is a genuine privilege. Every day I get to work with some of the smartest people I have ever met, helping to make sense of the world. It is also a privilege  because statistics have meaning. They help people make choices. They help people make Big Choices. That means I feel an obligation/duty/*something* to make sure the statistics are made available in the most accessible manner.  [1]

Attempting to do something, anything, in the best possible way is daunting. Attempting to make the information used to decide how much your train ticket costs, or where a hospital is built is frankly terrifying. Terrifying and important.  

Now ‘open’ and ‘digital’ can mean many things, but I wanted to draw together some of the work we are doing in the Stats Palace to be open, to show our working out and make our information part of a wider web of links and meaning. This isn’t a right and wrong thing, but an attempt to describe how one project has tried to be open, because, well, that seems open. 

Customise My Data 

We are currently working on some new functionality for the Ons website. The project name is ‘customise my data’. Currently the Ons world runs on excel. 1000s of them. This work aims to make a consistent backbone of data within the site that will allow users easy routes to breakdown existing datasets into smaller chunks and to create an API that allows access to our actual numbers (not an API that ends with a request for a xls file). The project is positioned to connect the numbers produced to the people who need to use them - and in the right contexts. 

Open Data    

(The mighty) Leigh Dodds helped us put together some open data publishing principles and they are, I think, pretty darn good. 

Make data discoverable - this might seem obvious, but it's something that could be, and needs to be, so much better. Don't hide data in excel when you can make it open to people. This means indexing the information in datasets, marking them as datasets (as a *thing*) and describing what they are about.

Ensure reuse rights are always clear - easy. Put it in a footer and pop it in a response header. 

Help users cite their sources. Always present data in context

I am very interested in this at the moment. Some prov, some CSV-w and a lot of work to ensure every broken down section of information has a permanent identifier that means everyone else can point at it. Context is also important in a world where data isn't always consumed within a closed platform, when data is weaved into the fabric of decisions. How do fact checkers know what caveats are applied to data, how do you make changes to data once it is out of the box? 

Make datasets legible

Always. 

Data should be useful for everyone

No user gets left behind. 

Make data part of the web

This one is the one that gets me in the idealistic feels every time. I am, I think it is safe to say, a fan of the ol’ WWWs. It is pretty much the most beautiful concept you can get. Democratic, universal access to universal knowledge. Could you create a more powerful idea than that? Making information available as part of the web is the key though. We can all make an island. We can casually monetise the exhaust fumes from user data. But that is not open. Open is making information that plays well with all users. This can be someone with a browser looking at HTML, or it can be a machine asking a question and expecting a defined response. It also needs to exist as part of a Web of information and be linked to other things. No registration, open access. That is the vision.

Open Standards

Standards are important. Obvs. They make sure the world fits together. Ensure that widget A and sprocket B clip into place, just so. On the web they are vital. Not everything has to use every standard, but if you want information to co-exist in a powerful way, describe it using the conventions of the web, not your organisation's or your own head.  

Recently, we've been thinking about this with the publishing of statistics. This has led to a small collection of important choices. 

Showing the provenance of provenance. It is vital that information around statistics has context. To this end, metadata is our love letter to the future so we had best use our best stationery. StatDCAT is a good framework to work with here. The context comes next, so let's throw in some CSV on the web. I heart heart this so much. Machine readable tabular data with the ability to ensure annotations travel with the data? Yes please and thank you. 

Open source

In some ways, the obvious one. We do this well in UK Gov. For us, that has meant producing an open source data publishing platform. I would love more people to use it.

Open working

The teams I am involved with have tried to do so much of this. Is it important?  Yes. It is vital. Agile teams (and goodness knows we need to be Agile. Making complex data systems is difficult and we need all the help from process that we can get) are, almost by default, units of open data. The carefully constructed user stories create obvious things to track and point at. So;

We publish our roadmap here, keep it updated and blog about changes

We write sprint notes

We tell people what we have learned

We make every prototype available

We make ourselves available

We request feedback

This is important.  Not through some pious holier than thou approach but for two big reasons

1) Being open is about sharing. We never say we have get things right, but we share them so others can learn from and with us. We don't have to solve every problem from scratch and open sharing helps

2) We do this because these are your statistics and this is what democracy looks like. 

So to recap. I am working on a open by default Agile project, that processes statistical datasets, turns them into open Standards compliant formats that will allow programmatic access to low level data, in browser customisation of datasets to meet exact user needs and generates metadata to enable automated pipelines to validate the context of statistics. 

Open the data,  show your working out- save the world? I genuinely think it could. 

(ps - we are hiring)


[1] I am not a statistician by any means, so here I am referring to the best possible publication.