In the past few years a number of governments have launched open data portals. These sites, like www.data.gov or data.vancouver.ca share data – in machine readable formats (e.g. that you can play with on your computer) that government agencies collect.
Increasingly, people approach me and ask: what makes for a good open data portal? Great question. And now that we have a number of sites out there we are starting to learn what makes a site more or less effective. A good starting point for any of this is 8 Open Government principles, and for those newer to this discussion, there are the 3 laws of open data (also available in German Japanese, Chinese, Spanish, Dutch and Russian).
But beyond that, I think there are some pretty tactical things, data portal owners should be thinking about. So here are some issues I’ve noticed and thought might be helpful.
1. It’s all about automating the back end
Probably the single greatest mistake I’ve seen governments make is, in the rush to get some PR or meet an artificial deadline, they create a data portal in which the data must be updated manually. This means that a public servant must run around copying the data out of one system, converting (and possibly scrubbing it of personal and security information) and then posting it to the data portal.
There are a few interrelated problems with this approach. Yes, it allows you to get a site up quickly but… it isn’t sustainable. Most government IT departments don’t have a spare body that can do this work part time, even less so if the data site were to grow to include 100s or 1000s of data sets.
Consequently, this approach is likely to generate ill-will towards the government, especially from the very community of people who could and should be your largest supporters: local tech advocates and developers.
Consider New York, here is a site where – from I can tell – the data is not regularly updated and grumblings are getting louder. I’ve heard similar grumblings out of some developers and citizens in Canadians cities where open data portals get trumpeted despite infrequent updates and having few data sets available.
If you are going to launch an open data portal, make sure you’ve figured out how to automate the data updates first. It is harder to do, but essential. In the early days open data sites often live and die based on the engagement of a relatively small community or early adopters – the people who will initially make the data come alive and build broader awareness. Frustrate the community and the initiative will have a harder time gaining traction.
2. Keep the barriers low
Both the 8 principles and 3 laws talk a lot about licensing. Obviously there are those who would like the licenses on many existing portals to be more open, but in most cases the licenses are pretty good.
What you shouldn’t do is require users to register. If the data is open, you don’t care who is using it and indeed, as a government, you don’t want the hassle of tracking them. Also, don’t call your data open if members must belong to a educational institution or a non-profit. That is by definition not data that is open (I’m looking at you StatsCan, its not liberated data if only a handful of people can look at it, sadly, you’re not the only site to do this). Worst is one website that, in order to access the online catalogue you have to fax in a form outlining who you are.
This is the antithesis of how an open data portal should work.
3. Think like (or get help from) good librarians and designers
The real problem is when sites demand too much of users to even gain access to the data. Readers of this blog know about my feelings regarding Statistics Canada’s website, the data always seems to be one click away. Of course, that’s if you even think you are able to locate the data you are interested in, which usually seems impossible to find.
And yes, I know that Statistics Canada’s phone operators are very helpful and can help you locate datasets quickly – but I submit to you that this is a symptom of a problem. If every time I went to Amazon.com I had to call a help desk to find the book I was interested in I don’t think we’d be talking about how great Amazon’s help desk was. We’d be talking about how crappy their website is.
The point here is that an open data site is likely to grow. Indeed, looking at data.gov and data.gov.uk these sites now have thousands of data sets on them. In order to be navigable they need to have excellent design. More importantly, you need to have a new breed of librarian – one capable of thinking in the online space – to help create a system where data sets can be easily and quickly located.
This is rarely a problem early on (Vancouver has 140 data sets up, Washington DC, around 250, these can still be trolled through without a sophisticated system). But you may want to sit down with a designer and a librarian during these early stages to think about how the site might evolve so that you don’t create problems in the future.
Finally, I think good open data portals want, and even encourage feedback. I like that data.vancouver.ca has a survey on the site which asks people what data sets they would be interested in seeing made open.
But more importantly, this is an area where governments can benefit. No data set is perfect. Most have a typo here or there. Once people start using your data they are going to find mistakes.
The best approach is not to pretend like the information is perfect (it isn’t, and the public will have less confidence in you if you pretend this is true). Instead, ask to be notified about errors. Remember, you are using this data internally, so any errors are negatively impacting your own planning and analysis. By harnessing the eyes of the public you will be able to identify and fix problems more quickly.
And, while I’m sure we all agree this is probably not the case, maybe the face that the data us public, there will be a small added incentive to fixing it quickly. Maybe.