Okay, let’s geek out on some open data portal stats from data.gc.ca. I’ve got three parts to this review: First, an assessment on how to assess the value of data.gc.ca. Second, a look at what are the most downloaded data sets. And third, some interesting data about who is visiting the portal.
Before we dive in, a thank you to Jonathan C sent me some of this data to me the other day after requesting it from Treasury Board, the ministry within the Canadian Government that manages the government’s open data portal.
1. Assessing the Value of data.gc.ca
Here is the first thing that struck me. Many governments talk about how they struggle to find methodologies to measure the value of open data portals/initiatives. Often these assessments focus on things like number of apps created or downloaded. Sometimes (and incorrectly in my mind) pageviews or downloads are used. Occasionally it veers into things like mashups or websites.
However, one fairly tangible value of open data portals is that they cheaply resolve some access to information requests - a point I’ve tried to make before. At the very minimum they give scale to some requests that previously would have been handled by slow and expensive access to information/freedom of information processes.
Let me share some numbers to explain what I mean.
The Canada Government is, I believe, only obligated to fulfill requests that originate within Canada. Drawing from the information in the charts later in this post, let’s say assume there were a total of 2200 downloads in January and that 1/3 of these originated from Canada – so a total of 726 “Canadian” downloads. Thanks to some earlier research, I happen to know that the office of the information commissioner has assessed that the average cost of fulfilling an access to information request in 2009-2010 was $1,332.21.
So in a world without an open data portal the hypothetical cost of fulfilling these “Canadian” downloads as formal access to information requests would have been $967,184.46 in January alone. Even if I’m off by 50%, then the cost – again, just for January – would still sit at $483,592.23. Assuming this is a safe monthly average, then over the course of a year the cost savings could be around $11,606,213.52 or $5,803,106.76 – depending on how conservative you’d want to be about the assumptions.
Of course, I’m well aware that not every one of these downloads would been an information request in a pre-portal world – that process is simply to burdensome. You have to pay a fee, and it has to be by check (who pays for anything by check any more???) so many of these users would simply have abandoned their search for government information. So some of these savings would not have been realized. But that doesn’t mean there isn’t value. Instead the open data portal is able to more cheaply reveal latent demand for data. In addition, only a fraction of the government’s data is presently on the portal – so all these numbers could get bigger still. And finally I’m only assessing downloads that originated inside Canada in these estimates.
So I’m not claiming that we have arrived at a holistic view of how to assess the value of open data portals – but even the narrow scope of assessment I outline above generates financial savings that are not trivial, and this is to say nothing of the value generated by those who downloaded the data – something that is much harder to measure – or of the value of increased access to Canadians and others.
2. Most Downloaded Datasets at data.gc.ca
This is interesting because… well… it’s just always interesting to see what people gravitate towards. But check this out…
Data sets like the Anthropogenic disturbance footprint within boreal caribou ranges across Canada may not seem interesting, but the ground breaking agreement between the Forest Products Association of Canada and a coalition of Environmental Non-Profits – known as the Canadian Boreal Forest Agreement (CBFA) – uses this data set a lot to assess where the endangered woodland caribou are most at risk. There is no app, but the data is critical in both protecting this species and in finding a way to sustainably harvest wood in Canada. (note, I worked as an adviser on the CBFA so am a) a big fan and b) not making this stuff up).
It is fascinating that immigration and visa data tops the list. But it really shouldn’t be a surprise. We are of course, a nation of immigrants. I’m sure that immigration and visa advisers, to say nothing of think tanks, municipal governments, social service non-profits and English as a second language schools are all very keen on using this data to help them understand how they should be shaping their services and policies to target immigrant communities.
There is, of course, weather. The original open government data set. We made this data open for 100s of years. So useful and so important you had to make it open.
And, nice to see Sales of fuel used for road motor vehicles, by province and territory. If you wanted to figure out the carbon footprint of vehicles, by province, I suspect this is a nice dataset to get. Probably is also useful for computing gas prices as it might let you get a handle on demand. Economists probably like this data set.
All this to say, I’m less skeptical than before about the data sets in data.gc.ca. With the exception of weather, these data sets aren’t likely useful to software developers – the group I tend to hear most from – but then I’ve always posited that apps were only going to be a tiny part of the open data ecosystem. Analysis is king for open data and there does appear to be people out there who are finding data of value for analyses they want to make. That’s a great outcome.
Here are the tables outlining the most popular data sets since launch and (roughly) in February.
Top 10 most downloaded datasets, since launch
|1||Permanent Resident Applications Processed Abroad and Processing Times (English)||Citizenship and Immigration Canada||4730|
|2||Permanent Resident Summary by Mission (English)||Citizenship and Immigration Canada||1733|
|3||Overseas Permanent Resident Inventory (English)||Citizenship and Immigration Canada||1558|
|4||Canada – Permanent residents by category (English)||Citizenship and Immigration Canada||1261|
|5||Permanent Resident Applicants Awaiting a Decision (English)||Citizenship and Immigration Canada||873|
|6||Meteorological Service of Canada (MSC) – City Page Weather||Environment Canada||852|
|7||Meteorological Service of Canada (MSC) – Weather Element Forecasts||Environment Canada||851|
|8||Permanent Resident Visa Applications Received Abroad – English Version||Citizenship and Immigration Canada||800|
|9||Water Quality Indicators – Reports, Maps, Charts and Data||Environment Canada||697|
|10||Canada – Permanent and Temporary Residents – English version||Citizenship and Immigration Canada||625|
Top 10 most downloaded datasets, for past 30 days
3. Visitor locations
So this is just plain fun. There is not a ton to derive from this – especially as IP addresses can, occasionally, be misleading. In addition, this is page view data, not download data. But what is fascinating is that computers in Canada are not the top source of traffic at data.gc.ca. Indeed, Canada’s share of the traffic is actually quite low. In fact, in January, just taking into account the countries in the chart (and not the long tail of visitors) Canada accounted for only 16% of the traffic to the site. That said, I suspect that downloads were significantly higher from Canadian visitors – although I have no hard evidence of this, just a hypothesis.