Okay, let’s geek out on some open data portal stats from data.gc.ca. I’ve got three parts to this review: First, an assessment on how to assess the value of data.gc.ca. Second, a look at what are the most downloaded data sets. And third, some interesting data about who is visiting the portal.
Before we dive in, a thank you to Jonathan C sent me some of this data to me the other day after requesting it from Treasury Board, the ministry within the Canadian Government that manages the government’s open data portal.
1. Assessing the Value of data.gc.ca
Here is the first thing that struck me. Many governments talk about how they struggle to find methodologies to measure the value of open data portals/initiatives. Often these assessments focus on things like number of apps created or downloaded. Sometimes (and incorrectly in my mind) pageviews or downloads are used. Occasionally it veers into things like mashups or websites.
However, one fairly tangible value of open data portals is that they cheaply resolve some access to information requests – a point I’ve tried to make before. At the very minimum they give scale to some requests that previously would have been handled by slow and expensive access to information/freedom of information processes.
Let me share some numbers to explain what I mean.
The Canada Government is, I believe, only obligated to fulfill requests that originate within Canada. Drawing from the information in the charts later in this post, let’s say assume there were a total of 2200 downloads in January and that 1/3 of these originated from Canada – so a total of 726 “Canadian” downloads. Thanks to some earlier research, I happen to know that the office of the information commissioner has assessed that the average cost of fulfilling an access to information request in 2009-2010 was $1,332.21.
So in a world without an open data portal the hypothetical cost of fulfilling these “Canadian” downloads as formal access to information requests would have been $967,184.46 in January alone. Even if I’m off by 50%, then the cost – again, just for January – would still sit at $483,592.23. Assuming this is a safe monthly average, then over the course of a year the cost savings could be around $11,606,213.52 or $5,803,106.76 – depending on how conservative you’d want to be about the assumptions.
Of course, I’m well aware that not every one of these downloads would been an information request in a pre-portal world – that process is simply to burdensome. You have to pay a fee, and it has to be by check (who pays for anything by check any more???) so many of these users would simply have abandoned their search for government information. So some of these savings would not have been realized. But that doesn’t mean there isn’t value. Instead the open data portal is able to more cheaply reveal latent demand for data. In addition, only a fraction of the government’s data is presently on the portal – so all these numbers could get bigger still. And finally I’m only assessing downloads that originated inside Canada in these estimates.
So I’m not claiming that we have arrived at a holistic view of how to assess the value of open data portals – but even the narrow scope of assessment I outline above generates financial savings that are not trivial, and this is to say nothing of the value generated by those who downloaded the data – something that is much harder to measure – or of the value of increased access to Canadians and others.
2. Most Downloaded Datasets at data.gc.ca
This is interesting because… well… it’s just always interesting to see what people gravitate towards. But check this out…
Data sets like the Anthropogenic disturbance footprint within boreal caribou ranges across Canada may not seem interesting, but the ground breaking agreement between the Forest Products Association of Canada and a coalition of Environmental Non-Profits – known as the Canadian Boreal Forest Agreement (CBFA) – uses this data set a lot to assess where the endangered woodland caribou are most at risk. There is no app, but the data is critical in both protecting this species and in finding a way to sustainably harvest wood in Canada. (note, I worked as an adviser on the CBFA so am a) a big fan and b) not making this stuff up).
It is fascinating that immigration and visa data tops the list. But it really shouldn’t be a surprise. We are of course, a nation of immigrants. I’m sure that immigration and visa advisers, to say nothing of think tanks, municipal governments, social service non-profits and English as a second language schools are all very keen on using this data to help them understand how they should be shaping their services and policies to target immigrant communities.
There is, of course, weather. The original open government data set. We made this data open for 100s of years. So useful and so important you had to make it open.
And, nice to see Sales of fuel used for road motor vehicles, by province and territory. If you wanted to figure out the carbon footprint of vehicles, by province, I suspect this is a nice dataset to get. Probably is also useful for computing gas prices as it might let you get a handle on demand. Economists probably like this data set.
All this to say, I’m less skeptical than before about the data sets in data.gc.ca. With the exception of weather, these data sets aren’t likely useful to software developers – the group I tend to hear most from – but then I’ve always posited that apps were only going to be a tiny part of the open data ecosystem. Analysis is king for open data and there does appear to be people out there who are finding data of value for analyses they want to make. That’s a great outcome.
Here are the tables outlining the most popular data sets since launch and (roughly) in February.
Top 10 most downloaded datasets, since launch
|1||Permanent Resident Applications Processed Abroad and Processing Times (English)||Citizenship and Immigration Canada||4730|
|2||Permanent Resident Summary by Mission (English)||Citizenship and Immigration Canada||1733|
|3||Overseas Permanent Resident Inventory (English)||Citizenship and Immigration Canada||1558|
|4||Canada – Permanent residents by category (English)||Citizenship and Immigration Canada||1261|
|5||Permanent Resident Applicants Awaiting a Decision (English)||Citizenship and Immigration Canada||873|
|6||Meteorological Service of Canada (MSC) – City Page Weather||Environment Canada||852|
|7||Meteorological Service of Canada (MSC) – Weather Element Forecasts||Environment Canada||851|
|8||Permanent Resident Visa Applications Received Abroad – English Version||Citizenship and Immigration Canada||800|
|9||Water Quality Indicators – Reports, Maps, Charts and Data||Environment Canada||697|
|10||Canada – Permanent and Temporary Residents – English version||Citizenship and Immigration Canada||625|
Top 10 most downloaded datasets, for past 30 days
3. Visitor locations
So this is just plain fun. There is not a ton to derive from this – especially as IP addresses can, occasionally, be misleading. In addition, this is page view data, not download data. But what is fascinating is that computers in Canada are not the top source of traffic at data.gc.ca. Indeed, Canada’s share of the traffic is actually quite low. In fact, in January, just taking into account the countries in the chart (and not the long tail of visitors) Canada accounted for only 16% of the traffic to the site. That said, I suspect that downloads were significantly higher from Canadian visitors – although I have no hard evidence of this, just a hypothesis.
Pingback: (17:23 08-03-2012) Noticias más populares de #opendata en las ultimas 24 horas | Tuits de Software Libre
Pingback: (22:03 08-03-2012) Noticias más populares de #opendata en las ultimas 24 horas | Tuits de Software Libre
Pingback: (22:09 08-03-2012) Noticias más populares de #opengov en las ultimas 24 horas | Tuits de Software Libre
Pingback: (02:41 09-03-2012) Noticias más populares de #opendata en las ultimas 24 horas | Tuits de Software Libre
Pingback: (06:41 09-03-2012) Noticias más populares de #opendata en las ultimas 24 horas | Tuits de Software Libre
Pingback: (10:57 09-03-2012) Noticias más populares de #opendata en las ultimas 24 horas | Tuits de Software Libre
Pingback: (15:07 09-03-2012) Noticias más populares de #opendata en las ultimas 24 horas | Tuits de Software Libre
In a more recent blog post (https://eaves.ca/2012/03/09/access-to-information-open-data-and-the-problem-with-convergence/), David mentioned the email I sent responding to the above post. Here it is:
You’ve worked hard to bring open government data and should be
congratulated. But in his enthusiasm you’ve made some mistakes in your recent blog
First you wrote: “one fairly tangible value of open data portals is that
they cheaply resolve some access to information requests.”
That claim is not supported by the post. I’m not
sure why published datasets would reduce FOIs. There’s even evidence to
suggest that posting FOIed
documents on-line does not reduce the total number of FOIs. In BC, the
provincial government has a policy of making documents responsive to FOI
downloadable at http://www.openinfo.gov.bc.ca.
In the six months this policy has been implemented, the BC Government has seen
the total number of FOIs increase by 28% compared to the same period last year.
You also estimate the value of http://www.data.gc.ca
by comparing the costs of access via an opendata portal (which you assume to be $0) with the method of access
defined by the Access to Information Act. The estimate you provide is based on how
much it would have cost to access all the data through the ATI Act:
726 downloads of files from Canada for January x $1332.21/ATI = $967,184.46
Over a year, you estimate that to be upwards of $11,606,213
However, it’s important to recognize that once information is ordered
through the Access to Information Act, other people can subsequently have the
department send them the responsive documents. The catalogues of ATIs
you can order are posted on-line. Here is Citizenship and Immigration Canada’s catalogue:
I’m sure there’s a way of reducing the transaction cost of subsequent
orders even further (the Office of the Information Commissioner used to have an
It’s important to note that the estimates you provide appear to be
based on approximately 10 data files on http://www.data.gc.ca
that were downloaded multiple times. For example, “Permanent
Resident Applications Processed Abroad and Processing Times”, was downloaded
481 times in January 2012. You are assuming subsequent access of
the same data files will cost as much as the initial access. That’s wrong, as I explained above.
So the lowest estimate would be:
10 data files x $1332/ATI in one month = $13,322 and not $967,184
And in a year, it would still be $13,322 and not $11,606,213
Currently, subsequent orders via the ATI system have a transaction cost. Those aren’t nearly as much as initial costs. And as
said, I’m sure there’s ways of reducing even further.
Rather than compare FOI legislation and Open Gov Data as if it’s
“one or the other”, do you think there’s a way of talking about how the
two might converge?Mark
Pingback: Life, Liberty, and the Pursuit of Open Data | Code for America
Pingback: Project Proposal – Open Data Portals and Community Participation | 2013WC-LIBR548H-99C-Issues in Information Services - IS REF&INFO SERV-Gurstein: Course Blog/Discussion
Pingback: FreeBalance – DEV – ACC | The Real ROI for Government Open Data