Calculating the Value of Canada’s Open Data Portal: A Mini-Case Study

Okay, let’s geek out on some open data portal stats from data.gc.ca. I’ve got three parts to this review: First, an assessment on how to assess the value of data.gc.ca. Second, a look at what are the most downloaded data sets. And third, some interesting data about who is visiting the portal.

Before we dive in, a thank you to Jonathan C sent me some of this data to me the other day after requesting it from Treasury Board, the ministry within the Canadian Government that manages the government’s open data portal.

1. Assessing the Value of data.gc.ca

Here is the first thing that struck me. Many governments talk about how they struggle to find methodologies to measure the value of open data portals/initiatives. Often these assessments focus on things like number of apps created or downloaded. Sometimes (and incorrectly in my mind) pageviews or downloads are used. Occasionally it veers into things like mashups or websites.

However, one fairly tangible value of open data portals is that they cheaply resolve some access to information requests –  a point I’ve tried to make before. At the very minimum they give scale to some requests that previously would have been handled by slow and expensive access to information/freedom of information processes.

Let me share some numbers to explain what I mean.

The Canada Government is, I believe, only obligated to fulfill requests that originate within Canada. Drawing from the information in the charts later in this post, let’s say assume there were a total of 2200 downloads in January and that 1/3 of these originated from Canada – so a total of 726 “Canadian” downloads. Thanks to some earlier research, I happen to know that the office of the information commissioner has assessed that the average cost of fulfilling an access to information request in 2009-2010 was $1,332.21.

So in a world without an open data portal the hypothetical cost of fulfilling these “Canadian” downloads as formal access to information requests would have been $967,184.46 in January alone. Even if I’m off by 50%, then the cost – again, just for January – would still sit at $483,592.23. Assuming this is a safe monthly average, then over the course of a year the cost savings could be around $11,606,213.52 or $5,803,106.76 – depending on how conservative you’d want to be about the assumptions.

Of course, I’m well aware that not every one of these downloads would been an information request in a pre-portal world – that process is simply to burdensome. You have to pay a fee, and it has to be by check (who pays for anything by check any more???) so many of these users would simply have abandoned their search for government information. So some of these savings would not have been realized. But that doesn’t mean there isn’t value. Instead the open data portal is able to more cheaply reveal latent demand for data. In addition, only a fraction of the government’s data is presently on the portal – so all these numbers could get bigger still. And finally I’m only assessing downloads that originated inside Canada in these estimates.

So I’m not claiming that we have arrived at a holistic view of how to assess the value of open data portals – but even the narrow scope of assessment I outline above generates financial savings that are not trivial, and this is to say nothing of the value generated by those who downloaded the data – something that is much harder to measure – or of the value of increased access to Canadians and others.

2. Most Downloaded Datasets at data.gc.ca

This is interesting because… well… it’s just always interesting to see what people gravitate towards. But check this out…

Data sets like the Anthropogenic disturbance footprint within boreal caribou ranges across Canada may not seem interesting, but the ground breaking agreement between the Forest Products Association of Canada and a coalition of Environmental Non-Profits – known as the Canadian Boreal Forest Agreement (CBFA) – uses this data set a lot to assess where the endangered woodland caribou are most at risk. There is no app, but the data is critical in both protecting this species and in finding a way to sustainably harvest wood in Canada. (note, I worked as an adviser on the CBFA so am a) a big fan and b) not making this stuff up).

It is fascinating that immigration and visa data tops the list. But it really shouldn’t be a surprise. We are of course, a nation of immigrants. I’m sure that immigration and visa advisers, to say nothing of think tanks, municipal governments, social service non-profits and English as a second language schools are all very keen on using this data to help them understand how they should be shaping their services and policies to target immigrant communities.

There is, of course, weather. The original open government data set. We made this data open for 100s of years. So useful and so important you had to make it open.

And, nice to see Sales of fuel used for road motor vehicles, by province and territory. If you wanted to figure out the carbon footprint of vehicles, by province, I suspect this is a nice dataset to get. Probably is also useful for computing gas prices as it might let you get a handle on demand. Economists probably like this data set.

All this to say, I’m less skeptical than before about the data sets in data.gc.ca. With the exception of weather, these data sets aren’t likely useful to software developers – the group I tend to hear most from – but then I’ve always posited that apps were only going to be a tiny part of the open data ecosystem. Analysis is king for open data and there does appear to be people out there who are finding data of value for analyses they want to make. That’s a great outcome.

Here are the tables outlining the most popular data sets since launch and (roughly) in February.

  Top 10 most downloaded datasets, since launch

DATASET DEPARTMENT DOWNLOADS
1 Permanent Resident Applications Processed Abroad and Processing Times (English) Citizenship and Immigration Canada 4730
2 Permanent Resident Summary by Mission (English) Citizenship and Immigration Canada 1733
3 Overseas Permanent Resident Inventory (English) Citizenship and Immigration Canada 1558
4 Canada – Permanent residents by category (English) Citizenship and Immigration Canada 1261
5 Permanent Resident Applicants Awaiting a Decision (English) Citizenship and Immigration Canada 873
6 Meteorological Service of Canada (MSC) – City Page Weather Environment Canada 852
7 Meteorological Service of Canada (MSC) – Weather Element Forecasts Environment Canada 851
8 Permanent Resident Visa Applications Received Abroad – English Version Citizenship and Immigration Canada  800
9 Water Quality Indicators – Reports, Maps, Charts and Data Environment Canada 697
10 Canada – Permanent and Temporary Residents – English version Citizenship and Immigration Canada 625

Top 10 most downloaded datasets, for past 30 days

DATASET DEPARTMENT DOWNLOADS
1 Permanent Resident Applications Processed Abroad and Processing Times (English) Citizenship and Immigration Canada 481
2 Sales of commodities of large retailers – English version Statistics Canada  247
3 Permanent Resident Summary by Mission – English Version Citizenship and Immigration Canada 207
4 CIC Operational Network at a Glance – English Version Citizenship and Immigration Canada 163
5 Gross domestic product at basic prices, communications, transportation and trade – English version Statistics Canada 159
6 Anthropogenic disturbance footprint within boreal caribou ranges across Canada – As interpreted from 2008-2010 Landsat satellite imagery Environment Canada  102
7 Canada – Permanent residents by category – English version Citizenship and Immigration Canada  98
8 Meteorological Service of Canada (MSC) – City Page Weather Environment Canada  61
9 Sales of fuel used for road motor vehicles, by province and territory – English version  Statistics Canada 52
10 Government of Canada Core Subject Thesaurus – English Version  Library and Archives Canada  51

3. Visitor locations

So this is just plain fun. There is not a ton to derive from this – especially as IP addresses can, occasionally, be misleading. In addition, this is page view data, not download data. But what is fascinating is that computers in Canada are not the top source of traffic at data.gc.ca. Indeed, Canada’s share of the traffic is actually quite low. In fact, in January, just taking into account the countries in the chart (and not the long tail of visitors) Canada accounted for only 16% of the traffic to the site. That said, I suspect that downloads were significantly higher from Canadian visitors – although I have no hard evidence of this, just a hypothesis.

datagcca-december-visits

•Total visits since launch: 380,276 user sessions

10 thoughts on “Calculating the Value of Canada’s Open Data Portal: A Mini-Case Study

  1. Pingback: (17:23 08-03-2012) Noticias más populares de #opendata en las ultimas 24 horas | Tuits de Software Libre

  2. Pingback: (22:03 08-03-2012) Noticias más populares de #opendata en las ultimas 24 horas | Tuits de Software Libre

  3. Pingback: (22:09 08-03-2012) Noticias más populares de #opengov en las ultimas 24 horas | Tuits de Software Libre

  4. Pingback: (02:41 09-03-2012) Noticias más populares de #opendata en las ultimas 24 horas | Tuits de Software Libre

  5. Pingback: (06:41 09-03-2012) Noticias más populares de #opendata en las ultimas 24 horas | Tuits de Software Libre

  6. Pingback: (10:57 09-03-2012) Noticias más populares de #opendata en las ultimas 24 horas | Tuits de Software Libre

  7. Pingback: (15:07 09-03-2012) Noticias más populares de #opendata en las ultimas 24 horas | Tuits de Software Libre

  8. Mark

    In a more recent blog post (http://eaves.ca/2012/03/09/access-to-information-open-data-and-the-problem-with-convergence/), David mentioned the email I sent responding to the above post. Here it is:

    ——————————————

    Hi David,

    You’ve worked hard to bring open government data and should be
    congratulated. But in his enthusiasm you’ve made some mistakes in your recent blog
    posting: http://eaves.ca/2012/03/08/calculating-the-value-of-canadas-open-data-portal-a-mini-case-study

    First you wrote: “one fairly tangible value of open data portals is that
    they cheaply resolve some access to information requests.”

    That claim is not supported by the post. I’m not
    sure why published datasets would reduce FOIs. There’s even evidence to
    suggest that posting FOIed
    documents on-line does not reduce the total number of FOIs.  In BC, the
    provincial government has a policy of making documents responsive to FOI
    downloadable at http://www.openinfo.gov.bc.ca. 
    In the six months this policy has been implemented, the BC Government has seen
    the total number of FOIs increase by 28% compared to the same period last year.

    You also estimate the value of http://www.data.gc.ca
    by comparing the costs of access via an opendata portal (which you assume to be $0) with the method of access
    defined by the Access to Information Act. The estimate you provide is based on how
    much it would have cost to access all the data through the ATI Act:

    726 downloads of files from Canada for January x $1332.21/ATI = $967,184.46

    Over a year, you estimate that to be upwards of $11,606,213

    However, it’s important to recognize that once information is ordered
    through the Access to Information Act, other people can subsequently have the
    department send them the responsive documents.  The catalogues of ATIs
    you can order are posted on-line. Here is Citizenship and Immigration Canada’s catalogue:

    http://www.cic.gc.ca/english/department/atip/completed.asp

    I’m sure there’s a way of reducing the transaction cost of subsequent
    orders even further (the Office of the Information Commissioner used to have an
    on-line form).

    It’s important to note that the estimates you provide appear to be
    based on approximately 10 data files on http://www.data.gc.ca
    that were downloaded multiple times.  For example, “Permanent
    Resident Applications Processed Abroad and Processing Times”, was downloaded
    481 times in January 2012.   You are assuming subsequent access of
    the same data files will cost as much as the initial access.  That’s wrong, as I explained above.

    So the lowest estimate would be:

    10 data files x $1332/ATI  in one month = $13,322 and not $967,184

    And in a year, it would still be $13,322 and not $11,606,213

    Currently, subsequent orders via the ATI system have a transaction cost. Those aren’t nearly as much as initial costs. And as
    said, I’m sure there’s ways of reducing even further.

    Rather than compare FOI legislation and Open Gov Data as if it’s
    “one or the other”, do you think there’s a way of talking about how the
    two might converge?Mark

     

    Reply
  9. Pingback: Life, Liberty, and the Pursuit of Open Data | Code for America

  10. Pingback: Project Proposal – Open Data Portals and Community Participation | 2013WC-LIBR548H-99C-Issues in Information Services - IS REF&INFO SERV-Gurstein: Course Blog/Discussion

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s