Statistics Canada Data to become OpenData – Background, Winners and Next Steps

As some of you learned last night, Embassy Magazine broke the story that all of Statistics Canada’s online data will not only be made free, but released under the Government of Canada’s Open Data License Agreement (updated and reviewed earlier this week) that allows for commercial re-use.

This decision has been in the works for months, and while it does not appear to have been formally announced, Embassy Magazine does appear to have managed to get a Statistics Canada spokesperson to confirm it is true. I have a few thoughts about this story: Some background, who wins from this decision, and most importantly, some hope for what it will, and won’t lead to next.

Background

In the embassy article, the spokesperson claimed this decision had been in the works for years, something that is probably technically true. Such a decision – or something akin to it – has likely been contemplated a number of times. And there have been a number of trials and projects that have allowed for some data to be made accessible albeit under fairly restrictive licenses.

But it is less clear that the culture of open data has arrived at StatsCan, and less clear to me that this decision was internally driven. I’ve met many a Statscan employee who encountered enormous resistance while advocating for data open. I remember pressing the issue during a talk at one of the department’s middle managers conference in November of 2008 and seeing half the room nod vigorously in agreement, while the other half crossed it arms in strong disapproval.

Consequently, with the federal government increasingly interested in open data, coupled with a desire to have a good news story coming out of statscan after last summer census debacle, and with many decisions in Ottawa happening centrally, I suspect this decision occurred outside the department. This does not diminish its positive impact, but it does mean that a number of the next steps, many of which will require StatsCan to adapt its role, may not happen as quickly as some will hope, as the organization may take some time to come to terms with the new reality and the culture shift it will entail.

This may be compounded by the fact that there may be tougher news on the horizon for StatsCan. With every department required to have submitted proposal to cut their budgets by either 5% and 10%, and with StatsCan having already seen a number of its programs cut, there may be fewer resources in the organization to take advantage of the opportunity making its data open creates, or even just adjust to what has happened.

Winners (briefly)

The winners from this decision are of course, consumers of statscan’s data. Indirectly, this includes all of us, since provincial and local governments are big consumers of statscan data and so now – assuming it is structured in such a manner – they will have easier (and cheaper) access to it. This is also true of large companies and non-profits which have used statscan data to locate stores, target services and generally allocate resources more efficiently. The opportunity now opens for smaller players to also benefit.

Indeed, this is the real hope. That a whole new category of winners emerges. That the barrier to use for software developers, entrepreneurs, students, academics, smaller companies and non-profits will be lowered in a manner that will enable a larger community to make use of the data and therefor create economic or social goods.

Such a community, however, will take time to evolve, and will benefit from support.

And finally, I think StatsCan is a winner. This decision brings it more profoundly into the digital age. It opens up new possibilities and, frankly, pushes a culture change that I believe is long over due. I suspect times are tough at StatsCan – although not as a result of this decision – this decision creates room to rethink how the department works and thinks.

Next Steps

The first thing everybody will be waiting for is to see exactly what data gets shared, in what structure and to what detail. Indeed this question arose a number of times on twitter with people posting tweets such as “Cool. This is all sorts of awesome. Are geo boundary files included too, like Census Tracts and postcodes?” We shall see. My hope is yes and I think the odds are good. But I could be wrong, at which point all this could turn into the most over hyped data story of the year. (Which actually matters now that data analysts are one of the fastest growing categories of jobs in North America).

Second, open data creates an opportunity for a new and more relevant role for StatsCan to a broader set of Canadians. Someone from StatsCan should talk to the data group at the World Bank around their transformation after they launched their open data portal (I’d be happy to make the introduction). That data portal now accounts for a significant portion of all the Bank’s web traffic, and the group is going through a dramatic transformation, realizing they are no longer curators of data for bank staff and a small elite group of clients around the world but curators of economic data for the world. I’m told a new, while the change has not been easy, a broader set of users have brought a new sense of purpose and identity. The same could be true of StatsCan. Rather than just an organization that serves the government of Canada and a select groups of clients, StatsCan could become the curators of data for all Canadians. This is a much more ambitious, but I’d argue more democratized and important goal.

And it is here that I hope other next steps will unfold. In the United States, (which has had free census data for as long as anyone I talked to can remember) whenever new data is released the census bureau runs workshops around the country, educating people on how to use and work with its data. StatsCan and a number of other partners already do some of this, but my hope is that there will be much, much more of it. We need a society that is significantly more data literate, and StatsCan along with the universities, colleges and schools could have a powerful role in cultivating this. Tracey Lauriault over at the DataLibre blog has been a fantastic advocate of such an approach.

I also hope that StatsCan will take its role as data curator for the country very seriously and think of new ways that its products can foster economic and social development. Offering APIs into its data sets would be a logical next step, something that would allow developers to embed census data right into their applications and ensure the data was always up to date. No one is expecting this to happen right away, but it was another question that arose on twitter after the story broke, so one can see that new types of users will be interested in new, and more efficient ways, of accessing the data.

But I think most importantly, the next step will need to come from us citizens. This announcement marks a major change in how StatsCan works. We need to be supportive, particularly at a time of budget cuts. While we are grateful for open data, it would be a shame if the institution that makes it all possible was reduced to a shell of its former self. Good quality data – and analysis to inform public policy – is essential to a modern economy, society, and government. Now that we will have free access to what our tax dollars have already paid for, let’s make sure that it stays that way, by both ensure it continues to be available, and that there continues to be a quality institution capable of collecting and analyzing it.

(sorry for typos – it’s 4am, will revise in the morning)

13 thoughts on “Statistics Canada Data to become OpenData – Background, Winners and Next Steps

  1. Pingback: G A N I S » Blog Archive » Statistics Canada Moving to Make Data Free

  2. Nadya Repin

    I think this is excellent news, and must be followed up by a complete overhaul of the StatsCan website so that the free data can actually be found.

    Reply
  3. Data Dude

    About bloody time!  Maybe we can try to catch up and exceed what the American Census Bureau has been doing for years…not open data, but we are in the right direction…FINALLY!

    Reply
  4. Diane Dyson

    Some of this data has already been made easier to access through the Canadian Council for Social Development’s Community Social Data Strategy, which provides low-cost data to municipalities and non-profits through a group purchasing plan. This announcements takes it one step further – a real surprise considering the strong messaging out of Stats Can in recent years how everything has to be done in a cost recovery fashion – and underlining your point that cuts may come. This is a great decision in principle. We see how it falls out fiscally.

    Reply
  5. Eddie

    Nice move, but what about data from Environment Canada, why isn’t it included? NOAA in the U.S. continues to trump Environment Canada given how liberal NOAA is in sharing its data with the world. 

    Reply
    1. David Eaves

      Eddie,

      Thank you for the comment. Have you actually gone to data.gc.ca and looked for data sets? Or are you just responding to this blog post. I say this because there is already a large number of data sets from envrionment canada on data.gc.ca. There can and should be more, but I suspect that you are responding to this blog post – which is referencing some newly released data – but haven’t done any research. You might find you are pleasantly surprised. IF not, I’d love to here that too – let me know what you are looking for.

      Reply
  6. Tracey P. Lauriault

    I just sent a note to clarify what is free and what is not as we are getting mixed messages, also, CAPDU will also be seeking clarification as they too are getting mixed messages from the agency, the Data Liberation Initiative has also been asked, as they were sharing something quite different also.

    I do not believe this will cover small geographies (Dissemination areas), nor other surveys, nor Trade Division Data, maybe E-Stat.  We are also not sure if this means that the data that have already been purchased can now be shared nor if the cost of custom orders will go up.  We also do not know how they will share these!  People want SPSS files, SAS files and people have become accustomed to Beyond20/20.  The community profiles are interesting, however, you could not order the entire set for Canada you had to go community by community, so that part is also not quite worked out.

    In terms of a long discussions at StatCan, my understanding that it has only been in the past couple of years that this has been a dicussion, even my favourite former Chief Statistician is on tape saying that cost recovery is essential to Statistics Canada. 

    I really do think it was a result of the Census debacle.

    Thanks for sharing the article.

    Reply
  7. Pingback: Strata Week: New open-data initiatives in Canada and the UK - O'Reilly Radar

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s