As some of you learned last night, Embassy Magazine broke the story that all of Statistics Canada’s online data will not only be made free, but released under the Government of Canada’s Open Data License Agreement (updated and reviewed earlier this week) that allows for commercial re-use.
This decision has been in the works for months, and while it does not appear to have been formally announced, Embassy Magazine does appear to have managed to get a Statistics Canada spokesperson to confirm it is true. I have a few thoughts about this story: Some background, who wins from this decision, and most importantly, some hope for what it will, and won’t lead to next.
In the embassy article, the spokesperson claimed this decision had been in the works for years, something that is probably technically true. Such a decision – or something akin to it – has likely been contemplated a number of times. And there have been a number of trials and projects that have allowed for some data to be made accessible albeit under fairly restrictive licenses.
But it is less clear that the culture of open data has arrived at StatsCan, and less clear to me that this decision was internally driven. I’ve met many a Statscan employee who encountered enormous resistance while advocating for data open. I remember pressing the issue during a talk at one of the department’s middle managers conference in November of 2008 and seeing half the room nod vigorously in agreement, while the other half crossed it arms in strong disapproval.
Consequently, with the federal government increasingly interested in open data, coupled with a desire to have a good news story coming out of statscan after last summer census debacle, and with many decisions in Ottawa happening centrally, I suspect this decision occurred outside the department. This does not diminish its positive impact, but it does mean that a number of the next steps, many of which will require StatsCan to adapt its role, may not happen as quickly as some will hope, as the organization may take some time to come to terms with the new reality and the culture shift it will entail.
This may be compounded by the fact that there may be tougher news on the horizon for StatsCan. With every department required to have submitted proposal to cut their budgets by either 5% and 10%, and with StatsCan having already seen a number of its programs cut, there may be fewer resources in the organization to take advantage of the opportunity making its data open creates, or even just adjust to what has happened.
The winners from this decision are of course, consumers of statscan’s data. Indirectly, this includes all of us, since provincial and local governments are big consumers of statscan data and so now – assuming it is structured in such a manner – they will have easier (and cheaper) access to it. This is also true of large companies and non-profits which have used statscan data to locate stores, target services and generally allocate resources more efficiently. The opportunity now opens for smaller players to also benefit.
Indeed, this is the real hope. That a whole new category of winners emerges. That the barrier to use for software developers, entrepreneurs, students, academics, smaller companies and non-profits will be lowered in a manner that will enable a larger community to make use of the data and therefor create economic or social goods.
Such a community, however, will take time to evolve, and will benefit from support.
And finally, I think StatsCan is a winner. This decision brings it more profoundly into the digital age. It opens up new possibilities and, frankly, pushes a culture change that I believe is long over due. I suspect times are tough at StatsCan – although not as a result of this decision – this decision creates room to rethink how the department works and thinks.
The first thing everybody will be waiting for is to see exactly what data gets shared, in what structure and to what detail. Indeed this question arose a number of times on twitter with people posting tweets such as “Cool. This is all sorts of awesome. Are geo boundary files included too, like Census Tracts and postcodes?” We shall see. My hope is yes and I think the odds are good. But I could be wrong, at which point all this could turn into the most over hyped data story of the year. (Which actually matters now that data analysts are one of the fastest growing categories of jobs in North America).
Second, open data creates an opportunity for a new and more relevant role for StatsCan to a broader set of Canadians. Someone from StatsCan should talk to the data group at the World Bank around their transformation after they launched their open data portal (I’d be happy to make the introduction). That data portal now accounts for a significant portion of all the Bank’s web traffic, and the group is going through a dramatic transformation, realizing they are no longer curators of data for bank staff and a small elite group of clients around the world but curators of economic data for the world. I’m told a new, while the change has not been easy, a broader set of users have brought a new sense of purpose and identity. The same could be true of StatsCan. Rather than just an organization that serves the government of Canada and a select groups of clients, StatsCan could become the curators of data for all Canadians. This is a much more ambitious, but I’d argue more democratized and important goal.
And it is here that I hope other next steps will unfold. In the United States, (which has had free census data for as long as anyone I talked to can remember) whenever new data is released the census bureau runs workshops around the country, educating people on how to use and work with its data. StatsCan and a number of other partners already do some of this, but my hope is that there will be much, much more of it. We need a society that is significantly more data literate, and StatsCan along with the universities, colleges and schools could have a powerful role in cultivating this. Tracey Lauriault over at the DataLibre blog has been a fantastic advocate of such an approach.
I also hope that StatsCan will take its role as data curator for the country very seriously and think of new ways that its products can foster economic and social development. Offering APIs into its data sets would be a logical next step, something that would allow developers to embed census data right into their applications and ensure the data was always up to date. No one is expecting this to happen right away, but it was another question that arose on twitter after the story broke, so one can see that new types of users will be interested in new, and more efficient ways, of accessing the data.
But I think most importantly, the next step will need to come from us citizens. This announcement marks a major change in how StatsCan works. We need to be supportive, particularly at a time of budget cuts. While we are grateful for open data, it would be a shame if the institution that makes it all possible was reduced to a shell of its former self. Good quality data – and analysis to inform public policy – is essential to a modern economy, society, and government. Now that we will have free access to what our tax dollars have already paid for, let’s make sure that it stays that way, by both ensure it continues to be available, and that there continues to be a quality institution capable of collecting and analyzing it.
(sorry for typos – it’s 4am, will revise in the morning)