Making StatsCan Data Free: Assessing the Cost

Regular readers of my blog will know that I’ve advocated that StatsCan’s data – and particularly its Census data – should be made open (e.g. free, unlicensed, and downloadable in multiple formats). Presently, despite the fact that Canadian tax dollars pay to collect (a sadly diminishing amount, and quality of,) data, it is not open.

The main defense I hear to why StatsCan’s data should not be free is because the department depends on the revenue the data generates.

So exactly how much revenue are we talking about? Thanks to the help of some former public servants I’ve been able to go over the publicly available numbers. The basic assessment – which I encourage people to verify and challenge – turns out not to be a huge a number.

The most interesting figure in StatsCan’s finances is the revenue it generates from its online database (e.g. data downloaded from its website). So how much revenue is it? Well in 2007/2008, it was $559,000.

That’s it. For $559,000 in lost government revenue Canadians could potentially have unlimited access to the Statscan census database their tax dollars paid to collect and organize. I suspect this is a tiny fraction of the value (and tax revenue) that might be generated by economic activity if this data were free.

Worse, the $559,000 is not profit. From what I can tell it is only revenue. Consequently, it doesn’t factor in collection costs StatsCan has to absorb to run and maintain a checkout system on its website, collect credit card info, bill people, etc… I’m willing to bet almost anything that the cost of these functions either exceed $559,000 a year, or come pretty close. So the net cost of making the data free could end up being a less.

StatsCan makes another $763,000 selling Statistics Canada publications (these are 243 data releases of the 29 major economic indicators StatsCan measures and the 5 census releases it does annually – in short these are non-customized reports). So for $1,422,000 Canadians could get access to both the online data statscan has and the reports the organization generates. This is such laughably (or depressingly) small number it begs the question – why are we debating this? (again this is revenue, not profit, so the cost could be much lower)

Of course, the figure that you’ll often hear cited is $100M in revenue. So what accounts for the roughly 100x difference between the above number and the alleged revenue? Well, in 2007/08 StatsCan did make $103,155,000 but this was from value added (e.g. customized) reports. This is very, very different product than the basic data that is available on its website. My sources tell me this is not related to downloaded data.

I think we should concede that if the entire StatsCan’s database were made open and free it would impact some of this revenue. But this would also be a good thing. Why is this? Let’s break it down:

  1. Increase Capacity and Data Literacy: By making a great deal of data open and free, StatsCan would make it easier for competitors to enter the market place. More companies and individuals could analyze the country’s census and other data, and so too could more “ordinary” Canadians than ever would be able to access the database (again, that their tax dollars paid to create). This might include groups like senior high school and university students, non-profits and everyday citizens who wanted to know more about their country. So yes, Statscan would have more competitors, but the country might also benefit from having a more data literate population (and thus potential consumers).
  2. Increase Accessibility of Canadian Data to Marginalized Groups: An increase in the country’s analysis capacity would drop the price for such work. This would make it cheaper and easier for more marginal groups to benefit from this data – charities, religious groups, NGO’s, community organizations, individuals, etc…
  3. Improve Competitiveness: It would also be good for Canadian competitiveness, companies would have to spend less to understand and sell into the Canadian market. This would lower the cost of doing business in Canada – helpful to consumers and the Canadian economy.
  4. StatsCan would not lose all or even most of its business: Those at StatsCan who fear the organization would be overwhelmed by a more open world should remember, not all the data can be shared. Some data – particularly economic data gathered from companies – is sensitive and confidential. As a result there will be some data that StatsCan retains exclusive access to, and thus a monopoly over analysis. More importantly, I suspect that were Statscan data made open the demand for data analysis would grow, so arguably new capacity might end up being devoted to new demand, not existing demand.
  5. It will Reduce the Cost of Government: Finally, the crazy thing about StatsCan is that it sells its data and services to other Ministries and layers of government. This means that governments are paying people to move tax payer money between government ministries and jurisdictions. This is a needless administrative costs that drives up everybody’s taxes and poorly allocates scarce government resources (especially at the local level). Assuming every town and city in Canada pays $50 – 1000 dollars to access statscan data may not seem like much, but in reality, we are really paying that, plus their and StatsCan’s staff time to manage all these transactions, enforce compliance, etc… all of which is probably, far, far more.

So in summary, the cost to Canada of releasing this data will likely be pretty marginal, while the benefits could be enormous.

At best, if costs half a million dollars in forgone revenue. Given the improved access and enormous benefits, this is a pittance to pay.

At worst, StatsCana would lose maybe 20-30 million – this is a real nightmare scenario that assumes much greater competition in the marketplace (again, a lot of assumptions in this scenario). Of course the improved access to data would lead to economic benefits that would far, far, surpass this lost revenue, so the net benefit for the country would be big, but the cost to StatsCan would be real. Obviously, it would be nice if this decline in revenue was offset by improved funding for StatsCan (something a government that was genuinely concerned about Canadian economic competitiveness would jump at doing). However, given the current struggles Statscan faces on the revenue front (cuts across the board) I could see how a worse case scenario would be nerve wracking to the department’s senior public servants, who are also still reeling from the Long Form Census debacle.

Ultimately, however, I think the worse case scenario is unlikely. Moreover, in either scenario the benefits are significant.

Bonus Material:

Possibly the most disconcerting part of the financial reports on StatsCan on Treasury Board’s website was the stakeholder consultation associated with access to statscan’s database. It claimed that:

Usability and client satisfaction survey were conducted with a sample of clients in early 2005. Declared level of satisfaction with service was very high.

This is stunning. I’ve never talked to anyone who has had a satisfactory experience on StatsCan’s website (in contrast to their phone support – which everyone loves). I refer to the statscan site where the place where what you want is always one click away.

I’m willing to bet a great deal that the consultations were with existing long term customers – the type of people that have experience using the website. My suspicion is that if a broader consultation was conducted with potential users (university students, community groups, people like me and you, etc…) the numbers would tank. I dare you to try to use their website. It is virtually unnavigable.

Indeed, had made its website and data more accessible I suspect it the department would engage Canadians and have more stakeholders. This would have been the single most powerful thing it could have done to protect itself from cuts and decisions like the Long Form fiasco.

I know this post may anger a number of people at Statscan. I’m genuinely sorry. I know the staff work hard, are dedicated and are exceedingly skilled and professional. This type of feedback is never flattering – particularly in public. It is because you are so important to the unity, economy and quality of life in our country that it is imperative we hold you to the highest possible bar – not just in the quality of that data your collect (there you already excel) but in the way you serve and engage Canadians. In this, I hope that you get the support you need and deserve.

9 thoughts on “Making StatsCan Data Free: Assessing the Cost

  1. Pingback: Tweets that mention Making StatsCan Data Free: Assessing the Cost | eaves.ca -- Topsy.com

  2. MotoDC

    Well writen, and very good points. I was surpised to find much of thier data a pay-for affair when i knew it was collected using tax dollars. I especially agree the website usability comments. That thing is as frustrating as heck!!

    Reply
  3. Pingback: datalibre.ca · Abolition to StatCan Cost Recovery Policy on the Census

  4. Hugh

    I’ve spent as much time trying to understand how to purchase StatsCan data as I have spent learning how to actually use it — do I want a standard variable tabulation? a custom tabulation? perhaps a semi-custom tabulation? How many “cells” would that be, and what is the cost per cell at that volume tier?

    The result is that I don’t want to risk either the financial cost or the time cost to access data I’m not 100% certain I could effectively use. That limits experimentation and innovation.

    Also, yes, the website seems dedicated to obfuscating data availability.

    Reply
  5. Robert

    I cannot believe we have to pay a fee to access Stats Can data. This is a joke, WTF are our tax dollars going for. I needed some IPPI data for some analysis and they want $69 for a few series of data. What a pain the ass. I am so use to using US BLS data (which is free). Unbelievable. $559,000 in web revenue! Total joke. I bet they spend that much trying to administer the the billing!

    Reply
  6. Pingback: StatsCan’s free data costs $2M – a rant | eaves.ca

  7. epidemic31

    Great post, great synopsis of the issue. I linked to here while looking for the upcoming ‘free’ data release, and found your well-written argument. Kudos.

    Reply
  8. Pingback: GTA Technology Topics, Tips & Tricks: Reminder Statistics Canada Making Data Free February 1st | The Modern MLIS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s