Tag Archives: open data

The Challenge of Open Data and Metrics

One promise of open data is its ability to inform citizens and consumers about the quality of local services. At the Gov 2.0 Summit yesterday the US Department of Health and Human Resources announced it was releasing data on hospitals, nursing homes and clinics in the hopes that developers will create applications that show citizens and consumers how their local hospitals stacks up against others. In short, how good, or even how safe, is their local hospital?

In Canada we already have some experience with this type of measuring. The Fraser Institute publishes an annual report card of schools performance in Alberta, BC, Ontario and Washington. (For those unfamiliar with the Fraser Institute it is a right-wing think tank based in Vancouver with, shall we say, dubious research credentials but strong ideological and fundraising goals.

Perhaps unsurprisingly, private schools do rather well in the Fraser Institute’s report card. Indeed it would appear (and I may be off by one here) that the t0p 18 schools on the list are all private. This does support a narrative that private schools are inherently better than state run schools that would be consistent with the Fraser Institute’s outlook. But, of course, that would be a difficult conclusion to sustain. Private schools tend to be populated with kids from wealthy families with better educated parents and have been given a blessed head start in life. Also, and not noted in the report card, is that many private schools are comfortable turfing out under-performing or unruly students. This means that the “delayed advancement rate,” one critical metric of a schools performance, is dramatically less impacted than a public school that cannot as easily send students packing.

Indeed, the Fraser Institute’s report card is rife with problems, something that teachers unions and, say,  equally ideological but left-oriented think tanks like the Centre for Policy Alternatives are all too happy to point out.

While I loath the Fraser Institute’s simplistic report card and think it is of dubious value to parents I do like that they are at least trying to give parents some tool by which to measure schools. The notion that schools, teachers and education quality can’t be measured, or are too complicated to measure is untenable. I suspect few parent – especially those in say, jobs where they are evaluated – believe it. Nor does such a position help parents assess the quality of education their child is receiving. While they understand, may be sympathetic to or even agree that this is a complicated issue it seems clear based on the success of Ontario’s school locator that many parents want and like these tools.

Ultimately the problem here isn’t the open data (despite what critics of the Ontario Government’s school comparison website would have you believe). Besides, are we now going to hide or suppress data so that parents can’t assess their kids schools? Nor is the problem school report cards per se. If anything is the problem it is that the Fraser Institute has had the field all to itself to play in. If teachers groups, other think tanks, or any other group believes that the Fraser Institute’s report cards are not too crude, why not design a better one? The data is available (and the government could easily be pressured to make more of it available). Why don’t teacher’s groups share with parents the metrics by which they believe parents should evaluate and compare schools? What this issue could use is some healthy competition and debate – one that generated more options and tools for parents.

The challenge for government is to make data more easily available. By making educational data more accessible, less time, IT skills and energy is needed to organize the data and precious resources can instead be focused on developing and visualizing the scoring methodology. This is certainly seems to be Health and Human Services approach: lower transaction costs, galvanize a variety of assessment applications and foster a healthy debate. It would be nice if ministries of education in Canada took a similar view.

But the second half of that challenge is also important, and groups outside of government need to recognize they can have a role, and the consequence of not participating. The mistake is to ask how to deal with groups like the Fraser Institute that use crude metrics, instead we need to encourage more groups and encourage our own organizations to contribute to the debate, to give it more nuance, and create better tools. Leaving the field to the Fraser Institute is a dangerous strategy, one that will serve few people. This is even more the case since in the future we are likely to have more, not less data about education, health and a myriad of other services and programs.

So, the challenge for readers is – will your organization participate?


Are you a Public Servant? What are your Open Data Challenges?

A number of governments have begun to initiate open data and open government strategies. With more governments moving in this direction a growing number of public servants are beginning to understand the issues, obstacles, challenges and opportunities surrounding open data and open government.

Indeed, these challenges are why many of these public servants frequent this blog.

This is precisely why I’m excited to share that, along with the Sunlight Foundation, the Personal Democracy Forum, Code for America, and GovLoop, I am helping Socrata in a recently launched survey aimed at government employees at the national, regional and local levels in the US and abroad about the progress of Open Data initiatives within their organization.

If you are a government employee please consider taking time to help us understand the state of Open Data in government. The survey is comprehensive, but given how quickly this field and the policy questions that come with it, is expanding, I think the collective result of our work could be useful. So, with that all said, I know you’re busy, but hope you’ll consider taking 10 minutes to fill out the survey. You can find it at: http://www.socrata.com/benchmark-study.

Creating effective open government portals

In the past few years a number of governments have launched open data portals. These sites, like www.data.gov or data.vancouver.ca share data – in machine readable formats (e.g. that you can play with on your computer) that government agencies collect.

Increasingly, people approach me and ask: what makes for a good open data portal? Great question. And now that we have a number of sites out there we are starting to learn what makes a site more or less effective. A good starting point for any of this is 8 Open Government principles, and for those newer to this discussion, there are the 3 laws of open data (also available in German Japanese, Chinese, Spanish, Dutch and Russian).

But beyond that, I think there are some pretty tactical things, data portal owners should be thinking about. So here are some issues I’ve noticed and thought might be helpful.

1. It’s all about automating the back end

Probably the single greatest mistake I’ve seen governments make is, in the rush to get some PR or meet an artificial deadline, they create a data portal in which the data must be updated manually. This means that a public servant must run around copying the data out of one system, converting (and possibly scrubbing it of personal and security information) and then posting it to the data portal.

There are a few interrelated problems with this approach. Yes, it allows you to get a site up quickly but… it isn’t sustainable. Most government IT departments don’t have a spare body that can do this work part time, even less so if the data site were to grow to include 100s or 1000s of data sets.

Consequently, this approach is likely to generate ill-will towards the government, especially from the very community of people who could and should be your largest supporters: local tech advocates and developers.

Consider New York, here is a site where – from I can tell – the data is not regularly updated and grumblings are getting louder. I’ve heard similar grumblings out of some developers and citizens in Canadians cities where open data portals get trumpeted despite infrequent updates and having few data sets available.

If you are going to launch an open data portal, make sure you’ve figured out how to automate the data updates first. It is harder to do, but essential. In the early days open data sites often live and die based on the engagement of a relatively small community or early adopters – the people who will initially make the data come alive and build broader awareness. Frustrate the community and the initiative will have a harder time gaining traction.

2. Keep the barriers low

Both the 8 principles and 3 laws talk a lot about licensing. Obviously there are those who would like the licenses on many existing portals to be more open, but in most cases the licenses are pretty good.

What you shouldn’t do is require users to register. If the data is open, you don’t care who is using it and indeed, as a government, you don’t want the hassle of tracking them. Also, don’t call your data open if members must belong to a educational institution or a non-profit. That is by definition not data that is open (I’m looking at you StatsCan, its not liberated data if only a handful of people can look at it, sadly, you’re not the only site to do this). Worst is one website that, in order to access the online catalogue you have to fax in a form outlining who you are.

This is the antithesis of how an open data portal should work.

3. Think like (or get help from) good librarians and designers

The real problem is when sites demand too much of users to even gain access to the data. Readers of this blog know about my feelings regarding Statistics Canada’s website, the data always seems to be one click away. Of course, that’s if you even think you are able to locate the data you are interested in, which usually seems impossible to find.

And yes, I know that Statistics Canada’s phone operators are very helpful and can help you locate datasets quickly – but I submit to you that this is a symptom of a problem. If every time I went to Amazon.com I had to call a help desk to find the book I was interested in I don’t think we’d be talking about how great Amazon’s help desk was. We’d be talking about how crappy their website is.

The point here is that an open data site is likely to grow. Indeed, looking at data.gov and data.gov.uk these sites now have thousands of data sets on them. In order to be navigable they need to have excellent design. More importantly, you need to have a new breed of librarian – one capable of thinking in the online space – to help create a system where data sets can be easily and quickly located.

This is rarely a problem early on (Vancouver has 140 data sets up, Washington DC, around 250, these can still be trolled through without a sophisticated system). But you may want to sit down with a designer and a librarian during these early stages to think about how the site might evolve so that you don’t create problems in the future.

4. Feedback

Finally, I think good open data portals want, and even encourage feedback. I like that data.vancouver.ca has a survey on the site which asks people what data sets they would be interested in seeing made open.

But more importantly, this is an area where governments can benefit. No data set is perfect. Most have a typo here or there. Once people start using your data they are going to find mistakes.

The best approach is not to pretend like the information is perfect (it isn’t, and the public will have less confidence in you if you pretend this is true). Instead, ask to be notified about errors. Remember, you are using this data internally, so any errors are negatively impacting your own planning and analysis. By harnessing the eyes of the public you will be able to identify and fix problems more quickly.

And, while I’m sure we all agree this is probably not the case, maybe the face that the data us public, there will be a small added incentive to fixing it quickly. Maybe.


Interview on Open Source, Open Gov & Open Data withe CSEDEV

The other week – in the midst of boarding a plane(!) – I did an interview with the CSEDEV on some thoughts around open data, open government and open source.

The kind people at CSEDEV have written up the interview in a kind of paraphrased way and published it as three short blog posts here, part 2 here and part 3 here.

Part of what makes this interesting to me is how a broader set of people are becoming interested in open government. Take CSEDEV for example. Here is an Ottawa based software firm focused on enterprise solutions. It’s part of an increasing number of software companies and IT consulting firms are taking note of the open government and open data meme. Indeed, another concrete example of this is Lagan, a large supplier of 311 systems, announced the other week that they would support the open311 standard. This dramatically alters the benefits of a 311 system and the capacity for it to serve as a platform and innovation driver for a city.

But, even more exciting, the meme is starting to spread beyond IT and software. I was recently asked to write an article on what open data and open government means for business more generally, here in BC. (Will link to it, when published)

These moments represent an important shift in the open data and open government debate. With vendors and consultants taking notice governments can more easily push for, and expect, off the shelf solutions that support open government initiatives. Not only could this reduce cost to government and improve access for public servants and citizens, it could also be a huge boost for open standards which prove to be transformative to the management of information in the public sector.

Exciting times. Watch the open government space – now that it’s linked to IT, it’s beginning to gain speed.

The week in review (or… why I blog and a thank you)

Here’s a few snippets of comments, emails and other communications I’ve had this week in response to specific posts or just the blog in general. Each one touches on why I love blogging and my readers and why this blog has come to mean so much to me.

Venting, and finding out your not alone…

So, yesterday I got a little bit into a hate-on for Statistics Canada’s website. It wasn’t the first time and pretty much every time I do it I find another soul out there whose had their soul crushed by the website as well. Take this comment from last week:

Re: Stats Canada’s website being unusable. I completely frickin agree. God. Has anyone in government actually tried to use that website? An econ professor gave our class an assigment last year that involved looking stuff up on Statscan. Half of our class failed the assignment because they gave up and the other half had the wrong data, but got the marks anyways for trying. I think he actually took that assigment off of the grading at the end. It’s a bloody gong show…

Sometimes it makes me feel more human knowing that others are out there struggling with the same thing. StatsCan does great work… I just wish they made it accessible.

…and then having some kind souls find some solutions for you.

But as nice as knowing you’re not alone… even better is how often the internet connects you to others who just happen to have that esoteric piece of knowledge that saves the day.

I agree, Stats Can is one of the worst government websites out there (specifically those stupid CANSIM tables), one that, as a policy analyst with XXXXXXXX Canada, i frequently have to use to get data. I had the data for XXXXXXX and it wasn’t hard to get it for the country.

This kind soul led me straight to a completely different page on statscan that happened to have the data I was looking for. (for those interested, it was here).

And they weren’t the only one. Another reader posted a link to the data over twitter…

Thank god there is an army good natured amateur and professional experts experienced in navigating the byzantine structure of the statscan website!

So… thank you! I’m going to try to grind out an updated pan-North American version of the Fatness Index this weekend.

Impacting Policy

But this week also had that other rewarding ingredient I love to get: hearing about a post helped, incrementally, foster better public policy. This came in via email from a public servant about yesterday’s blog post:

Your blog today provided a good example in a meeting with government colleagues about the benefits of opening data. It illustrates the implications of not releasing data to the public (e.g. stifling innovation)… It resonated well with them.

This is a huge part of why I blog. Part of it is to explore ideas, part of it is to introduce ideas and thoughts, but a big piece of it is to enable public servants and do just this, helps small internal government meeting (on subjects like open data) go a little more smoothly.

So to everyone out there, be it policy wonks, students, public servants, politicians or ordinary, engaged citizens. Thank you. It was a good week. We wrote some good posts, some good comments, had an original story on the stupidity of the census, and maintained sanity in the face of the StatsCan website. Thank you everyone for making it so fun. Hope you all have a great weekend. – Dave

Fatness Index 2 years on: the good, the bad, the ugly

Two years ago I saw that Richard Florida and Andrew Sullivan had re-posted a map created by calorielab that color-coded US states by weight.

As I found it interesting I created a North America wide map the included Canadian data (knowing that it probably would be a perfect apple to apple comparisons). The map and subsequent blog post turned into one of my best viewed pages with well over 20,000 pageviews.

The very cool people over at Calorie Labs informed me that they have released an updated version of the American map (posted below, you can see the original at their site here). Not too much has changed, but after looking at the map I’ve a few comments.

Calorie lab’s release of an updated version of the map has triggered a few thoughts and some lessons that I think should matter to policy makers, health-care professionals and citizens in general. Here they are:

The Good

The amazing people at Calorie Lab. When I created the map 2 years ago I didn’t even check to see if their work was copyrighted. Although the data was public domain, I copied Calorielab’s colour palette as I was trying to create a “mash-up” of their work with Canadian data. I wanted the maps to look similar. My map was a derivative work.

Did the people at Calorielab freak out? No. Quite the opposite. They reached out, said thank you and asked if I needed help.

It seems this year they’ve gotten even cooler. I don’t remember if the original map’s license but with the publishing of their 2010 update they wrote:

CalorieLab’s United States of Obesity 2010 map is licensed for use by anyone in any media and can be downloaded in various formats (small GIF, large GIF, SVG, EPS).

There’s a line directed specifically at people like me. It says, please, use this map! Not only is the license open but they’ve provided it in lots of formats (Which is great cause two years I had to recreate the thing from scratch and it took hours).

So naturally you are wondering, where is David’s 2010 mashup-Northern American Fatness Index.

The Bad

The bad is that trying to find the Canadian data is a pain. A couple of times a year I get a cool idea for a visual or graph that Statistics Canada data might help me create. In minutes I’m on their webpage and, within 5 minutes, I’m walking away from my computer fearing I might throw it out the window.

StatCans website may be the worst, most inaccessible government website in the western world. Whatever data you are looking for always seems to be at least one more click away.

It spent an hour trying to find data that StatsCan allegedly wants me to find. (This in an era of google where I generally find data people don’t want me to find, in minutes). Ultimately, I think I found the relevant data on overwieight/obesity figures by province (but who knows! Should I be choosing peer group A, or B, or C, D, E, F, G? None of which have labels explaining what they mean!).

The Ugly

Sadly, it gets worse. Even if you a) locate the data on Statscan’s website and b) it is free, it will probably still be inaccessible.  The only way the data can be viewed is with a Beyond 20/20 Professional Browser. You need to learn a new software package, one 99.9% of Canadians have never heard of, and that only works on a PC (I’m on a mac). The data I want is pretty simple, a CSV file, or even an Excel spreadsheet would be sufficient, something the average Canadian could access. But I guess it is not to be.

So I give up.

You win StatsCan. There are 10s of thousands of Canadians like me who would love to do interesting things with the data our tax dollars paid to collect, but even when your data is free and “open,” it isn’t. You’ve enjoyed tremendous support in the last month from those Canadians who understand why you are important (including me) but many Canadians have had to go up a steeper learning curve around why they should care. I might suggest they’d have gotten up that curve faster if they too could have used your data.

Myself, healthcare professionals, students and countless others could paint innumerable stories explaining Canadians and Canada to one another – helping us grasp our history, our social and health challenges, as well simply who we are. But we can’t.

In the end I’m still one of your biggest supporters, but frankly even I feel alienated.

Note: If someone wants to help me get this data, I’ll take a cut at recreating the map again, otherwise, as I said before. I give up.

How Science Is Rediscovering "Open" And What It Means For Government

Pretty much everybody in government should read this fantastic New York Times article Sharing of Data Leads to Progress on Alzheimer’s. On one hand the article is a window into what has gone wrong with science – about how all to frequently a process that used to be competitive but open, and problem focused has become a competitive but closed and intellectual property driven (one need only look at scientific journals to see how slow and challenging the process has become).

But strip away the talk about the challenges and opportunities for science. At its core, this is an article is about something more basic and universal. This is an article about open data.

Viewed through this lens it is a powerful case study for all of us. It is a story of how one scientific community’s (re)discovery of open principles can yield powerful lessons and analogies for the private sector and, more importantly the public sector.

Consider first, the similarities in problems. From the article:

Dr. Potter had recently left the National Institutes of Health and he had been thinking about how to speed the glacial progress of Alzheimer’s drug research.

“We wanted to get out of what I called 19th-century drug development — give a drug and hope it does something,” Dr. Potter recalled in an interview on Thursday. “What was needed was to find some way of seeing what was happening in the brain as Alzheimer’s progressed and asking if experimental drugs could alter that progression.”

Our government’s are struggling too. They are caught with a 20th-century organizational, decision-making and accountability structures. More to the point, they move at a glacial speed. On the one hand we should be worried about a government that moves too quickly, but a government that is too slow to be responsive to crises or to address structural problems is one that will lose the confidence of the public. Moreover, like in healthcare, many of the simpler problems have been addressed. citizens are looking for solutions to more complex problems. As with the scientists and Alzheimer’s we may need new models to speed the process up for understanding and testing solutions for these issues.

To overcome this 19th century approach – and achieve the success they currently enjoy – the scientists decided to do some radical.

The key to the Alzheimer’s project was an agreement as ambitious as its goal: not just to raise money, not just to do research on a vast scale, but also to share all the data, making every single finding public immediately, available to anyone with a computer anywhere in the world.

No one would own the data. No one could submit patent applications, though private companies would ultimately profit from any drugs or imaging tests developed as a result of the effort.

Consider this. Here a group of private sector companies recognize the intellectual property slows down innovation. The solution – dilute the intellectual property, focus on sharing data and knowledge, and understand that those who contribute most will be best positioned to capitalize on the gains at the end.

Sadly this is the same problem faced within governments. Sometimes it has to do with actual intellectual property (something I’ve recently argued our governments should abandon). However, the real challenge isn’t about about formal rules, it is more subtle. In complex siloed organizations where knowledge is power the incentives to maximize influence are to not share knowledge and data. Better to use the information you have strategically, in a limited fashion, to maximize influence. The result, data is kept as a scarce, but strategic asset. This is a theme I tackled both in my chapter in Open Government and in blog posts like this one.

In short, the real challenge is structural and cultural. Scientists had previously existed in a system where reputation (and career advancement) was built by hoarding data and publishing papers. While the individual incentives were okay, collectively this behavior was a disaster. The problem was not getting solved.

Today, it would appear that publishing is still important, but there are reputational effects from being the person or group to share data. Open data is itself a currency. This is hardly surprising. If you are sharing data it means you are doing lots of work, which means you are likely knowledgeable. As a result, those with a great deal of experience are respected but there remains the opportunity for those with radical ideas and new perspectives to test hypothesis and gain credibility by using the open data.

Unsurprisingly, this shift wasn’t easy:

At first, the collaboration struck many scientists as worrisome — they would be giving up ownership of data, and anyone could use it, publish papers, maybe even misinterpret it and publish information that was wrong.

Wow, does that sound familiar. This is invariably the first question government officials ask when you begin talking about open data. The answer, both in the scientific community and for government, is that you either believe in the peer-review process and public debate, or you don’t. Yes, people might misrepresent the data, or publish something that is wrong, but the bigger and more vibrant the community, the more likely people will find and point out the errors quickly. This is what innovation looks like… people try out ideas, sometimes they are right, sometimes they are wrong. But the more data you make available to people the more ideas can be tested and so the faster the cycle of innovation can proceed.

Whether it is behind the firewall or open to the public, open data is the core to accelerating the spread of ideas and the speed of innovation. These scientists are rediscovering that fact as our some governments. We’ve much to learn and do, but the case is becoming stronger and stronger that this is the right thing to do.

Creating Open Data Apps: Lessons from Vantrash Creator Luke Closs

Last week, as part of the Apps for Climate Action competition (which is open to anyone in Canada), I interviewed the always awesome Luke Closs. Luke, along with Kevin Jones, created VanTrash, a garbage pick up reminder app that uses open data from the City of Vancouver. In it, Luke shares some of the lessons learned while creating an application using open data.

As the deadline for the Apps for Climate Action competition approaches (August 8th) we thought this might help those who are thinking about throwing their hat in the ring last minute.

Some key lessons from Luke:

  • Don’t boil the ocean: Keep it simple – do one thing really, really well.
  • Get a beta up fast: Try to scope something you can get a rough version working in day or evening – that is a sure sign that it is doable
  • Beta test: On friends and family. A lot.
  • Keep it fun: do something that develops a skill or let’s you explore a technology you’re interested in

Open Canada – Hello Globe and Mail?

Richard Poynder has a wonderful (and detailed) post on his blog Open and Shut about the state of open data in the UK. Much of it covers arguments about why open data matters economically and democratically (the case I’ve been making as well). It is worthwhile reading for policy makers and engaged citizens.

There is however a much more important lesson buried in the article. It is in regard to the role of the Guardian newspaper.

As many of you know I’ve been advocating for Open Data at all levels of government, and in particular, at the federal level. This is why I and others created datadotgc.ca: If the government won’t create an open data portal, we’ll create one for them. The goal of course, was to show them that it already does open data, and that it could do a lot, lot more (there is a v2 of the site in the works that will offer some more, much cooler functionality coming soon).

What is fascinating about Poynder’s article is the important role the Guardian has played in bringing open data to the UK. Consider this small excerpt from his post.

For The Guardian the release of COINS marks a high point in a crusade it began in March 2006, when it published an article called “Give us back our crown jewels” and launched the Free Our Data campaign. Much has happened since. “What would have been unbelievable a few years ago is now commonplace,” The Guardian boasted when reporting on the release of COINS.

Why did The Guardian start the Free Our Data campaign? Because it wanted to draw attention to the fact that governments and government agencies have been using taxpayers’ money to create vast databases containing highly valuable information, and yet have made very little of this information publicly available.

The lesson here is that a national newspaper in the UK played a key role in pressuring a system of government virtually identical to our own (now also governed by a minority, conservative lead government) to release one of the most important data in its possession – the Combined Online Information System (COINS). This on top of postal codes and what we would find in Stats Canada’s databases.

All this leads me to ask one simple question. Where is the Globe and Mail? I’m not sure its editors have written a single piece calling for open data (am I wrong here?). Indeed, I’m not even sure the issue is on their radar. It certainly has done nothing close to launching a “national campaign.” They could do the Canadian economy, democracy and journalism and world of good. Open data can be championed by individual advocates such as myself but having a large media player repeatedly raising the issue, time and time again brings out the type of pressure few individuals can muster.

All this to say, if the Globe ever gets interested, I’m here. Happy to help.

Canadian Open Cities Update

For those who have not been following the news there have been a couple of exciting developments on the open data front at the municipal level in Canada.

First off, the City of Edmonton has launched its Apps competition, details can be found at the Apps4Edmonton website.

Second, it looks like the City of London, Ontario is may do a pilot of open data – thanks to the vocal activism of local developers and community organizers the Mayor of London expressed interesting in doing a pilot at the London Changecamp. As mentioned, there is a vibrant and active community in London, Ontario so I hope this effort takes flight.

Third, and much older, is that Ottawa approved doing open data, so keep an eye on this website as things begin to take shape

The final municipal update is the outlier… Turns out that although Calgary passed a motion to do open data a few months ago the roll out keeps getting delayed by a small group of city councillors. Reasons are murky especially since I’m told by local activists that the funds have already been allocated and that everything is set to go. Will be watching this unfold with interest.

Finally, unrelated to municipal data, but still important (!), Apps4Climate Action has extended the contest deadline due to continued interest in the contest. The new submission deadline is August 8th.

Hope everyone has a great weekend. Oh, and if you haven’t already, please join the facebook group “let’s get 100,000 Canadian to op out of yellow pages delivery.” Already, in less than a week, over 800 Canadians have successfully opted of receiving the yellow pages. Hope you’ll join too.