Tag Archives: open data

Learning from Libraries: The Literacy Challenge of Open Data

We didn’t build libraries for a literate citizenry. We built libraries to help citizens become literate. Today we build open data portals not because we have public policy literate citizens, we build them so that citizens may become literate in public policy.

Yesterday, in a brilliant article on The Guardian website, Charles Arthur argued that a global flood of government data is being opened up to the public (sadly, not in Canada) and that we are going to need an army of people to make it understandable.

I agree. We need a data-literate citizenry, not just a small elite of hackers and policy wonks. And the best way to cultivate that broad-based literacy is not to release in small or measured quantities, but to flood us with data. To provide thousands of niches that will interest people in learning, playing and working with open data. But more than this we also need to think about cultivating communities where citizens can exchange ideas as well as involve educators to help provide support and increase people’s ability to move up the learning curve.

Interestingly, this is not new territory.  We have a model for how to make this happen – one from which we can draw lessons or foresee problems. What model? Consider a process similar in scale and scope that happened just over a century ago: the library revolution.

In the late 19th and early 20th century, governments and philanthropists across the western world suddenly became obsessed with building libraries – lots of them. Everything from large ones like the New York Main Library to small ones like the thousands of tiny, one-room county libraries that dot the countryside. Big or small, these institutions quickly became treasured and important parts of any city or town. At the core of this project was that literate citizens would be both more productive and more effective citizens.

But like open data, this project was not without controversy. It is worth noting that at the time some people argued libraries were dangerous. Libraries could spread subversive ideas – especially about sexuality and politics – and that giving citizens access to knowledge out of context would render them dangerous to themselves and society at large.  Remember, ideas are a dangerous thing. And libraries are full of them.

Cora McAndrews Moellendick, a Masters of Library Studies student who draws on the work of Geller sums up the challenge beautifully:

…for a period of time, censorship was a key responsibility of the librarian, along with trying to persuade the public that reading was not frivolous or harmful… many were concerned that this money could have been used elsewhere to better serve people. Lord Rodenberry claimed that “reading would destroy independent thinking.” Librarians were also coming under attack because they could not prove that libraries were having any impact on reducing crime, improving happiness, or assisting economic growth, areas of keen importance during this period… (Geller, 1984)

Today when I talk to public servants, think tank leaders and others, most grasp the benefit of “open data” – of having the government sharing the data it collects. A few however, talk about the problem of just handing data over to the public. Some questions whether the activity is “frivolous or harmful.” They ask “what will people do with the data?” “They might misunderstand it” or “They might misuse it.” Ultimately they argue we can only release this data “in context”. Data after all, is a dangerous thing. And governments produce a lot of it.

As in the 19th century, these arguments must not prevail. Indeed, we must do the exact opposite. Charges of “frivolousness” or a desire to ensure data is only released “in context” are code to obstruct or shape data portals to ensure that they only support what public institutions or politicians deem “acceptable”. Again, we need a flood of data, not only because it is good for democracy and government, but because it increases the likelihood of more people taking interest and becoming literate.

It is worth remembering: We didn’t build libraries for an already literate citizenry. We built libraries to help citizens become literate. Today we build open data portals not because we have a data or public policy literate citizenry, we build them so that citizens may become literate in data, visualization, coding and public policy.

This is why coders in cities like Vancouver and Ottawa come together for open data hackathons, to share ideas and skills on how to use and engage with open data.

But smart governments should not only rely on small groups of developers to make use of open data. Forward-looking governments – those that want an engaged citizenry, a 21st-century workforce and a creative, knowledge-based economy in their jurisdiction – will reach out to universities, colleges and schools and encourage them to get their students using, visualizing, writing about and generally engaging with open data. Not only to help others understand its significance, but to foster a sense of empowerment and sense of opportunity among a generation that could create the public policy hacks that will save lives, make public resources more efficient and effective and make communities more livable and fun. The recent paper published by the University of British Columbia students who used open data to analyze graffiti trends in Vancouver is a perfect early example of this phenomenon.

When we think of libraries, we often just think of a building with books.  But 19th century mattered not only because they had books, but because they offered literacy programs, books clubs, and other resources to help citizens become literate and thus, more engaged and productive. Open data catalogs need to learn the same lesson. While they won’t require the same centralized and costly approach as the 19th century, governments that help foster communities around open data, that encourage their school system to use it as a basis for teaching, and then support their citizens’ efforts to write and suggest their own public policy ideas will, I suspect, benefit from happier and more engaged citizens, along with better services and stronger economies.

So what is your government/university/community doing to create its citizen army of open data analysts?

Apps for Climate Action Update – Lessons and some new sexy data

ttl_A4CAOkay, so I’ll be the first to say that the Apps4Climate Action data catalog has not always been the easiest to navigate and some of the data sets have not been machine readable, or even data at all.

That however, is starting to change.

Indeed, the good news is three fold.

First, the data catalog has been tweaked and has better search and an improved capacity to sort out non-machine readable data sets. A great example of a government starting to think like the web, iterating and learning as the program progresses.

Second, and more importantly, new and better sets are starting to be added to the catalog. Most recently the Community Energy and Emissions Inventories were released in an excel format. This data shows carbon emissions for all sorts of activities and infrastructure at a very granular level. Want to compare the GHG emissions of a duplex in Vancouver versus a duplex in Prince George? Now you can.

Moreover, this is the first time any government has released this type of data at all, not to mention making it machine readable. So not only have the app possibilities (how green is your neighborhood, rate my city, calculate my GHG emissions) all become much more realizable, but any app using this data will be among the first in the world.

Finally, probably one of the most positive outcomes of the app competition to date is largely hidden from the public. The fact that members of the public have been asking for better data or even for data sets at all(!) has made a number of public servants realize the value of making this information public.

Prior to the competition making data public was a compliance problem, something you did but you figured no one would ever look at or read it. Now, for a growing number of public servants, it is an innovation opportunity. Someone may take what the government produces and do something interesting with it. Even if they don’t, someone is nonetheless taking interest in your work – something that has rewards in of itself. This, of course, doesn’t mean that things will improve over night, but it does help advance the goal of getting government to share more machine readable data.

Better still, the government is reaching out to stakeholders in the development community and soliciting advice on how to improve the site and the program, all in a cost-effective manner.

So even within the Apps4Climate Action project we see some of the changes the promise of Government 2.0 holds for us:

  • Feedback from community participants driving the project to adapt
  • Iterations of development conducted “on the fly” during a project or program
  • Success and failures resulting in queries in quick improvements (release of more data, better website)
  • Shifting culture around disclosure and cross sector innovation
  • All on a timeline that can be measured in weeks

Once this project is over I’ll write more on it, but wanted to update people, especially given some of the new data sets that have become available.

And if you are a developer or someone who would like to do a cool visualization with the data, check out the Apps4Climate Action website or drop me an email, happy to talk you through your idea.

Open Data: An Example of the Long Tail of Public Policy at Work

VancouverGraffiti_AnalysisAs many readers know, Vancouver passed what has locally been termed the Open3 motion a year ago and has had a open data portal up and running for several months.

Around the world much of the focus of open data initiatives have focused on the development of applications like Vancouver’s Vantrash, Washington DC’s Stumble Safely or Toronto’s Childcare locator. But the other use of data portals is to actually better understand and analyze phenomena in a city – all of which can potentially lead to a broader diversity of perspectives, better public policy and a more informed public and/or decision makers.

I was thus pleased to find out about another example of what I’ve been calling the Long Tail of Public Policy when I received an email from Victor Ngo, a student at the University of British Columbia who just completed his 2nd year in the Human Geography program with an Urban Studies focus (He’s also a co-op student looking for a summer job – nudge to the City of Vancouver).

It turns out that last month, he and two classmates did a project on graffiti occurrence and its relationship to land use, crime rates, and socio-economic variables. As Victor shared with me:

It was a group project I did with two other members in March/April. It was for an introductory GIS class and given our knowledge, our analysis was certainly not as robust and refined as it could have been. But having been responsible for GIS analysis part of the project, I’m proud of what we accomplished.

The “Graffiti sites” shapefile was very instrumental to my project. I’m a big fan of the site and I’ll be using it more in the future as I continue my studies.

So here we have University students in Vancouver using real city data to work on projects that could provide some insights, all while learning. This is another small example of why open data matters. This is the future of public policy development. Today Victor may be a student, less certain about the quality of his work (don’t underestimate yourself, Victor) but tomorrow he could be working for government, a think tank, a consulting firm, an insurance company or a citizen advocacy group. But wherever he is, the open data portal will be a resource he will want to turn to.

With Victor’s permission I’ve uploaded his report, Graffiti in the Urban Everyday – Comparing Graffiti Occurrence with Crime Rates, Land Use, and Socio-Economic Indicators in Vancouver, to my site so anyone can download it. Victor has said he’d love to get people’s feedback on it.

And what was the main drawback of using the open data? There wasn’t enough of it.

…one thing I would have liked was better crime statistics, in particular, the data for the actual location of crime occurrence. It would have certainly made our analysis more refined. The weekly Crime Maps that the VPD publishes is an example of what I mean:

http://vancouver.ca/police/CrimeMaps/index.htm

You’re able to see the actual location where the crime was committed. We had to tabulate data from summary tables found at:

http://vancouver.ca/police/organization/planning-research-audit/neighbourhood-statistics.html

To translate: essentially the city releases this information in a non-machine-readable format, meaning that citizens, public servants at other levels of government and (I’m willing to wager) City of Vancouver public servants outside the police department have to recreate the data in a digital format. What a colossal waste of time and energy. Why not just share the data in a structured digital way? The city already makes it public, why not make it useful as well? This is what Washington DC (search crime) and San Francisco have done.

I hope that more apps get created in Vancouver, but as a public policy geek, I’m also hoping that more reports like these (and the one Bing Thom architects published on the future of Vancouver also using data from the open data catalog) get published. Ultimately, more people learning, thinking, writing and seeking solutions to our challenges will create a smarter, more vibrant and more successful city. Isn’t that what you’d want your city government (or any government, really…) to do?

On Journalism & Crowdsourcing: the good, the bad, the ugly

Last week the Vancouver Sun (my local paper) launched a laudable experiment. They took all of the campaign finance data from the last round of municipal elections in the Lower Mainland (the Greater Vancouver area in Canada) and posted a significant amount of it on their website. This is exactly the type of thing I’ve been hoping that newspapers would do more of in Canada (much like British newspapers – especially The Guardian – have done). I do think there are some instructive lessons, so here is a brief list of what I think is good, bad and ugly about the experiment.

The Good:

That it is being done at all. For newspapers in Canada to do anything other than simply repackage text that was (or wasn’t) going to end up in the newsprint sadly still counts as innovation here. Seriously, someone should be applauding the Vancouver sun team. I am. I hope you will to. Moreover, enabling people to do some rudimentary searches is interesting – mostly as people will want to see who the biggest donors are. Of course, no surprise to learn that in many cases the biggest donors in municipal elections (developers) give to all the major parties or players… just to cover their bets. Also interesting is that they’ve invited readers to add “If you find something interesting in the database that you want to share with other readers, go to The Sun’s Money & Influence blog at vancouversun.com/influence and post a comment” and is looking for people to sniff out news stories.

While it is great that the Vancouver Sun has compiled this data, it will be interesting to see who, if anyone uses their data. A major barrier here is the social contract between the paper and those it is looking to engage. The paper won’t actually let you access the data – only run basic searches. This is because they don’t want readers running off and doing something interesting with the data on another website. But this constraint also means you can’t visualize it, (for example put it into a spread sheet and graph) or try to analyze it in some interesting ways. Increasingly our world isn’t one where we tell the story in words, we tell is visually with graphs, charts and visuals… that is the real opportunity here.

I know a few people who would love to do something interesting with the data (like John Jensen or Luke Closs), if they could access it. I also understand that the Vancouver Sun wants the discussion to take place on their page. But if you want people to use the data and do something interesting with it, you have to let them access it: that means downloading it or offering up an API (This is what The Guardian, a newspaper that is serious about letting people use their data, does.). What the Sun could have done was distribute it with an attribution license, so that anybody who used the API had to at least link back to The Sun. But I don’t know a single person out there who with or without a license wouldn’t have linked back to the Sun, thanked them, and driven a bunch a traffic to them. Moreover, if The Sun had a more open approach, it could have likely even enlisted people to to data entry on campaign donations in other districts around the province. Instead, many of the pages for this story sit blank. There are few comment but some like these two that are not relevant and the occasional gem like this one). There is also one from John Jensen, open data hackathon regular who has been trying to visualize this data for months but been unable to since typing up all the data has been time consuming.

At the end of the day, if you want readers to create content for you, to sniff out stories and sift through data, you have to trust them, and that means giving them real access. I can imagine that feels scary. But I think it would work out.

The Ugly:

The really ugly part about this story is that the Vancouver Sun needed to do all this data entry in the first place. Since campaigns are legally required to track donations most track them using… MicroSoft Excel. Then, because the province requires that candidates disclose donations the city in which the candidate is running insists that they submit the list of donations in print. Then that form gets scanned and saved as a PDF. If, of course, the province’s campaign finance law’s were changed so as to require you to submit your donations in an electronic format, then all of the data entry the Sun had to do would disappear and suddenly anyone could search and analyze campaign donations. In short, even though this system is suppose to create transparency, we’ve architected it to be opaque. The information is all disclosed, we’ve just ensured that it is very difficult and expensive to sort through. I’m sadly, not confident that the BC Election Task Force is going to change that although I did submit this as a recommendation.

Some Ideas:

1) I’d encourage the Vancouver Sun to make available the database they’ve cobbled together. I think if they did, I know I would be willing to help bring together volunteers to add donation data from more municipalities and to help create some nice visualizations of the data. I also think it would spark a larger discussion both on their website, and elsewhere across the internet (and possibly even other mediums) around the province. This could become a major issue. I even suspect that there would be a number of people at the next open data hackathon who would take this issue up.

2) Less appealing is to scrape the data set off the Vancouver Sun’s website and then do something interesting with it. I would, of course, encourage whoever did that to attribute the good work of the Vancouver Sun, link back to them and even encourage readers to go and participate in their discussion forum.

CIO Summit recap and links

Yesterday I was part of a panel at the CIO Summit, a conference for CIO’s of the various ministries of the Canadian Government.  There was lots more I would have liked to have shared with the group, so I’ve attached some links here as a follow up for those in (and not in) attendance, to help flesh out some of my thoughts:

1. Doing mini-GCPEDIAcamps or WikiCamps

So what is a “camp“? Check out Wikipedia! “A term commonly used in the titles of technology-related unconferences, such as Foo Camp and BarCamp.” In short, it is an informal gathering of people who share a common interest who gather to share best practices or talk about the shared interest.

There is interest in GCPEDIA across the public service but many people aren’t sure how to use it (in both the technical and social sense). So let’s start holding small mini-conferences to help socialize how people can use GCPEDIA and help get them online. Find a champion, organize informally, do it at lunch, make it informal, and ensure there are connected laptops or computers on hand. And do it more than once! Above all, a network peer-based platform, requires a networked learning structure.

2. Send me a Excel Spreadsheet of structured data sets on your ministries website

As I mentioned, a community of people have launched datadotgc.ca. If you are the CIO of a ministry that has structured data sets (e.g. CVS, excel spreadsheets, KML, SHAPE files, things that users can download and play with, so not PDFs!) drop the URLs of their locations into an email or spreadsheet and send it to me! I would love to have your ministry well represented on the front page graph on datadotgc.ca.

3. Some links to ideas and examples I shared

– Read about how open data help find/push the CRA to locate $3.2B dollar in lost tax revenue.

– Read about how open data needs to be part of the stimulus package.

– Why GCPEDIA could save the public service here.

– Check out Vantrash, openparliament is another great site too.

– The open data portals I referenced: the United States, the United Kingdom, The World Bank, & Vancouver’s

4. Let’s get more people involved in helping Government websites work (for citizens)

During the conference I offered to help organize some Government DesignCamps to help ensure that CLF 3 (or whatever the next iteration will be called) helps Canadians navigate government websites. There are people out there who would offer up some free advice – sometimes out of love, sometimes out of frustration – that regardless of their motivation could be deeply, deeply helpful. Canada has a rich and talented design community including people like this – why not tap into it? More importantly, it is a model that has worked when done right. This situation is very similar to the genesis of the original TransitCamp in Toronto.

5. Push your department to develop an Open Source procurement strategy

The fact is, if you aren’t even looking at open source solutions you are screen out part of your vendor ecosystem and failing in your fiduciary duty to engage in all options to deliver value to tax payers. Right now Government’s only seem to know how to pay LOTS of money for IT. You can’t afford to do that anymore. GCPEDIA is available to every government employee, has 15,000 users today and could easily scale to 300,000 (we know it can scale because Wikipedia is way, way bigger). All this for the cost of $60K in consulting fees and $1.5M in staff time. That is cheap. Disruptively cheap. Any alternative would have cost you $20M+ and, if scaled, I suspect $60M+.

Not every piece of software should necessarily be open source, but you need to consider the option. Already, on the web, more and more governments are looking at open source solutions.

Help with datadotgc.ca

For regular readers of my blog I promise not to talk too much about datadotgc.ca here at eaves.ca. I am going to today since I’ve received a number of requests from people asking if and how they could help so I wanted to lay out what is on my mind at the moment, and if people had time/capacity, how they could help.

The Context

Next wednesday I’ll be doing a small presentation to all the CIO’s of the federal public service. During that presentation I’d like to either go live to datadotgc.ca or at least show an up to date screen shot (if there is no internet). It would be great to have more data sets in the site at that time so I can impress upon this group a) how little machine readable data there is in Canada versus other countries (especially the UK and US) and b) show them what an effective open data portal should look like.

So what are the datadotgc.ca priorities at this moment?

1. Get more data sets listed in datadotgc.ca

There is a list of machine readable data sets known to exist in the federal government that has been posted here. For coders – the CKAN API is relatively straight forward to use. There is also an import script that can allow one to bulk import data lists into datadotgc.ca, as well as instructions posted here in the datadotgc.ca google group.

2. Better document how to bulk add data sets.

While the above documentation is good, I’d love to have some documentation and scripts that are specific to datadotgc.ca/ca.ckan.net. I’m hoping to recruit some help with this tonight at the Open Data hackathon, but if you are interested, please let me know.

3. Build better tools

One idea I had that I have shared with Steve T. is to develop a jet-pack add on for Firefox that, when you are on a government page scans for links to certain file types (SHAPE, XLS, etc…) and then let’s you know if they are already in datadotgc.ca. If not, it would provide a form to “liberate the dataset” without forcing the user to leave the government website. This would make it easier for non-developers to add datasets to datadotgc.ca.

4. Locate machine readable data sets

Of course, we can only add to datadotgc.ca data sets that we know about, so if you know about a machine readable datasets that could be liberated, please add it! If there are many and you don’t know how ping me, or add it directly to the list in the datadotgc.ca google group.

Open Government interview and panel on TVO's The Agenda with Steve Paikin

My interview on TVO’s The Agenda with Steve Paikin has been uploaded to Youtube (BTW, it is fantastic that The Agenda has a YouTube channel where it posts all its interviews. Kudos!). If you live outside Ontario, or were wrapped up in the Senators-Pens playoff game that was on at the same time (which obviously we destroyed in the ratings), I thought I’d throw it up here as a post in case it is of interest. The first clip is a one on one interview between myself and Paikin. The second clip is the discussion panel that occurred afterward with myself, senior editor of Reason magazine Katherine Mangu-Ward , American Prospect executive editor Mark Schmitt and the Sunlight Foundation’s Policy Director John Wonderlich.

Hope you enjoy!

One on one interview with Paikin:

Panel Discussion:

Datadotgc.ca Launched – the opportunity and challenge

Today I’m really pleased to announce that we’ve launched datadotgc.ca, a volunteer driven site I’m collaboratively creating with a small group of friends and, I hope, a growing community that, if you are interested, may include you.

As many of you already know I, and many other people, want our governments to open up and share their data, in useful, structured formats that people can actually use or analyze. Unlike our American and British peers, the Canadian Federal (and provincial…) government(s) currently have no official, coordinated effort to release government data.

I think that should change.

So rather than merely complain that we don’t have a data.gov or data.gov.uk in Canada, we decided to create one ourselves. We can model what we want our governments to do and even create limited versions of the service ourselves. So that is what we are doing with this site. A stab at showing our government, and Canada, what a federal open data portal could and should look like – one that I’m hoping people will want to help make a success.

Some two things to share.

First, what’s our goal for the site?

  • Be an innovative platform that demonstrates how government should share data.
  • Create an incentive for government to share more data by showing ministers, public servants and the public which ministries are sharing data, and which are not.
  • Provide a useful service to citizens interested in open data by bringing it all the government data together into one place to both make it easier to find.

Second, our big challenge.

As Luke C, one datadotgc.ca community member said to me – getting the site up is the easier part. The real challenge is building a community of people who will care for it and help make it a living, growing and evolving success. Here there is lots of work still to be done. But if you feel passionate about open government and are interested in joining our community, we’d love to have you. At the moment, especially as we still get infrastructure to support the community in place, we are convening at a google group here.

So what our some of the things I think are a priority in the short term?

  • Adding or bulk scraping in more data sets so the site more accurately displays what is available
  • Locating data sets that are open and ready to be “liberated”
  • Documenting how to add or scrape in a data set to allow people to help more easily
  • Implement a more formal bug and feature tracker
  • Plus lots of other functionality I know I at least (and I’m sure there are lots more ideas out there) would like to add (like “request a closed data set”)

As Clay Shirky once noted about any open source project, datadotgc.ca is powered by love. If people love the site and love what it is trying to accomplish, then we will have a community interested in helping make it a success. I know I love datadotgc.ca – and so my goal is to help you love it too, and to do everything I can to make it as easy as possible for you to make whatever contribution you’d like to make. Creating a great community is the hardest but best part of any project. We are off to a great start, and I hope to maybe see you on the google group.

Finally, just want to thank everyone who has helped so far, including the fine people at Raised Eyebrow Web Studio, Luke Closs, and a number of fantastic coders from the Open Knowledge Foundation. There are also some great people over at the Datadotgc.ca Google Group who have helped scrape data, tested for bugs and been supportive and helpful in so many ways.

Case Study: How Open data saved Canada $3.2 Billion

Note: I’ll be on TVO’s The Agenda with Steve Paikin tonight talking about Government 2.0.

Why does open data matter? Rather than talk in abstract terms, let me share a well documented but little known story about how open data helped expose one of the biggest tax frauds in Canada’s history.

It begins in early 2007 when a colleague was asked by a client to do an analysis of the charitable sector in Toronto. Considering it a simply consulting project, my colleague called the Canada Revenue Agency (CRA) and asked for all the 2005 T3010s – the Annual Information Returns where charities disclose to the CRA their charitable receipts and other information – in Toronto. After waiting several weeks and answering a few questions, the CRA passed along the requested information.

After spending time cleaning up the data my colleague eventually had a working excel spreadsheet and began to analyze the charitable sector in the Greater Toronto Area. One afternnon, on a lark, they decided to organize the charities by size of tax-receipted charitable donations.

At this point it is important to understand something about scale. The United Way of Greater Toronto is one of the biggest charities in North America, indeed its most recent annual charitable donation drive was the biggest on the continent. In 2008 – the year of the financial crisis started – the United Way of Greater Toronto raised $107.5 million.

So it was with some surprise that after sorting the charities by 2005 donation amounts my colleague discovered that the United Way was not first on the list. It wasn’t even second.

It was third.

This was an enormous surprise. Somewhere in Toronto, without anyone being aware of it, two charities had raised more money than the United Way (which in 2005 raised target of $96.1M). The larger one, the International Charity Association Network (ICAN) raised $248M in 2005. The other, the Choson Kallah Fund of Toronto had receipts of $120M (up from $6M in 2003).

Indeed, four out the top 15 charities on the list, including Millennium Charitable Foundation, Banyan Tree, were unknown to my colleague, someone who had been active in the Toronto charitable community for over a decade.

All told, my colleague estimated that these illegally operating charities alone sheltered roughly Half a billion dollars in 2005. Indeed, newspapers later confirmed that in 2007, fraudulent donations were closer to a billion dollars a year, with some some 3.2 billion dollars illegally sheltered, a sum that accounts for 12% of all charitable giving in Canada.

Think about this. One billion dollars. A year. That is almost .6% of the Federal Government’s annual budget.

My colleague was eager to make sure that CRA was taking action on these organizations, but it didn’t look that way. The tax frauds were still identified by CRA as qualified charities and were still soliciting donors with the endorsement of government. They knew that a call to CRA’s fraud tip line was unlikely to prompt swift action. The Toronto Star had been doing its own investigations into other instances of charity fraud and had been frustrated by CRA’s slow response.

My colleague took a different route. They gave the information to the leadership of the charitable sector and those organizations as a group took it to the leadership at CRA. From late 2007 right through 2009 the CRA charities division – now under new leadership – has systematically shut down charity tax shelters and are continuing to do so.  One by one, International Charity Association Network, Banyan Tree Foundation, Choson Kallah Fund, the Millennium Charitable Foundation and others identified by my colleague have lost their charitable status. A reported $3.2 billion in tax receipts claimed by 100,000 Canadian tax filers have so far been disallowed or are being questioned. A class action suit launched by thousands of donors against the organizers and law firm of Banyan Tree Foundation was recently certified. It’s a first. Perhaps the CRA was already investigating these cases. It must build its cases carefully as, if they end up in court and fail to successfully present their case, they could help legalize a tax loophole. It may just have been moving cautiously. But perhaps it did not know.

This means that, at best, government data – information that should be made more accessible and open in an unfettered and machine readable format – helped reveal one of the largest tax evasion scandals in the country’s history. But if the CRA was already investigating, scrutiny of this data by the public served a different purpose – helping to bring these issues out into the open, forcing CRA to take public action (suspending these organizations’ right to solicit more donations), sooner rather than later.  Essentially from before 2005-2007 dozens of charities were operating illegally. Had the data about their charitable receipts been available for the public’s routine review,  someone in the public might have taken notice and raised a fuss earlier. Perhaps even a website tracking donations might have been launched. This would have exposed those charities that had abnormally large donations with few programs to explain then. Moreover, it might have given some of the 100,000 Canadians now being audited a tool for evaluating the charities they were giving money to.

In the computer world there is something called Linus’ Law, which states: “given enough eyeballs, all bugs (problems) are shallow.” The same could be said about many public policy or corruption issues. For many data sets, citizens should not have to make a request. Nor should we have to answer questions about why we want the data. It should be downloadable in its entirety. Not trapped behind some unhelpful search engine. When data is made readily available in machine readable formats, more eyes can look at it. This means that someone on the ground, in the community (like, say, Toronto) who knows the sector, is more likely to spot something a public servant in another city might not see because they don’t have the right context or bandwidth. And if that public servant is not allowed to talk about the issue, then they can share this information with their fellow citizens.

This is the power of open data: The power to find problems in complicated environments, and possibly even to prevent them from emerging.

Opening Parliament and other big announcements

This is going to be an exciting week for online activists seeking to make government more open and engaged.

First off, openparliament.ca launched yesterday. This is a fantastic site with a lot going for it – go check it out (after reading my other updates!). And huge kudos to its creator Michael Mulley. Just another great example of how our democratic institutions can be hacked to better serve our needs – to make them more open, accessible and engaging. There is a ton of stuff that could be built on top of Michael’s and others – like Howdtheyvote, sites. I’ve written more about this in a piece on the Globe’s website titled If You Won’t Tell Us About Our MPs Well Do It For You.

Second, as follow on to the launch of openparliament.ca, I’ve been meaning to share for some time that I’ve been having conversations with the House of Parliament IT staff over the past couple of months. About a month ago parliament IT staff agreed to start sharing the Hansard, MP’s bios, committee calendars and a range of other information via XML (sorry for not sharing this sooner, things have been a little crazy). They informed me that they would start doing this before the year is over – so I suspect it won’t happen in the next couple of months, but will happen at some point in the next 6 months. This is a huge step forward for the house and hopefully not the last (also, there is no movement on the senate as of yet). There are still a ton more ways that information about the proceedings of Canada’s democracy could be made more easily available, but we have some important momentum with great sites like those listed above, and internal recognition to share more data. I’ll be having further conversations with some of the staff over the coming months so will try to update people on progress as I find out.

Finally, I am gearing up to launch datadotgc.ca. This is a project I’ve been working on for quite some time with a number of old and new allies. Sadly, the Canadian government does not have an open data policy and there is no political effort to create a data.gc.ca like that created by the Obama administration (http://www.data.gov/) or of the British Government (http://data.gov.uk/). So, I along with a few friends have decided to create one for them. I’ll have an official post on this tomorrow. Needless to same, I’m excited. We are still looking for people to help us populate the site with open government data sets – and have even located some that we need help scraping – so if you are interested in contributing feel free to join the datadotgc.ca google group and we can get you password access to the site.