Tag Archives: opendata

My Canadian Open Government Consultation Submission

Attached below is my submission to the Open Government Consultation conducted by Treasury Board over the last couple of weeks. There appear to be a remarkable number of submission that were made by citizens, which you can explore on the Treasury Board website. In addition, Tracey Lauriault has tracked some of the submissions on her website.

I actually wish the submissions on the Government website were both searchable and could be downloaded in there entirety. That way we could re-organize them, visualize them, search and parse them as well as play with the submissions so as to make the enormous number of answers easier to navigate and read. I can imagine a lot of creative ways people could re-format all that text and make it much more accessible and fun.

Finally, for reference, in addition to my submission I wrote this blog post a couple months ago suggesting goals the government set for itself as part of its Open Government Partnership commitments. Happily, since writing that post, the government has moved on a number of those recommendations.

So, below is my response to the government’s questions (in bold):

What could be done to make it easier for you to find and use government data provided online?

First, I want to recognize that a tremendous amount of work has been done to get the present website and number of data sets up online.

FINDING DATA:

My advice on making data easier to engage Socrata to create the front end. Socrata has an enormous amount of experience in how to share government data effectively. Consider http://data.oregon.gov here is a site that is clean, easy to navigate and offers a number of ways to access and engage the governments data.

More specifically, what works includes:

1. Effective search: a simple search mechanism returns all results
2. Good filters: Because the data is categorized by type (Internal vs. external, charts, maps, calendars, etc…) it is much easier to filter. One thing not seen on Socrata that would be helpful would be the ability to sort by ministry.
3. Preview: Once I choose a data set I’m given a preview of what it looks like, this enables me to assess whether or not it is useful
4. Social: Here there is a ton on offer
– I’m able to sort data sets by popularity – being able to see what others find interesting is, in of itself interesting.
– Being able to easily share data sets via email, or twitter and facebook means I’m more likely to find something interesting because friends will tell me about it
– Data sets can also be commented upon so I can see what others think of the data, if they think it is useful or not, and what for or not.
– Finally, it would be nice if citizens could add meta data, to make it easier for others to do keyword searches. If the government was worried about the wrong meta data being added, one could always offer a search with crowd sourced meta data included or excluded
5. Tools: Finally, there are a large number of tools that make it easier to quickly play with and make use of the data, regardless of one’s skills as a developer. This makes the data much more accessible to the general public.

USING DATA

Finding data is part of the problem, being able to USE the data is a much bigger issue.

Here the single most useful thing would be to offer API’s into government data. My own personal hope is that one day there will be a large number of systems both within and outside of government that will integrate government data right into their applications. For example, as I blogged about here – https://eaves.ca/2011/02/18/sharing-critical-information-with-public-lessons-for-governments/ – product recall data would be fantastic to have as an API so that major retailers could simply query the API every time they scan inventory in a warehouse or at the point of sale, any product that appears on the list could then be automatically removed. Internally, Borders and Customs could also query the API when scanning exports to ensure that nothing exported is recalled.

Second, if companies and non-profits are going to invest in using open data, they need assurances that both they are legally allowed to use the data and that the data isn’t going to suddenly disappear on them. This means, a robust license that is clear about reuse. The government would be wise to adopt the OGL or even improve on it. Better still helping establish a standardized open data license for Canada and ideally internationally could help reduce some legal uncertainty for more conservative actors.

More importantly, and missing from Socrata’s sites, would be a way of identifying data sets on the security of their longevity. For example, data sets that are required by legislation – such as the NPRI – are the least likely to disappear, whereas data sets the the long form census which have no legal protection could be seen as at higher risk.

How would you use or manipulate this data?

I’m already involved in a number of projects that use and share government data. Among those are Emitter.ca – which maps and shares NPRI pollution data and Recollect.net, which shares garbage calendar information.

While I’ve seen dramatically different uses of data, for me personally, I’m interested mostly in using data for thinking and writing about public policy issues. Indeed, much has been made of the use of data in “apps” but I think it is worth noting that the single biggest use of data will be in analysis – government officials, citizens, academics and others using the data to better understand the world around them and lobby for change.

This all said, there are some data sets that are of particular usefulness to people, these include:

1. Data sets on sensitive issues, this includes health, inspection and performance data (Say surgery outcomes for specific hospitals, or restaurant inspection data, crime and procurement data are often in great demand).
2. Dynamic real-time Data: Data that is frequently updated (such a border, passport renewal or emergency room wait times). This data is shared in the right way can often help people adjust schedules and plans or reallocate resources more effectively. Obviously this requires an API.
3.Geodata: Because GIS standards are very mature it is easy to “mashup” geo data to create new maps or offer new services. These common standards means that geo data from different sources will work together or can be easily compared. This is in sharp contrast to say budget data, where there are few common standards around naming and organizing the data, making it harder to share and compare.

What could be done to make it easier for you to find government information online?

It is absolutely essential that all government records be machine readable.

Some of the most deplorable moment in open government occur when the government shares documents with the press, citizens or parliamentary officers in paper form. The first and most important thing to make government information easier to find online is to ensure that it is machine readable and searchable by words. If it does not meet this criteria I increasingly question whether or not it can be declared open.

As part of the Open Government Partnership commitments it would be great for the government to commit to guarantee that every request for information made of it would include a digital version of the document that can be searched.

Second, the government should commit that every document it publishes be available online. For example, I remember in 2009 being told that if I wanted a copy of the Health Canada report “Human Health in a Changing Climate:A Canadian Assessment of Vulnerabilities and Adaptive Capacity” I had to request of CD, which was then mailed to me which had a PDF copy of the report on it. Why was the report not simply available for download? Because the Minister had ordered it not to appear on the website. Instead, I as a taxpayer and to see more of my tax dollars wasted for someone to receive my mail, process it, then mail me a custom printed cd. Enabling ministers to create barriers to access government information, simply because they do not like the contents, is an affront to the use of tax payer dollars and our right to access information.

Finally, Allow Government Scientists to speak directly to the media about their research.

It has become a reoccurring embarrassment. Scientists who work for Canada publish an internationally recognized ground break paper that provides some insight about the environment or geography of Canada and journalists must talk to government scientists from other countries in order to get the details. Why? Because the Canadian government blocks access. Canadians have a right to hear the perspectives of scientists their tax dollars paid for – and enjoy the opportunity to get as well informed as the government on these issues.

Thus, lift the ban that blocks government scientists from speaking with the media.

Do you have suggestions on how the Government of Canada could improve how it consults with Canadians?

1. Honour Consultation Processes that have started

The process of public consultation is insulted when the government itself intervenes to bring the process into disrepute. The first thing the government could do to improve how it consults is not sabotage processes that already ongoing. The recent letter from Natural Resources Minister Joe Oliver regarding the public consultation on the Northern Gateway Pipelines has damaged Canadians confidence in the governments willingness to engage in and make effective use of public consultations.

2. Focus on collecting and sharing relevant data

It would be excellent if the government shared relevant data from its data portal on the public consultation webpage. For example, in the United States, the government shares a data set with the number and location of spills generated by Enbridge pipelines, similar data for Canada would be ideal to share on a consultation. Also useful would be economic figures, job figures for the impacted regions, perhaps also data from nearby parks (visitations, acres of land, kml/shape boundary files). Indeed, data about the pipeline route itself that could be downloaded and viewed in Google earth would be interesting. In short, there are all sorts of ways in which open data could help power public consultations.

3. Consultations should be ongoing

It would be great to see a 311 like application for the federal government. Something that when loaded up, would use GPS to identify the services, infrastructure or other resources near the user that is operated by the federal government and allow the user to give feedback right then and there. Such “ongoing” public feedback could then be used as data when a formal public consultation process is kicked off.

Are there approaches used by other governments that you believe the Government of Canada could/should model?

1. The UK governments expense disclosure and release of the COINS database more generally is probably the most radical act of government transparency to date. Given the government’s interest in budget cuts this is one area that might be of great interest to pursue.

2. For critical data sets, those that are either required by legislation or essential to the operation of a ministry or the government generally, it would be best to model the city of Chicago or Washington DC and foster the creation of a data warehouse where this data could be easily shared both internally and externally (as privacy and security permits). These cities are leading governments in this space because they have tackled both the technical challenges (getting the data on a platform where it can be shared easily) and around governance (tackling the problem of managing data sets from various departments on a shared piece of infrastructure).

Are there any other comments or suggestions you would like to make pertaining to the Government of Canada’s Open Government initiative?

Some additional ideas:

Redefine Public as Digital: Pass an Online Information Act

a) Any document it produces should be available digitally, in a machine readable format. The sham that the government can produce 3000-10,000 printed pages about Afghan detainees or the F-35 and claim it is publicly disclosing information must end.

b) Any data collected for legislative reasons must be made available – in machine readable formats – via a government open data portal.

c) Any information that is ATIPable must be made available in a digital format. And that any excess costs of generating that information can be born by the requester, up until a certain date (say 2015) at which point the excess costs will be born by the ministry responsible. There is no reason why, in a digital world, there should be any cost to extracting information – indeed, I fear a world where the government can’t cheaply locate and copy its own information for an ATIP request as it would suggest it can’t get that information for its own operations.

Use Open Data to drive efficiency in Government Services: Require the provinces to share health data – particularly hospital performance – as part of its next funding agreement within the Canada Health Act.

Comparing hospitals to one another is always a difficult task, and open data is not a panacea. However, more data about hospitals is rarely harmful and there are a number of issues on which it would be downright beneficial. The most obvious of these would be deaths caused by infection. The number of deaths that occur due to infections in Canadian hospitals is a growing problem (sigh, if only open data could help ban the antibacterial wipes that are helping propagate them). Having open data that allows for league tables to show the scope and location of the problem will likely cause many hospitals to rethink processes and, I suspect, save lives.

Open data can supply some of the competitive pressure that is often lacking in a public healthcare system. It could also better educate Canadians about their options within that system, as well as make them more aware of its benefits.

Reduce Fraud: Creating a Death List

In an era where online identity is a problem it is surprising to me that I’m unable to locate a database of expired social insurance numbers. Being able to query a list of social security numbers that belong to dead people might be a simple way to prevent fraud. Interestingly, the United States has just such a list available for free online. (Side fact: Known as the Social Security Death Index this database is also beloved by genealogist who use it to trace ancestry).

Open Budget and Actual Spending Data

For almost a year the UK government has published all spending data, month by month, for each government ministry (down to the £500 in some, £25,000 in others). More over, as an increasing number of local governments are required to share their spending data it has lead to savings, as government begin to learn what other ministries and governments are paying for similar services.

Create a steering group of leading Provincial and Municipal CIOs to create common schema for core data about the country.

While open data is good, open data organized the same way for different departments and provinces is even better. When data is organized the same way it makes it easier to citizens to compare one jurisdiction against another, and for software solutions and online services to emerge that use that data to enhance the lives of Canadians. The Federal Government should use its convening authority to bring together some of the countries leading government CIOs to establish common data schemas for things like crime, healthcare, procurement, and budget data. The list of what could be worked on is virtually endless, but those four areas all represent data sets that are frequently requested, so might make for a good starting point.

Open Data in BC – Good & Bad Examples from Bikes to Libraries

2 Replies

Some small examples of open data use and public servants who do and don’t understand open data from the Province of British Columbia to the City of Vancouver.

Open Libraries?

For the past several years – ever since the open motion was passed in Vancouver – the city has been releasing more and more data sets. One data set I’ve encouraged them to proactively release was library data – the catalog, what books were popular, etc… Others have made the request and, in fact, some of the catalog data is available, if you know where to look – but it isn’t licensed. This hasn’t stopped people from creating cool things – like this awesome Firefox greasemonkey script that shows if a book you are looking at on Amazon’s site is available at your local VPL library – but it has driven these innovations underground, discouraged them, and made them difficult to maintain.

I’ve even had meetings with Vancouver Public Library (VPL) officials who ranged from deeply opposed to indifferent about sharing their data, usually on the grounds of privacy and security. How releasing the libraries catalog, or offering an API into the catalog or showing the number of times a book has been checked out threatens privacy is beyond me. Mostly I suspect it is driven by the fact that they don’t want anything competing with their website and software – pretty much the opposite approach to innovation than that taken by the leading cities and governments.

The reluctance of VPL to share its data given they are a) a community supported library and b) that City Council passed a motion explicitly directing city staff to make their data open, is all the more surprising (I mean even ICBC gave me bike accident data). This is why I was excited to see that the Provincial Government of British Columbian has taken the opposite view. Recently they released location and statistic for Public Libraries across BC for 2006-2009. It does not sadly, include the collections data or the number of check outs for each book (which would of course be awesome but it does provide lat/longs for every library and a great deal of data on each library system and sometimes individual branch such as staff levels, budget data and usage counts (again not by resource). It’s a good start and something I hope people will want to play with. Of course, getting an API into the actual catalog is the real idea – the things my friends talk about doing to enable them and their kids to better use the library…

Speaking of playing…

Bike Accident Data Keeps Generating Discussion

It is wonderful to see that blog posts and analysis as a result of Eric Promislow’s BC bike accident map continue to emerge. Eric created his map during the December 3rd Open Data Hackathon when he visualized bike accident data I managed to get from Insurance Company of British Columbia and uploaded it to Buzzdata. (Eric subsequently got automobile accident data and mapped that too). Another example appeared last week, when the map and data proved useful to Stephen Wehner who used it in a recent blog post to supplement some anecdotal data around accidents in his neighborhood.

It’s a wonderful example of how local citizens can begin to see the risks and problems in their neighborhoods, and arm themselves with real data when they want to complain to their councilperson, MLA, MP or other representative.

Solving the Common Standards problem in the Open Data Space

4 Replies

Last year during my Open Government Data Camp keynote speech on The State of Open Data 2011 I mentioned how I thought the central challenge for open data was shifting from getting data open (still a big issue, but a battle that is starting to be won) to getting all that open data in some common standards and schemas so that use (be it apps, analysis and other uses) can be scaled across jurisdictions.

Looks like someone out there is trying to turn that challenge in to a business opportunity.

Listpoint, a UK based company has launched a platform with the goal of creating translators between various established specs. As they point out in an email I saw from them:

“The Listpoint reference data management platform is a repository for data standards in the shape of code lists. Listpoint will help interpret open data by providing its underlying metadata and schema in machine readable format. E.g. mapping ISO country codes and Microsoft Country codes to provide a translation layer for systems to surface a single view of data.”

Interesting stuff… and exactly the types of challenges we need solved if we are going to scale the opendata revolution.

The Future of Academic Research

12 Replies

Yesterday, Nature – one of the worlds premier scientific journals recognized University of British Columbia scientist Rosie Redfield as one of the top 10 science newsmakers of 2011.

The reason?

After posting a scathing attack on her blog about a paper that appeared in the journal Science, Redfield decided to attempt to recreate the experiment and has been blogging about her effort over the past year. As Nature describes it:

…that month, Redfield took matters into her own hands: she began attempting to replicate the work in her lab at the University of British Columbia in Vancouver, and documenting her progress on her blog (http://rrresearch.fieldofscience.com).

The result has been a fascinating story of open science unfolding over the year. Redfield’s blog has become a virtual lab meeting, in which scientists from around the world help to troubleshoot her attempts to grow and study the GFAJ-1 bacteria — the strain isolated by Felisa Wolfe-Simon, lead author of the Science paper and a microbiologist who worked in the lab of Ronald Oremland at the US Geological Survey in Menlo Park, California.

While I’m excited about Redfields blog (more on that below) we should pause and note the above paragraph is a very, very sad reminder of the state of affairs in science. I find the term “open science” to be an oxymoron. The scientific process only works when it is, by definition, open. There is, quite arguably, no such thing as “closed science.” And yet it is a reflection of how 18th century the entire science apparatus remains that Redfields awesome experiment is just that – an experiment. We should celebrate her work, and ask ourselves, why is this not the norm?

So first, to celebrate her work… when I look at Redfields blog, I see exactly what I hope the future of scientific, and indeed all academic research, will look like. Here is someone who is constantly updating their results and sharing what they are doing with their peers, as well as getting input and feedback from colleagues and others around the world. Moreover, she plays to the mediums strengths. While rigorous, she remains inviting and, from my reading, creates a more honest and human view into the world of science. I suspect that this might be much more attractive (and inspiring) to potential scientists. Consider, these two lines from one of her recent posts:

So I’m pretty sure I screwed something up. But what? I used the same DNA stock tube I’ve used many times before, and I definitely remember putting 3 µl of DNA into each assay tube. I made fresh sBHI + novobiocin plates using pre-made BHI agar,, and I definitely remember adding the hemin (4 ml), NAD (80 µl) and novobiocin (40 µl) to the melted agar before I poured the plates.

and

UPDATE: My novobiocin plates had no NovR colonies because I had forgotten to add the required hemin supplement to the agar! How embarrassing – I haven’t made that mistake in years.

and then this blog post title:

Some control results! (Don’t get excited, it’s just a control…)

Here is someone literally walking through their thought processes in a thorough, readable way. Can you imagine anything more helpful for a student or young scientist? And the posts! Wonderfully detailed walk throughs of what has been tried, progress made and set backs uncovered. And what about the candor! The admission of error and the attempts to figure out what went wrong. It’s the type of thinking I see from great hackers as well. It’s also the type of dialogue and discussion you won’t see in a formal academic paper but is exactly what I believe every field (from science, to non-profit, to business) needs more of.

Reading it all, and I’m once again left wondering. Why is this the experiment? Why isn’t this the norm? Particularly at publicly funded universities?

Of course, the answer lies in another question, one I first ran into over a year ago reading this great blog post by Michael Clarke on Why Hasn’t Scientific Publishing Been Disrupted Already? As he so rightly points out:

When Tim Berners-Lee created the Web in 1991, it was with the aim of better facilitating scientific communication and the dissemination of scientific research. Put another way, the Web was designed to disrupt scientific publishing. It was not designed to disrupt bookstores, telecommunications, matchmaking services, newspapers, pornography, stock trading, music distribution, or a great many other industries…

…The one thing that one could have reasonably predicted in 1991, however, was that scientific communication—and the publishing industry that supports the dissemination of scientific research—would radically change over the next couple decades.

And yet it has not.

(Go read the whole article, it is great). Mathew Ingram also has a great piece on this published half a year later called So when does academic publishing get disrupted?

Clarke has a great breakdown on all of this, but my own opinion is that scientific journals survive not because they are an efficient means of transmitting knowledge (they are not – Redfield’s blog shows there are much, much faster ways to spread knowledge). Rather journals survive in their current form because they are the only rating system scientists (and more importantly) universities have to deduce effectiveness, and thus who should get hired, fired, promoted and, most importantly, funded. Indeed, I suspect journals actually impede (and definitely slow) scientific progress. In order to get published scientists regularly hold back sharing and disclosing discoveries and, more often still, data, until they can shape it in such a way that a leading journal will accept it. Indeed, try to get any scientists to publish their data in machine readable formats – even after they have published with it -it’s almost impossible… (notice there are no data catalogs on any major scientific journals websites…) The dirty secret is that this is because they don’t want others using it in case it contains some juicy insight they have so far missed.

Don’t believe me? Just consider this New York Times article on the break throughs in Alzheimer’s. The whole article is about a big break through in scientific research process. What was it? That the scientists agreed they would share their data:

The key to the Alzheimer’s project was an agreement as ambitious as its goal: not just to raise money, not just to do research on a vast scale, but also to share all the data, making every single finding public immediately, available to anyone with a computer anywhere in the world.

This is unprecedented? This is the state of science today? In an era where we could share everything, we opt to share as little as possible. This is the destructive side of the scientific publishing process that is linked to performance.

It is also the sad reason why it is a veteran, established researcher closer to the end of her career that is blogging this way and not a young, up and coming researcher trying to establish herself and get tenure. This type of blog is too risky to ones career. Today “open” science, is not a path forward. It actually hurts you in a system that prefers more inefficient methods at spreading insights, research and data, but is good at creating readily understood rankings.

I’m thrilled that Rosie Redfield has been recognized by Nature (which clearly enjoys the swipe at Science – its competitor). I’m just sad that the today’s culture of science and universities means there aren’t more like her.

Bonus material: If you want to read an opposite view, here is a seriously self-interested defensive of the scientific publishing industry that was totally stunning to read. It’s fascinating that this man and Michael Clarke share the same server. If you look in the comments of that post, there is a link to this excellent post by a researcher at a University in Cardiff that I think is a great counter point.

Why is Finding a Post Box so Hard?

9 Replies

Sometimes it is the small things that show how government just gets it all so wrong.

Last Thursday The Daily Show’s Wyatt Cenac has a little bit on the US Post Office and its declining fortunes as people move away from mail. There is no doubt that the post offices days are numbered, but that doesn’t mean the decline has to be as steep as it is. Besides there are things they could be doing to make life a little easier to use them (and god knows they should be doing anything they can, to be more appealing).

Take, for example, the humble post office box. They can be frustratingly hard to locate. Consider Broadway and Cambie – one of the busiest intersections in Vancouver – and yet there is no post box at the intersection. (I eventually found it one block east on broadway) but I carried around a letter for 3 weeks before I eventually found one.

In short why is there not digital map (or for techies, and API) for post box locations? I could imagine all sorts of people that might make use of it. Would it be nice to just find out – where is the closest post box to where I’m standing? More importantly, it might actually help the post office attract a few extra customers. It certainly wouldn’t hurt customer service. I’ve wondered for a couple of years why it doesn’t publish this data set.

Turns out I’m not the only with this frustration. My friend Steven Tannock has channeled his frustration into a simple app called Wherepost.ca. It’s a simple website – optimized for mobile phone use – that allows users to add post boxes as well as find the one nearest to them. In short, Steven’s trying to create a public data set of post box locations by crowd sourcing the problem. If Canada Post won’t be helpful… we’ll help one another.

Launched on Thursday with 20 post office box locations, there are now over 400 boxes mapped (mostly in the Vancouver area) with several dozen users contributing. In addition, Steven tells me users in at least 2 other countries have asked for new icons so they can add post boxes where they live. It seems Canadians aren’t the only ones frustrated about not knowing where the nearest post box is.

The ideal, of course, would be for Canada Post to publish an API of all post box locations. I suspect however, that they either don’t actually know where they all are in a digital form (at which point they should really help Steven as he is doing them a huge service) or revealing their location will be seeing as sacrificing some important IP that people should pay for. Remember, this is an organization that refuses to make Postal Code data open, a critical data set for companies, non-profits and governments.

This isn’t the worlds fanciest app but its simplicity is what makes it so great, and so useful. Check it out at WherePost.ca and… of course, add a post box if you see one.

Open Government Consultation, Twitter Townhalls & Doing Advocacy Wrong

3 Replies

Earlier this week the Canadian Federal Government launched its consultation process on Open Government. This is an opportunity for citizens to comment and make suggestions around what data the federal government should make open and what information it should share, and provide feedback on how it can consult more effectively with Canadians. The survey (which, handily, can be saved midway through completion) contains a few straightforward multiple choice questions and about eight open ended questions which I’ve appended to the end of this post so that readers can reflect upon them before starting to fill out the form.

In addition to the online consultations, Tony Clement – the Minister responsible for the Open Government file – will host a Twitter townhall on Open Government this Thursday (December 15). Note! The townhall will be hosted by the treasury board twitter account @TBS_Canada (English) and @SCT_Canada (French) not Minister Clement’s personal (and better known) twitter account. The townhall will first take place in French from 4-4:45pm EST using the hashtags #parlonsgouvert and then in English from 5-5:45 EST using the hashtag #opengovchat.

Some of you may have also noticed that Democracy Watch issued a strongly worded press release last week with the (somewhat long) headline “Federal Conservatives break all of their international Open Government Partnership commitments by failing to consult with Canadians about their draft action plan before meeting in Brazil this week.” This seems to have prompted the CBC to write this article.

Now, to be clear, I’m a strong advocate for Open Government, and there are plenty of things for which one could be critical about this government for not being open about. However, to be credible – especially around issues of transparency and disclosure – one must be factual. And Democracy Watch did more than just stretch the truth. The simple fact is, that while I too wish the government’s consultations had happened sooner, this does not mean it has broken all of its Open Government Partnership commitments. Indeed, it hasn’t broken any of its commitments. A careful read of the Open Government Partnership requirements would reveal that the recent December meeting was to share drafts plans (including the plans by which to consult). The deadline that Democracy Watch is screaming about does not occur until March of 2012.

It would have been fair to say the government has been slow in fulfilling its commitments, but to say it has broken any of them is flatly not true. Indeed the charge feels particularly odd given that in the past two weeks the government signed on greater aid transparency via IATI and released an additional 4000 data sets, including virtually all of StatsCan’s data, giving Canadian citizens, non profits, other levels of governments and companies access to important data sets relevant for social, economic and academic purposes.

Again, there are plenty of things one could talk about when it comes to transparency and the government. Yes, the consultation could have gotten off the ground faster. And yes, there is much more to done. But this screaming headline is somewhat off base. Publishing it damages both the credibility of the organization making the charge, and risk hurting the credible of open government advocates in general.

List of Open Ended Questions in the Open Government Consultation.

1. What could be done to make it easier for you to find and use government data provided online?

2. What types of open data sets would be of interest to you? Please pick up to three categories below and specify what data would be of interest to you.

3. How would you use or manipulate this data?

4. What could be done to make it easier for you to find government information online?

7. Do you have suggestions on how the Government of Canada could improve how it consults with Canadians?

8. Are there approaches used by other governments that you believe the Government of Canada could/should model?

9. Are there any other comments or suggestions you would like to make pertaining to the Government of Canada’s Open Government initiative?

Open Data Day 2011 – Recaps from Around the World

Government Motivator

I think one of the biggest accomplishments of Open Data Day has been how it has become a motivator – a sort of deadline – for governments keen to share more open data. Think about this. A group of volunteers around the world is moving governments to share more data – to make public assets more open to reuse. For example, in Ireland Fingal County Council released data around trees, parking, playing pitches & mobile libraries for the day. In Ontario, Canada the staff for the Region of Waterloo worked extra hard to get their open data portal up in time for the event. And it wasn’t just local governments. The Government of BC launched new high value data sets in anticipation of the event and the Federal Government of Canada launched 4000 new data sets with International Open Data Day in mind. Meanwhile, the open data evangelist of Data.gov was prepared to open up data sets for anyone who had a specific request.

While governments should always be working to make more data available I think we can all appreciate the benefits of having a deadline, and Open Data Day has helped become just that for more and more governments.

In other places, Open Data Day turns into a place where governments can converse with developers and citizens about why open data matters, and do research into what data the public is interested in. This is exactly what happened in Enschede in the Netherlands where local city staff worked with participants around prioritizing data sets to make open.

Local Events & Cool Hacks

A lot of people have been blogging about, or sharing videos of, Open Data Day events around the world. I’ve seen blog posts and news articles on events in places such as Madrid, Victoria BC, Oakland, Mexico City, Vancouver, and New York City. If there are more, please email them to me or post them on the wiki.

I haven’t been able to keep track of all the projects that got worked on, but here are a sampling of some that I’ve seen via twitter, the wiki and other forums:

Hongbo: The Emergency Location Locator

In Cotonou, Benin the open data day participants developed a web application called Hongbo the Goun word for “Gate.” Hongbo enables users to locate the nearest hospital, drugstore and police stations. As they noted on the open data day wiki, the data sets for this application were public but not easily accessible. They hope Benin citizen can use it quickly identify who to call or where to go in emergencies.

Tweet My Council

In Sydney, Australia participants created Tweetmycouncil. A fantastic simply application that allows a user to know which jurisdiction they are standing in. Simply send a tweet to the hashtag #tmyc and the app will work where you, what council’s jurisdiction you are in and send you a tweet with the response.

Mexican Access to Information Tracker

In Mexico City one team created an application to compare Free of Information requests between different government departments. This could be a powerful tool for citizens and journalists. (Github repo)

Making it Easier for the Next Guy

Another project out of Mexico City, a team from Oaxaca created an API that creates a json file for any public data set. Would be great for this team to connect with Max Ogden and talk about Gut.

Making it Even Easier for the Next Guy

Speaking of, Max Ogden in Oakland shared more on Gut, which is less of a classic app then a process that enables users to convert data between different formats. It had a number of people excited including open data developers at other locations such as Luke Closs and Mike West.

Mapping Census Data in Barcelona

A team of hackers in Barcelona mapped census tracts so they could be visualized, showing things, like say, the number of parks per census tract. You can find the data sets they used in Google Fusion Tabels here.

Foreign Aid Visualizations

In London UK and in Seattle (and possibly other places) developers were also very keen on the growing amount of aid data being made available and in a common structure thanks to IATI. In Seattle developers created this very cool visualization of US Aid over the last 50 years. I know the London UK team has visualizations of their own they’d like to share shortly.

Food Hacking!

One interesting thing about Open Data Day is how it bridges some very different communities. One of the most active are the food hackers which came out in force in both New York and Vancouver.

In New York a whole series of food related tools, apps and visualization got developed, most of which are described here and here. The sheer quantity of participants (120+) and projects developed is astounding, but also fantastic is how inclusive their event is, with lots of people not just working on apps, but analyzing data and creating visualizations to help others understand an issue they share a common passion for: the Food Bill. Please do click on those links to see some of the fun visuals created.

The Ultimate Food API

In Vancouver, the team at FoodTree – who hosted the hackathon there – focused on shipping an API for developers interested in large food datasets. You can find their preliminary API and datasets in github. You can also track the work they’ve done on their Open Food Wiki.

Homelessness

In Victoria, BC a team created a map of local walk-in community services that you can check out at http://ourservices.ca/.

BC Emergency Tweeting System

Another team in Victoria, BC focused on creating twitter hashtags for each official place in the province with the hopes that the province’s Provincial Emergency Program.

Mapping Shell’s Oils Spills in Nigeria

The good people at the Open Knowledge Foundation worked on getting a ton more data into the Datahub, but they also had people learning how to visualize data, one of whom created this visualization of oil spills in Nigeria. Always great to see people experimenting and learning!

Mapping Vancouver’s Most Dangerous Intersections for Bikes

Open Data hacking and biking accident data have a long history together and this hackathon I uploaded 5 years worth of bike accident I managed to get from ICBC to Buzzdata. As a result – even though I couldn’t be present in Vancouver – two different developers took it and mapped it. You can see @ngriffiths‘ here and @ericp’s will be up soon. It was interesting to learn that Broadway and Cambie is the most dangerous intersection in the city for cyclists?

Looking Forward

Last year open data day attracted individual citizens: those with a passion for an issue (like food) or who want to make their government more effective or citizens lives a little easier. However, this year we already started to see the community grow – the team at Socrata hosted a hackathon at their offices in Seattle. Buzzdata had people online trying to help people share their data. In addition to these private companies some of the more established non-profits were out in force. The Open Knowledge Foundation had a team working on making openspending.org more accessible while MySociety helped a team in Canada set up a local version of MapIt.

For those who think that open data can change the world or, even build medium sized economic ecosystems, over night, we need to reset their expectations. But it is growing. No longer are participants just citizens and hacktavists – there are real organizations and companies participating. Few, but they are there. My hope is that this trend will continues. That open data day will continue to have meaning for individuals and hackers but will also be something that larger more established organizations, non-profits and companies will use as a rallying point as well. Something to shoot for next year.

Feedback

As I mentioned at the beginning, Open Data Day is a very decentralized event. We are, of course, not wedded to that approach and I’d love to hear feedback from people, good or bad, about worked or didn’t work. Please do feel free to email me, post it to the mailing list or simply comment below.

Postscript

Finally, some of you may have noticed I became conspicuously absent on the day. I want to apologize to everyone. My partner went into labour on Friday night and so by early morning Saturday it was obvious that my open data day was going to be spent with her. Our baby was 11 days over due so we really thought that we’d be in the clear by Dec 3rd… but our baby had other plans. The good news is that despite 35 hours of labour, baby and boy are doing well!

StatsCan's free data costs $2M – a rant

5 Replies

So the other day a reader sent me an email pointing me to a story in iPolitics titled “StatsCan anticipates $2M loss from move to open data” and asked me what I thought.

Frustrated, was my response.

$2M is not a lot of money. Not in a federal budget of almost $200B. And, the number may have been less. The StatsCan person quoted in the article called this expected loss of revenue a “maximum net loss.” This may mean that the loss from making the data free does not take into account the fact the StatsCan’s expenditures may also go down. For instance, if StatsCan no longer has to handle as many financial transactions or chase down invoices and so forth, the reduction if staff over other overhead (unrelated to its core mission by the way) and so result in lower operating costs not reflected in the $2M cited above.

Moreover it is still unclear to me where the $2M figure comes from. As I noted in a blog post earlier this year, in StatsCan’s own reports it outlined that its online database (the one just made free) generated $559,000 in revenue (not profit) in 2007-08 and was estimated to generate $525,000 in revenue in 2010-11. Where does the extra $1.5M come from? I’m open to the fact that I’m reading these reports incorrectly… but it is hard to see how.

But all this is really an aside.

What really, really, really, frustrates me is that the hard number of $2M. It is a pittance.

This is the unbearable cost that’s been holding up open StatsCan data for years? This may be the tiniest golden goose ever killed. Maybe more like a lame duck. Can anyone believe the loss of $2M (or 500K) was going to break the organization?

Give me a break.

What a colossal lack of imagination and sense of economic and social prosperity on the part of every government since Mulroney (who made StatsCan engage in cost recovery). In the United States open statistical data has helped businesses, the social sector, local and state governments, as well as researchers and academics. Heck, even Canadian teachers tell me that they’ve been forced to train students on US data because they couldn’t afford to train their students on Canadian data. All this lost innovation, efficiency, jobs and social benefits for a measly $2M dollars (if that). Oh lack of vision, at all levels! Both at the top of the political order, and within StatsCan, which has been reluctant to go down this route for years.

Now that we see the “cost” this battle seems more pathetic than ever.

Sigh. Rant over.

Using Open Data to Map Vancouver’s Trees

9 Replies

This week, in preparation for the International Open Data Hackathon on Saturday, the Vancouver Parks Board shared one neighborhood of its tree inventory database (that I’ve uploaded to Buzzdata) so that we could at least see how it might be leveraged by citizens.

What’s interesting is how valuable this data is already (and why it should be open). As it stands this data could be used by urban landscape students and architects, environmentalists, and of course academics and scientists. I could imagine this data would even be useful for analyzing something as obtuse as the impact of the tree’s Albedo effect on the city’s climate. Of course, locked away in the city’s data warehouse, none of those uses are possible.

However, as I outlined in this blog post, having lat/long data would open up some really fun possibilities that could promote civic engagement. People could adopt trees, care for them, water them, be able to report problems about a specific tree to city hall. But to do all this we need to take the city’s data and make it better – specifically, identify the latitude and longitude of each tree. In addition to helping citizens it might make the inventory more use to the city (if they chose to use it) as well as help out the other stakeholders I outlined above.

So here’s what I’ve scoped out would be ideal to do.

Goal

Create an app that would allow citizens to identify the latitude and longitude of trees that are in the inventory.

Data Background

A few things about the city’s tree inventory data. While they don’t have an actual long/lat for each individual tree, they do register trees by city address. (Again, you can look at the data yourself here.) But this means that we can narrow the number of trees down based on proximity to the user.

Process

So here is what I think we need to be able to do.

Convert the addresses in the inventory into a format that can be located within Google Maps
Just show the trees attached to addresses that are either near the user (on a mobile app), or near addresses that are currently visible within Google Maps (on a desktop app).
Enable the user to add a lat/long to a specific tree’s identification number.

Awesome local superstar coder/punk rock star Duane Nickull whipped together a web app that would allow one to map lat/longs. So based on that, I could imagine at desktop app that allows you to map trees remotely. This obviously would not work for many trees, but it would work for a large number.

You’ll notice in the right-hand corner, I’ve created an illustrative list of trees to choose from. Obviously, given the cross-section of the city we are looking at, it would be much longer, but if you were zoomed in all the way I could imagine it was no longer than 5-20.

I’ve also taken the city’s data and parsed it in a way that I think makes it easier for users to understand.

This isn’t mind-blowing stuff, but helpful. I mean who knew that dbh (diameter at breast height) was an actual technical term when measuring tree diameters! I’ve also thrown in some hyperlinks (it would be nice to have images people can reference) so users can learn about the species and ideally, even see a photo to compare against.

So, in short, you can choose a tree, locate it in Google Maps and assign a lat/long to it. In Google Maps where you can zoom even closer than ESRI, you could really pick out individual trees.

In addition to a desktop web app, I could imagine something similar for the iPhone where it locates you using the GPS, identifies what trees are likely around you, and gives you a list such as the one on the right hand side of the screenshot above, the user then picks a tree from the list that they think they’ve identified, stands next to the tree and then presses a button in the app that assigns the lat/long of where they are standing to that tree.

Canada’s Foreign Aid Agency signs on to IATI: Aid Data get more transparent