Monthly Archives: March 2012

Next Generation Open Data: Personal Data Access

Background

This Monday I had the pleasure of being in Mexico City for the OECD’s High Level Meeting on e-Government. CIO’s from a number of countries were present – including Australia, Canada, the UK and Mexico (among others). But what really got me going was a presentation by Chris Vein, the Deputy United States Chief Technology Officer for Government Innovation.

In his presentation he referenced work around the Blue Button and the Green Button – both efforts I was previously familiar with. But my conversation with Chris sparked several new ideas and reminded me of just how revolutionary these initiatives are.

For those unacquainted with them, here’s a brief summary:

The Blue Button Initiative emerged out of the US Department of Veterans Affairs (VA) with a simple goal – create a big blue button on their website that would enable a logged in user to download their health records. That way they can then share those records with whoever they wish, a new doctor, a hospital, an application or even just look at it themselves. The idea has been deemed so good, so important and so popular, that it is now being championed as industry standard, something that not just the VA but all US health providers should do.

The Green Button Initiative is similar. I first read about it on ReadWriteWeb under the catchy and insightful title “Green Button” Open Data Just Created an App Market for 27M US Homes. Essentially the Green Button would enable users to download their energy consumption data from their utility. In the United States 9 utilities have already launched Green Buttons and an app ecosystem – applications that would enable people to monitor their energy use – is starting to emerge. Indeed Chris Vein talked about one app that enabled a user to see their thermostat in real time and then assess the financial and environmental implications of raising and/or lowering it. I personally see the Green Button evolving into an API that you can give others access to… but that is a detail.

Why it Matters

Colleagues like Nigel Shadbolt in the UK have talked a lot about enabling citizens to get their data out of websites like Facebook. And Google has it’s own very laudable Data Liberation Front run by great guy and werewolf expert, Brian Fitzpatrick. But what makes the Green Button and Blue Button initiatives unique and important is that they create a common industry standard for sharing consumer data. This creates incentives for third parties to develop applications and websites that can analyze this data because these applications will scale across jurisdictions. Hence the Read Write Web article’s focus on a new market. It also makes the data easy to share. Healthcare records downloaded using the blue button are easily passed on to a new doctor or a new hospital since now people can design systems to consumer these healthcare records. Most importantly, it gives the option of sharing these records so they don’t have to wait for lumbering bureaucracies.

This is a whole new type of open data. Open not to the public but to the individual to whom the data really belongs.

A Proposal

I would love to see the blue button and green button initiative spread to companies and jurisdictions outside the United States. There is no reason why for examples there cannot be Blue Buttons on Provincial Health Care website in Canada, or the UK. Nor is there any reason why provincial energy corporations like BC Hydro or Bullfrog Energy (there’s a progressive company that would get this) couldn’t implement the Green Button. Doing so would enable Canadian software developers to create applications that could use this data and help citizens and tap into the US market. Conversely, Canadian citizens could tap into applications created in the US.

The opportunity here is huge. Not only could this revolutionize citizens access to their own health and energy consumption data, it would reduce the costs of sharing health care records, which in turn could potentially create savings for the industry at large.

Action

If you are a consumer, tell your local health agency, insurer and energy utility about this.

If you are a energy utility or Ministry of Health and are interested in this – please contact me.

Either way, I hope this is interesting. I believe there is huge potential in Personal Open Data, particular around data currently held by crown corporations and in critical industries, like healthcare.

When Industries Get Disrupted: Toronto Real Estate Boards Sad Campaign

5 Replies

As some of my readers know I’ve been engaged by the real estate industry at various points over the last year to share thoughts about how they might be impacted in a world where listings data might be more open.

So I was saddened to read the other day about this misleading campaign the Toronto Real Estate Board (TREB) has launched against the Competition Bureau. It’s got all the makings of a political attack ad. Ominous warnings, supportive polling and a selective use of facts. You can check it out at the Protectyourprivacy.ca website. (As an aside, those concerned with online issues like myself should be beating ourselves up for letting TREB snag that URL. There are literally dozens of more compelling uses for that domain, from Bill c-30 to advocacy around privacy setting in Facebook or Google.)

The campaign does, however, make a wonderful mini-case study in how some industries react when confronted with disruptive change. They don’t try to innovate out of the problem, they go to the lawyers (and the pollsters and marketers). To be fair, not everyone in the Real Estate industry is behaving this way. Over the past several months I’ve had the real pleasure of meeting many, many real estate agents across the country who have been finding ways to innovate.

Which is why I suspect this campaign is actually quite divisive. Indeed, since the public doesn’t really know or care who does what in the real estate industry, they’re just going lump everyone in together. Consequently, if this campaign backfires (and there is a risk that if anyone pays attention to it, it could) than the entire industry could be tarred, not just those at TREB.

So what is the big scary story? Well according to TREB the Competition Bureau has gone rogue and is going to force Canadians to disclose their every personal detail to the world! Specifically, in the words of the Protectyourprivacy.ca website:

The Competition Bureau is trying to dismantle the safeguards for consumers’ personal and private information.

If they get their way, your sensitive personal home information could be made publicly available to anyone on the internet.

Are your alarm bells going off yet? If you’re like me, the answer is probably yes. But like me it not for any of the reasons TREB wants.

To begin with, Canada has a fairly aggressive Privacy Commissioner who is very willing to speak her mind. I suspect she (and possibly her provincial counterparts) were consulted before Competition Commissioner issued her request. And like most Canadians I likely trust the Privacy Commissioner more than TREB. She’s been fairly quiet.

But of course, why speculate about issues! Let’s go straight to the source. What did the Competition Bureau actually ask for? Well you can find all the relevant documents here (funny how TREB’s campaign website does not link to any of these), but check it out yourself. Here is my breakdown of the issue:

1. This is actually about enabling new services – TREB essentially uses MLS – the online listing service where you look for homes, as a mechanism to prevent new ways of looking for homes online from emerging. I suspect that consumers are not well served by this outcome. That is certainly how the Competition Bureau feels.

2. The Competition Bureau is not asking for information like your name and street address to be posted online for all to see (although I actually think consumers should be given that choice). Indeed you can tell a lawyer was involved in drafting the protectyourprivacy.ca website. There are all these strategically inserted “could’s” as in “your sensitive personal home information could be made publicly available.” Err… that’s a fair degree less alarming.

What the Competition Bureau appears to want is to enable brokers’ client to browse homes on a password-protected site (called a “virtual office website”). Here they could get more details than what is currently available to the public at large on MLS. However, even these password protected site might not include things like the current occupants name. It would however (or at least hopefully) include previous sales prices, as knowing the history of the market is quite helpful. I think most consumers agree that a little more transparency around pricing in the real estate industry would be good for consumers.

3. Of course, anything that happens on such a website would still have to comply with Privacy Laws and would, ultimately, still require the sellers consent.

According to TREB however, implementing these recommendations will lead to mayhem and death. Literally. Here is a quote from their privacy officer:

“There is a real possibility of break-ins and assaults; you only have to read the headlines to imagine what might happen. You hear stories about realtors getting attacked and killed. Can you imagine if we put that information out there about consumers? You can only imagine the headlines.”

Happily the Globe confirmed that the Toronto Police department is not aware of realtors being targeted for attack.

But here is the real punchline. Everything the Competition Commissioner is asking for already exists in places like Nova Scotia or across the entire United States.

Here’s what these lucky jurisdictions have not experienced: a rash of violence resulting from burglars and others browsing homes online (mostly because if they were going to do that… they could JUST USE GOOGLE STREET VIEW.).

And here’s what they have experienced: an explosion in new and innovative ways to browse, buy and sell homes. From Trulia to Zillow to Viewpoint consumers can get a radically better online experience than what is available in Toronto.

I suspect that if consumers actually hear about this campaign many – including most under the age of 40 – are going to see it as an effort by an industry to protect itself from new competition, not as an effort to protect them. If the story does break that way, it will be evidence to many consumers that the gap between them and the Real Estate industry is growing, not shrinking.

Some upcoming talks

Citizen Surveillance and the Coming Challenge for Public Institutions

Data.gc.ca – Data Sets I found that are interesting, and some suggestions

1 Reply

Yesterday was the one year anniversary of the Canadian federal government’s open data portal. Over the past year government officials have been continuously adding to the portal, but as it isn’t particularly easy to browse data sets on the website, I’ve noticed a lot of people aren’t aware of what data is now available (self included!). Consequently, I want to encourage people to scan the available data sets and blog about ones that they think might be interesting to them personally, to others, or to communities of interests they may know.

Such an undertaking has been rendered MUCH easier thanks to the data.gc.ca administrators decision to publish a list of all the data sets available on the site. Turns out, there are 11680 data sets listed in this file. Of course, reviewing all this data took me much longer than I thought it would! (and to be clear, I didn’t explore each one in detail), but the process has been deeply interesting. Below are some thoughts, ideas and data sets that have come out of this exploration – I hope you’ll keep reading, and that it will be of interest to ordinary citizens, prospective data users and to managers of open government data portals.

A TagCloud of the Data Sets on data.gc.ca

Some Brief Thoughts on the Portal (and for others thinking about exploring the data)

Trying to review all the data sets on the portal is a enormous task and trying to do it has taught me some lessons about what works and doesn’t. The first is that, while the search function on the website is probably good if you have a keyword or a specific data you are looking for, it is much easier to browse the data in an excel than on the website. What was particularly nice about this is that, in excel, the data was often clustered by type. This made easy to spot related data sets – a great example of this when I found the data on “Building permits, residential values and number of units, by type of dwelling” I could immediately see there were about 12 other data sets on building permits available.

Another issue that became clear to me is the problem of how a data set is classified. For example, because of the way the data is structured (really as a report) the Canadian Dairy Exports data has a unique data file for every month and year (you can look at May 1988 as an example). That means each month is counted as a unique “data set” in the catalog. Of course, French and English versions are also counted as unique. This means that what I would consider to be a single data set “Canadian Dairy Exports Month Dairy Year from 1988 to present” actually counts as 398 data sets. This has two outcomes. First, it is hard to imagine anyone wants the data for just one month. This means a user looking for longitudinal data on this subject has to download 199 distinct data sets (very annoying). Why not just group it into one? Second, given that governments like to keep score about how many data sets they share – counting each month as a unique data set feels… unsportsmanlike. To be clear, this outcome is an artifact of how Agriculture Canada gathers and exports this data, but it is an example of the types of problems an open data catalog needs to come to grips with.

Finally, many users – particularly, but not exclusively, developers – are looking for data that is up to date. Indeed, real time data is particularly sexy since its dynamic nature means you can do interesting things with it. This it was frustrating to occasionally find data sets that were no longer being collected. A great example of this was the Provincial allocation of corporate taxable income, by industry. This data set jumped out at me as I thought it could be quite interesting. Sadly, StatsCan stopped collecting data on this in 1987 so any visualization will have limited use today. This is not to say data like this should be pulled from the catalog, but it might be nice to distinguish between datasets that are being collected on an ongoing basis versus those that are no longer being updated.

Data Sets I found Interesting

Just quickly before I begin, some quick thoughts on my very unscientific methodology for identifying interesting data sets.

First, browsing the data sets really brought home to me how many will be interesting to different groups – we really are in the world of the long tail of public policy. As a result, there is lots of data that I think will be interesting to many, many people that is not on this list.
Second, I tried to not include too much of StatsCan’s data. StatsCan data already has a fairly well developed user base. And while I’m confident that base is going to get bigger still now that its data is free, I figure there are already a number of people who will be sharing/talking about it
Finally, I’ve tried to identify some data sets that I think would make for good mashups or apps. This isn’t easy with federal government data sets since they tend do be more aggregate and high-level than say municipal data sets… but I’ve tried to tease out what I can. That said, I’m sure there is much, much more.

New GeoSpatial API!

So the first data set is a little bit of a cheat since it is not on the open data portal, but I was emailed about it yesterday and it is so damn exciting, I’ve got to share it. It is a recently released public BETA of a new RESTful API from the very cool people at GeoGratis that provides a consolidated access point to several repositories of geospatial data and information products including GeoGratis, GeoPub and Mirage. (huge thank you to the GeoGratis team for sending this to me).

Documentation can be found here (and in french here) and a sample search client that demonstrates some of its functionality and how to interact with the API can be found here. Formats include ATOM, HTML Fragment, CSV, RSS, JSON, and KML. (So you can see results – for example – in Google Earth by using the KML format (example here).

I’m also told that these fine folks have been working on geolocation service, so you can do sexy things like search by place name, by NTS map or by the first three characters of a postal code. Documentation will be posted here in english and french. Super geeks may notice that there is a field in the JSON called CGNDBkey. I’m also told you can use this key to select an individual placename according to the Canadian Geographic names board. Finally, you can also search all their Metadata through search engines like google (here is a sample search for gold they sent me).

All data is currently licensed under GeoGratis.

The National Pollutant Release Inventory

Description: The National Pollutant Release Inventory (NPRI) is Canada’s public inventory of pollutant releases (to air, water and land), disposals and transfers for recycling.

Notes: This is the same data set (but updated) that we used to create emitter.ca. I frankly feel like the opportunities around this data set, for environmentalists, investors (concerned about regulatory and lawsuit risks), the real estate industry, and others, is enormous. The public could be very interested in this.

Greenhouse Gas Emissions Reporting Program

Description: The Greenhouse Gas Emissions Reporting Program (GHGRP) is Canada’s legislated, publicly-accessible inventory of facility-reported greenhouse gas (GHG) data and information.

Notes: What interesting here is that while it doesn’t have lat/longs, it does have facility names and addresses. That means you should be able to cross reference it with the NPRI (which does have lat/longs) to be able to plot where the big greenhouse gas emitters are on a map. Think the same people as the NPRI might be interested in this data.

The Canadian Ice Thickness Program

Description: The Ice Thickness program dataset documents the thickness of ice on the ocean. Measurements begin when the ice is safe to walk on and continue until it is no longer safe to do so. This data can help gauge the impact of global warming and is relevant to shipping data in the north of Canada.

Notes: Students interested in global warming… this could make for some fun visualization.

Argo: Canadian Tracked Data

Description: Argo Data documents some of the approximately 3000 profiling floats were deployed around the world. Once at sea, the float sinks to a preprogrammed target depth of 2000 meters for a preprogrammed period of time. It then floats to the surface, taking temperature and salinity values during its ascent at set depths. — The Canadian Tracked Argo Datadescribes the Argo programme in Canada and provides data and information about Canadian floats.

Notes: Okay, so I can think of no use for this data, but I just that it was so awesome that people are doing this that I totally geeked out.

Civil Aircraft Register Database

Description: Civil Aircraft Register Database – this file contains the current mark, aircraft and owner information of all Canadian civil registered aircraft.

Notes: Here I really think there could be a geeky app. Just a simple app that you can type an aircraft’s number into and it will tell you the owner and details about the plane. I actually think the government could do a lot of work with this data. If regulatory and maintenance data were made available as well – then you’d have a powerful app that would tell you a lot about the planes you fly in. At a minimum would be of interest to flight enthusiasts.

Real Time Hydrometric Data Tool

Description: Real Time Hydrometric Data Tool – this site provides public access to real-time hydrometric (water level and streamflow) data collected at over 1700 locations in Canada. These data are collected under a national program jointly administered under federal-provincial and federal-territorial cost-sharing agreements. It is through partnerships that the Water Survey of Canada program has built a standardized and credible environmental information base for Canada. This dataset contains both current and historical datasets. The current month can be viewed in an HTML table, and historical data can be downloaded in CSV format.

Notes: So ripe for an API! What is cool is that the people at Environment Canada have integrated it into google maps. I could imagine fly fisherman and communities at risk of flooding being interested in this data set.

Access to information data sets

Description: 2006-2010 Access to Information and Privacy Statistics (With the previous years here, here and here.) is a compilation of statistical information about access to information and privacy submitted by government institutions subject to the Access to Information Act and the Privacy Act for 2006-2010.

Notes: I’d love to crunch this stuff again and see whose naughty and nice in the ATIP world…

Poultry and Forestry data

No links, BECAUSE THERE IS SO MUCH OF IT. Anyone interested in the Poultry or Forestry industry will find lots of data… obviously this stuff is useful to people who analyze these industries but I suspect there are a couple of “A” university level papers hidden in that data set as well.

Building Permits

There is tons on building permits., construction.. Actually one of the benefits of looking at the data in a spread sheet, easy to see other related data sets.

StatsCan

It really is amazing how much Statistic Canada data there is. Even reviewing something like the supply and demand of natural gas liquids got me thinking about the wealth of information trapped in there. One thing I do hope statscan starts to do is geolocate its data whenever possible.

Crime Data

As this has been in the news I couldn’t help but include it. It’s nice that any citizen can look at the crime data direct from StatsCan too see how our crime rate is falling (which is why we should build more expensive prisons) Crime statistics, by detailed offences. Of course unreported crime, which we all know is climbing at 3000% a year, is not included in these stats.

Legal Aid Applications

Legal aid applications, by status and type of matter. This was interesting to me since, here in BC there is much talk about funding for the Justice system and yet, the number of legal aid applications has remained more or less flat over the past 5 years.

National Broadband Coverage data

Description: The National Broadband Coverage Data represents broadband coverage information, by technology, for existing broadband service providers as of January 2012. Coverage information for Broadband Canada Program projects is included for all completed projects. Coverage information is aggregated over a grid of hexagons, which are each 6 km across. The estimated range of unserved / underserved population within in each hexagon location is included.

Notes: What’s nice is that there is lat/long data attached to all this, so mapping it, and potentially creating a heat map is possible. I’m certain the people at OpenMedia might appreciate such a map.

Census Consolidated Subdivision

Description: Census Consolidated Subdivision Cartographic Boundary Files portrays the geographic limits used for the 2006 census dissemination. The Census Consolidated Subdivision Boundary Files contain the boundaries of all 2,341 census consolidated subdivisions.

Notes: Obviously this one is on every data geeks radar, but just in case you’ve been asleep for the past 5 months, I wanted to highlight it.

Non-Emergency Surgeries, distribution of waiting times

Description: Non-emergency surgeries, distribution of waiting times, household population aged 15 and over, Canada, provinces and territories

Notes: Would love to see this at the hospital and clinic level!

Border Wait Times

Description: Estimates Border Wait Times (commercial and travellers flow) for the top 22 Canada Border Services Agency land border crossings.

Notes: Here I really think there is an app that could be made. At the very least there is something that could tell you historical averages and ideally, could be integrated into Google and Bing maps when calculating trip times… I can also imagine a lot of companies that export goods to the US are concerned about this issue and would be interested in better data to predict the costs and times of shipping goods. Big potential here.

Okay, that’s my list. Hope it inspires you to take a look yourself, or play with some of the data listed above!

Sharing ideas about data.gc.ca

2 Replies

As some of you may remember, the other week I suggested that on its one year anniversary we hack data.gc.ca – specifically, that people share what data sets they find most interesting on the website, especially as it is hard to search it.

Initially I’d uploaded a list of all the data sets on the catalog to buzzdata. However the other day the data.gc.ca administrators added a data set that is a list of all the data sets available on the site (meta, I know). This new list is, apparently, an even more robust and up to date list than the one I shared earlier and is available in both official languages.

If you do end up finding data you think is particularly interesting, creating a list of your favourite data sets, doing a mash up or visualization or (most ambitiously) creating a better way to search data.gc.ca please send me your results, a link, or at least an email. I’ll be posting what I find interesting tonight or tomorrow morning and would love to link to anything anyone else has done too!

Want to Find Government Innovation? US Military is often leading the way.

3 Replies

When it comes to see what trends will impact government in 20-30 years I’m a big fan of watching the US military. They may do lot of things wrong but, when it comes to government, they are on the bleeding edge of being a “learning organization.” It often feels like they are less risk averse, more likely to experiment, and, (as noted) more likely to learn, than almost any government agency I can think of (hint, those things maybe be interconnected). Few people realize that to rise above Colonel in many military organizations, you must have at least a masters degree. Many complete PhDs. And these schools often turn into places where people challenge authority and the institution’s conventional thinking.

Part of it, I suspect, has to do with the whole “people die when you make mistakes” aspect of their work. It may also have to do with the seriousness with which they take their mandate. And part of it has to do with the resources they have at their disposal.

But regardless of the cause, I find they are often at the cutting edge of ideas in the public sector. For example, I can’t think of a government organization that empowers the lowest echelons of its employee base more than the US military. Their network centric vision of the world means those on the front lines (both literally and figuratively) are often empowered, trusted and strongly supported with tools, data and technology to make decisions on the fly. In an odd way, the very hierarchical system that the rest of government has been modeled on, has really transcended into something different. Still very hierarchical but, at the same time, networked.

Frankly, if a 1-800 call operator at Service Canada or the guy working at the DMV in Pasadena, CA (or even just their managers) had 20% of the autonomy of a US Sargent, I suspect government would be far more responsive and innovative. Of course, Service Canada or the California DMV would have to have a network centric approach to their work… and that’s a ways off since it demands serious cultural challenge, the hardest thing to shift in an org.

Anyways… long rant. Today I’m interested in another smart call the US military is making that government procurement departments around the world should be paying attention to (I’m especially looking at you Public Works – pushers of Beehive over GCPEDIA). This article, Open source helicopters trivialize Europe’s ODF troubles, on Computer World’s Public Sector IT blog outlines how the next generation of US helicopters will be built on an open platform. No more proprietary software that binds hardware to a particular vendor.

Money quote from the piece:

Weapons manufacturers and US forces made an unequivocal declaration for royalty-free standards in January through the FACE (Future Airborne Capabilities Environment) Consortium they formed in response to US Defence Secretary Leon Panetta’s call for a “common aircraft architecture and subsystems”.

“The FACE Standard is an open, nonproprietary technical specification that is publicly available without restrictive contracts, licensing terms, or royalties,” the Consortium announced from its base at The Open Group, the industry association responsible for the POSIX open Unix specification.

“In business terms, the open standards specified for FACE mean that programmers are freely able to use them without monetary remuneration or other obligation to the standards owner,” it said.

While business software producers have opposed governments that have tried to implement identical open standards policies with the claim it will handicap innovation and dampen competition, the US military is embracing open standards for precisely the opposite reasons.

So suddenly the we are going to have an open source approach to innovation and program delivery (helicopter manufacturing, operation and maintenance) at major scale. Trust me, if the US military is trying to do this with helicopters you can convert you proprietary intranet to a open source wiki platform. I can’t believe the complexity is as great. But the larger point here is that this approach could be used to think about any system a government wants to develop, from earthquake monitoring equipment to healthcare systems to transit passes. From a “government as a platform perspective” this could be a project to watch. Lots of potential lessons here.

Access to Information, Open Data and the Problem with Convergence

12 Replies

In response to my post yesterday one reader sent me a very thoughtful commentary that included this line at the end:

“Rather than compare [Freedom of Information] FOI legislation and Open Gov Data as if it’s “one or the other”, do you think there’s a way of talking about how the two might converge?”

One small detail:

So before diving in to the meat let me start by saying I don’t believe anything in yesterday’s post claimed open data was better or worse than Freedom of Information (FOI often referred to in Canada as Access to Information or ATI). Seeing FOI and open data as competing suggests they are similar tools. While they have similar goals – improving access – and there may be some overlap, I increasingly see them as fundamentally different tools. This is also why I don’t see an opportunity for convergence in the short term (more on that below). I do, however, believe open data and FOI processes can be complimentary. Indeed, I’m hopeful open data can alleviate some of the burden placed on FOI system which are often slow. Indeed, in Canada, government departments regularly violate rules around disclosure deadlines. If anything, this complimentary nature was the implicit point in yesterday’s post (which I could have made more explicit).

The Problem with Convergence:

As mentioned above, the overarching goals of open data and FOI systems are similar – to enable citizens to access government information – but the two initiatives are grounded in fundamentally different approaches to dealing with government information. From my view FOI has become a system of case by case review while open data is seeking to engage in an approach of “pre-clearance.”

Part of this has to do with what each system is reacting to. FOI was born, in part, out of a reaction to scandals in the mid 20th century which fostered public support for a right to access government information.

FOI has become a powerful tool for accessing government information. But the infrastructure created to manage it has also had some perverse effects. In some ways FOI has, paradoxically made it harder to gain access to government information. I remember talking to a group of retired reporters who talk about how it was easier to gain access to documents in a pre-FOI era since there were no guidelines and many public servants saw most documents as “public” anyways. The rules around disclosure today – thanks in part to FOI regimes – mean that governments can make closed the “default” setting for government information. In the United States the Ashcroft Memo serves as an excellent example of this problem. In this case the FOI legislation actually becomes a tool that helps governments withhold documents, rather than enable citizens to gain legitimate access.

But the bigger problem is that the process by which access to information requests are fulfilled is itself burdensome. While relevant and necessary for some types of information it is often overkill for others. And this is the niche that open data seeks to fill.

Let me pause to stress, I don’t share the above to disparage FOI. Quite the opposite. It is a critical and important tool and I’m not advocating for its end. Nor am I arguing the open data can – in the short or even medium term – solve the problems raised above.

This is why, over the short term, open data will remain a niche solution – a fact linked to its origins. Like FOI Open data has its roots in government transparency. However, it also evolved out of efforts to tear down antiquated intellectual property regimes to the facilitate sharing of data/information (particularly between organizations and governments). Thus the emphasis was not on case by case review of documents, but rather of clearing rights to categories of information, both created and to be created in the future. In other words, this is about granting access to the outputs of a system, not access to individual documents.

Another way of thinking about this is that open data initiatives seek to leverage the benefits of FOI while jettisoning its burdensome process. If a category of information can be pre-clear in advanced and in perpetuity for privacy, security and IP concerns then FOI processes – essential for individual documents and analysis – becomes unnecessary and one can reduce the transaction costs to citizens wishing to access the information.

Maybe, in the future, the scope of these open data initiatives could become broader, and I hope they will. Indeed there is, ample evidence to suggest that technology could be used to pre-clear or assess the sensitivity of any government document. An algorithm that assess a mixture of who the author is, the network of people who review it and a scan of the words would probably allow ascertain if a document could be released to an ATIP request in seconds, rather than weeks. It could at least give a risk profile and/or strip out privacy related information. These types of reforms would be much more disruptive (in the positive sense) to FOI legislation than open data.

But all that said, just getting the current focus of open data initiatives right would be a big accomplishment. And, even if such initiatives could be expanded, there are limits. I am not so naive to believe that government can be entirely open. Nor am I sure that would be an entirely good outcome. When trying to foster new ideas or assess how to balance competing interests in society, a private place to initiate and play with ideas may be essential. And despite the ruminations above, the limits of government IT systems means there will remain a lot of information – particularly non-data information like reports and analysis – that we won’t be able to “pre-clear- for sharing and downloading. Consequently an FOI regime – or something analogous – will continue to be necessary.

So rather than replace or converge with FOI systems, I hope open data will, for the short to medium term actually divert information out of the FOI, not because it competes, but because it offers a simpler and more efficient means of sharing (for both government and citizens) certain types of information. That said, open data initiatives offer none of the protections or rights of FOI and so this legislation will continue to serve as the fail safe mechanism should a government choose to stop sharing data. Moreover, FOI will continue to be a necessary tool for documents and information that – for all sorts of reasons (privacy, security, cabinet confidence, etc…) cannot fall under the rubric of an open data initiative. So convergence… not for now. But co-existence feels both likely and helpful for both.

Calculating the Value of Canada’s Open Data Portal: A Mini-Case Study

11 Replies

Okay, let’s geek out on some open data portal stats from data.gc.ca. I’ve got three parts to this review: First, an assessment on how to assess the value of data.gc.ca. Second, a look at what are the most downloaded data sets. And third, some interesting data about who is visiting the portal.

Before we dive in, a thank you to Jonathan C sent me some of this data to me the other day after requesting it from Treasury Board, the ministry within the Canadian Government that manages the government’s open data portal.

1. Assessing the Value of data.gc.ca

Here is the first thing that struck me. Many governments talk about how they struggle to find methodologies to measure the value of open data portals/initiatives. Often these assessments focus on things like number of apps created or downloaded. Sometimes (and incorrectly in my mind) pageviews or downloads are used. Occasionally it veers into things like mashups or websites.

However, one fairly tangible value of open data portals is that they cheaply resolve some access to information requests – a point I’ve tried to make before. At the very minimum they give scale to some requests that previously would have been handled by slow and expensive access to information/freedom of information processes.

Let me share some numbers to explain what I mean.

The Canada Government is, I believe, only obligated to fulfill requests that originate within Canada. Drawing from the information in the charts later in this post, let’s say assume there were a total of 2200 downloads in January and that 1/3 of these originated from Canada – so a total of 726 “Canadian” downloads. Thanks to some earlier research, I happen to know that the office of the information commissioner has assessed that the average cost of fulfilling an access to information request in 2009-2010 was $1,332.21.

So in a world without an open data portal the hypothetical cost of fulfilling these “Canadian” downloads as formal access to information requests would have been $967,184.46 in January alone. Even if I’m off by 50%, then the cost – again, just for January – would still sit at $483,592.23. Assuming this is a safe monthly average, then over the course of a year the cost savings could be around $11,606,213.52 or $5,803,106.76 – depending on how conservative you’d want to be about the assumptions.

Of course, I’m well aware that not every one of these downloads would been an information request in a pre-portal world – that process is simply to burdensome. You have to pay a fee, and it has to be by check (who pays for anything by check any more???) so many of these users would simply have abandoned their search for government information. So some of these savings would not have been realized. But that doesn’t mean there isn’t value. Instead the open data portal is able to more cheaply reveal latent demand for data. In addition, only a fraction of the government’s data is presently on the portal – so all these numbers could get bigger still. And finally I’m only assessing downloads that originated inside Canada in these estimates.

So I’m not claiming that we have arrived at a holistic view of how to assess the value of open data portals – but even the narrow scope of assessment I outline above generates financial savings that are not trivial, and this is to say nothing of the value generated by those who downloaded the data – something that is much harder to measure – or of the value of increased access to Canadians and others.

2. Most Downloaded Datasets at data.gc.ca

This is interesting because… well… it’s just always interesting to see what people gravitate towards. But check this out…

Data sets like the Anthropogenic disturbance footprint within boreal caribou ranges across Canada may not seem interesting, but the ground breaking agreement between the Forest Products Association of Canada and a coalition of Environmental Non-Profits – known as the Canadian Boreal Forest Agreement (CBFA) – uses this data set a lot to assess where the endangered woodland caribou are most at risk. There is no app, but the data is critical in both protecting this species and in finding a way to sustainably harvest wood in Canada. (note, I worked as an adviser on the CBFA so am a) a big fan and b) not making this stuff up).

It is fascinating that immigration and visa data tops the list. But it really shouldn’t be a surprise. We are of course, a nation of immigrants. I’m sure that immigration and visa advisers, to say nothing of think tanks, municipal governments, social service non-profits and English as a second language schools are all very keen on using this data to help them understand how they should be shaping their services and policies to target immigrant communities.

There is, of course, weather. The original open government data set. We made this data open for 100s of years. So useful and so important you had to make it open.

And, nice to see Sales of fuel used for road motor vehicles, by province and territory. If you wanted to figure out the carbon footprint of vehicles, by province, I suspect this is a nice dataset to get. Probably is also useful for computing gas prices as it might let you get a handle on demand. Economists probably like this data set.

All this to say, I’m less skeptical than before about the data sets in data.gc.ca. With the exception of weather, these data sets aren’t likely useful to software developers – the group I tend to hear most from – but then I’ve always posited that apps were only going to be a tiny part of the open data ecosystem. Analysis is king for open data and there does appear to be people out there who are finding data of value for analyses they want to make. That’s a great outcome.

Here are the tables outlining the most popular data sets since launch and (roughly) in February.

Top 10 most downloaded datasets, since launch

	DATASET	DEPARTMENT	DOWNLOADS
1	Permanent Resident Applications Processed Abroad and Processing Times (English)	Citizenship and Immigration Canada	4730
2	Permanent Resident Summary by Mission (English)	Citizenship and Immigration Canada	1733
3	Overseas Permanent Resident Inventory (English)	Citizenship and Immigration Canada	1558
4	Canada – Permanent residents by category (English)	Citizenship and Immigration Canada	1261
5	Permanent Resident Applicants Awaiting a Decision (English)	Citizenship and Immigration Canada	873
6	Meteorological Service of Canada (MSC) – City Page Weather	Environment Canada	852
7	Meteorological Service of Canada (MSC) – Weather Element Forecasts	Environment Canada	851
8	Permanent Resident Visa Applications Received Abroad – English Version	Citizenship and Immigration Canada	800
9	Water Quality Indicators – Reports, Maps, Charts and Data	Environment Canada	697
10	Canada – Permanent and Temporary Residents – English version	Citizenship and Immigration Canada	625

Top 10 most downloaded datasets, for past 30 days

	DATASET	DEPARTMENT	DOWNLOADS
1	Permanent Resident Applications Processed Abroad and Processing Times (English)	Citizenship and Immigration Canada	481
2	Sales of commodities of large retailers – English version	Statistics Canada	247
3	Permanent Resident Summary by Mission – English Version	Citizenship and Immigration Canada	207
4	CIC Operational Network at a Glance – English Version	Citizenship and Immigration Canada	163
5	Gross domestic product at basic prices, communications, transportation and trade – English version	Statistics Canada	159
6	Anthropogenic disturbance footprint within boreal caribou ranges across Canada – As interpreted from 2008-2010 Landsat satellite imagery	Environment Canada	102
7	Canada – Permanent residents by category – English version	Citizenship and Immigration Canada	98
8	Meteorological Service of Canada (MSC) – City Page Weather	Environment Canada	61
9	Sales of fuel used for road motor vehicles, by province and territory – English version	Statistics Canada	52
10	Government of Canada Core Subject Thesaurus – English Version	Library and Archives Canada	51

3. Visitor locations

So this is just plain fun. There is not a ton to derive from this – especially as IP addresses can, occasionally, be misleading. In addition, this is page view data, not download data. But what is fascinating is that computers in Canada are not the top source of traffic at data.gc.ca. Indeed, Canada’s share of the traffic is actually quite low. In fact, in January, just taking into account the countries in the chart (and not the long tail of visitors) Canada accounted for only 16% of the traffic to the site. That said, I suspect that downloads were significantly higher from Canadian visitors – although I have no hard evidence of this, just a hypothesis.

•Total visits since launch: 380,276 user sessions

Attack of the Drones – How Surveillance May Change our Culture

14 Replies

I’ve been following the rise of do it yourself (DIY) drones for a few years now, ever since Chris Anderson, the editor of Wired magazine, introduced me to the topic in a podcast. And yes, I’m talking about flying drones… Like those the US Air Force uses to monitor – and attack – enemy forces in Afghanistan. Except, in the case of DIY drones, they are smaller, cheaper, and are being built by a growing legion of hobbyists, companies and enthusiasts all around the world, many of whom are sharing open source UAV plans that can be downloaded off the internet.

You many not know it, but there could be drones in your neighborhood. And this has real implications.

Take, for example, a story that really grabbed my attention a few weeks ago. An animal rights group called SHARK chose to deploy a drone to monitor a live pigeon shoot taking place on private land. It turns out the mere presence of their drone caused the shoot to be cancelled. To begin with, that says a lot of drone’s effectiveness. But what was really interesting was how, in frustration, one or some of the shooters then hid and shot the drone out of the sky.

Think about this.

Here you have a group using what is essentially a mini-helicopter to monitor an activity taking place on what is private property. Then, in response, the other party fires live rounds at the drone and causes it to crash. And all of playing out near a US highway (not a major one, but still, a public road). This is a privacy, legal, and public safety nightmare. The policy and societal implications are significant.

And this is not an isolated use. As the Economist pointed out in its excellent write up on civilian drones in this week’s Quarterly Technology Review, drones are already being used by an environmental group to locate and track Japanese whalers. In the US several police forces already operate drones – including one in Texas which, frighteningly, has the capacity to launch grenades. George Clooney funds a non-profit that uses satellites to monitor Sudan in an effort to prevent atrocities through transparency. Can drones be far behind?

I share all this because, these days, people are often most frightened by the state’s growing interest to monitor what we do online. Here in Canada for example, the government has proposed a law that would require telecommunications firm have the ability to record, and save, everyone’s online activities. But technology to monitor people offline, in the physical world, is also evolving. More importantly, it is becoming available to ordinary citizens. This will have real impacts.

As my friend Luke C. pointed out the other day, it is entirely conceivable that, in 5-7 years, there could be drones that would follow your child as he walks to school. You can of course, already choose to monitor your child by giving them a cell phone and tracking the GPS device within it, but a drone would have several advantages. It would be harder for someone to destroy or “disconnect” from your child. It could also record and save remotely everything that is going on – in order to prevent anyone from harassing or bullying them. It might even remind them to look both ways before crossing the street, in case they forget. Or, because of its high vantage point, it could pick out and warn your child of cyclists and cars they failed to observe. Once your kid is safely at school the drone could whiz home and recharge in time to walk them home at the end of the day. This may all seem creepy to you, but if such a drone cost $100 dollars, how many parents do you think would feel like it was “the responsible thing to do.” I suspect a great deal. Even if it was only 5% of parents… that would be a lot of drones.

And of course there are thousands of other uses. Protestors might want a drone observing them, just so that any police brutality could be carefully recorded for later. Cautious adults may want one hovering over them, especially when going into an unfamiliar or unsafe neighborhoods. Or maybe you’ll want one for your elderly parents… just in case something happens to them? It’s be good to be able to pull them up on a live feed, from anywhere.

If you think back 20 years ago and told someone you were going to give them a device that would enable their government to locate them within a few feet at any given moment, they would likely have imagined some Orwellian future. But this is, functionally, what any smart phone can do. Looking forward 20 years, I ask myself: would my child feel monitored if he has a drone helping him get to school? Or maybe he will he feel unsafe without it? Or maybe it will feel like his Hogwart’s owl, a digital pet? Or maybe all of these outcomes? I’m not sure the answer is obvious.

My larger point is that the pressure to create the surveillance society isn’t going to come exclusively from the state. Indeed, we may find ourselves in a surveillance society not because the state demands it, but because we want the tools for our own useful and/or selfish ends. Some people may argue that this may level the playing field between citizens and the state or powerful organizations. I hope that is true. But maybe the mass adoption of such tools will simply normalize surveillance in our society and culture. That might, in turn, make it easier for the state, or other organizations, or just everyone else, to monitor us.

What I do know is that our government, our police forces, our neighborhoods are wholly unprepared for this. That’s okay, they have some time. But it is coming. At some point we will be living in a society where the technology will exist to enable anyone to deploy a drone that can observe anyone else in a public space, and maybe even in a private space. The challenges and complexities for policy makers are significant, and the implications for our communities, probably even more so. Either way it’s going to make many people’s lives a lot more complicated.

Note, I suspect there are typos in this, but it is 2 am and wordpress already deleted the first draft of this post killing a couple hours of work… so my capacity and patience is low. I hope you’ll forgive me a little.

eaves.ca

if writing is a muscle, this is my gym