The Uncertain Future of Open Data in the Government of Canada

Some thoughts on the relaunched data.gc.ca

Yesterday, I talked about what I thought was the real story that got missed in the fanfare surrounding the relaunch of data.gc.ca. Today I’ll talk about the new data.gc.ca itself.

Before I begin, there is an important disclaimer to share (to be open!). Earlier this year Treasury Board asked me to chair five public consultations across Canada to gather feedback on both its open data program and data.gc.ca in particular. As such, I solicited peoples suggestions on how data.gc.ca could be improved – as well as shared my own – but I was not involved in the creation of data.gc.ca. Indeed the first time I saw the site was on Tuesday when it launched. My role was merely to gather feedback. For those curious you can read the report I wrote here.

There is, I’m happy to say, much to commend about the new open data portal. Of course, aesthetically, it is much easier on the eye, but this is really trivial compared to a number of other changes.

The most important shift relates to the desire of the site to foster community. Users can now register with the site as well as rate and comment on data sets. There are also places like the Developers’ Corner which contains documentation that potential users might find helpful and a sort of app store where government agencies and citizens can posts applications they have created. This shift mirrors the evolution of data.gov, data.gov.uk and DataBC which started out as data repositories but sought to foster and nurture a community of data users. The critical piece here is that simply creating the functionality will probably not be sufficient, in the US, UK and BC it has required dedicated community managers/engagers to help foster such a community. At present it is unclear if that exists behind the website at data.gc.ca.

The other two noteworthy improvements to the site are an improved search and the availability of API’s. While not perfect, the improved search is nonetheless helpful as previously it was basically impossible to find anything on the site. Today a search for “border time” and a border wait time data set is the top result. However, search for “border wait times” and “Biogeochemical exploration using Douglas-fir tree tops in the Mabel Lake area, southern British Columbia (NTS 82L09 and 10)” becomes the top hit with actual border wait time data set pushed down to fifth. That said the search is still a vast improvement and this alone could be a boon to policy wonks, researchers and developers who elect to make use of the site.

The introduction of APIs is another interesting development. For the uninitiated an API (application programming interface) provides continuous access to updated data, so rather than downloading a file, it is more like you are plugging into a socket that delivers data, rather than electricity. The aforementioned border wait time data set is a fantastic example. It is less of a “data set” than of a “data stream” providing the most recent updates of border wait times, like what you would see on the big signs across the highway as you approach the border. By providing it through the open data site it would not, for example, be impossible for Google Maps to scan this data set daily, understand how border wait times fluctuate and incorporate these delays in its predicted travel times. Indeed, it could even querry the API in real time and tell you how long it will take to drive from Vancouver to Seattle, with border delays taken into account. The opportunity for developers and, equally intriguing, government employees and contractors, to build applications a top of these APIs is, in my mind, quite exciting. It is a much, much cheaper and flexible approach than how a lot of government software is currently built.

I also welcome the addition of the ability to search Access to Information (ATIP) requests summaries. That said, I’d like for there to be more than just the summaries, that actually responses would be nice, particularly given that ATIP requests likely represent information people have identified as important. In addition, the tool for exploring government expenditures is interesting, but it is weirdly more notable because, as far as I can tell, none of the data displayed in the tool can be downloaded, meaning it is not very open.

Finally, I will briefly note that the license is another welcome change. For more on that I recommend checking out Teresa Scassa’s blog post on it. Contrary to my above disclaimer I have been more active on this side of things, and hope to have more to share on that another time.

I’m sure, as I and others explore the site in the coming days we will discover more to like and dislike about it, but it is a helpful step forward and another signal that open data is, slowly, being baked into the public service as a core service.

The Real News Story about the Relaunch of data.gc.ca

6 Replies

As many of my open data friends know, yesterday the government launched its new open data portal to great fanfare. While there is much to talk about there – something I will dive into tomorrow – that was not the only thing that happened yesterday.

Indeed, I did a lot of media yesterday between flights and only after it was over did I notice that virtually all the questions focused on the relaunch of data.gc.ca. Yet it is increasingly clear that for me, the much, much bigger story of the portal relaunch was the Prime Minister announcing that Canada would adopt the Open Data Charter.

In other words, Canada just announced that it is moving towards making all government data open by default. Moreover, it even made commitments to make specific “high value” data sets open in the next couple of years.

As an aside, I don’t think the Prime Minister’s office has ever mentioned open data – as far as I can remember, so that was interesting in of itself. But what is still more interesting is what the Prime Minister committed Canada to. The open data charter commits the government to make data open by default as well as four other principles including:

Quality and Quantity
Useable by All
Releasing Data for Improved Governance
Releasing Data for Innovation

In some ways Canada has effectively agreed to implement the equivalent to Presidential Executive Order on Open Data the White House announced last month (and that I analyzed in this blog post). Indeed, the charter is more aggressive than the executive order since it goes on to layout the need to open up not just future data, but also current “high value” data sets. Included among these are data sets the Open Knowledge Foundation has been seeking to get opened via its open data census, as well as some data sets I and many others have argued should be made open, such as the company/business register. Other suggested high value data sets include data on crime, school performance, energy and environment pollution levels, energy consumption, government contracts, national budgets, health prescription data and many, many others. Also included on the list… postcodes – something we are presently struggling with here in Canada.

But the charter wasn’t all the government committed to. The final G8 communique contained many interesting tidbits that again, highlighted commitments to open up data and adhere to international data schemas.

Among these were:

Corporate Registry Data: There was a very interesting section on “Transparency of companies and legal arrangements” which is essentially on sharing data about who owns companies. As an advisory board member to OpenCorporates, this was music to my ears. However, the federal government already does this, the much, much bigger problem is with the provinces, like BC and Quebec that make it difficult or expensive to access this data.
Extractive Industries Transparency Initiative: A commitment that “Canada will launch consultations with stakeholders across Canada with a view to developing an equivalent mandatory reporting regime for extractive companies within the next two years.” This is something I fought to get included into our OGP commitment two years ago but failed to succeed at. Again, I’m thrilled to see this appear in the communique and look forward to the government’s action.
International Aid Transparency Initiative (IATI) and Busan Common Standard on Aid Transparency,: A commitment to make aid data more transparent and downloadable by 2015. Indeed, with all the G8 countries agreed to taking this step it may be possible to get greater transparency around who is spending what money, where on aid. This could help identify duplication as well as in assessments around effectiveness. Given how precious aid dollars are, this is a very welcome development. (h/t Michael Roberts of Acclar.org)

So lots of commitments, some on the more vague side (the open data charter) but some very explicit and precise. And that is the real story of yesterday, not that the country has a new open data portal, but that a lot more data is likely going to get put into that portal over then next 2-5 years. And a tsunami of data could end up in it over the next 10-25 years. Indeed, so much data, that I suspect a portal will no longer be a logical way to share it all.

And therein lies the deeper business and government story in all this. As I mentioned in my analysis of the White House Executive Order that made open data default, the big change here is in procurement. If implemented, this could have a dramatic impact on vendors and suppliers of equipement and computers that collect and store data for the government. Many vendors try to find ways to make their data difficult to export and share so as to lock the government in to their solution. Again, if (and this is a big if) the charter is implemented it will hopefully require a lot of companies to rethink what they offer to government. This is a potentially huge story as it could disrupt incumbents and lead to either big reductions in the costs of procurement (if done right) or big increases and the establishment of the same, or new, impossible to work with incumbents (if done incorrectly).

There is potentially a tremendous amount at stake in how the government handles the procurement side of all this, because whether it realizes it or not, it may have just completely shaken up the IT industry that serves it.

Postscript: One thing I found interesting about the G8 communique was how many times commitments about open data and open data sets occurred in the section that had nothing to do with open data. Will be interesting if that is a trend that continues at the next G8 meeting. Indeed, I wouldn’t be surprised is a specific open data section disappears and instead these references just become part of various issue related commitments.

Policy-Making in a Big Data World

What Traffic Lights Say About the Future of Regulation

1 Reply

I have a piece up on TechPresident about some crazy regulations that took place in Florida that put citizens at greater risk all so the state and local governments can make more money.

Here’s a chunk:

In effect, what the state of Florida is saying is that a $20 million increase in revenue is worth an increase in risk of property damage, injury and death as a result of increased accidents. Based onnational statistics, there are likely about 62 deaths and 5,580 injuries caused by red light running in Florida each year. If shorter yellow lights increased that rate by 10 percent (far less than predicted by the USDOT) that could mean an additional 6 deaths and 560 injuries. Essentially the state will raise a measly extra $35,000 for each injury or death its regulations help to cause, and possibly far less.

The Past, Present and Future of Sensor Journalism

4 Replies

This weekend I had the pleasure of being invited to the Tow Centre for Digital Journalism at the Columbia Journalism School for a workshop on sensor journalism.

The workshop (hashtag #towsenses) brought together a “community of journalists, hackers, makers, academics and researchers to explore the use of sensors in journalism; a crucial source of information for investigative and data journalists.” And, it was fascinating to talk about what role sensors – from the Air Quality Egg to aerial drones – should, could or might play in journalism. Even more fun with a room full of DIYers, academics and journalists with interesting titles such as “applications division manager” or “data journalist.” Most fascinating was a panel on the ethics of sensors in journalism of which I hope to write about another time.

There is, of course, a desire to treat sensors as something new in journalism. And for good reason. Much like I’m sure there were early adopters of camera’s in the newsroom, cameras probably didn’t radically change the newsroom until they were (relatively) cheap, portable and gave you something your audience wanted. Today we may be experiencing something similar with sensors. The costs of creating sophisticated sensors is falling and/or other objects, like our cell phones, can be repurposed to be sensors. The question is… like cameras’ how can the emergence of sensors help journalists? And how might they distract them?

My point is, well, they already do sensor journalism. Indeed, I’d argue that somewhere between 5-15% of many news broadcasts are consumed with sensor journalism. At the very minimum the weather report is a form of sensor journalism. The meteorological group is a part of the news media organization that is completely reliant on sensors to provide it with information which it must analyze and turn into relevant information for its audience. And it is a very specific piece of knowledge that matters to the audience. They are not asking for how the weather came about, but merely and accurate prediction of what the weather will be. For good or (as I feel) for ill, there is not a lot of discussions about climate change on the 6 o’clock news weather report. (As an aside Clay Johnson cleverly pointed out that weather data may also be the government’s oldest, most mature and economically impactful open data set).

Of course weather data is not the only form of sensor journalism going on on a daily basis. Traffic reports frequently rely on sensors, from traffic counting devices to permanently mounted visual sensors (cameras!) that allow one to count, measure, and even model and predict traffic. There may still be others.

So there are already some (small) parts of the journalism world that are dependent on sensors. Of course, some of you may not consider traffic reports and weather reports to be journalism since it is not, well, investigative journalism. But these services are important, have tended to be part of news gathering organizations and are in constant demand by consumers. And while demand may not always the most important metric, it is an indication that this matters to people. My broader point here is that, there is part of the media community that is used to dealing with a type of sensor journalism. Yes, it has low ethical risk (we aren’t pointing these sensors at humans really) but it does mean there are policies, processes, methodologies and practices for thinking about sensors that may exist in news organizations, if not in the newsroom.

It is also a window in the the types of stories that sensors have, at least in the past, been good at helping out with. Specifically there seem to be two criteria: things that both occur at, and that a large number of people want to know about at, a high frequency. Both weather and traffic fit the bill, lots of people want to know about them, often twice a day, if not more frequently. So it might be worth thinking about, what are the other types of issues or problems that interest journalist that do, or could conform, with that criteria? In addition, if we are able to lower the cost of gathering and analyzing the data, does it become feasible, or profitable to serve smaller, niche audiences?

None of this is to say that sensors can’t, won’t or shouldn’t be used to cover investigative journalism projects. The work Public Labs did in helping map the extent of the oil spill along the gulf coast is a fantastic example of where sensors may be critical in journalism (as well as advocacy and evidence building) as has been the example of groups like Safecast and others who monitored radioactivity levels in Japan after the Fukushima disaster. Indeed I think the possibilities of sensors in investigative journalism are both intriguing, and potentially very, very bright. I just love for us to build off of work that is already being done – even if it is in the (journalistically) mundane space of traffic and weather rather than imagine we are beginning with an entirely blank slate.

Some Nice Journalistic Data Visualization – Global’s Crude Awakening

2 Replies

Over at Global, David Skok and his team have created a very nice visualization of the over 28,666 crude oil spills that have happened on Alberta pipelines over the last 37 years (that’s about two a day). Indeed, for good measure they’ve also visualized the additional 31,453 spills of “other” substance carried by Alberta pipeline (saltwater, liquid petroleum, etc..)

They’ve even created a look up feature so you can tackle the data geographically, by name, or by postal code. It is pretty in depth.

Of course, I believe all this data should be open. Sadly, they have to get at it through a complicated Access to Information Request that appears to have consumed a great deal of time and resources and that would probably only be possible by a media organizations with the dedicated resources (legal and journalistic) and leverage to demand it. Had this data been open there would have still been a great deal of work to parse, understand and visualize it, but it would have helped lower the cost of development.

In fact, if you are curious about how they got the data – and the sad, sad, story it involved – take a look at the fantastic story they wrote about the creation of their oilspill website. This line really stood out for me:

An initial Freedom of Information request – filed June 8, 2012, the day after the Sundre spill – asked Alberta Environment and Sustainable Resource Development for information on all reported spills from the oil and gas industry, from 2006 to 2012.

About a month later, Global News was quoted a fee of over $4,000 for this information. In discussions with the department, it turned out this high fee was because the department was unable to provide the information in an electronic format: Although it maintained a database of spills, the departmental process was to print out individual reports on paper, and to charge the requester for every page.

So the relevant government department has the data in a machine readable form. It just chooses to only give it out in a paper form. Short of simply not releasing the data at all it is hard to imagine a more obstructionist approach to preventing the public from accessing environmental data their tax dollars paid to collect and that is supposed to be in the public interest. You essentially look at thousands of pieces of paper and re-enter tens, if not hundreds of thousands, of data points into spreadsheets. This is a process designed to prevent you from learning anything and frustrating potential users.

Let’s hope that when the time comes for the Global team to update this tool and webpage there will be open data they can download and access to the task is a little easier.

Awesome Simple Open Data use case – Welcome Wagon for New Community Businesses

3 Replies

A few weeks ago I was at an event in Victoria, British Columbia at event where people were discussing the possibilities, challenges and risk of open data. During the conversation, one of the participants talked about how they wanted an API for business license applications from the city.

This is a pretty unusual request – people have told me about their desire for business licenses data especially at the provincial/state and national level, but also at the local level. However, they are usually happy with a data dump once a year or quarter since they generally want to analyze the data for urban planning or business planning reasons. But an API – which would mean essentially constant access to the data and the opportunity to see changes to the database in real time (e.g. if a business registered or moved) – was different.

The reason? The individual – who was an entrepreneur and part of the local Business Improvement Area – wanted to be able to offer a “welcome wagon” to other new businesses in his community. If he knew when a business opened up he could reach out and help make them welcome them to the neighborhood. He thought it was always nice when shopkeepers knew one another but didn’t always know what was going on even a few blocks away because, well, he was often manning his own shop. I thought it was a deeply fascinating example of how open data could help foster community and is something I would have never imagined.

Food for thought and wanted to share.

Duffy, the Government and the problem with “no-notes” meetings

1 Reply

So, for my non-canadian readers, there is a significant scandal brewing up here in Canadaland, regarding a senator, who claimed certain expenses he was not allowed to (to the tune of $90,000) and then had that debt paid for by the Prime Minister’s chief of staff (who has now resigned).

This short paragraph in an article by the star captures the dark nature of the exchange:

…once the repayment was made, Duffy stopped cooperating with independent auditors examining his expense claims. When a Senate committee met behind closed doors in early May to write the final report on the results of the audit, Conservatives used their majority to soften the conclusions about Duffy’s misuse of taxpayers’ money.

So, it was money designed to make a potential minor scandal go away.

Now the opposition party ethics critic Charlie Angus is calling for an RCMP investigation. From the same story:

“Where is the paper trail? Who was involved,” Angus said.

“If there is a signed agreement between Mr. Duffy and Mr. Wright or Mr. Duffy and the Prime Minister’s Office, we need to see that documentation,” he said.

And herein lies an interesting rub. This government is infamous for holding “no-note” meetings. I’ve been told about such meetings on numerous occasions by public servants. Perhaps they are making it up. But I don’t think so. The accusations have happened too many times.

So maybe there is a paper trail between the Prime Minister’s Office and Senator Duffy. There were lawyers involved… so one suspects there was. But who knows. Maybe there isn’t. And the lack of a paper trail won’t give many people confidence. Indeed, with Duffy now no longer in the Conservative Caucus and with the Chief of Staff resigned everyone is now looking at the Prime Minister, people are now starting to focus on the Prime Minister. What did he know?

And therein lies the bigger rub. In an environment where there is no paper trail one cannot audit who knew what or who is responsible. This means everyone is responsible, all the way to the top. So a “no-notes” meeting can be good for keeping the public from knowing certain decisions, and who made them, but the approach fails once the public finds out about one of these decisions and starts to care. If there is no smoking email in which someone else claims responsibility, it is going to be hard for this not to reach the Prime Minister’s desk.

Politicians and political staff will sometimes forget that the rules and processes around meetings – which appear to be designed to simply promote a degree of (annoying to them) transparency and accountability – are as much about creating a record for the public as they are about protecting the people involved.

Thoughts on the White House Executive Order on Open Data

4 Replies

As those steeped in the policy wonk geekery of open data are likely already aware, last Thursday the President of the United States issued an Executive Order Making Open and Machine Readable the New Default for Government Information.

This is, quite frankly, a big deal. Further down in the post I’ve got some links and some further explanations why.

That said, the White House called and asked if I would be willing to provide some context about the significance of the order – which I did. You can read my reaction, along with those of a number of people I respect, here. Carl Malamud is, as always, the most succinct and dramatic.

Here are some further thoughts:

Relevant Links

A link to the press release

White House CTO, Todd Park, and CIO, Steve VanRoekel, explaining Open Government Data and its significance.

A fact sheet on the announcement

A link to the Executive Order

A policy memo about the Executive

And, perhaps most interestingly, a link to Project Open Data, a site with a TON of resources about doing open data within government, including job descriptions, best practices and even tools (e.g. code) you can download to help with an open data deployment in at the city, state or national level. Indeed if you are a public servant reading this (and, I know many of you are), I strongly encourage you to take a look at this site. Tools here could save you tens to hundreds of thousands of dollars in software development costs for a range of projects. I love this example, for one: Kickstart – “A simple WordPress plugin to help agencies kickstart their open data efforts by allowing citizens to browse existing datasets and vote for suggested priorities.”

What the White House did right

Here is the genius of this executive order. At its core it deals with something that is hard to communicate to a lot of people in a meaningful way. Here is the executive order for dummies version: This is essentially a core change to procurement and information publication. From a procurement perspective it basically means from now on, if you work in the US government and you buy a computer or software that is going to store or collect data, it sure as hell better be able to export it in a way that others can re-use it. From a information publication perspective, having the ability to publish the data is not sufficient, you actually have to publish the data.

This change is actually quite wide ranging. So much so that it could be hard for many people to understand its significance. This is why I love the emphasis on what I would refer to as strategic data sets – data sets on healthcare, education, energy and safety. While the order pertains to data that is much, much broader than this, talking about datasets like the 5-Star Safety Ratings System about almost every vehicle in America or data on most appliances’ Energy Star rating brings it down to earth. This is information the average American can wrap their head around and agree should be made more widely available.

The point is that while I’m in favour of making government data more available, I’m particularly interested in using it to drive for policy outcomes that are in the public interest. Finding better ways to get people safety, health, energy or education data in their hands at the moment they are making an important decision is something open data can facilitate. If, when you are making a purchase or about to create a new project, there is some software that can filter your choices by safety rating or prompt you to rethink your criteria in a way that will enhance your safety or reduce your carbon footprint, I find that compelling. So more availability to government data for research or even just access… yes! But access to specific data sets with the goal of improving specific outcomes is also very important, and this is clearly one of the goals of this order.

What this executive order is not

It is important to note what this Executive Order is not. While I think it can help citizens make better choices, improve access to some types of information, offer researchers and policy wonks more data to test theories and propose solutions and improve productivity within and outside government, I do not think it will not change politics in America. Had this order existed, it would not have magically prevented the Iraq War by, for example, making CIA analysis more scrutable. Nor will it directly rein in lobbyists or make money matter less. This is more about changing the way government works, not the effect that politics has on government decisions. Maybe it will have that impact in the long run (or the opposite impact), but it will be through second and third order effects that I’m all too happy to confess that I currently don’t see.

No one is claiming that this release somehow makes the US government “open” – there are still lots of examples about policies and processes in the White House that require greater transparency. Transparency and openness in government move on several axes. Progress along one axis does not automatically mean there is progress along all axes. And even progress will foster new challenges and demand new types of vigilance. For example, I also suspect, over time, the order may impact what data governments elect to collect, if, by default, it is to be made open. The order could, in some cases, make the data more political, something I’ve argued here.

This is to say that there is no panacea and this order does not create some perfectly transparent government. But it is an important step, and one that other governments should be looking at closely. It is an effort to reposition government to better participate in and be relevant in a data driven and networked world, and it does foster a level of access around a class of information, data, that is too often kept hidden from citizens. For that reason, it is worthy of much praise.

eaves.ca

if writing is a muscle, this is my gym