Tag Archives: opendata

The Uncertain Future of Open Data in the Government of Canada

It is possible to state that presently, open data is at its high water mark in the Government of Canada. Data.gc.ca has been refreshed, more importantly, the government has signed the Open Data Charter committing it to making data “open” by default, and a rash of new data sets have been made available.

In other words there is a lot of momentum in the right direction. So what could go wrong.

The answer…? Everything.

The reason is the upcoming cabinet shuffle.

I confess that Minister Clement and I have not agreed on all things. I believe – like the evidence shows us – that needle injection sites such as Insite make communities safer, save lives and make it easier for drug users to get help. As Health Minister, Clement did not. I argued strongly against dismantling of the mandatory long form census, noting its demise would make our government dumber and, ultimately, more expensive. As Industry Minister, Minister Clement was responsible for the end of a reliable long form census.

However, when it comes to open data, Minister Clement has been a powerful voice in a government that has, on many occasions, looked for ways to make access to information harder, not easier. Indeed, open data advocates have been lucky to have had two deeply supportive ministers, Clement and, prior to him, Stockwell Day (who also felt strongly about this issue and was incredibly responsive to many of my concerns when I shared them). This run, though, may be ending.

With the Government in trouble there is wide spread acceptance that a major cabinet re-shuffle will be in order. While Minister Clement has been laying a lot of groundwork for the upcoming negotiations with the public sector unions and a rethink of the public service could be more effective and accountable, he may not be sticking around to see this work (that I’m sure the government sees as essential) through to the end. Nor may he want to. Treasury Board remains a relatively inward facing ministry and it would not surprise me if both the Minister, and the PMO, were interested in moving him to a portfolio that was more outward and public facing. Only a notable few politicians dream of wrestling with public servants and figuring out how to reform the public service. (Indeed Reg Alcock is the only one I can think of).

If the Minister is moved it will be a real test for the sustainability of open data at the federal level. Between the Open Data charter, the expertise and team built up within Treasury Board and hopefully some educational work Minister Clement has done within his own caucus, ideally there is enough momentum and infrastructure in place that the open data file will carry on. This is very much what I hope to be the case.

But much may depend on who is made President of the Treasury Board.  If that role changes open data advocates may find themselves busy not doing new things, but rather safe guarding gains already made.

 

Some thoughts on the relaunched data.gc.ca

Yesterday, I talked about what I thought was the real story that got missed in the fanfare surrounding the relaunch of data.gc.ca. Today I’ll talk about the new data.gc.ca itself.

Before I begin, there is an important disclaimer to share (to be open!). Earlier this year Treasury Board asked me to chair five public consultations across Canada to gather feedback on both its open data program and data.gc.ca in particular. As such, I solicited peoples suggestions on how data.gc.ca could be improved – as well as shared my own – but I was not involved in the creation of data.gc.ca. Indeed the first time I saw the site was on Tuesday when it launched. My role was merely to gather feedback. For those curious you can read the report I wrote here

There is, I’m happy to say, much to commend about the new open data portal. Of course, aesthetically, it is much easier on the eye, but this is really trivial compared to a number of other changes.

The most important shift relates to the desire of the site to foster community. Users can now register with the site as well as rate and comment on data sets. There are also places like the Developers’ Corner which contains documentation that potential users might find helpful and a sort of app store where government agencies and citizens can posts applications they have created. This shift mirrors the evolution of data.govdata.gov.uk and DataBC which started out as data repositories but sought to foster and nurture a community of data users. The critical piece here is that simply creating the functionality will probably not be sufficient, in the US, UK and BC it has required dedicated community managers/engagers to help foster such a community. At present it is unclear if that exists behind the website at data.gc.ca.

The other two noteworthy improvements to the site are an improved search and the availability of API’s. While not perfect, the improved search is nonetheless helpful as previously it was basically impossible to find anything on the site. Today a search for “border time” and a border wait time data set is the top result. However, search for “border wait times” and “Biogeochemical exploration using Douglas-fir tree tops in the Mabel Lake area, southern British Columbia (NTS 82L09 and 10)” becomes the top hit with actual border wait time data set pushed down to fifth. That said the search is still a vast improvement and this alone could be a boon to policy wonks, researchers and developers who elect to make use of the site.

The introduction of APIs is another interesting development. For the uninitiated an API (application programming interface) provides continuous access to updated data, so rather than downloading a file, it is more like you are plugging into a socket that delivers data, rather than electricity. The aforementioned border wait time data set is a fantastic example. It is less of a “data set” than of a “data stream” providing the most recent updates of border wait times, like what you would see on the big signs across the highway as you approach the border. By providing it through the open data site it would not, for example, be impossible for Google Maps to scan this data set daily, understand how border wait times fluctuate and incorporate these delays in its predicted travel times. Indeed, it could even querry the API  in real time and tell you how long it will take to drive from Vancouver to Seattle, with border delays taken into account. The opportunity for developers and, equally intriguing, government employees and contractors, to build applications a top of these APIs is, in my mind, quite exciting. It is a much, much cheaper and flexible approach than how a lot of government software is currently built.

I also welcome the addition of the ability to search Access to Information (ATIP) requests summaries. That said, I’d like for there to be more than just the summaries, that actually responses would be nice, particularly given that ATIP requests likely represent information people have identified as important. In addition, the tool for exploring government expenditures is interesting, but it is weirdly more notable because, as far as I can tell, none of the data displayed in the tool can be downloaded, meaning it is not very open.

Finally, I will briefly note that the license is another welcome change. For more on that I recommend checking out Teresa Scassa’s blog post on it. Contrary to my above disclaimer I have been more active on this side of things, and hope to have more to share on that another time.

I’m sure, as I and others explore the site in the coming days we will discover more to like and dislike about it, but it is a helpful step forward and another signal that open data is, slowly, being baked into the public service as a core service.

 

The Real News Story about the Relaunch of data.gc.ca

As many of my open data friends know, yesterday the government launched its new open data portal to great fanfare. While there is much to talk about there – something I will dive into tomorrow – that was not the only thing that happened yesterday.

Indeed, I did a lot of media yesterday between flights and only after it was over did I notice that virtually all the questions focused on the relaunch of data.gc.ca. Yet it is increasingly clear that for me, the much, much bigger story of the portal relaunch was the Prime Minister announcing that Canada would adopt the Open Data Charter.

In other words, Canada just announced that it is moving towards making all government data open by default. Moreover, it even made commitments to make specific “high value” data sets open in the next couple of years.

As an aside, I don’t think the Prime Minister’s office has ever mentioned open data – as far as I can remember, so that was interesting in of itself. But what is still more interesting is what the Prime Minister committed Canada to. The open data charter commits the government to make data open by default as well as four other principles including:

  • Quality and Quantity
  • Useable by All
  • Releasing Data for Improved Governance
  • Releasing Data for Innovation

In some ways Canada has effectively agreed to implement the equivalent to Presidential Executive Order on Open Data the White House announced last month (and that I analyzed in this blog post). Indeed, the charter is more aggressive than the executive order since it goes on to layout the need to open up not just future data, but also current “high value” data sets. Included among these are data sets the Open Knowledge Foundation has been seeking to get opened via its open data census, as well as some data sets I and many others have argued should be made open, such as the company/business register. Other suggested high value data sets include data on crime, school performance, energy and environment pollution levels, energy consumption, government contracts, national budgets, health prescription data and many, many others. Also included on the list… postcodes – something we are presently struggling with here in Canada.

But the charter wasn’t all the government committed to. The final G8 communique contained many interesting tidbits that again, highlighted commitments to open up data and adhere to international data schemas.

Among these were:

  • Corporate Registry Data: There was a very interesting section on “Transparency of companies and legal arrangements” which is essentially on sharing data about who owns companies. As an advisory board member to OpenCorporates, this was music to my ears. However, the federal government already does this, the much, much bigger problem is with the provinces, like BC and Quebec that make it difficult or expensive to access this data.
  • Extractive Industries Transparency Initiative: A commitment that “Canada will launch consultations with stakeholders across Canada with a view to developing an equivalent mandatory reporting regime for extractive companies within the next two years.” This is something I fought to get included into our OGP commitment two years ago but failed to succeed at. Again, I’m thrilled to see this appear in the communique and look forward to the government’s action.
  • International Aid Transparency Initiative (IATI) and Busan Common Standard on Aid Transparency,: A commitment to make aid data more transparent and downloadable by 2015. Indeed, with all the G8 countries agreed to taking this step it may be possible to get greater transparency around who is spending what money, where on aid. This could help identify duplication as well as in assessments around effectiveness. Given how precious aid dollars are, this is a very welcome development. (h/t Michael Roberts of Acclar.org)

So lots of commitments, some on the more vague side (the open data charter) but some very explicit and precise. And that is the real story of yesterday, not that the country has a new open data portal, but that a lot more data is likely going to get put into that portal over then next 2-5 years. And a tsunami of data could end up in it over the next 10-25 years. Indeed, so much data, that I suspect a portal will no longer be a logical way to share it all.

And therein lies the deeper business and government story in all this. As I mentioned in my analysis of the White House Executive Order that made open data default, the big change here is in procurement. If implemented, this could have a dramatic impact on vendors and suppliers of equipement and computers that collect and store data for the government. Many vendors try to find ways to make their data difficult to export and share so as to lock the government in to their solution. Again, if (and this is a big if) the charter is implemented it will hopefully require a lot of companies to rethink what they offer to government. This is a potentially huge story as it could disrupt incumbents and lead to either big reductions in the costs of procurement (if done right) or big increases and the establishment of the same, or new, impossible to work with incumbents (if done incorrectly).

There is potentially a tremendous amount at stake in how the government handles the procurement side of all this, because whether it realizes it or not, it may have just completely shaken up the IT industry that serves it.

 

Postscript: One thing I found interesting about the G8 communique was how many times commitments about open data and open data sets occurred in the section that had nothing to do with open data. Will be interesting if that is a trend that continues at the next G8 meeting. Indeed, I wouldn’t be surprised is a specific open data section disappears and instead these references just become part of various issue related commitments.

 

 

 

Some Nice Journalistic Data Visualization – Global’s Crude Awakening

Over at Global, David Skok and his team have created a very nice visualization of the over 28,666 crude oil spills that have happened on Alberta pipelines over the last 37 years (that’s about two a day). Indeed, for good measure they’ve also visualized the additional 31,453 spills of “other” substance carried by Alberta pipeline (saltwater, liquid petroleum, etc..)

They’ve even created a look up feature so you can tackle the data geographically, by name, or by postal code. It is pretty in depth.

Of course, I believe all this data should be open. Sadly, they have to get at it through a complicated Access to Information Request that appears to have consumed a great deal of time and resources and that would probably only be possible by a media organizations with the  dedicated resources (legal and journalistic) and leverage to demand it. Had this data been open there would have still been a great deal of work to parse, understand and visualize it, but it would have helped lower the cost of development.

In fact, if you are curious about how they got the data – and the sad, sad, story it involved – take a look at the fantastic story they wrote about the creation of their oilspill website. This line really stood out for me:

An initial Freedom of Information request – filed June 8, 2012, the day after the Sundre spill – asked Alberta Environment and Sustainable Resource Development for information on all reported spills from the oil and gas industry, from 2006 to 2012.

About a month later, Global News was quoted a fee of over $4,000 for this information. In discussions with the department, it turned out this high fee was because the department was unable to provide the information in an electronic format: Although it maintained a database of spills, the departmental process was to print out individual reports on paper, and to charge the requester for every page.

So the relevant government department has the data in a machine readable form. It just chooses to only give it out in a paper form. Short of simply not releasing the data at all it is hard to imagine a more obstructionist approach to preventing the public from accessing environmental data their tax dollars paid to collect and that is supposed to be in the public interest. You essentially look at thousands of pieces of paper and re-enter tens, if not hundreds of thousands, of data points into spreadsheets. This is a process designed to prevent you from learning anything and frustrating potential users.

Let’s hope that when the time comes for the Global team to update this tool and webpage there will be open data they can download and access to the task is a little easier.

 

Awesome Simple Open Data use case – Welcome Wagon for New Community Businesses

A few weeks ago I was at an event in Victoria, British Columbia at event where people were discussing the possibilities, challenges and risk of open data. During the conversation, one of the participants talked about how they wanted an API for business license applications from the city.

This is a pretty unusual request – people have told me about their desire for business licenses data especially at the provincial/state and national level, but also at the local level. However, they are usually happy with a data dump once a year or quarter since they generally want to analyze the data for urban planning or business planning reasons. But an API – which would mean essentially constant access to the data and the opportunity to see changes to the database in real time (e.g. if a business registered or moved) – was different.

The reason? The individual – who was an entrepreneur and part of the local Business Improvement Area – wanted to be able to offer a “welcome wagon” to other new businesses in his community. If he knew when a business opened up he could reach out and help make them welcome them to the neighborhood. He thought it was always nice when shopkeepers knew one another but didn’t always know what was going on even a few blocks away because, well, he was often manning his own shop. I thought it was a deeply fascinating example of how open data could help foster community and is something I would have never imagined.

Food for thought and wanted to share.

 

The Value of Open Data – Don’t Measure Growth, Measure Destruction

Alexander Howard – who, in my mind, is the best guy covering the Gov 2.0 space – pinged me the other night to ask “What’s the best evidence of open data leading to economic outcomes that you’ve seen?”

I’d like to hack the question because – I suspect – for many people, they will be looking to measure “economic outcomes” in ways that I don’t think will be so narrow as to be helpful. For example, if you are wondering what the big companies are going to be that come out of the open data movement and/or what are the big savings that are going to be found by government via sifting through the data, I think you are probably looking for the wrong indicators.

Why? Part of it is because the number of “big” examples is going to be small.

It’s not that I don’t think there won’t be any. For example several years ago I blogged about how FOIed (or, in Canada ATIPed) data that should have been open helped find $3.2B in evaded tax revenues channeled through illegal charities. It’s just that this is probably not where the wins will initially take place.

This is in part because most data for which there was likely to be an obvious and large economic impact (eg spawning a big company or saving a government millions) will have already been analyzed or sold by governments before the open data movement came along. On the analysis side of the question- if you are very confident a data set could yield tens or hundreds of millions in savings… well… you were probably willing to pay SAS or some other analytics firm 30-100K to analyze it. And you were probably willing to pay SAP a couple of million (a year?) to set up the infrastructure to just gather the data.

Meanwhile, on the “private sector company” side of the equation – if that data had value, there were probably eager buyers. In Canada for example, interest in census data – to help with planning where to locate stores or how to engage in marketing and advertising effectively – was sold because the private sector made it clear they were willing to pay to gain access to it. (Sadly, this was bad news for academics, non-profits and everybody else, for whom it should have been free, as it was in the US).

So my point is, that a great deal of the (again) obvious low hanging fruit has probably been picked long before the open data movement showed up, because governments – or companies – were willing to invest some modest amounts to create the benefits that picking those fruit would yield.

This is not to say I don’t think there are diamonds in the rough out there – data sets that will reveal significant savings – but I doubt they will be obvious or easy finds. Nor do I think that billion dollar companies are going to spring up around open datasets over night since –  by definition – open data has low barriers to entry to any company that adds value to them. One should remember it took Red Hat two decades to become a billion dollar company. Impressive, but it is still a tiny compared to many of its rivals.

And that is my main point.

The real impact of open data will likely not be in the economic wealth it generates, but rather in its destructive power. I think the real impact of open data is going to be in the value it destroys and so in the capital it frees up to do other things. Much like Red Hat is fraction of the size of Microsoft, Open Data is going to enable new players to disrupt established data players.

What do I mean by this?

Take SeeClickFix. Here is a company that, leveraging the Open311 standard, is able to provide many cities with a 311 solution that works pretty much out of the box. 20 years ago, this was a $10 million+ problem for a major city to solve, and wasn’t even something a small city could consider adopting – it was just prohibitively expensive. Today, SeeClickFix takes what was a 7 or 8 digit problem, and makes it a 5 or 6 digit problem. Indeed, I suspect SeeClickFix almost works better in a small to mid-sized government that doesn’t have complex work order software and so can just use SeeClickFix as a general solution. For this part of the market, it has crushed the cost out of implementing a solution.

Another example. And one I’m most excited. Look at CKAN and Socrata. Most people believe these are open data portal solutions. That is a mistake. These are data management companies that happen to have simply made “sharing (or “open”) a core design feature. You know who does data management? SAP. What Socrata and CKAN offer is a way to store, access, share and engage with data previously gathered and held by companies like SAP at a fraction of the cost. A SAP implementation is a 7 or 8 (or god forbid, 9) digit problem. And many city IT managers complain that doing anything with data stored in SAP takes time and it takes money. CKAN and Socrata may have only a fraction of the features, but they are dead simple to use, and make it dead simple to extract and share data. More importantly they make these costly 7 and 8 digital problems potentially become cheap 5 or 6 digit problems.

On the analysis side, again, I do hope there will be big wins – but what I really think open data is going to do is lower the costs of creating lots of small wins – crazy numbers of tiny efficiencies. If SAP and SAS were about solving the 5 problems that could create 10s of millions in operational savings for governments and companies then Socrata, CKAN and the open data movement is about finding the 1000 problems for which you can save between $20,000 and $1M in savings. For example, when you look at the work that Michael Flowers is doing in NYC, his analytics team is going to transform New York City’s budget. They aren’t finding $30 million dollars in operational savings, but they are generating a steady stream of very solid 6 to low 7 digit savings, project after project. (this is to say nothing of the lives they help save with their work on ambulances and fire safety inspections). Cumulatively  over time, these savings are going to add up to a lot. But there probably isn’t going to be a big bang. Rather, we are getting into the long tail of savings. Lots and lots of small stuff… that is going to add up to a very big number, while no one is looking.

So when I look at open data, yes, I think there is economic value. Lots and lots of economic value. Hell, tons of it.

But it isn’t necessarily going to happen in a big bang, and it may take place in the creative destruction it fosters and so the capital it frees up to spend on other things. That may make it potentially harder to measure (I’m hoping some economist much smarter than me is going tell me I’m wrong about that) but that’s what I think the change will look like.

Don’t look for the big bang, and don’t measure the growth in spending or new jobs. Rather let’s try to measure the destruction and cumulative impact of a thousand tiny wins. Cause that is where I think we’ll see it most.

Postscript: Apologies again for any typos – it’s late and I’m just desperate to get this out while it is burning in my brain. And thank you Alex for forcing me to put into words something I’ve been thinking about saying for months.

 

Canada Post and the War on Open Data, Innovation & Common Sense (continued, sadly)

Almost exactly a year ago I wrote a blog post on Canada Post’s War on the 21st Century, Innovation & Productivity. In it I highlighted how Canada Post launched a lawsuit against a company – Geocoder.ca – that recreates the postal code database via crowdsourcing. Canada Posts case was never strong, but then, that was not their goal. As a large, tax payer backed company the point wasn’t to be right, it was to use the law as a way to financial bankrupt a small innovator.

This case matters – especially to small start ups and non-profits. Open North – a non-profit on which I sit on the board of directors – recently explored what it would cost to use Canada Posts postal code data base on represent.opennorth.ca, a website that helps identify elected officials who serve a given address. The cost? $9,000 a year, nothing near what it could afford.

But that’s not it. There are several non-profits that use Represent to help inform donors and other users of their website about which elected officials represent geographies where they advocate for change. The licensing cost if you include all of these non-profits and academic groups? $50,000 a year.

This is not a trivial sum, and it is very significant for non-profits and academics. It is also a window into why Canada Post is trying to sue Geocoder.ca – which offers a version of its database for… free. That a private company can offers a similar service at a fraction of the cost (or for nothing) is, of couse, a threat.

Sadly, I wish I could report good news on the one year anniversary of the case. Indeed, I should be!

This is because what should have been the most important development was how the Federal Court of Appeal made it even more clear that data cannot be copyrighted. This probably made it Canada Post’s lawyers that they were not going to win and made it even more obvious to us in the public that the lawsuit against geocoder.ca – which has not been dropped-  was completely frivolous.

Sadly, Canada Post reaction to this erosion of its position was not to back off, but to double down. Recognizing that they likely won’t win a copyright case over postal code data, they have decided:

a) to assert that they hold trademark on the words ‘postal code’

b) to name Ervin Ruci – the opertator of Geocoder.ca – as a defendent in the case, as opposed to just his company.

The second part shows just how vindictive Canada Post’s lawyers are, and reveals the true nature of this lawsuit. This is not about protecting trademark. This is about sending a message about legal costs and fees. This is a predatory lawsuit, funded by you, the tax payer.

But part a is also sad. Having seen the writing on the wall around its capacity to win the case around data, Canada Post is suddenly decided – 88 years after it first started using “Postal Zones” and 43 years after it started using “Postal Codes” to assert a trade mark on the term? (You can read more on the history of postal codes in canada here).

Moreover the legal implications if Canada Post actually won the case would be fascinating. It is unclear that anyone would be allowed to solicit anybody’s postal code – at least if they mentioned the term “postal code” – on any form or website without Canada Posts express permission. It leads one to ask. Does the federal government have Canada Post’s express permission to solicit postal code information on tax forms? On Passport renewal forms? On any form they have ever published? Because if not, they are, I understand Canada Posts claim correctly, in violation of Canada Post trademark.

Given the current government’s goal to increase the use of government data and spur innovation, will they finally intervene in what is an absurd case that Canada Post cannot win, that is using tax payer dollars to snuff out innovators, increases the costs of academics to do geospatial oriented social research and that creates a great deal of uncertainty about how anyone online be they non-profits, companies, academics, or governments, can use postal codes.

I know of no other country in the world that has to deal with this kind of behaviour from their postal service. The United Kingdom compelled its postal service to make postal code information public years ago.In Canada, we handle the same situation by letting a tax payer subsidized monopoly hire expensive lawyers to launch frivolous lawsuits against innovators who are not breaking the law.

That is pretty telling.

You can read more about this this, and see the legal documents on Ervin Ruci’s blog has also done a good job covering this story at canada.com.