Category Archives: technology

New Zealand: The World’s Lab for Progressive Tech Legislation?

Cross posted with TechPresident.

One of the nice advantage of having a large world with lots of diverse states is the range of experiments it offers us. Countries (or regions within them) can try out ideas, and if they work, others can copy them!

For example, in the world of drug policy, Portugal effectively decriminalized virtually all drugs. The result has been dramatic. And much of it positive. Some of the changes include a decline in both HIV diagnoses amongst drug users by 17% and drug use among adolescents (13-15 yrs). For those interested you can read more about this in a fantastic report by the Cato Institute written by Glenn Greenwald back in 2009 before he started exposing the unconstitutional and dangerous activities of the NSA. Now some 15 years later there have been increasing demands to decriminalize and even legalize drugs, especially in Latin America. But even the United States is changing, with both the states of Washington and Colorado opting to legalize marijuana. The lessons of Portugal have helped make the case, not by penetrating the public’s imagination per se, but by showing policy elites that decriminalization not only works but it saves lives and saves money. Little Portugal may one day be remembered for changing the world.

I wonder if we might see a similar paper written about New Zealand ten years from now about technology policy. It may be that a number of Kiwis will counter the arguments in this post by exposing all the reasons why I’m wrong (which I’d welcome!) but at a glance, New Zealand would probably be the place I’d send a public servant or politician wanting to know more about how to do technology policy right.

So why is that?

First, for those who missed it, this summer New Zealand banned software patents. This is a stunning and entirely sensible accomplishment. Software patents, and the legal morass and drag on innovation they create, are an enormous problem. The idea that Amazon can patent “1-click” (e.g. the idea that you pre-store someone’s credit card information so they can buy an item with a single click) is, well, a joke. This is a grand innovation that should be protected for years?

And yet, I can’t think of single other OECD member country that is likely to pass similar legislation. This means that it will be up to New Zealand to show that the software world will survive just fine without patents and the economy will not suddenly explode into flames. I also struggle to think of an OECD country where one of the most significant industry groups – the Institute of IT Professionals appeared – would not only both support such a measure but help push its passage:

The nearly unanimous passage of the Bill was also greeted by Institute of IT Professionals (IITP) chief executive Paul Matthews, who congratulated [Commerce Minister] Foss for listening to the IT industry and ensuring that software patents were excluded.

Did I mention that the bill passed almost unanimously?

Second, New Zealanders are further up the learning curve around the dangerous willingness their government – and foreign governments – have for illegally surveilling them online.

The arrest of Kim Dotcom over MegaUpload has sparked some investigations into how closely the country’s police and intelligence services follow the law. (For an excellent timeline of the Kim Dotcom saga, check out this link). This is because Kim Dotcom was illegally spied on by New Zealand’s intelligence services and police force, at the behest of the United States, which is now seeking to extradite him. The arrest and subsequent fall out has piqued public interest and lead to investigations including the Kitteridge report (PDF) which revealed that “as many as 88 individuals have been unlawfully spied on” by the country’s Government Communications Security Bureau.

I wonder if the Snowden documents and subsequent furor probably surprised New Zealanders less than many of their counterparts in other countries since it was less a bombshell than another data point on a trend line.

I don’t want to overplay the impact of the Kim Dotcom scandal. It has not, as far as I can tell, lead to a complete overhaul of the rules that govern intelligence gathering and online security. That said, I suspect, it has created a political climate that amy be more (healthily) distrustful of government intelligence services and the intelligence services of the United States. As a result, it is likely that politicians have been more sensitive to this matter for a year or two longer than elsewhere and that public servants are more accustomed at policies through the lens of its impact on rights and privacy of citizens than in many other countries.

Finally, (and this is somewhat related to the first point) New Zealand has, from what I can tell, a remarkably strong open source community. I’m not sure why this is the case, but suspect that people like Nat Torkington – and open source and open data advocate in New Zealand – and others like him play a role in it. More interestingly, this community has had influence across the political spectrum. The centre left labour party deserves much of the credit for the patent reform while the centre-right New Zealand National Party has embraced both open data. The country was among the first to embrace open source as a viable option when procuring software and in 2003 the government developed an official open source policy to help clear the path for greater use of open source software. This contrasts sharply with my experience in Canada where, as late as 2008, open source was still seen by many government officials as a dangerous (some might say cancerous?) option that needed to be banned and/or killed.

All this is to say that in both the public (e.g. civil society and the private sector) and within government there is greater expertise around thinking about open source solutions and so an ability to ask different questions about intellectual property and definitions of the public good. While I recognize that this exists in many countries now, it has existed longer in New Zealand than in most, which suggests that it enjoys greater acceptance in senior ranks and there is greater experience in thinking about and engaging these perspectives.

I share all this for two reasons:

First, I would keep my eye on New Zealand. This is clearly a place where something is happening in a way that may not be possible in other OECD countries. The small size of its economy (and so relative lack of importance to the major proprietary software vendors) combined with a sufficient policy agreement both among the public and elites enables the country to overcome both internal and external lobbying and pressure that would likely sink similar initiatives elsewhere. And while New Zealand’s influence may be limited, don’t underestimate the power of example. Portugal also has limited influence, but its example has helped show the world that the US -ed narrative on the “war on drugs” can be countered. In many ways this is often how it has to happen. Innovation, particularly in policy, often comes from the margins.

Second, if a policy maker, public servant or politician comes to me and asks me who to talk to around digital policy, I increasingly find myself looking at New Zealand as the place that is the most compelling. I have similar advice for PhD students. Indeed, if what I’m arguing is true, we need research to describe, better than I have, the conditions that lead to this outcome as well as the impact these policies are having on the economy, government and society. Sadly, I have no names to give to those I suggest this idea to, but I figure they’ll find someone in the government to talk to, since, as a bonus to all this, I’ve always found New Zealanders to be exceedingly friendly.

So keep an eye on New Zealand, it could be the place where some of the most progressive technology policies first get experimented with. It would be a shame if no one noticed.

(Again If some New Zealanders want to tell me I’m wrong, please do. Obviously, you know your country better than I do).

Thesis Question Idea: Probing Power & Promotions in the Public Service

Here’s an idea for a PhD candidate out there with some interest in government or HR and some quant skills.

Imagine you could access the a sensible slice of the HR history of a 300,000+ person organization, so you could see when people were promoted and where they moved in the organization?.

I’m not sure if would work, but the Government Electronic Directory Service (GEDS), essentially a “white pages” of Canada’s national government, could prove to be such a dataset. The service is actually designed to let people find one another within government. However, this also means it could potentially allow an someone to track the progress of public servants careers since you can see the different titles an employee enjoys each time they change jobs (and thus get a new title and phone number in GEDS). While not a perfect match, job titles generally map up to pay scales and promotions, likely making it not a prefect, but likely still a good, metric for career trajectory.

The screen shot below is for a random name I tried. I’ve attempted to preserve the privacy of the employee, which, in truth isn’t really necessary, since anyone can access GEDS and so the data isn’t actually private to begin with.

GEDS 2

There are a number of interesting questions I could imagine an engaged researcher could ask with such data. For example, where are the glass ceilings: are there particular senior roles that seem harder for women to get promoted into? Who are the super mentors: is there a manager whose former charges always seem to go on to lofty careers? Are there power cliques: are there super public servants around whom others cluster and whose promotions or career moves are linked? Are there career paths that are more optimal, or suboptimal? Or worse is ones path predetermined early on by where and in what role one enters the public service? And (frighteningly), could you create a predictive algorithm that allowed one to accurately forecast who might be promoted.

These types of questions could be enormously illuminating and shed an important light on how the public service works. Indeed, this data set would not only be important to issues of equity and fairness within the public service, but also around training and education. In many ways, I wish the public service itself would look at this data to learn about itself.

Of course, given that there is not effectively a pan-government HR group (that I’m aware of) it is unlikely that anyone is thinking about the GEDS data in a pan-government and longitudinal way (more likely there are HR groups organized by ministry that just focus on their ministry’s employees). All this, in my mind, would make this research in an academic institution all the more important.

I’m sure there are probably fears that would drive opposition to this. Privacy is an obvious one (this is why I’m saying an academic, or the government itself, should do this). Another might be lawsuits. Suppose such a study did discover institutional sexism? Or that some other group of people were disproportionally passed over for roles in a way that suggested unfair treatment. If this hypothetical study were able to quantify this discrimination in a new way, could it then be used to support lawsuits? I’ve no idea. Nor do I think I care. I’d rather have a government that was leveraging its valuable talent in the most equitable and effective way then one that stayed blind to understanding itself in order to avoid a possible lawsuit.

The big if of course, is whether snapshots of the GEDS database have been saved over the years, either on purpose or inadvertently (backups?). It is also possible that some geek somewhere has been scrapping GEDS on a nightly, weekly or monthly basis. The second big if is, would anyone be willing to hand the data over? I’d like to think that the answer would be yes, particularly for an academic whose proposal had been successfully vetted by an Institutional Review Board.

If anyone ever decides to pursue this, I’d be happy to talk to you more about ideas I have. Also, I suspect there may be other levels of government that similar applications. Maybe this would work easier on a smaller scale.

Announcing the 311 Data Challenge, soon to be launched on Kaggle

The Kaggle – SeeClickFix – Eaves.ca 311 Data Challenge. Coming Soon.

I’m pleased to share that, in conjunction with SeeClickFix and Kaggle I’ll be sponsoring a predictive data competition using 311 data from four different cities. My hope is that – if we can demonstrate that there are some predictive and socially valuable insights to be gained from this data – we might be able to persuade cities to try to work together to share data insights and help everyone become more efficient, address social inequities and address other city problems 311 data might enable us to explore.

Here’s the backstory and some details in anticipation of the formal launch:

The Story

Several months back Anthony Goldbloom, the founder and CEO of Kaggle – a predictive data competition firm – approached me asking if I could think of something interesting that could be done in the municipal space around open data. Anthony generously offered to waive all of Kaggle’s normal fees if I could come up with a compelling contest.

After playing around with some ideas I reached out to Ben Berkowitz, co-founder of SeeClickFix (one of the world’s largest implementers of the Open311 standard) and asked him if we could persuade some of the cities they work for to share their data for a competition.

Thanks to the hard work of Will Cukierski at Kaggle as well as the team at SeeClickFix we were ultimately able to generate a consistent data set with 300,000 lines of data involving 311 issues spanning 4 cities across the United States.

In addition, while we hoped many of who might choose to participate in a municipal open data challenge would do so out curiosity or desire to better understand how cities work, both myself and SeeClickFix agreed to collectively put up $5000 in prize money to help raise awareness about the competition and hopefully stoke some media (as well as broader participant) interest.

The Goal

The goal of the competition will be to predict the number of votes, comments and views an issue is likely to generate. To be clear, this is not a prediction that is going to radically alter how cities work, but it could be a genuinely useful to communications departments, helping them predict problems that are particularly thorny or worthy proactively communicating to residents about. In addition – and this remains unclear – my own hope is that it could help us understand discrepancies in how different socio-economic or other groups use online 311 and so enable city officials to more effectively respond to complaints from marginalized communities.

In addition there will be a smaller competition around visualization the data.

The Bigger Goal

There is, however, for me, a potentially bigger goal. To date, as far as I know, predictive algorithms of 311 data have only ever been attempted within a city, not across cities. At a minimum it has not been attempted in a way in which the results are public and become a public asset.

So while the specific problem  this contest addresses is relatively humble, I’d see it as a creating a larger opportunity for academics, researchers, data scientists, and curious participants to figure out if can we develop predictive algorithms that work for multiple cities. Because if we can, then these algorithms could be a shared common asset. Each algorithm would become a tool for not just one housing non-profit, or city program but a tool for all sufficiently similar non-profits or city programs. This could be exceptionally promising – as well as potentially reveal new behavioral or incentive risks that would need to be thought about.

Of course, discovering that every city is unique and that work is not easily transferable, or that predictive models cluster by city size, or by weather, or by some other variable is also valuable, as this would help us understand what types of investments can be made in civic analytics and what the limits of a potential commons might be.

So be sure to keep an eye on the Kaggle page (I’ll link to it) as this contest will be launching soon.

Beyond Property Rights: Thinking About Moral Definitions of Openness

“The more you move to the right the more radical you are. Because everywhere on the left you actually have to educate people about the law, which is currently unfair to the user, before you even introduce them to the alternatives. You aren’t even challenging the injustice in the law! On the right you are operating at a level that is liberated from identity and accountability. You are hacking identity.” – Sunil Abraham

I have a new piece up on TechPresident titled: Beyond Property Rights: Thinking About Moral Definitions of Openness.

This piece, as the really fun map I recreated is based on a conversation with Sunil Abraham (@sunil_abraham), the Executive Director of the Centre for Internet and Society in Bangalore.

If you find this map interesting… check the piece out here.

map of open

 

Some thoughts on the relaunched data.gc.ca

Yesterday, I talked about what I thought was the real story that got missed in the fanfare surrounding the relaunch of data.gc.ca. Today I’ll talk about the new data.gc.ca itself.

Before I begin, there is an important disclaimer to share (to be open!). Earlier this year Treasury Board asked me to chair five public consultations across Canada to gather feedback on both its open data program and data.gc.ca in particular. As such, I solicited peoples suggestions on how data.gc.ca could be improved – as well as shared my own – but I was not involved in the creation of data.gc.ca. Indeed the first time I saw the site was on Tuesday when it launched. My role was merely to gather feedback. For those curious you can read the report I wrote here

There is, I’m happy to say, much to commend about the new open data portal. Of course, aesthetically, it is much easier on the eye, but this is really trivial compared to a number of other changes.

The most important shift relates to the desire of the site to foster community. Users can now register with the site as well as rate and comment on data sets. There are also places like the Developers’ Corner which contains documentation that potential users might find helpful and a sort of app store where government agencies and citizens can posts applications they have created. This shift mirrors the evolution of data.govdata.gov.uk and DataBC which started out as data repositories but sought to foster and nurture a community of data users. The critical piece here is that simply creating the functionality will probably not be sufficient, in the US, UK and BC it has required dedicated community managers/engagers to help foster such a community. At present it is unclear if that exists behind the website at data.gc.ca.

The other two noteworthy improvements to the site are an improved search and the availability of API’s. While not perfect, the improved search is nonetheless helpful as previously it was basically impossible to find anything on the site. Today a search for “border time” and a border wait time data set is the top result. However, search for “border wait times” and “Biogeochemical exploration using Douglas-fir tree tops in the Mabel Lake area, southern British Columbia (NTS 82L09 and 10)” becomes the top hit with actual border wait time data set pushed down to fifth. That said the search is still a vast improvement and this alone could be a boon to policy wonks, researchers and developers who elect to make use of the site.

The introduction of APIs is another interesting development. For the uninitiated an API (application programming interface) provides continuous access to updated data, so rather than downloading a file, it is more like you are plugging into a socket that delivers data, rather than electricity. The aforementioned border wait time data set is a fantastic example. It is less of a “data set” than of a “data stream” providing the most recent updates of border wait times, like what you would see on the big signs across the highway as you approach the border. By providing it through the open data site it would not, for example, be impossible for Google Maps to scan this data set daily, understand how border wait times fluctuate and incorporate these delays in its predicted travel times. Indeed, it could even querry the API  in real time and tell you how long it will take to drive from Vancouver to Seattle, with border delays taken into account. The opportunity for developers and, equally intriguing, government employees and contractors, to build applications a top of these APIs is, in my mind, quite exciting. It is a much, much cheaper and flexible approach than how a lot of government software is currently built.

I also welcome the addition of the ability to search Access to Information (ATIP) requests summaries. That said, I’d like for there to be more than just the summaries, that actually responses would be nice, particularly given that ATIP requests likely represent information people have identified as important. In addition, the tool for exploring government expenditures is interesting, but it is weirdly more notable because, as far as I can tell, none of the data displayed in the tool can be downloaded, meaning it is not very open.

Finally, I will briefly note that the license is another welcome change. For more on that I recommend checking out Teresa Scassa’s blog post on it. Contrary to my above disclaimer I have been more active on this side of things, and hope to have more to share on that another time.

I’m sure, as I and others explore the site in the coming days we will discover more to like and dislike about it, but it is a helpful step forward and another signal that open data is, slowly, being baked into the public service as a core service.

 

The Real News Story about the Relaunch of data.gc.ca

As many of my open data friends know, yesterday the government launched its new open data portal to great fanfare. While there is much to talk about there – something I will dive into tomorrow – that was not the only thing that happened yesterday.

Indeed, I did a lot of media yesterday between flights and only after it was over did I notice that virtually all the questions focused on the relaunch of data.gc.ca. Yet it is increasingly clear that for me, the much, much bigger story of the portal relaunch was the Prime Minister announcing that Canada would adopt the Open Data Charter.

In other words, Canada just announced that it is moving towards making all government data open by default. Moreover, it even made commitments to make specific “high value” data sets open in the next couple of years.

As an aside, I don’t think the Prime Minister’s office has ever mentioned open data – as far as I can remember, so that was interesting in of itself. But what is still more interesting is what the Prime Minister committed Canada to. The open data charter commits the government to make data open by default as well as four other principles including:

  • Quality and Quantity
  • Useable by All
  • Releasing Data for Improved Governance
  • Releasing Data for Innovation

In some ways Canada has effectively agreed to implement the equivalent to Presidential Executive Order on Open Data the White House announced last month (and that I analyzed in this blog post). Indeed, the charter is more aggressive than the executive order since it goes on to layout the need to open up not just future data, but also current “high value” data sets. Included among these are data sets the Open Knowledge Foundation has been seeking to get opened via its open data census, as well as some data sets I and many others have argued should be made open, such as the company/business register. Other suggested high value data sets include data on crime, school performance, energy and environment pollution levels, energy consumption, government contracts, national budgets, health prescription data and many, many others. Also included on the list… postcodes – something we are presently struggling with here in Canada.

But the charter wasn’t all the government committed to. The final G8 communique contained many interesting tidbits that again, highlighted commitments to open up data and adhere to international data schemas.

Among these were:

  • Corporate Registry Data: There was a very interesting section on “Transparency of companies and legal arrangements” which is essentially on sharing data about who owns companies. As an advisory board member to OpenCorporates, this was music to my ears. However, the federal government already does this, the much, much bigger problem is with the provinces, like BC and Quebec that make it difficult or expensive to access this data.
  • Extractive Industries Transparency Initiative: A commitment that “Canada will launch consultations with stakeholders across Canada with a view to developing an equivalent mandatory reporting regime for extractive companies within the next two years.” This is something I fought to get included into our OGP commitment two years ago but failed to succeed at. Again, I’m thrilled to see this appear in the communique and look forward to the government’s action.
  • International Aid Transparency Initiative (IATI) and Busan Common Standard on Aid Transparency,: A commitment to make aid data more transparent and downloadable by 2015. Indeed, with all the G8 countries agreed to taking this step it may be possible to get greater transparency around who is spending what money, where on aid. This could help identify duplication as well as in assessments around effectiveness. Given how precious aid dollars are, this is a very welcome development. (h/t Michael Roberts of Acclar.org)

So lots of commitments, some on the more vague side (the open data charter) but some very explicit and precise. And that is the real story of yesterday, not that the country has a new open data portal, but that a lot more data is likely going to get put into that portal over then next 2-5 years. And a tsunami of data could end up in it over the next 10-25 years. Indeed, so much data, that I suspect a portal will no longer be a logical way to share it all.

And therein lies the deeper business and government story in all this. As I mentioned in my analysis of the White House Executive Order that made open data default, the big change here is in procurement. If implemented, this could have a dramatic impact on vendors and suppliers of equipement and computers that collect and store data for the government. Many vendors try to find ways to make their data difficult to export and share so as to lock the government in to their solution. Again, if (and this is a big if) the charter is implemented it will hopefully require a lot of companies to rethink what they offer to government. This is a potentially huge story as it could disrupt incumbents and lead to either big reductions in the costs of procurement (if done right) or big increases and the establishment of the same, or new, impossible to work with incumbents (if done incorrectly).

There is potentially a tremendous amount at stake in how the government handles the procurement side of all this, because whether it realizes it or not, it may have just completely shaken up the IT industry that serves it.

 

Postscript: One thing I found interesting about the G8 communique was how many times commitments about open data and open data sets occurred in the section that had nothing to do with open data. Will be interesting if that is a trend that continues at the next G8 meeting. Indeed, I wouldn’t be surprised is a specific open data section disappears and instead these references just become part of various issue related commitments.

 

 

 

What Traffic Lights Say About the Future of Regulation

I have a piece up on TechPresident about some crazy regulations that took place in Florida that put citizens at greater risk all so the state and local governments can make more money.

Here’s a chunk:

In effect, what the state of Florida is saying is that a $20 million increase in revenue is worth an increase in risk of property damage, injury and death as a result of increased accidents. Based onnational statistics, there are likely about 62 deaths and 5,580 injuries caused by red light running in Florida each year. If shorter yellow lights increased that rate by 10 percent (far less than predicted by the USDOT) that could mean an additional 6 deaths and 560 injuries. Essentially the state will raise a measly extra $35,000 for each injury or death its regulations help to cause, and possibly far less.

The Past, Present and Future of Sensor Journalism

This weekend I had the pleasure of being invited to the Tow Centre for Digital Journalism at the Columbia Journalism School for a workshop on sensor journalism.

The workshop (hashtag #towsenses) brought together a “community of journalists, hackers, makers, academics and researchers to explore the use of sensors in journalism; a crucial source of information for investigative and data journalists.” And, it was fascinating to talk about what role sensors – from the Air Quality Egg to aerial drones – should, could or might play in journalism. Even more fun with a room full of DIYers, academics and journalists with interesting titles such as “applications division manager” or “data journalist.” Most fascinating was a panel on the ethics of sensors in journalism of which I hope to write about another time.

There is, of course, a desire to treat sensors as something new in journalism. And for good reason. Much like I’m sure there were early adopters of camera’s in the newsroom, cameras probably didn’t radically change the newsroom until they were (relatively) cheap, portable and gave you something your audience wanted. Today we may be experiencing something similar with sensors. The costs of creating sophisticated sensors is falling and/or other objects, like our cell phones, can be repurposed to be sensors. The question is… like cameras’ how can the emergence of sensors help journalists? And how might they distract them?

My point is, well, they already do sensor journalism. Indeed, I’d argue that somewhere between 5-15% of many news broadcasts are consumed with sensor journalism. At the very minimum the weather report is a form of sensor journalism. The meteorological group is a part of the news media organization that is completely reliant on sensors to provide it with information which it must analyze and turn into relevant information for its audience. And it is a very specific piece of knowledge that matters to the audience. They are not asking for how the weather came about, but merely and accurate prediction of what the weather will be. For good or (as I feel) for ill, there is not a lot of discussions about climate change on the 6 o’clock news weather report. (As an aside Clay Johnson cleverly pointed out that weather data may also be the government’s oldest, most mature and economically impactful open data set).

Of course weather data is not the only form of sensor journalism going on on a daily basis. Traffic reports frequently rely on sensors, from traffic counting devices to permanently mounted visual sensors (cameras!) that allow one to count, measure, and even model and predict traffic. There may still be others.

So there are already some (small) parts of the journalism world that are dependent on sensors. Of course, some of you may not consider traffic reports and weather reports to be journalism since it is not, well, investigative journalism. But these services are important, have tended to be part of news gathering organizations and are in constant demand by consumers. And while demand may not always the most important metric, it is an indication that this matters to people. My broader point here is that, there is part of the media community that is used to dealing with a type of sensor journalism. Yes, it has low ethical risk (we aren’t pointing these sensors at humans really) but it does mean there are policies, processes, methodologies and practices for thinking about sensors that may exist in news organizations, if not in the newsroom.

It is also a window in the the types of stories that sensors have, at least in the past, been good at helping out with. Specifically there seem to be two criteria: things that both occur at, and that a large number of people want to know about at, a high frequency. Both weather and traffic fit the bill, lots of people want to know about them, often twice a day, if not more frequently. So it might be worth thinking about, what are the other types of issues or problems that interest journalist that do, or could conform, with that criteria? In addition, if we are able to lower the cost of gathering and analyzing the data, does it become feasible, or profitable to serve smaller, niche audiences?

None of this is to say that sensors can’t, won’t or shouldn’t be used to cover investigative journalism projects. The work Public Labs did in helping map the extent of the oil spill along the gulf coast is a fantastic example of where sensors may be critical in journalism (as well as advocacy and evidence building) as has been the example of groups like Safecast and others who monitored radioactivity levels in Japan after the  Fukushima disaster. Indeed I think the possibilities of sensors in investigative journalism are both intriguing, and potentially very, very bright. I just love for us to build off of work that is already being done – even if it is in the (journalistically) mundane space of traffic and weather rather than imagine we are beginning with an entirely blank slate.

 

 

 

The Value of Open Data – Don’t Measure Growth, Measure Destruction

Alexander Howard – who, in my mind, is the best guy covering the Gov 2.0 space – pinged me the other night to ask “What’s the best evidence of open data leading to economic outcomes that you’ve seen?”

I’d like to hack the question because – I suspect – for many people, they will be looking to measure “economic outcomes” in ways that I don’t think will be so narrow as to be helpful. For example, if you are wondering what the big companies are going to be that come out of the open data movement and/or what are the big savings that are going to be found by government via sifting through the data, I think you are probably looking for the wrong indicators.

Why? Part of it is because the number of “big” examples is going to be small.

It’s not that I don’t think there won’t be any. For example several years ago I blogged about how FOIed (or, in Canada ATIPed) data that should have been open helped find $3.2B in evaded tax revenues channeled through illegal charities. It’s just that this is probably not where the wins will initially take place.

This is in part because most data for which there was likely to be an obvious and large economic impact (eg spawning a big company or saving a government millions) will have already been analyzed or sold by governments before the open data movement came along. On the analysis side of the question- if you are very confident a data set could yield tens or hundreds of millions in savings… well… you were probably willing to pay SAS or some other analytics firm 30-100K to analyze it. And you were probably willing to pay SAP a couple of million (a year?) to set up the infrastructure to just gather the data.

Meanwhile, on the “private sector company” side of the equation – if that data had value, there were probably eager buyers. In Canada for example, interest in census data – to help with planning where to locate stores or how to engage in marketing and advertising effectively – was sold because the private sector made it clear they were willing to pay to gain access to it. (Sadly, this was bad news for academics, non-profits and everybody else, for whom it should have been free, as it was in the US).

So my point is, that a great deal of the (again) obvious low hanging fruit has probably been picked long before the open data movement showed up, because governments – or companies – were willing to invest some modest amounts to create the benefits that picking those fruit would yield.

This is not to say I don’t think there are diamonds in the rough out there – data sets that will reveal significant savings – but I doubt they will be obvious or easy finds. Nor do I think that billion dollar companies are going to spring up around open datasets over night since –  by definition – open data has low barriers to entry to any company that adds value to them. One should remember it took Red Hat two decades to become a billion dollar company. Impressive, but it is still a tiny compared to many of its rivals.

And that is my main point.

The real impact of open data will likely not be in the economic wealth it generates, but rather in its destructive power. I think the real impact of open data is going to be in the value it destroys and so in the capital it frees up to do other things. Much like Red Hat is fraction of the size of Microsoft, Open Data is going to enable new players to disrupt established data players.

What do I mean by this?

Take SeeClickFix. Here is a company that, leveraging the Open311 standard, is able to provide many cities with a 311 solution that works pretty much out of the box. 20 years ago, this was a $10 million+ problem for a major city to solve, and wasn’t even something a small city could consider adopting – it was just prohibitively expensive. Today, SeeClickFix takes what was a 7 or 8 digit problem, and makes it a 5 or 6 digit problem. Indeed, I suspect SeeClickFix almost works better in a small to mid-sized government that doesn’t have complex work order software and so can just use SeeClickFix as a general solution. For this part of the market, it has crushed the cost out of implementing a solution.

Another example. And one I’m most excited. Look at CKAN and Socrata. Most people believe these are open data portal solutions. That is a mistake. These are data management companies that happen to have simply made “sharing (or “open”) a core design feature. You know who does data management? SAP. What Socrata and CKAN offer is a way to store, access, share and engage with data previously gathered and held by companies like SAP at a fraction of the cost. A SAP implementation is a 7 or 8 (or god forbid, 9) digit problem. And many city IT managers complain that doing anything with data stored in SAP takes time and it takes money. CKAN and Socrata may have only a fraction of the features, but they are dead simple to use, and make it dead simple to extract and share data. More importantly they make these costly 7 and 8 digital problems potentially become cheap 5 or 6 digit problems.

On the analysis side, again, I do hope there will be big wins – but what I really think open data is going to do is lower the costs of creating lots of small wins – crazy numbers of tiny efficiencies. If SAP and SAS were about solving the 5 problems that could create 10s of millions in operational savings for governments and companies then Socrata, CKAN and the open data movement is about finding the 1000 problems for which you can save between $20,000 and $1M in savings. For example, when you look at the work that Michael Flowers is doing in NYC, his analytics team is going to transform New York City’s budget. They aren’t finding $30 million dollars in operational savings, but they are generating a steady stream of very solid 6 to low 7 digit savings, project after project. (this is to say nothing of the lives they help save with their work on ambulances and fire safety inspections). Cumulatively  over time, these savings are going to add up to a lot. But there probably isn’t going to be a big bang. Rather, we are getting into the long tail of savings. Lots and lots of small stuff… that is going to add up to a very big number, while no one is looking.

So when I look at open data, yes, I think there is economic value. Lots and lots of economic value. Hell, tons of it.

But it isn’t necessarily going to happen in a big bang, and it may take place in the creative destruction it fosters and so the capital it frees up to spend on other things. That may make it potentially harder to measure (I’m hoping some economist much smarter than me is going tell me I’m wrong about that) but that’s what I think the change will look like.

Don’t look for the big bang, and don’t measure the growth in spending or new jobs. Rather let’s try to measure the destruction and cumulative impact of a thousand tiny wins. Cause that is where I think we’ll see it most.

Postscript: Apologies again for any typos – it’s late and I’m just desperate to get this out while it is burning in my brain. And thank you Alex for forcing me to put into words something I’ve been thinking about saying for months.

 

Canada Post and the War on Open Data, Innovation & Common Sense (continued, sadly)

Almost exactly a year ago I wrote a blog post on Canada Post’s War on the 21st Century, Innovation & Productivity. In it I highlighted how Canada Post launched a lawsuit against a company – Geocoder.ca – that recreates the postal code database via crowdsourcing. Canada Posts case was never strong, but then, that was not their goal. As a large, tax payer backed company the point wasn’t to be right, it was to use the law as a way to financial bankrupt a small innovator.

This case matters – especially to small start ups and non-profits. Open North – a non-profit on which I sit on the board of directors – recently explored what it would cost to use Canada Posts postal code data base on represent.opennorth.ca, a website that helps identify elected officials who serve a given address. The cost? $9,000 a year, nothing near what it could afford.

But that’s not it. There are several non-profits that use Represent to help inform donors and other users of their website about which elected officials represent geographies where they advocate for change. The licensing cost if you include all of these non-profits and academic groups? $50,000 a year.

This is not a trivial sum, and it is very significant for non-profits and academics. It is also a window into why Canada Post is trying to sue Geocoder.ca – which offers a version of its database for… free. That a private company can offers a similar service at a fraction of the cost (or for nothing) is, of couse, a threat.

Sadly, I wish I could report good news on the one year anniversary of the case. Indeed, I should be!

This is because what should have been the most important development was how the Federal Court of Appeal made it even more clear that data cannot be copyrighted. This probably made it Canada Post’s lawyers that they were not going to win and made it even more obvious to us in the public that the lawsuit against geocoder.ca – which has not been dropped-  was completely frivolous.

Sadly, Canada Post reaction to this erosion of its position was not to back off, but to double down. Recognizing that they likely won’t win a copyright case over postal code data, they have decided:

a) to assert that they hold trademark on the words ‘postal code’

b) to name Ervin Ruci – the opertator of Geocoder.ca – as a defendent in the case, as opposed to just his company.

The second part shows just how vindictive Canada Post’s lawyers are, and reveals the true nature of this lawsuit. This is not about protecting trademark. This is about sending a message about legal costs and fees. This is a predatory lawsuit, funded by you, the tax payer.

But part a is also sad. Having seen the writing on the wall around its capacity to win the case around data, Canada Post is suddenly decided – 88 years after it first started using “Postal Zones” and 43 years after it started using “Postal Codes” to assert a trade mark on the term? (You can read more on the history of postal codes in canada here).

Moreover the legal implications if Canada Post actually won the case would be fascinating. It is unclear that anyone would be allowed to solicit anybody’s postal code – at least if they mentioned the term “postal code” – on any form or website without Canada Posts express permission. It leads one to ask. Does the federal government have Canada Post’s express permission to solicit postal code information on tax forms? On Passport renewal forms? On any form they have ever published? Because if not, they are, I understand Canada Posts claim correctly, in violation of Canada Post trademark.

Given the current government’s goal to increase the use of government data and spur innovation, will they finally intervene in what is an absurd case that Canada Post cannot win, that is using tax payer dollars to snuff out innovators, increases the costs of academics to do geospatial oriented social research and that creates a great deal of uncertainty about how anyone online be they non-profits, companies, academics, or governments, can use postal codes.

I know of no other country in the world that has to deal with this kind of behaviour from their postal service. The United Kingdom compelled its postal service to make postal code information public years ago.In Canada, we handle the same situation by letting a tax payer subsidized monopoly hire expensive lawyers to launch frivolous lawsuits against innovators who are not breaking the law.

That is pretty telling.

You can read more about this this, and see the legal documents on Ervin Ruci’s blog has also done a good job covering this story at canada.com.