Category Archives: public policy

Depression and Decline: American Irresponsibility is Ending the American Era with a Bang

Despite the assurances of US Treasury Secretary Timothy Geithner it is increasingly likely there will be no debt deal. The United States is going to default on its debt. I know it sounds crazy, but I believe it is going to happen. If it does, this is the black swan event no one imagined or was prepared to contemplate. Its impacts are going to be significant. Possibly immeasurable.

For history, August 2nd, 2011 could end up marking the end of the American Era. Sadly, it will not have been inevitable, it will have been entirely self-inflicted and it may now be irreversible. Even if an agreement is reached tomorrow I suspect the world will increasingly be unwilling to entrust the role of global financial system caretaker to the United States. The world has lost faith in America. And why not. Its Congress has demonstrated that it can no longer be trusted with the responsibility of global financial management. Indeed, even its closest allies have had their confidence shaken.

The economic and geopolitical ramifications of this outcome cannot be underestimated.

Economically, we may now be closer to a global depression than at anytime since 1930s. For all the talk of the financial crises being a near miss, this could potentially be much, much worse, simply because the consequences fall outside our predictive models.

What is clear is that America is trapped. In the short term spending less will devastate its population. Today more Americans (18.1%) than ever use food stamps. It takes American workers 40 weeks (and rising) to find a job, twice as long than in any previous recession. 1 in every 6 Americans use Medicaid. Any cuts to these services will have an immediate and harsh affect on the quality of life of a huge number of Americans.

Longer term, America cannot restart its economy. Already the top 5% of Americans by income account for 37% of all consumer outlays. This is unsurprising given the top 5% of Americans account for 34.7% of all income. This is similar to 1929 when the top 5% accounted for the top third of all personal income. This is precisely the type of economic structure that Kenneth Galbraith argues in The Great Crash, 1929, transformed the great crash into the great depression. Rather than being able to rely on a broad consumer base to power economic growth, the United States then (as now) was dependent on a high level of investment and luxury consumer spending driven by a small elite. The crash caused that elite to seize up, leaving the American economy paralyzed.

In other words, the Bush Tax cuts may have killed the US economically, and possibly geopolitical. By killing the surpluses they have broken the US treasury. By radically curtailing wealth redistribution they have fatally eroded the capacity of the US domestic economy to power new growth. Combine this with two wars that have sapped trillions of taxpayer dollars, and it is hard not to see a United States more ill prepared than at any time in its history to deal with an economic crisis. The only question that may remain is how much of the rest of the world it drags down with it.

Of course economic decline could become a leading indicator for political decline.

When I arrived to grad school in 1998 to study international relations the field had spent much of the previous decade grappling with the issue of American decline. Books like The Rise and Fall of Great Powers and Lester Thurow’s Head to Head seemed to suggest that economically and militarily, the United States was in, at the very least, relative decline as a the world’s leading power.

But then the successes of the US economy – coupled with the turn around in the size of the US government’s debt – meant that as a peer, China felt a long way off while Brazil and India seemed more distant still. Europe was too old, disorganized and unambitious to matter. Russia, was fading quickly from the scene. Suddenly decline theory was, itself in decline.

But today the writings of Kennedy feel even more urgent. America, with or without a raised debt ceiling, cannot afford its empire, or the means to protect it. It may be able to find allies to help shoulder the burden – today the central challenge of 21st century geopolitics is the integration of India into the Western Alliance, something that proceeds apace. But if it defaults (and maybe even if it does not) it’s capacity to raise money at a reasonable rate should a major conflict arise, may be compromised. War, for America, is going to get more expensive because investors may be more nervous.

I want to clearly state that I don’t write any of this with any glee. Leftish non-americans who relish a world without the US hegemony should look at the what the period after Britain’s decline, or any period of hegemonic decline. They generally aren’t pretty. Indeed, they are often unstable, violent and nasty. Not something any country should wish for, especially smaller countries (such as my own – Canada). Moreover, while there is no immediate peer that could take America’s place, it isn’t clear that the most likely candidate – China – is one that most people would feel more comfortable with. Be careful what you wish for.

I hope that I’m wrong. I hope a deal will be reached. And that if it is, or if it isn’t, the impact on the markets will be minimal or non-existent. Or maybe, I just need to have more confidence in what I have often tell others: do not to underestimate America. As Sir Winston Churchill famously noted: “Americans can always be counted on to do the right thing…after they have exhausted all other possibilities.” And maybe they’ll have enough time to boot.

But I genuinely fear that in the haze of summer this crisis, as much as it has spurred some scary headlines, remains a sleeper. That we are confronting the mother of all black swans, and that a period of financial turmoil that will make the last two years look like a merry ride, could be upon us. Worse, that that financial turmoil will lead to other, great military and/or political turmoil.

These are scary times.

I can honestly say I never written a blog post that I hope I’m more wrong about.

Update: The Atlantic has a great article worth reading about the origins of the deficit published later this morning that includes a reference this fantastic graph from a few months ago.

The State of Open Data Licenses in Canada and where to go from here

11 Replies

(for readers less interested in Open Data – I promise something different tomorrow)

In February I wrote how 2011 would be the year of the license for Canada’s open data community. This has indeed been the case. For public servants and politicians overseeing the various open data projects happening in Canada and around the world, here is an outline of where we are, and what I hope will happen next. For citizens I hope this will serve as a primer and help explain why this matters. For non-Canadians, I hope this can help you strategize how to deal with the different levels of government in your own country.

This is important stuff, and will be important to ensure success in the next open data challenge: aligning different jurisdictions around common standards.

Why Licenses Matter

Licenses matter because they determine how you are able to use government data – a public asset. As I outlined in the three laws of open data, data is only open if it can be found, be played with and be shared. The license deals with the last of these. If you are able to take government data, find some flaw or use it to improve a service, it means nothing if you are not able to share what you create with others. The more freedom you have in doing this, the better.

What we want from the license regime (and for your government)

There are a couple of interests one is trying to balance in creating a license regime. You want:

Open: there should maximum freedom for reuse (see above, and this blog post)
Secure: it offers governments appropriate protections for privacy and security
Simplicity: to keep down legal costs low, and make it easier for everyone to understand
Standardized: so my work is accessible across jurisdictions
Stable: so I know that the government won’t change the rules on me

At the moment, two licenses in Canada meet these tests. The Public Domain Dedication and License (PDDL) used by Surrey, Langley, Winnipeg (for its transit data) and the BC government open data portal license (which is a copy of the UK Open Government license).

Presently a bunch of licenses do not. This includes the Government of Canada Open Data Licence Agreement for Unrestricted Use of Canada’s Data (couldn’t they choose a better name? But for a real critique of why, read this blog post). It also includes the variants of the license created by Vancouver and now used by Toronto, Ottawa and Edmonton (among others). Full disclosure, I was peripherally involved in the creation of this license – it was necessary at the time.

Both these licenses are not standardized, have restrictions in them not found in the UK/BC Open Government License and the PDDL and are anything but simple. Nor are they stable. At any time the government can revoke them. In other words, many developers and companies interested in open data dislike them immensely.

Where do we go from here?

At the moment there are a range of licenses available in Canada – this undermines the ability of developers to create software that uses open data across multiple jurisdictions.

First, the launch of BC’s open data portal and its use of the UK Open Government License has reset the debate in this country. The Federal government, which has an awkward, onerous and unloved license should stop trying to create a new license that simply adds unnecessary complexity and creates confusion for software developers. (I detail the voluminous problems with the Federal license here.)

Instead the Feds should adopt the UK Open Government Licence and push for it to be a standard, both for the provinces and federal government agencies, as well as for other common wealth countries. Its refusal to adopt the UK license is deeply puzzling. It has offered no explanation about why it can’t, indeed, it would be interesting to hear what the Federal Government believes it knows that the UK government (which has been doing this for much longer) and the BC government doesn’t know.

What I predict will happen is that more and more provinces will adopt the UK license and increasingly the Feds will look isolated and ridiculous. Barring some explanation, this silliness should end.

At the municipal level, things are more complicated. If you look at the open data portals of Vancouver, Toronto, Edmonton and Ottawa (sometimes referred to as the G4) you’ll notice each has a similar paragraph:

The Cities of Vancouver, Edmonton, Ottawa and Toronto have recently joined forces to collaborate on an “Open Data Framework”. The project aims to enhance current open data initiatives in the areas of data standards and terms of use agreements. Please contact us for further information.

This paragraph has been sitting on these sites for well over a year now (approaching almost two years) but in terms of data standards and common terms of use the work, to date, the G4 has produced nothing tangible for end users. (Full disclosure, I have sat in on some of these meetings.) The G4 cities, which were leaders, are now languishing with a license that actually puts them in the middle, not the front of the pack. They remain ahead of the bulk of Canadian cities that have no open data, but, in terms of license, behind the aforementioned cities of Surrey, Langley, Winnipeg (for its transit data).

These second generation open data cities either had fewer resources or drew the right lessons and have leap-frogged the G4 cities by adopting the PDDL – something they did because it essentially outsourced the management of the license to a competent third party. It maximized the effectiveness of their data, while limiting their costs all while giving them the same level of protection.

The UK and BC versions of the Open Government License could work for the cities, but the PDDL is a better license. Also, it is well managed. If the cities were to adopt the OGL it wouldn’t be the end of the world but it also isn’t necessary. It probably makes more sense for them to simply follow the new leaders in the space and adopt the PDDL as this will less restrictive and easier to adopt.

Thus, speaking personally, the ideal situation in Canada would be that:

the Federal and Provincial Governments to adopt the UK/BC Open Government License. I’d love to live in a world where the adopted the PDDL, but my conversations with them lead me to believe this simply is not likely in the near to mid term. I think 99% of software developers out there will agree that the Open Government License is an acceptable substitute. and
the municipalities push to adopt the PDDL. Already several municipalities have done this and the world has not ended. The bar has been set.

The worse outcome would be:

the G4 municipalities invent some new license. The last thing the world needs is another open data license to confuse users and increase legal costs.
the federal government continues along the path of evolving its own license. Its license was born broken and is unnecessary.

Sadly, I see little evidence for optimism at the federal level. However, I’m optimistic about the cities and provinces. The fact that most new open data portals at the municipal level have adopted the PDDL suggests that many in these governments “get it”. I also think the launch of data.gov.bc.ca will spur other provinces to be intelligent about their license choice.

Province of BC launches Open Data Catalog: What works

9 Replies

As revealed yesterday, the province of British Columbia became the first provincial government in Canada to launch an open data portal.

It’s still early but here are some things that I think they’ve gotten right.

1. License: Getting it Right (part 1)

Before anything else happens, this is probably the single biggest good news story for Canadians interested in the opportunities around open data. If the license is broken, it pretty much doesn’t matter how good the data is, it essential gets put in a legal straightjacket and cannot be used. For BC open data portal this happily, is not the case.

There’s actually two good news stories here.

The first is that the license is good. Obviously my preference would be for everything to be unlicensed and in the public domain as it is in the United States. Short of that however, the most progressive license out there is the UK Government’s Open Government License for Public Sector Information. Happily the BC government has essentially copied it. This means that many of that BC’s open data can be used for commercial purposes, political advocacy, personal use and so forth. In short the restrictions are minimal and, I believe, acceptable. The license addresses the concerns I raised back in March when I said 2011 would be the year of Open Data licenses in Canada.

2. License: The Virtuous Convergence (part 2)

The other great thing is that this is a standardized license. The BC government didn’t invent something new they copied something that already worked. This is music to the ears of many as it means applications and analysis developed in British Columbia can be ported to other jurisdictions that use the same license seamlessly. At the moment, that means all of the United Kingdom. There has been some talk of making the UK Open Government Licenses (OGL) a standard that can be used across the commonwealth – that, in my mind, would be a fantastic outcome.

My hope is that this will also put pressure on other jurisdictions to either improve their licenses or converge them with BC/UK or adopt a better license still. With the exception of the City of Surrey, which uses the PDDL license, the BC government’s license far superior to the licenses being used by other jurisdictions: – the municipal licenses based on Vancouver’s license (used by Vancouver, Edmonton, Ottawa, Toronto and a few others) and the Federal Government’s open data license (used by Treasury Board and CIDA) are both much more restrictive. Indeed, my real hope is that BC’s move will snap the Federal Government out of their funk, make them realize their own licenses are confusing, problematic and a waste of time, and encourage them to contribute to making the UK’s OGL a new standard for all of Canada. It would be much better than what they have on offer.

3. Tools for non-developers

Another nice thing about the data.gov.bc.ca website is that it provides tools for non-developers, so that they can play with, and learn from, some of the data. This is, of course, standard fare on most newer open data portals – indeed, it’s seems to be the primary focus on Socrata, a company that specializes in creating open government data portals. The goal everywhere is to increase the number of people who can make use of the data.

4. Meaty Data – Including Public Accounts

One of the charges sometimes leveled against open data portals is that they don’t publish data that is important, or that could drive substantive public policy debates. While this is not true of what has happened in the UK and the United States, that charge probably is someone fair in Canada. While I’m still exploring the data available on data.gov.bc.ca one thing seems clear, there is a commitment to getting the more “high-value” data sets out to the public. For example, I’ve already noticed you can download the Consolidated Revenue Fund Detailed Schedules of Payments-FYE10-Suppliers which for the fiscal year 2009-2010 details the payees who received $25,000 or more from the government. I also noticed that the Provincial Obstacles to Fish Passage are available for download – something I hope our friends in the environmental movement will find helpful. There is also an entire section dedicated to data on the provincial educational system, I’ll be exploring that in more detail.

Wanted to publish this for now, definitely keen to hear about others thoughts and comments on the data portal, data sets you find interesting and helpful, or anything else. If you are building an app using this data, or doing an analysis that is made easier because of the data on this site, I’d love to hear from you.

This is a big step for the province. I’m sure I’ll discover some shortcomings as I dive deeper, but this is a solid start and, I hope, an example to other provinces about what is possible.

The Audacity of Shaw: How Canada's Internet just got Worse

59 Replies

It is really, really, really hard to believe. But as bad as internet access is in Canada, it just got worse.

Yesterday, Shaw Communications, a Canadian telecommunications company and internet service provider (ISP) that works mostly in Western Canada announced they are launching Movie Club, a new service to compete with Netflix.

On the surface this sounds like a good thing. More offerings should mean more competition, more choice and lower prices. All things that would benefit consumers.

Look only slightly closer and you learn the very opposite is going on.

This is because, as the article points out:

“…subscribers to Movie Club — who initially can watch on their TV or computer, with phones and tablets planned to come on line later — can view content without it counting against their data plan.

“There should be some advantage to you being a customer,” Bissonnette said.”

The very reason the internet has been such an amazing part of our lives is that every service that is delivered on it is treated equally. You don’t pay more to look at the Vancouver Sun’s website than you do to look at eaves.ca or CNN or to any other website in the world. For policy and technology geeks this principle of equality of access is referred to as net neutrality. The idea is that ISPs (like Shaw) should not restrict or give favourable access to content, sites, or services on the internet.

But this is precisely what Shaw is doing with its new service.

This is because ISPs in Canada charge what are called “overages.” This means if you use the internet a lot, say you watch a lot of videos, at a certain point you will exceed a “cap” and Shaw charges you extra, beyond your fixed monthly fee. If, for example, you use Netflix (which is awesome and cheap, for $8 a month you get unlimited access to a huge quantity of content) you will obviously be watching a large number of videos, and the likelihood of exceeding the cap is quite high.

What Shaw has announced is that if you use their service – Movie Club – none of the videos you watch will count against your cap. In other words they are favouring their service over that of others.

So why should you care? Because, in short, Shaw is making the internet suck. It wants to turn your internet from the awesome experience where you have unlimited choice and can try any service that is out there, into the experience of cable, where your choice is limited to the channels they choose to offer you. Today they’ll favour their movie service as opposed to (the much better) Netflix service. But tomorrow they may decide… hey you are using Skype instead of our telephone service, people who use “our skype” will get cheaper access than people who use skype. Shaw is effectively applying a tax on new innovative and disruptively cheap service on the internet so that you don’t use them. They are determining – through pricing – what you can and cannot do with your computer while elsewhere in the world, people will be using cool new disruptive services that give them better access to more fun content, for cheaper. Welcome to the sucky world of Canada’s internet.

Doubling down on Audacity: The Timing

Of course what makes this all the more obscene is that Shaw has announced this service at the very moment the CRTC – the body that regulates Canada’s Internet Service Providers – is holding hearings on Usage Based Billings. One of the reasons Canada’s internet providers say that have to charge “overages” for those who use the internet a lot is because of there isn’t enough bandwidth. But how is it that there is enough bandwidth for their own services?

As Steve Anderson of the OpenMedia – a consumer advocacy group – shared with me yesterday “It’s a huge abuse of power.” and that “The launch of this service at the time when the CRTC is holding a hearing on pricing regulation should be seen as a slap in the face to the the CRTC, and the four hundred and ninety one thousand Canadians that signed the Stop The Meter petition.”

My own feeling is the solution is pretty simple. We need to get the ISPs out of the business of delivering content. Period. Their job should be to deliver bandwidth, and nothing else. You do that, you’ll have them competing over speed and price very, very quickly. Until then the incentive of ISPs isn’t to offer good internet service, it’s to do the opposite, it’s to encourage (or force) users to use the services they offer over the internet.

For myself, I’m a Shaw customer and a Netflix customer. Until now I’ve had nothing to complain about with either. Now, apparently I have to choose between the two. I can tell you right now who is going to win. Over the next few months I’m going to be moving my internet service to another provider. Maybe I’ll still get cable TV from Shaw, I don’t know, but my internet service is going to a company that gives me the freedom to choose the services I want and that doesn’t ding me with fees that apparently, I’m being charged under false pretenses. I’ll be telling by family members, friends and pretty much everyone I know, to do the same.

Shaw, I’m sorry it had to end this way. But as a consumer, it’s the only responsible thing to do.

Lots of Open Data Action in Canada

8 Replies

A lot of movement on the open data (and not so open data) front in Canada.

Canadian International Development Agency (CIDA) Open Data Portal Launched

Some readers may remember that last week I wrote a post about the imminent launch of CIDA’s open data portal. The site is now live and has a healthy amount of data on it. It is a solid start to what I hope will become a robust site. I’m a big believer – and supporter of the excellent advocacy efforts of the good people at Engineers Without Borders – that the open data portal would be greatly enhanced if CIDA started publishing its data in compliance with the emerging international standard of the International Aid Transparency Initiative as these 20 leading countries and organizations have.

If anyone creates anything using this data, I’d love to see it. One simple start might be to try using the Open Knowledge Foundation’s open source Where Does my Money Go code, to visualize some of the spending data. I’d be happy to chat with anyone interested in doing this, you can also check out the email group to find some people experienced in playing with the code base.

Improved License on the CIDA open data portal and data.gc.ca

One thing I’ve noticed with the launch of the CIDA open data portal was how the license was remarkably better than the license at data.gc.ca – which struck me as odd, since I know the feds like to be consistent about these types of things. Turns out that the data.gc.ca license has been updated as well and the two are identical. This is good news as some of the issues that were broken with the previous license have been fixed. But not all. The best license out there remains the license at data.gov (that’s a trick question, because data.gov has no license, it is all public domain! Tricky eh…? Nice!) but if you are going to have a license, the UK Open Government License used by at data.gov.uk is more elegant, freer and satisfies a number of the concerns I cite above and have heard people raise.

So this new data.gc.ca license is a step in the right direction, but still behind the open gov leaders (teaching lawyers new tricks sadly takes a long time, especially in government).

Great site, but not so open data: WellBeing Toronto

Interestingly, the City of Toronto has launched a fabulous new website called Well Being Toronto. It is definitely worth checking out. The main problem of course is that while it is interesting to look at, the underlying data is, sadly, not open. You can’t play with the data, such as mash it up with your own (or another jurisdiction’s) data. This is disappointing as I believe a number of non-profits in Toronto would likely find the underlying data quite helpful/important. I have, however, been told that the underlying data will be made open. It is something I hope to check in on again in a few months as I fear that it may never get prioritized, so it may be up to Torontonians to whold the Mayor and council’s feet to the fire to ensure it gets done.

Parliamentary Budget Office (PBO) launches (non-open) data website

It seems the PBO is also getting in on the data action with the launch of a beta site that allows you to “see” budgets from the last few years. I know that the Parliamentary Budget Office has been starved of resources, so they deserve to be congratulated for taking this first, important step. Also interesting is that the data has no license on the website, which could make it the most liberally licensed open data portal in the country. The site does have big downsides. First, the data can only be “looked” at, there is no obvious (simple) way to download it and start playing with it. More oddly still the PBO requires that users register with their email address to view the data. This seems beyond odd and actually, down right creepy, to me. First, parliament’s budget should be free and open and one should not need to hand over an email address to access it. Second, the email addresses collected appear to serve no purpose (unless the PBO intends to start spamming us), other than to tempt bad people to hack their site so they can steal a list of email addresses.

Why not create an Open311 add-on for Ushahidi?

7 Replies

This is not a complicated post. Just a simple idea: Why not create an Open311 add-on for Ushahidi?

So what do I mean by that, and why should we care?

Many readers will be familiar with Ushahidi, non-profit that develops open source mapping software that enables users to collect and visualize data in interactive maps. It’s history is now fairly famous, as the Wikipedia article about it outlines: “Ushahidi.com’ (Swahili for “testimony” or “witness”) is a website created in the aftermath of Kenya’s disputed 2007 presidential election (see 2007–2008 Kenyan crisis) that collected eyewitness reports of violence sent in by email and text-message and placed them on a Google map.^[2]“Ushahidi’s mapping software also proved to be an important resource in a number of crises since the Kenyan election, most notably during the Haitian earthquake. Here is a great 2 minute video on How how Ushahidi works.

But mapping of this type isn’t only important during emergencies. Indeed it is essential for the day to day operations of many governments, particularly at the local level. While many citizens in developed economies may be are unaware of it, their cities are constantly mapping what is going on around them. Broken infrastructure such as leaky pipes, water mains, clogged gutters, potholes, along with social issues such as crime, homelessness, business and liquor license locations are constantly being updated. More importantly, citizens are often the source of this information – their complaints are the sources of data that end up driving these maps. The gathering of this data generally falls under the rubric of what is termed 311 systems – since in many cities you can call 311 to either tell the city about a problem (e.g. a noise complaint, service request or inform them about broken infrastructure) or to request information about pretty much any of the city’s activities.

This matters because 311 systems have generally been expensive and cumbersome to run. The beautiful thing about Ushahidi is that:

it works: it has a proven track record of enabling citizens in developing countries to share data using even the simplest of devices both with one another and agencies (like humanitarian organizations)
it scales: Haiti and Kenya are pretty big places, and they generated a fair degree of traffic. Ushahidi can handle it.
it is lightweight: Ushahidi technical footprint (yeap making that up right now) is relatively light. The infrastructure required to run it is not overly complicated
it is relatively inexpensive: as a result of (3) it is also relatively cheap to run, being both lightweight and leveraging a lot of open source software
Oh, and did I mention IT WORKS.

This is pretty much the spec you would want to meet if you were setting up a 311 system in a city with very few resources but interested in starting to gather data about both citizen demands and/or trying to monitor newly invested in infrastructure. Of course to transform Ushahidi into a process for mapping 311 type issues you’d need some sort of spec to understand what that would look like. Fortunately Open311 already does just that and is supported by some of the large 311 providers system providers – such as Lagan and Motorola – as well as some of the disruptive – such as SeeClickFix. Indeed there is an Open311 API specification that any developer could use as the basis for the add-on to Ushahidi.

Already I think many cities – even those in developing countries – could probably afford SeeClickFix, so there may already be a solution at the right price point in this space. But maybe not, I don’t know. More importantly, an Open311 module for Ushahidi could get local governments, or better still, local tech developers in developing economies, interested in and contributing to the Ushahidi code base, further strengthening the project. And while the code would be globally accessible, innovation and implementation could continue to happen at the local level, helping drive the local economy and boosting know how. The model here, in my mind, is OpenMRS, which has spawned a number of small tech startups across Africa that manage the implementation and servicing of a number of OpenMRS installations at medical clinics and countries in the region.

I think this is a potentially powerful idea for stakeholders in local governments and startups (especially in developing economies) and our friends at Ushahidi. I can see that my friend Philip Ashlock at Open311 had a similar thought a while ago, so the Open311 people are clearly interested. It could be that the right ingredients are already in place to make some magic happen.

Mind. Prepare to be blown away. Big Data, Wikipedia and Government.

3 Replies

Okay, super psyched about this. Back at the Strata Conference in Feb (in San Diego) I introduced my long time uber-quant friend and now Wikimedia Foundation data scientist Diederik Van Liere to fellow Gov2.0 thinker Nicholas Gruen (Chairman) and Anthony Goldbloom (Founder and CEO) of an awesome new company called Kaggle.

As usually happens when awesome people get together… awesomeness ensued. Mind. Be prepared to be blown.

So first, what is Kaggle? They’re a company that helps companies and organizations post their data and run competitions with the goal of having it scrutinized by the world’s best data scientists towards some specific goal. Perhaps the most powerful example of a Kaggle competition to date was their HIV prediction competition, in which they asked contestants to use a data set to find markers in the HIV sequence which predict a change in the severity of the infection (as measured by viral load and CD4 counts).

Until Kaggle showed up the best science to date had a prediction rate of 70% – a feat that had taken years to achieve. In 90 days contributors to the contest were able to achieve a prediction rate of 77%. A 10% improvement. I’m told that achieving an similar increment had previously taken something close to a decade. (Data geeks can read how the winner did it here and here.)

Diederik and Anthony have created a similar competition, but this time using Wikipedia participation data. As the competition page outlines:

This competition challenges data-mining experts to build a predictive model that predicts the number of edits an editor will make in the five months after the end date of the training dataset. The dataset is randomly sampled from the English Wikipedia dataset from the period January 2001 – August 2010.

The objective of this competition is to quantitively understand what factors determine editing behavior. We hope to be able to answer questions, using these predictive models, why people stop editing or increase their pace of editing.

This is of course, a subject matter that is dear to me as I’m hoping that we can do similar analysis in open source communities – something Diederik and I have tried to theorize with Wikipedia and actually do Bugzilla data.

There is a grand prize of $5000 (along with a few others) and, amazingly, already 15 participants and 7 submissions.

Finally, I hope public policy geeks, government officials and politicians are paying attention. There is power in data and an opportunity to use it to find efficiencies and opportunities. Most governments probably don’t even know how to approach an organization like Kaggle or to run a competition like this, despite (or because?) it is so fast, efficient and effective.

It shouldn’t be this way.

If you are in government (or any org), check out Kaggle. Watch. Learn. There is huge opportunity here.

12:10pm PST – UPDATE: More Michael Bay sized awesomeness. Within 36 hours of the wikipedia challenge being launched the leading submission has improved on internal Wikimedia Foundation models by 32.4%

CIDA announces Open Data portal: What it means to Canadians

7 Replies

For those who missed it, the Canadian International Development Agency (CIDA) has announced it is launching an open data portal.

This is exciting news. On Monday I was interviewed about the initiative by Embassy Magazine which published the resulting article (behind their paywall) here.

As (I hope) the interview conveys, I’m cautiously optimistic about the Minister’s announcement. I’m conservative in my reaction only because we don’t actually know what the Minister has announced. At the moment the CIDA open data page is, quite literally, a blank slate. I feel positive because pretty much anything that gets more information about Canada’s aid budget available online is a step in the right direction. I’m cautious however, because the text from the Minister’s speech leads me to believe that she is using the term “open data” to describe something that may, in fact, not be open data.

Donors and partner countries must be accountable to their citizens, absolutely, but both must also be accountable to each other.

Transparency underpins these accountabilities.

With this in mind, today I am pleased to announce the Open Data Portal on the CIDA website that will make our searchable database of roughly 3,000 projects quick and simple to access.

The Open Data portal will put our country strategies, evaluations, audits and annual statistical and results reports within easy reach.

One of the core elements of the definition of “open data” is that it be machine readable. I need to actually get the “data” (e.g an excel spreadsheet, or database I can download and/or access) so that I can play with it, mash it up, analyze it, etc… It isn’t clear that this is on offer. The minister’s announcements talks about a database that allows you to search, and quickly download, reports on the 3000 projects that CIDA funds or operates. A report however, is not data. It may cite data, it may (and hopefully does) even contain data in charts or tables, but if what we are getting is access to reports then this is not an open data portal.

What I hope is happening – and what I advocated for in an oped in the Toronto Star – is that the Minister is launching a true open data portal which will share actual data – not analysis – with Canadians. More importantly, I hope this means Canada will be joining the efforts of Publish What you Fund, as it pushes donor organizations to share their aid data in a single common structure, so that budgets, contributions, projects, timelines, geography and other information about aid can be compared across countries, agencies, and organizations.

Open data, and especially in a internationally recognized standardized format, matters because no one is going to read all 10,000 reports about all 3000 projects CIDA funds. However, if we had access to the data, in a structured manner, there are those at non-profits, in universities and colleges and in the media (among other places) that could map the projects, compare budgets and results more clearly, compare our efforts against those of other countries, and do their own analysis to say, find duplication and overlap. I don’t, for a second, believe that 99.9% of Canadians will use CIDA’s open data portal, but the .1% who do will be able to create products that can inform the rest of us, and allow us to better understand Canada’s role in the world. In other words, Open Data portal could be empowering and educating to a broad number of people. Access to 10,000 reports, while a good step, simply won’t be able to create a similar outcome on any scale. The difference is, quite frankly, dramatic.

So let’s wait and see. I’m excited that the Minister of International Cooperation is using the language of Open Data – it means that she and her staff understand it has currency. What I also hope is that they understand its meaning – so far we have no data on whether they do or do not, and I remain cautiously optimistic, they should, after all, realize the significance of the language they are using. Either way, they have set high expectations among those of us who think about, talk about and work in, this area. As a Canadian, I’m hoping those expectations get fulfilled.

The next Open Data battle: Advancing Policy & Innovation through Standards

21 Replies

With the possible exception of weather data, the most successful open data set out there at the moment is transit data. It remains the data with which developers have experimented and innovated the most. Why is this? Because it’s been standardized. Ever since Google and the City of Portland creating the General Transit Feed Specification (GTFS) any developer that creates an application using GTFS transit data can port their application to over 100+ cities around the world with 10s and even 100s of millions of potential users. Now that’s scale!

All in all the benefits of a standard data structure are clear. A public good is more effectively used, citizens receive enjoy better service and companies (both Google and the numerous smaller companies that sell transit related applications) generate revenue, pay salaries, etc…

This is why, with a number of jurisdictions now committed to open data, I believe it is time for advocates to start focusing on the next big issue. How do we get different jurisdictions to align around standard structures so as to increase the number of people to whom an application or analysis will be relevant? Having cities publish open data sets is a great start and has led to real innovation, next generation open data and the next leaps in innovation will require some more standards.

The key, I think, is to find areas that meet three criteria:

Government Data: Is there relevant government data about the service or issue that is available?
Demand: Is this a service for which there is regular demand? (this is why transit is so good, millions of people touch the service on a daily basis)
Business Model: Is there a business that believes it can use this data to generate revenue (either directly, or indirectly)

Two comments on this.

First, I think we should look at this model because we want to find places where the incentives are right for all the key stakeholders. The wrong way to create a data structure is to get a bunch of governments together to talk about it. That process will take 5 years… if we are lucky. Remember the GTFS emerged because Google and Portland got together, after that, everybody else bandwagoned because the value proposition was so high. This remains, in my mind, not the perfect, but the fastest and more efficient model to get more common data structures. I also respect it won’t work for everything, but it can give us more successes to point to.

Which leads me to point two. Yes, at the moment, I think that target in the middle of this model is relatively small. But I think we can make it bigger. The GTFS shows cities, citizens and companies that there is value in open data. What we need are more examples so that a) more business models emerge and b) more government data is shared in a structured way across multiple jurisdictions. The bottom and and right hand circles in this diagram can, and if we are successful will, move. In short, I think we can create this dynamic:

So, what does this look like in practice?

I’ve been trying to think of services that fall in various parts of the diagram. A while back I wrote a post about using open restaurant inspection data to drive down health costs. Specifically around finding a government to work with a Yelp!, Bing or Google Maps, Urban Spoon or other company to integrate the inspection data into the application. That for me is an example of something that I think fits in the middle. Government’s have the data, its a service citizens could touch on a regular base if the data appeared in their workflow (e.g. Yelp! or Bing Maps) and for those businesses it either helps drive search revenue or gives their product a competitive advantage. The Open311 standard (sadly missing from my diagram), and the emergence of SeeClickFix strike me as another excellent example that is right on the inside edge of the sweet spot).

Here’s a list of what else I’ve come up with at the moment:

You can also now see why I’ve been working on Recollect.net – our garbage pick up reminder service – and helping develop a standard around garbage scheduling data – the Trash & Recycling Object Notation. I think it is a service around which we can help explain the value of common standards to cities.

You’ll notice that I’ve put “democracy data” (e.g. agendas, minutes, legislation, hansards, budgets, etc…) in the area where I don’t think there is a business plan. I’m not fully convinced of this – I could see a business model in the media space for this – but I’m trying to be conservative in my estimate. In either case, that is the type of data the good people at the Sunlight Foundation are trying to get liberated, so there is at least, non-profit efforts concentrated there in America.

I also put real estate in a category where I don’t think there is real consumer demand. What I mean by this isn’t that people don’t want it, they do, but they are only really interested in it maybe 2-4 times in their life. It doesn’t have the high touch point of transit or garbage schedules, or of traffic and parking. I understand that there are businesses to be built around this data, I love Viewpoint.ca – a site that takes mashes opendata up with real estate data to create a compelling real estate website – but I don’t think it is a service people will get attached to because they will only use it infrequently.

Ultimately I’d love to hear from people on ideas they on why might fit in this sweet spot. (if you are comfortable sharing the idea, of course). Part of this is because I’d love to test the model more. The other reason is because I’m engaged with some governments interested in getting more strategic about their open data use and so these types of opportunities could become reality.

Finally, I just hope you find this model compelling and helpful.

If the Prime Minister Wants Accountable Healthcare, let's make it Transparent too

1 Reply

Over at the Beyond the Commons blog Aaron Wherry has a series of quotes from recent speeches on healthcare by Canadian Prime Minister Stephen Harper in which the one constant keyword is… accountability.

Who can blame him?

Take everyone promising to limit growth to a still unsustainable 6% (gulp) and throw in some dubiously costly projects ($1 billion spent on e-health records in Ontario when an open source solution – VistA – could likely have been implemented at a fraction of the cost) and the obvious question is… what is the country going to do about healthcare costs?

I don’t want to claim that open data can solve the problem. It can’t. There isn’t going to be a single solution. But I think it could help spread best practices, improve customer choice and service as well as possibly yield other potential benefits.

Anyone who’s been around me for the last month knows about my restaurant inspection open data example (which could also yield healthcare savings) but I think we can go bigger. A Federal Government that is serious about accountability in Healthcare needs to build a system where that accountability isn’t just between the provinces and the feds, it needs to be between the Healthcare system and its users; us.

Since the feds usually attach several provisions to their healthcare dollars, the one I’d like to see is an open data provision. One where provinces, and hospitals are required to track and make open a whole set of performance data, in machine readable formats, in a common national standard, that anyone in Canada (or around the world) can download and access.

Some of the data I’d love to see mandated to be tracked and shared, includes:

Emergency Room wait times – in real time.
Wait times, by hospital, for a variety of operations
All budget data, down to the hospital or even unit level, let’s allow the public to do a cost/patient analysis for every unit in the country
Survival rates for various surgeries (obviously controversial since some hospitals that have the lowest rates are actually the best since they get the hardest cases – but let’s trust the public with the data)
Inspection data – especially if we launched something akin to the Institute for Health Management’s Protecting 5 Millions Lives Campaign
I’m confident there is much, much more…

I can imagine a slew of services and analysis that emerge from these, if nothing than a citizenry that is better informed about the true state of its healthcare system. Even something as simple as being able to check ER wait times at all the hospitals near you, so you can drive to the one where the wait times are shortest. That would be nice.

Of course, if the Prime Minister wants to go beyond accountability and think about how data could directly reduce costs, he might take a look at one initiative launched south of the border.

If he did, he might be persuaded to demand that the provinces share a set of anonymized patient records to see if academics or others in the country might be able to build better models for how we should manage healthcare costs. In January of this year I witnessed the launch of the $3 million dollar Heritage Health Prize at the O’Reilly Strata Conference in San Diego. It is a stunningly ambitious, but realistic effort. As the press release notes:

Contestants in the challenge will be provided with a data set consisting of the de-identified medical records of 100,000 patients from the 2008 calendar year. Contestants will then be required to create a predictive algorithm to predict who was hospitalized during the 2009 calendar year. HPN will award the $3 million prize(more than twice what is paid for the Nobel Prize in medicine) to the first participant or team that passes the required level of predictive accuracy. In addition, there will be milestone prizes along the way, which will be awarded to teams leading the competition at various points in time.

In essence Heritage Health is doing to patient management what Netflix (through the $1M Netflix prize) did to movie selections. It’s crowdsourcing the problem to get better results.

The problem is, any algorithm developed by the winners of the Heritage Health Prize will belong to… Heritage Health. This means the benefits of this innovation cannot benefit Canadians (nor anyone else). So why not launch a prize of our own. We have more data, I suspect our data is better (not limited to a single state) and we could place the winning algorithm in the public domain so that it can benefit all of humanity. If Canadian data helped find efficiencies that lowered healthcare costs and improved healthcare outcomes for everyone in the world… it could be the biggest contribution to global healthcare by Canada since Federick Banting discovered insulin and rescued diabetics everywhere.

Of course, open data, and sharing (even anonymized) patient data would be a radical experiment for government, something new, bold and different. But 6% growth is itself unsustainable and Canadians need to see that their government can do something bold, new and innovative. These initiatives would fit the bill.