Tag Archives: opendata

International Open Data Hackathon 2011: Better Tools, More Data, Bigger Fun

Last year, with only a month of notice, a small group passionate people announced we’d like to do an international open data hackathon and invited the world to participate.

We were thinking small but fun. Maybe 5 or 6 cities.

We got it wrong.

In the end people from over 75 cities around the world offered to host an event. Better still we definitively heard from people in over 40. It was an exciting day.

Last week, after locating a few of the city organizers email addresses, I asked them if we should do it again. Every one of them came back and said: yes.

So it is official. This time we have 2 months notice. December 3rd will be Open Data Day.

I want to be clear, our goal isn’t to be bigger this year. That might be nice if it happens. But maybe we’ll only have 6-7 cities. I don’t know. What I do want is for people to have fun, to learn, and to engage those who are still wrestling with the opportunities around open data. There is a world of possibilities out there. Can we seize on some of them?

Why.

Great question.

First off. We’ve got more data. Thanks to more and more enlightened governments in more and more places, there’s a greater amount of data to play with. Whether it is Switzerland, Kenya, or Chicago there’s never been more data available to use.

Second, we’ve got better tools. With a number of governments using Socrata there are more API’s out there for us to leverage. Scrapperwiki has gotten better and new tools like Buzzdata, TheDataHub and Google’s Fusion Tables are emerging every day.

And finally, there is growing interest in making “openess” a core part of how we measure governments. Open data has a role to play in driving this debate. Done right, we could make the first Saturday in December “Open Data Day.” A chance to explain, demo and invite to play, the policy makers, citizens, businesses and non-profits who don’t yet understand the potential. Let’s raise the world’s data literacy and have some fun. I can’t think of a better way than with another global open data hackathon – an maker’s fair like opportunity for people to celebrate open data by creating visualizations, writing up analyses, building apps or doing what ever they want with data.

Of course, like last time, hopefully we can make the world a little better as well. (more on that coming soon)

How.

The basic premises for the event would be simple, relying on 5 basic principles.

1. Together. It can be as big or as small, as long or as short, as you’d like it, but we’ll be doing it together on Saturday, December 3rd, 2011.

2. It should be open. Around the world I’ve seen hackathons filled with different types of people, exchanging ideas, trying out new technologies and starting new projects. Let’s be open to new ideas and new people. Chris Thorpe in the UK has done amazing work getting young and diverse group hacking. I love Nat Torkington’s words on the subject. Our movement is stronger when it is broader.

3. Anyone can organize a local event. If you are keen help organize one in your city and/or just participate add your name to the relevant city on this wiki page. Where ever possible, try to keep it to one per city, let’s build some community and get new people together. Which city or cities you share with is up to you as it how you do it. But let’s share.

4. You can work on anything that involves open data. That could be a local or global app, a visualization, proposing a standard for common data sets, scraping data from a government website to make it available for others in buzzdata.

It would be great to have a few projects people can work on around the world – building stuff that is core infrastructure to future projects. That’s why I’m hoping someone in each country will create a local version of MySociety’s Mapit web service for their country. It will give us one common project, and raise the profile of a great organization and a great project.

We also hope to be working with Random Hacks of Kindness, who’ve always been so supportive, ideally supplying data that they will need to run their applications.

5. Let’s share ideas across cities on the day. Each city’s hackathon should do at least one demo, brainstorm, proposal, or anything that it shares in an interactive way with at members of a hackathon in at least one other city. This could be via video stream, skype, by chat… anything but let’s get to know one another and share the cool projects or ideas we are hacking on. There are some significant challenges to making this work: timezones, languages, culture, technology… but who cares, we are problem solvers, let’s figure out a way to make it work.

Like last year, let’s not try to boil the ocean. Let’s have a bunch of events, where people care enough to organize them, and try to link them together with a simple short connection/presentation.Above all let’s raise some awareness, build something and have some fun.

What next?

1. If you are interested, sign up on the wiki. We’ll move to something more substantive once we have the numbers.

2. Reach out and connect with others in your city on the wiki. Start thinking about the logistics. And be inclusive. Someone new shows up, let them help too.

3. Share with me your thoughts. What’s got you excited about it? If you love this idea, let me know, and blog/tweet/status update about it. Conversely, tell me what’s wrong with any or all of the above. What’s got you worried? I want to feel positive about this, but I also want to know how we can make it better.

4. Localization. If there is bandwidth locally, I’d love for people to translate this blog post and repost it locally. (let me know as I’ll try cross posting it here, or at least link to it). It is important that this not be an english language only event.

5. If people want a place to chat with other about this, feel free to post comments below. Also the Open Knowledge Foundation’s Open Data Day mailing list will be the place where people can share news and help one another out.

Once again, I hope this will sound like fun to a few committed people. Let me know what you think.

The Geopolitics of the Open Government Partnership: the beginning of Open vs. Closed

Aside from one or two notable exceptions, there hasn’t been a ton of press about the Open Government Partnership (OGP). This is hardly surprising. The press likes to talk about corruption and bad government, people getting together to talk about actually address these things in far less sexy.

But even where good coverage exists analysts and journalists are, I think, misunderstanding the nature of the partnership and its broader implications should it take hold. Presently it is generally seen as a do good project, one that will help fight corruption and hopefully lead to some better governance (both of which I hope will be true). However, the Open Government Partnership isn’t just about doing good, it has real strategic and geopolitical purposes.

In fact, the OGP is, in part, about a 21st century containment strategy.

For those unfamiliar with 20th century containment, a brief refresher. Containment refers to a strategy outlined by a US diplomat – George Kennan – who while posted in Moscow wrote the famous The Long Telegram in which he outlined the need for a more aggressive policy to deal with an expansionist post-WWII Soviet Union. He argued that such a policy would need to seek to isolate the USSR politically and strategically, in part by positioning the United States as a example in the world that other countries would want to work with. While discussions of “containment” often focus on its military aspects and the eventual arms race, it was equally influential in prompting the ideological battle between the USA and USSR as they sought to demonstrate whose “system” was superior.

So I repeat. The OGP is part of a 21st century containment policy. And I’d go further, it is a effort to forge a new axis around which America specifically, and a broader democratic camp more generally, may seek to organize allies and rally its camp. It abandons the now outdated free-market/democratic vs. state-controlled/communist axis in favour of a more subtle, but more appropriate, open vs. closed.

The former axis makes little sense in a world where authoritarian governments often embrace (quasi) free-market to reign, and even have some of the basic the trappings of a democracy. The Open Government Partnership is part of an effort to redefine and shift the goal posts around what makes for a free-market democracy. Elections and a market place clearly no longer suffice and the OGP essentially sets a new bar in which a state must (in theory) allow itself to be transparent enough to provide its citizens with information (and thus power), in short: it is a state can’t simple have some of the trappings of a democracy, it must be democratic and open.

But that also leaves the larger question. Who is being contained? To find out that answer take a look at the list of OGP participants. And then consider who isn’t, and likely never could be, invited to the party.

OGP members Notably Absent
Albania
Azerbaijan
Brazil
Bulgaria
Canada
Chile
Colombia
Croatia
Czech Republic
Dominican Republic
El Salvador
Estonia
Georgia
Ghana
Guatemala
Honduras
Indonesia
Israel
Italy
Jordon
Kenya
Korea
Latvia
Liberia
Lithuania
Macedonia
Malta
Mexico
Moldova
Mongolia
Montenegro
Netherlands
Norway
Peru
Philippines
Romania
Slovak Republic
South Africa
Spain
Sweden
Tanzania
Turkey
Ukraine
United Kingdom
United States
Uruguay
ChinaIran

Russia

Saudi Arabia

(Indeed much of the middle East)

Pakistan

*India is not part of the OGP but was involved in much of initial work and while it has withdrawn (for domestic political reasons) I suspect it will stay involved tangentially.

So first, what you have here is a group of countries that are broadly democratic. Indeed, if you were going to have a democratic caucus in the United Nations, it might look something like this (there are some players in that list that are struggling, but for them the OGP is another opportunity to consolidate and reinforce the gains they’ve made as well as push for new ones).

In this regards, the OGP should be seen as an effort by the United States and some allies to find some common ground as well as a philosophical touch point that not only separates them from rivals, but that makes their camp more attractive to deal with. It’s no trivial coincidence that on the day of the OGP launch the President announced the United States first fulfilled commitment would be its decision to join the Extractive Industries Transparency Initiative (EITI). The EITI commits the American oil, gas and mining companies to disclose payments made to foreign governments, which would make corruption much more difficult.

This is America essentially signalling to African people and their leaders – do business with us, and we will help prevent corruption in your country. We will let you know if officials get paid off by our corporations. The obvious counter point to this is… the Chinese won’t.

It’s also why Brazil is a co-chair, and the idea was prompted during a meeting with India. This is an effort to bring the most important BRIC countries into the fold.

But even outside the BRICs, the second thing you’ll notice about the list is the number of Latin American, and in particular African countries included. Between the OGP, the fact that the UK is making government transparency a criteria for its foreign aid, and that World Bank is increasingly moving in the same direction, the forces for “open” are laying out one path for development and aid in Africa. One that rewards governance and – ideally – creates opportunities for African citizens. Again, the obvious counter point is… the Chinese won’t.

It may sounds hard to believe but the OGP is much more than a simple pact designed to make heads of state look good. I believe it has real geopolitical aims and may be the first overt, ideological salvo in the what I believe will be the geopolitical axis of Open versus Closed. This is about finding ways to compete for the hearts and minds of the world in a way that China, Russia, Iran and others simple cannot. And, while I agree we can debate the “openness” of the various the signing countries, I like the idea of world in which states compete to be more open. We could do worse.

Canada Joins the Open Government Partnership

I’m in New York today for the launch of the Open Government Partnership and it looks as the Canada is now a signatory (or at least has signed a letter of intent).

No commitments are outlined, but I will link to them when they are posted.

The Open Government Partnership was launched by the White House and the State Department earlier this year with 8 founding countries. The goal is to get a coalition of governments around the world to commit to implementing a series of initiatives to improve government transparency, effectiveness and accountability. You can read more here.

For those interested, the launch of the event will be livestreamed here. If you’re at the event, I’ll be hosting the lunch on “How to identify and prioritize core classes of information for public disclosure.”

Updated: here’s a video…

Research Request – Transit Study

After writing yesterday’s post on the economics of opendata and transit I’ve really been reflecting on a research question that emerged in the piece: Does having transit data embedded in Google Maps increase ridership?

My hypothesis is that it would… but I did some googling on the topic and couldn’t find anything written on the subject, not to mention something that had been rigorously researched and would stand up to peer review. This leads me to believe it could be a great research project. I willing to bet that some transit authorities, and Google would be of enormously interested in the results.

Obviously there are a number of variables that might impact public transit ridership: budgets, fleet size growth or cutbacks, the economy, population growth, etc… That said, I’m sure there is someone out there who could think of a methodology that would account for these factors and still allow us to tell if becoming available in Google Maps impact’s a city’s ridership levels.

The helpful thing is that there are lots of data points to play with. A brief scan of the public transit feed lists suggests that there are roughly 150 cities that provide Google with GTFS data of the transit schedule. That’s a lot of cities to play with and would allow a study to offset regional variations. I’m also confident that each of the transit authorities mentioned in the list publicly publish their ridership levels (or they could be FOIAed/ATIPed)

If anyone has done this study, please let me know, I’d love to know more. If no, and someone is interested in doing this study, please go for it! I’m definitely happy to offer whatever support I can.

The Economics of Open Data – Mini-Case, Transit Data & TransLink

TransLink, the company that runs public transit in the region where I live (Vancouver/Lower Mainland) has launched a real time bus tracking app that uses GPS data to figure out how far away the next the bus you are waiting for really is. This is great news for everyone.

Of course for those interested in government innovation and public policy it also leads to another question. Will this GPS data be open data?

Presently TransLink does make its transit schedule “open” under a non-commercial license (you can download it here). I can imagine a number of senior TransLink officials (and the board) scratching their head asking: “Why, when we are short of money, would we make our data freely available?”

The answer is that TransLink should make its current data, as well as its upcoming GPS data, open and available under a license that allows for both non-commercial and commercial re-use, not just because it is the right thing to do, but because the economics of it make WAY MORE SENSE FOR TRANSLINK.

Let me explain.

First, there are not a lot of obvious ways TransLink could generate wealth directly from its data. But let’s take two possible opportunities: the first involves selling a transit app to the public (or advertising in such an app), the second is through selling a “next bus” service to companies (say coffee shops or organizations) that believe showing this information might be a convenience to their employees or customers.

TransLink has already abandoned doing paid apps – instead it maintains a mobile website at m.translink.ca – but even if it created an app and charged $1 per download, the revenue would be pitiful. Assuming a very generous customer base of 100,000 users, TransLink would generate maybe $85,000 dollars (once Apple takes its cut from the iPhone downloads, assuming zero cut for Androids). But remember, this is not a yearly revenue stream, it is one time. Maybe, 10-20,000 people upgrade their phone, arrive in Vancouver and decide to download every year. So your year on year revenue is maybe $15K? So over a 5 year period, TransLink ends up with an extra, say $145,000 dollars. Nothing to sneeze at, but not notable.

In contrast a free application encourages use. So there is also a cost to not giving it away. It could be that, having transit data more readily available might cause some people to choose taking transit over say, walking, or taking a taxi or driving. Last year TransLink handled 211.3 million trips. Let’s assume that more accessible data from wider access to the data meant there was a .1% increase in the number of trips. An infinitesimally small increase – but it means 211,300 more trips. Assuming each rider pays a one zone $2.50 fare that would still translate in an additional revenue of $528,250. Over the same five year period cited above… that’s revenue of $2.641M, much better than $145,000. And this is just calculating money. Let’s say nothing of less congested roads, less smog and a lower carbon footprint for the region…

When the this analysis is applied to licensing data it produces the same result. Will UBC pay to have TransLink’s real time data on terminals in the Student Union building? I doubt it. Would some strategically placed coffee shops… possibly. Obviously organizations would have to pay for the signs, but adding on annual “data license fee” to display’s cost would cause some to opt out. And once you take into account managing the signs, legal fees, dealing with the contract and going through the sales process, it is almost inconceivable that TransLink would make more money from these agreements than it would from simply having more signs everywhere created by other people that generated more customers for its actual core business: moving people from A to B for a fee. Just to show you the numbers, if shops that weren’t willing to pay for the data put up “next bus” screens that generated a mere 1000 new regular bus users who did only 40 one way trips a year (or 40,000 new trips), this would equal revenue of $100,000 every year at no cost to translink. Someone else could install and maintain the signs, no contracts or licenses would need to be managed.

From a cost recovery perspective it is almost impossible to imagine a scenario where TransLink is better off not allowing commercial re-use of its data.

My point is that TransLink should not be focused on creating a few bucks from licensing its data (which it doesn’t do right now anyway). It should be focused on shifting the competitive value in the marketplace from access to accessibility.

Being the monopoly holder of transit data does not benefit TransLink. All it means is that fewer people see and engage with its data. When it makes the data open and available “access” no longer becomes the defining advantage. When anybody (e.g. TransLink, Google, independent developers) can access the data, the market place shifts to competing on access to competing on accessibility. Consumers don’t turn to who has the data, they turn to who makes the data easiest to use.

For example, Translink has noted that in 2011 it will have a record number of trips. Part of me wonders to what degree the increase in trips over the past few years is a result of making transit data accessible in Google Maps. (Has anyone done a study on this in any jurisdiction?) The simple fact is that Google maps is radically easier to use for planning transit journeys than Translink’s own website AND THAT IS A GOOD THING FOR TRANSLINK. Now imagine if lots of companies were sharing translink’s data? The local Starbucks and Blenz Coffee, to colleges and universities and busy buildings downtown. Indeed, the real crime right now is that Translink has handed Google a defacto monopoly. It is allowed to use the data for commercial re-use. Local tax-paying developers…? Not so according to the license they have to click through.

Translink, you want a world where everyone is competing (including against you) on accessibility. In the end… you win with greater use and revenue.

But let me go further. There are other benefits to having Translink share its data for commercial re-use.

Procurement

Some riders will note that there are already bus stops in Vancouver which display “next bus” data (e.g. how many minutes away the next bus is). If TransLink made its next bus data freely available via an API it could conceivably alter the procurement process for buying and maintaining these signs. Any vendor could see how the data is structured and so take over the management of the signs, and/or experiment with creating more innovative or cheaper ways of manufacturing them.

The same is true of creating the RFP for TransLink’s website. With the data publicly available, TransLink could simple ask developers to mock up what they think is the most effective way of displaying the data. More development houses might be enticed to respond to the RFP increasing the likelihood of innovations and putting downward pressure of fees.

Analysis

Of course, making GPS data free could have an additional benefit. Local news companies might be able to use the bus’s GPS data to calculate traffic flow rates and so predict traffic jams. Might they be willing to pay TransLink for the data? Maybe, but again probably not enough to justify the legal and sales overhead. Moreover, TransLink would benefit from this analysis – as it could use the reports to adjust its schedule and notify its drivers of problems beforehand. Of course everyone would benefit as well as better informed commuters might change their behaviour (including taking transit!) reducing congestion, smog, carbon footprint, etc…

Indeed, the analysis opportunities using GPS data are potentially endless – much of which might be done by bloggers and university students. One could imagine correlating actual bus/subway times with any other number of data sets (crime, commute times, weather) that could yield interesting information that could help TransLink with its planning. There is no world where TransLink has the resources to do all this analysis, so enabling others to do it, can only benefit it.

Conclusion

So if you are at TransLink/Coast Mountain Bus Company (or any transit authority in the world), this post is for you. Here’s what I suggest as next steps:

1) Add GPS bus tracking API to your open data portal.

2) Change your license. Drop the non-commercial part. It hurts your business more than you realize and is anti competitive (why does can Google use the data for a commercial application while residents of the lower mainland cannot?). My suggestion, adopt the BC Government Open Government License or the PDDL.

3) Add an RSS feed to your GTFS data. Like Google, we’d all like to know when you update your data. Given we live here and are users, it be nice to extend the same service to us as you do them.

4) Maybe hold a Transit Data Camp where you could invite local developers and entrepreneurs to meet your staff and encourage people to find ways to get transit data into the hands of more Lower Mainlanders and drive up ridership!

 

 

Open Data and New Public Management

This morning I got an email thread pointing to an article by Justin Longo on #Opendata: Digital-Era Governance Thoroughbred or New Public Management Trojan Horse? I’m still digesting it all but wanted to share some initial thoughts.

The article begins with discussion about the benefits of open data but its real goal is to argue how open data is a pawn in a game to revive the New Public Management Reform Agenda:

My hypothesis, based on a small but growing number of examples highlighting political support for open data, is that some advocates—particularly politicians, but not exclusively—are motivated by beliefs (both explicit and unconscious) forged in the New Public Management (NPM) reform agenda.

From this perspective, support for more open data aims at building coalitions of citizen consumers who are encouraged to use open data to expose public service decisions, highlight perceived performance issues, increase competition within the public sector, and strengthen the hand of the citizen as customer.

What I found disappointing is the article’s one dimensional approach to the problem: open data may support a theory/approach to public management disliked by the author, consequently (inferring from the article’s title and tone) it must be bad. This is akin to saying any technology that could be used to advance an approach I don’t support, must be opposed.

In addition, I’d say that the idea of exposing public service decisions, highlighting perceived performance issues, increasing competition within the public sector, and strengthening the hand of the citizen as customer are goals I don’t necessarily oppose, certainly not categorically. Moreover, I would hope such goals are not exclusively the domain of NPM. Do we want a society where government’s performance issues are not highlighted? Or where public service decisions are kept secret?

These are not binary choices. You can support the outcomes highlighted above and simultaneously believe in other approaches to public sector management and/or be agnostic about the size of government. Could open data be used to advance NPM? Possibly (although I’m doubtful). But it definitely can also be used to accomplish a lot of other good and potentially advance other approaches as well. Let’s not conflate a small subset of ways open data can be used or a small subset of its supporters with the entire project and then to lump them all into a single school of thought around public service management.

Moreover, I’ve always argued that the biggest users and benefactors of open data would be government – and in particular the public service. While open data could be used to build “coalitions of citizen consumers who are encouraged to use open data to expose public service decisions” it will also be used by public servants to better understand citizens needs, be more responsive and allocate resources more effectively. Moreover, those “citizen consumers” will probably be effective in helping them achieve this task. The alternative is to have better shared data internally (which will eventually happen), an outcome that might allow the government to achieve these efficiencies but will also radically increase the asymmetry in the relationship between the government and its citizens and worse, between the elites that do have privileged access to this data, and the citizenry (See Taggart below).

So ignoring tangible benefits because of a potential fear feels very problematic. It all takes me back to Kevin Kelly and What Technology Wants… this is an attempt to prevent an incredibly powerful technology because of a threat it poses to how the public sector works. Of course, it presumes that a) you can prevent the technology and b) that not acting will allow the status quo or some other preferred approach to prevail. Again, there are outcomes much, much worse the NPM that are possible (again, I don’t believe that open data leads directly to NPM) and I would argue, indeed likely, given evolving public expectations, demographics, and fiscal constraints.

In this regard, the article sets of up a false choice. Open data is going to reshape all theories of public management. To claim it supports or biases in favour of one outcome is, I think beyond premature. But more importantly, it is to miss the trees for the forest and the much bigger fish we need to fry. The always thoughtful Chris Taggart summed much of this up beautifully in an email thread:

I think the title — making it out to be a choice between a thoroughbred or Trojan Horse — says it all. It’s a false dichotomy, as neither of those are what the open data advocates are suggesting it is, nor do most of us believe that open data is solution to all our problems (far from it — see some of my presentations[1]).

It also seems to offer a choice between New Public Management (which I think Emer Coleman does a fairly good job of illuminating in her paper[2]) and the brave new world of Digital Era Governance, which is also to misunderstand the changes being brought about in society, with or without open government data.
The point is not that open data is the answer to our problem but society’s chance to stay in the game (and even then, the odds are arguably against it). We already have ever increasing numbers of huge closed databases, many made up of largely government data, available to small number of people and companies.
This leads to an asymmetry of power and friction that completely undermines democracy; open data is not a sufficiency to counteract that, but I think it is a requirement.

It’s possible I’ve misunderstood Longo’s article and he is just across the straights at the University of Victoria, so hopefully we can grab a beer and talk it through. But my sense is this article is much more about a political battle between New Public Management and Digital Era Governance in which open data is being used as a pawn. As an advocate, I’m not wholly comfortable with that, as I think it risks misrepresenting it.

Open Source Data Journalism – Happening now at Buzz Data

(there is a section on this topic focused on governments below)

A hint of how social data could change journalism

Anyone who’s heard me speak in the last 6 months knows I’m excited about BuzzData. This week, while still in limited access beta, the site is showing hints its potential – and it still has only a few hundred users.

First, what is BuzzData? It’s a website that allows data to be easily uploaded and shared among any number of users. (For hackers – it’s essentially github for data, but more social). It makes it easy for people to copy data sets, tinker with them, share the results back with the original master, mash them up with other data sets, all while engaging with those who care about that data set.

So, what happened? Why is any of this interesting? And what does it have to do with journalism?

Exactly a month ago Svetlana Kovalyova of Reuters had her article – Food prices to remain high, UN warns – re-published in the Globe and Mail.  The piece essentially outlined that food commodities were getting cheaper because of local conditions in a number of regions.

Someone at the Globe and Mail decided to go a step further and upload the data – the annual food price indices from 1990-present – onto the BuzzData site, presumably so they could play around with it. This is nothing complicated, it’s a pretty basic chart. Nonetheless a dozen or so users started “following” the dataset and about 11 days ago, one of them, David Joerg, asked:

The article focused on short-term price movements, but what really blew me away is: 1) how the price of all these agricultural commodities has doubled since 2003 and 2) how sugar has more than TRIPLED since 2003. I have to ask, can anyone explain WHY these prices have gone up so much faster than other prices? Is it all about the price of oil?

He then did a simple visualization of the data.

FoodPrices

In response someone from the Globe and Mail entitled Mason answered:

Hi David… did you create your viz based on the data I posted? I can’t answer your question but clearly your visualization brought it to the forefront. Thanks!

But of course, in a process that mirrors what often happens in the open source community, another “follower” of the data shows up and refines the work of the original commentator. In this case, an Alexander Smith notes:

I added some oil price data to this visualization. As you can see the lines for everything except sugar seem to move more or less with the oil. It would be interesting to do a little regression on this and see how close the actual correlation is.

The first thing to note is that Smith has added data, “mashing in” Oil Price per barrel. So now the data set has been made richer. In addition his graph quite nice as it makes the correlation more visible than the graph by Joerg which only referenced the Oil Price Index. It also becomes apparent, looking at this chart, how much of an outlier sugar really is.

oilandfood

Perhaps some regression is required, but Smith’s graph is pretty compelling. What’s more interesting is not once is the price of oil mentioned in the article as a driver of food commodity prices. So maybe it’s not relevant. But maybe it deserves more investigation – and a significantly better piece, one that would provide better information to the public – could be written in the future. In either case, this discussion, conducted by non-experts simply looking at the data, helped surface some interesting leads.

And therein lies the power of social data.

With even only a handful of users a deeper, better analysis of the story has taken place. Why? Because people are able to access the data and look at it directly. If you’re a follower of Julian Assange of wikileaks, you might call this scientific journalism, maybe it is, maybe it isn’t, but it certainly is a much more transparent way for doing analysis and a potential audience builder – imagine if 100s or 1000s of readers were engaged in the data underlying a story. What would that do to the story? What would that do to journalism? With BuzzData it also becomes less difficult to imagine a data journalists who spends a significant amount of their time in BuzzData working with a community of engaged pro-ams trying to find hidden meaning in data they amass.

Obviously, this back and forth isn’t game changing. No smoking gun has been found. But I think it hints at a larger potential, one that it would be very interesting to see unlocked.

More than Journalism – I’m looking at you government

Of course, it isn’t just media companies that should be paying attention. For years I argued that governments – and especially politicians – interested in open data have an unhealthy appetite for applications. They like the idea of sexy apps on smart phones enabling citizens to do cool things. To be clear, I think apps are cool too. I hope in cities and jurisdictions with open data we see more of them.

But open data isn’t just about apps. It’s about the analysis.

Imagine a city’s budget up on Buzzdata. Imagine, the flow rates of the water or sewage system. Or the inventory of trees. Think of how a community of interested and engaged “followers” could supplement that data, analyze it, visualize it. Maybe they would be able to explain it to others better, to find savings or potential problems, develop new forms of risk assessment.

It would certainly make for an interesting discussion. If 100 or even just 5 new analyses were to emerge, maybe none of them would be helpful, or would provide any insights. But I have my doubts. I suspect it would enrich the public debate.

It could be that the analysis would become as sexy as the apps. And that’s an outcome that would warm this policy wonk’s soul.

The State of Open Data Licenses in Canada and where to go from here

(for readers less interested in Open Data – I promise something different tomorrow)

In February I wrote how 2011 would be the year of the license for Canada’s open data community. This has indeed been the case. For public servants and politicians overseeing the various open data projects happening in Canada and around the world, here is an outline of where we are, and what I hope will happen next. For citizens I hope this will serve as a primer and help explain why this matters. For non-Canadians, I hope this can help you strategize how to deal with the different levels of government in your own country.

This is important stuff, and will be important to ensure success in the next open data challenge: aligning different jurisdictions around common standards.

Why Licenses Matter

Licenses matter because they determine how you are able to use government data – a public asset. As I outlined in the three laws of open data, data is only open if it can be found, be played with and be shared. The license deals with the last of these. If you are able to take government data, find some flaw or use it to improve a service, it means nothing if you are not able to share what you create with others. The more freedom you have in doing this, the better.

What we want from the license regime (and for your government)

There are a couple of interests one is trying to balance in creating a license regime. You want:

  • Open: there should maximum freedom for reuse (see above, and this blog post)
  • Secure: it offers governments appropriate protections for privacy and security
  • Simplicity: to keep down legal costs low, and make it easier for everyone to understand
  • Standardized: so my work is accessible across jurisdictions
  • Stable: so I know that the government won’t change the rules on me

At the moment, two licenses in Canada meet these tests. The Public Domain Dedication and License (PDDL) used by Surrey, Langley, Winnipeg (for its transit data) and the BC government open data portal license (which is a copy of the UK Open Government license).

Presently a bunch of licenses do not. This includes the Government of Canada Open Data Licence Agreement for Unrestricted Use of Canada’s Data (couldn’t they choose a better name? But for a real critique of why, read this blog post). It also includes the variants of the license created by Vancouver and now used by Toronto, Ottawa and Edmonton (among others). Full disclosure, I was peripherally involved in the creation of this license – it was necessary at the time.

Both these licenses are not standardized, have restrictions in them not found in the UK/BC Open Government License and the PDDL and are anything but simple. Nor are they stable. At any time the government can revoke them. In other words, many developers and companies interested in open data dislike them immensely.

Where do we go from here?

At the moment there are a range of licenses available in Canada – this undermines the ability of developers to create software that uses open data across multiple jurisdictions.

First, the launch of BC’s open data portal and its use of the UK Open Government License has reset the debate in this country. The Federal government, which has an awkward, onerous and unloved license should stop trying to create a new license that simply adds unnecessary complexity and creates confusion for software developers. (I detail the voluminous problems with the Federal license here.)

Instead the Feds should adopt the UK Open Government Licence and push for it to be a standard, both for the provinces and federal government agencies, as well as for other common wealth countries. Its refusal to adopt the UK license is deeply puzzling. It has offered no explanation about why it can’t, indeed, it would be interesting to hear what the Federal Government believes it knows that the UK government (which has been doing this for much longer) and the BC government doesn’t know.

What I predict will happen is that more and more provinces will adopt the UK license and increasingly the Feds will look isolated and ridiculous. Barring some explanation, this silliness should end.

At the municipal level, things are more complicated. If you look at the open data portals of Vancouver, Toronto, Edmonton and Ottawa (sometimes referred to as the G4) you’ll notice each has a similar paragraph:

The Cities of Vancouver, Edmonton, Ottawa and Toronto have recently joined forces to collaborate on an “Open Data Framework”. The project aims to enhance current open data initiatives in the areas of data standards and terms of use agreements. Please contact us for further information.

This paragraph has been sitting on these sites for well over a year now (approaching almost two years) but in terms of data standards and common terms of use the work, to date, the G4 has produced nothing tangible for end users. (Full disclosure, I have sat in on some of these meetings.) The G4 cities, which were leaders, are now languishing with a license that actually puts them in the middle, not the front of the pack. They remain ahead of the bulk of Canadian cities that have no open data, but, in terms of license, behind the aforementioned cities of Surrey, Langley, Winnipeg (for its transit data).

These second generation open data cities either had fewer resources or drew the right lessons and have leap-frogged the G4 cities by adopting the PDDL – something they did because it essentially outsourced the management of the license to a competent third party. It maximized the effectiveness of their data, while limiting their costs all while giving them the same level of protection.

The UK and BC versions of the Open Government License could work for the cities, but the PDDL is a better license. Also, it is well managed. If the cities were to adopt the OGL it wouldn’t be the end of the world but it also isn’t necessary. It probably makes more sense for them to simply follow the new leaders in the space and adopt the PDDL as this will less restrictive and easier to adopt.

Thus, speaking personally, the ideal situation in Canada would be that:

  • the Federal and Provincial Governments to adopt the UK/BC Open Government License. I’d love to live in a world where the adopted the PDDL, but my conversations with them lead me to believe this simply is not likely in the near to mid term. I think 99% of software developers out there will agree that the Open Government License is an acceptable substitute. and
  • the municipalities push to adopt the PDDL. Already several municipalities have done this and the world has not ended. The bar has been set.

The worse outcome would be:

  • the G4 municipalities invent some new license. The last thing the world needs is another open data license to confuse users and increase legal costs.
  • the federal government continues along the path of evolving its own license. Its license was born broken and is unnecessary.

Sadly, I see little evidence for optimism at the federal level. However, I’m optimistic about the cities and provinces. The fact that most new open data portals at the municipal level have adopted the PDDL suggests that many in these governments “get it”. I also think the launch of data.gov.bc.ca will spur other provinces to be intelligent about their license choice.

 

 

Province of BC launches Open Data Catalog: What works

As revealed yesterday, the province of British Columbia became the first provincial government in Canada to launch an open data portal.

It’s still early but here are some things that I think they’ve gotten right.

1. License: Getting it Right (part 1)

Before anything else happens, this is probably the single biggest good news story for Canadians interested in the opportunities around open data. If the license is broken, it pretty much doesn’t matter how good the data is, it essential gets put in a legal straightjacket and cannot be used. For BC open data portal this happily, is not the case.

There’s actually two good news stories here.

The first is that the license is good. Obviously my preference would be for everything to be unlicensed and in the public domain as it is in the United States. Short of that however, the most progressive license out there is the UK Government’s Open Government License for Public Sector Information. Happily the BC government has essentially copied it. This means that many of that BC’s open data can be used for commercial purposes, political advocacy, personal use and so forth. In short the restrictions are minimal and, I believe, acceptable. The license addresses the concerns I raised back in March when I said 2011 would be the year of Open Data licenses in Canada.

2. License: The Virtuous Convergence (part 2)

The other great thing is that this is a standardized license. The BC government didn’t invent something new they copied something that already worked. This is music to the ears of many as it means applications and analysis developed in British Columbia can be ported to other jurisdictions that use the same license seamlessly. At the moment, that means all of the United Kingdom. There has been some talk of making the UK Open Government Licenses (OGL) a standard that can be used across the commonwealth – that, in my mind, would be a fantastic outcome.

My hope is that this will also put pressure on other jurisdictions to either improve their licenses or converge them with BC/UK or adopt a better license still. With the exception of the City of Surrey, which uses the PDDL license, the BC government’s license far superior to the licenses being used by other jurisdictions:  – the municipal licenses based on Vancouver’s license (used by Vancouver, Edmonton, Ottawa, Toronto and a few others) and the Federal Government’s open data license (used by Treasury Board and CIDA) are both much more restrictive. Indeed, my real hope is that BC’s move will snap the Federal Government out of their funk, make them realize their own licenses are confusing, problematic and a waste of time, and encourage them to contribute to making the UK’s OGL a new standard for all of Canada. It would be much better than what they have on offer.

3. Tools for non-developers

Another nice thing about the data.gov.bc.ca website is that it provides tools for non-developers, so that they can play with, and learn from, some of the data. This is, of course, standard fare on most newer open data portals – indeed, it’s seems to be the primary focus on Socrata, a company that specializes in creating open government data portals. The goal everywhere is to increase the number of people who can make use of the data.

4. Meaty Data – Including Public Accounts

One of the charges sometimes leveled against open data portals is that they don’t publish data that is important, or that could drive substantive public policy debates. While this is not true of what has happened in the UK and the United States, that charge probably is someone fair in Canada. While I’m still exploring the data available on data.gov.bc.ca one thing seems clear, there is a commitment to getting the more “high-value” data sets out to the public. For example, I’ve already noticed you can download the Consolidated Revenue Fund Detailed Schedules of Payments-FYE10-Suppliers which for the fiscal year 2009-2010 details the payees who received $25,000 or more from the government. I also noticed that the Provincial Obstacles to Fish Passage are available for download – something I hope our friends in the environmental movement will find helpful. There is also an entire section dedicated to data on the provincial educational system, I’ll be exploring that in more detail.

Wanted to publish this for now, definitely keen to hear about others thoughts and comments on the data portal, data sets you find interesting and helpful, or anything else. If you are building an app using this data, or doing an analysis that is made easier because of the data on this site, I’d love to hear from you.

This is a big step for the province. I’m sure I’ll discover some shortcomings as I dive deeper, but this is a solid start and, I hope, an example to other provinces about what is possible.

Using Data to Make Firefox Better: A mini-case study for your organization

I love Mozilla. Any reader of this blog knows it. I believe in its mission, I find the organization totally fascinating and its processes engrossing. So much so I spend a lot of time thinking about it – and hopefully, finding ways to contribute.

I’m also a big believer in data. I believe in the power of evidence-based public policy (hence my passion about the long-form census) and in the ability of data to help organizations develop better products, and people make smarter decisions.

Happily, a few months ago I was able to merge these two passions: analyzing data in an effort to help Mozilla understand how to improve Firefox. It was fun. But more importantly, the process says a lot about the potential for innovation open to organizations that cultivate an engaged user community.

So what happened?

In November 2010, Mozilla launched a visualization competition that asked: How do People Use Firefox? As part of the competition, they shared anonymous data collected from Test Pilot users (people who agreed to share anonymous usage data with Mozilla). Working with my friend (and quant genius) Diederik Van Liere, we analyzed the impact of add-on memory consumption on browser performance to find out which add-ons use the most memory and thus are most likely slowing down the browser (and frustrating users!). (You can read about our submission here).

But doing the analysis wasn’t enough. We wanted Mozilla engineers to know we thought that users should be shown the results – so they could make more informed choices about which add-ons they download. Our hope was to put pressure on add-on developers to make sure they weren’t ruining Firefox for their users. To do that we visualized the data by making a mock up of their website – with our data inserted.

FF-memory-visualizations2.001

For our efforts, we won an honourable mention. But winning a prize is far, far less cool than actually changing behaviour or encouraging an actual change. So last week, during a trip to Mozilla’s offices in Mountain View, I was thrilled when one of the engineers pointed out that the add-on site now has a page where they list add-ons that most slow down Firefox’s start up time.

Slow-Performing-Add-ons-Add-ons-for-Firefox_1310962746129

(Sidebar: Anyone else find it ironic that “FastestFox: Browse Faster” is #5?)

This is awesome! Better still, in April, Mozilla launched an add-on performance improvement initiative to help reduce the negative impact add-ons can have on Firefox. I have no idea if our submission to the visualization competition helped kick-start this project; I’m sure there were many smart people at Mozilla already thinking about this. Maybe it was already underway? But I like to believe our ideas helped push their thinking – or, at least, validated some of their ideas. And of course, I hope it continues to. I still believe that the above-cited data shouldn’t be hidden on a webpage well off the beaten path, but should be located right next to every add-on. That’s the best way to create the right feedback loops, and is in line with Mozilla’s manifesto – empowering users.

Some lessons (for Mozilla, companies, non-profits and governments)

First lesson. Innovation comes from everywhere. So why aren’t you tapping into it? Diederik and I are all too happy to dedicate some cycles to thinking about ways to make Firefox better. If you run an organization that has a community of interested people larger than your employee base (I’m looking at you, governments), why aren’t you finding targeted ways to engage them, not in endless brainstorming exercises, but in innovation challenges?

Second, get strategic about using data. A lot of people (including myself) talk about open data. Open data is good. But it can’t hurt to be strategic about it as well. I tried to argue for this in the government and healthcare space with this blog post. Data-driven decisions can be made in lots of places; what you need to ask yourself is: What data are you collecting about your product and processes? What, of that data, could you share, to empower your employees, users, suppliers, customers, whoever, to make better decisions? My sense is that the companies (and governments) of the future are going to be those that react both quickly and intelligently to emerging challenges and opportunities. One key to being competitive will be to have better data to inform decisions. (Again, this is the same reason why, over the next two decades, you can expect my country to start making worse and worse decisions about social policy and the economy – they simply won’t know what is going on).

Third, if you are going to share, get a data portal. In fact, Mozilla needs an open data portal (there is a blog post that is coming). Mozilla has always relied on volunteer contributors to help write Firefox and submit patches to bugs. The same is true for analyzing its products and processes. An open data portal would enable more people to help find ways to keep Firefox competitive. Of course, this is also true for governments and non-profits (to help find efficiencies and new services) and for companies.

Finally, reward good behaviour. If contributors submit something you end up using… let them know! Maybe the idea Diederik and I submitted never informed anything the add-on group was doing; maybe it did. But if it did… why not let us know? We are so pumped about the work they are doing, we’d love to hear more about it. Finding out by accident seems like a lost opportunity to engage interested stakeholders. Moreover, back at the time, Diederik was thinking about his next steps – now he works for the Wikimedia Foundation. But it made me realize how an innovation challenge could be a great way to spot talent.