Category Archives: open data

Smarter Ways to Have School Boards Update Parents

Earlier this month the Vancouver School Board (VSB) released an iPhone app that – helpfully – will use push notifications to inform parents about school holidays, parent interviews, and scheduling disruptions such as snow days. The app is okay, it’s a little clunky to use, and a lot of the data – such as professional days – while helpful in an app, would be even more helpful as an iCal feed parents could subscribe to in their calendars.

That said, the VSB deserves credit for having the vision of developing an app. Positively, the VSB app team hopes to add new features, such as letting parents know about after school activities like concerts, plays and sporting events.

This is a great innovation and without a doubt, other school boards will want apps of their own. The problem is, this is very likely to lead to an enormous amount of waste and duplication. The last thing citizens want is for every school board to be spending $15-50K developing iPhone apps.

Which leads to a broader opportunity for the Minister of Education.

Were I the Education Minister, I’d have my technology team recreate the specs of the VSB app and propose an RFP for it but under an open source license and using phonegap so it would work on both iPhone and Android. In addition, I’d ensure it could offer reminders – like we do at recollect.net – so that people could get email or text messages without a smart phone at all.

I would then propose the ministry cover %60 percent of the development and yearly upkeep costs. The other 40% would be covered by the school boards interested in joining the project. Thus, assuming the app had a development cost of $40K and a yearly upkeep of $5K, if only one school board signed up it would have to pay $16K for the app (a pretty good deal) and $2K a year in upkeep. But if 5 school districts signed up, each would only pay $3.2K in development costs and $400 dollars a year in upkeep costs. Better still, the more that sign up, the cheaper it gets for each of them. I’d also propose a governance model in which those who contribute money for develop would have the right to elect a sub-group to oversee the feature roadmap.

Since the code would be open source other provinces, school districts and private schools could also use the app (although not participate in the development roadmap), and any improvements they made to the code base would be shared back to the benefit of BC school districts.

Of course by signing up to the app project school boards would be committing to ensure their schools shared up to date notifications about the relevant information – probably a best practice that they should be doing anyways. This process work is where the real work lies. However, a simple webform (included in the price) would cover much of the technical side of that problem. Better still the Ministry of Education could offer its infrastructure for hosting and managing any data the school boards wish to collect and share, further reducing costs and, equally important, ensuring the data was standardized across the participating school boards.

So why should the Ministry of Education care?

First, creating new ways to update parents about important events – like when report cards are issued so that parents know to ask for them – helps improve education outcomes. That should probably reason enough, but there are other reasons as well.

Second, it would allow the ministry, and the school boards, to collect some new data: professional day dates, average number of snow days, frequency of emergency disruptions, number of parents in a district interested in these types of notifications. Over time, this data could reveal important information about educational outcomes and be helpful.

But the real benefit would be in both cost savings and in enabling less well resourced school districts to benefit from technological innovation wealthier school districts will likely pursue if left to their own devices. Given there are 59 english school districts in BC, if even half of them spent 30K developing their own iPhone apps, then almost $1M dollars would be collectively spent on software development. By spending $24K, the ministry ensures that this $1M dollars instead gets spent on teachers, resources and schools. Equally important, less tech savvy or well equipped school districts would be able to participate and benefit.

Of course, if the City of Vancouver school district was smart, they’d open source their app, approach the Ministry of Education and offer it as the basis of such a venture. Doing that wouldn’t just make them head of the class, it’d be helping everyone get smarter, faster.

Open Data and New Public Management

This morning I got an email thread pointing to an article by Justin Longo on #Opendata: Digital-Era Governance Thoroughbred or New Public Management Trojan Horse? I’m still digesting it all but wanted to share some initial thoughts.

The article begins with discussion about the benefits of open data but its real goal is to argue how open data is a pawn in a game to revive the New Public Management Reform Agenda:

My hypothesis, based on a small but growing number of examples highlighting political support for open data, is that some advocates—particularly politicians, but not exclusively—are motivated by beliefs (both explicit and unconscious) forged in the New Public Management (NPM) reform agenda.

From this perspective, support for more open data aims at building coalitions of citizen consumers who are encouraged to use open data to expose public service decisions, highlight perceived performance issues, increase competition within the public sector, and strengthen the hand of the citizen as customer.

What I found disappointing is the article’s one dimensional approach to the problem: open data may support a theory/approach to public management disliked by the author, consequently (inferring from the article’s title and tone) it must be bad. This is akin to saying any technology that could be used to advance an approach I don’t support, must be opposed.

In addition, I’d say that the idea of exposing public service decisions, highlighting perceived performance issues, increasing competition within the public sector, and strengthening the hand of the citizen as customer are goals I don’t necessarily oppose, certainly not categorically. Moreover, I would hope such goals are not exclusively the domain of NPM. Do we want a society where government’s performance issues are not highlighted? Or where public service decisions are kept secret?

These are not binary choices. You can support the outcomes highlighted above and simultaneously believe in other approaches to public sector management and/or be agnostic about the size of government. Could open data be used to advance NPM? Possibly (although I’m doubtful). But it definitely can also be used to accomplish a lot of other good and potentially advance other approaches as well. Let’s not conflate a small subset of ways open data can be used or a small subset of its supporters with the entire project and then to lump them all into a single school of thought around public service management.

Moreover, I’ve always argued that the biggest users and benefactors of open data would be government – and in particular the public service. While open data could be used to build “coalitions of citizen consumers who are encouraged to use open data to expose public service decisions” it will also be used by public servants to better understand citizens needs, be more responsive and allocate resources more effectively. Moreover, those “citizen consumers” will probably be effective in helping them achieve this task. The alternative is to have better shared data internally (which will eventually happen), an outcome that might allow the government to achieve these efficiencies but will also radically increase the asymmetry in the relationship between the government and its citizens and worse, between the elites that do have privileged access to this data, and the citizenry (See Taggart below).

So ignoring tangible benefits because of a potential fear feels very problematic. It all takes me back to Kevin Kelly and What Technology Wants… this is an attempt to prevent an incredibly powerful technology because of a threat it poses to how the public sector works. Of course, it presumes that a) you can prevent the technology and b) that not acting will allow the status quo or some other preferred approach to prevail. Again, there are outcomes much, much worse the NPM that are possible (again, I don’t believe that open data leads directly to NPM) and I would argue, indeed likely, given evolving public expectations, demographics, and fiscal constraints.

In this regard, the article sets of up a false choice. Open data is going to reshape all theories of public management. To claim it supports or biases in favour of one outcome is, I think beyond premature. But more importantly, it is to miss the trees for the forest and the much bigger fish we need to fry. The always thoughtful Chris Taggart summed much of this up beautifully in an email thread:

I think the title — making it out to be a choice between a thoroughbred or Trojan Horse — says it all. It’s a false dichotomy, as neither of those are what the open data advocates are suggesting it is, nor do most of us believe that open data is solution to all our problems (far from it — see some of my presentations[1]).

It also seems to offer a choice between New Public Management (which I think Emer Coleman does a fairly good job of illuminating in her paper[2]) and the brave new world of Digital Era Governance, which is also to misunderstand the changes being brought about in society, with or without open government data.
The point is not that open data is the answer to our problem but society’s chance to stay in the game (and even then, the odds are arguably against it). We already have ever increasing numbers of huge closed databases, many made up of largely government data, available to small number of people and companies.
This leads to an asymmetry of power and friction that completely undermines democracy; open data is not a sufficiency to counteract that, but I think it is a requirement.

It’s possible I’ve misunderstood Longo’s article and he is just across the straights at the University of Victoria, so hopefully we can grab a beer and talk it through. But my sense is this article is much more about a political battle between New Public Management and Digital Era Governance in which open data is being used as a pawn. As an advocate, I’m not wholly comfortable with that, as I think it risks misrepresenting it.

DataBC Hackathon this Saturday – inviting the public.

This Saturday, August 27, 2011 the Province of British Columbia is partnering with the Mozilla Foundation and OpenDataBC to host a open data hackathon.

The hackathon will be taking place at Mozilla Labs Vancouver. Their address is:
163 West Hastings Street, suite-200
Vancouver, BC V6B 1H5
(in the very beautiful Flack Building)

So three things:

First, as many of you are probably aware the province recently launched a data portal and so there is a lot of new data to play with. In addition the City of Vancouver continues to update its open data portal, so there is new data there as well. Will be interesting to see what people want to work on.

Second, please do not fret if you are not a developer. As we pointed out last year in the lead up to the open data day international hackathon, there are lots of ways non-developers can contribute – the easiest being… having ideas! So please come on by. I’m definitely going to be there (and I’m no coder) and look forward to seeing familiar and new faces.

Finally, and more to the point, if you are a company, non-profit, or citizen who is has a mashup, analysis, research paper, product, app or pretty much anything else you’d like to create but need data from the province, definitely swing by. I’m sure the staff on hand will be very keen to hear about what you want to do and see if they can make the data available in the near future.

The organizers are hoping that people will RVSP through their contact form (use the ‘other’ subject line) but if you decide last minute to come join, don’t be shy.

Hope to see you there!

 

Open Source Data Journalism – Happening now at Buzz Data

(there is a section on this topic focused on governments below)

A hint of how social data could change journalism

Anyone who’s heard me speak in the last 6 months knows I’m excited about BuzzData. This week, while still in limited access beta, the site is showing hints its potential – and it still has only a few hundred users.

First, what is BuzzData? It’s a website that allows data to be easily uploaded and shared among any number of users. (For hackers – it’s essentially github for data, but more social). It makes it easy for people to copy data sets, tinker with them, share the results back with the original master, mash them up with other data sets, all while engaging with those who care about that data set.

So, what happened? Why is any of this interesting? And what does it have to do with journalism?

Exactly a month ago Svetlana Kovalyova of Reuters had her article – Food prices to remain high, UN warns – re-published in the Globe and Mail.  The piece essentially outlined that food commodities were getting cheaper because of local conditions in a number of regions.

Someone at the Globe and Mail decided to go a step further and upload the data – the annual food price indices from 1990-present – onto the BuzzData site, presumably so they could play around with it. This is nothing complicated, it’s a pretty basic chart. Nonetheless a dozen or so users started “following” the dataset and about 11 days ago, one of them, David Joerg, asked:

The article focused on short-term price movements, but what really blew me away is: 1) how the price of all these agricultural commodities has doubled since 2003 and 2) how sugar has more than TRIPLED since 2003. I have to ask, can anyone explain WHY these prices have gone up so much faster than other prices? Is it all about the price of oil?

He then did a simple visualization of the data.

FoodPrices

In response someone from the Globe and Mail entitled Mason answered:

Hi David… did you create your viz based on the data I posted? I can’t answer your question but clearly your visualization brought it to the forefront. Thanks!

But of course, in a process that mirrors what often happens in the open source community, another “follower” of the data shows up and refines the work of the original commentator. In this case, an Alexander Smith notes:

I added some oil price data to this visualization. As you can see the lines for everything except sugar seem to move more or less with the oil. It would be interesting to do a little regression on this and see how close the actual correlation is.

The first thing to note is that Smith has added data, “mashing in” Oil Price per barrel. So now the data set has been made richer. In addition his graph quite nice as it makes the correlation more visible than the graph by Joerg which only referenced the Oil Price Index. It also becomes apparent, looking at this chart, how much of an outlier sugar really is.

oilandfood

Perhaps some regression is required, but Smith’s graph is pretty compelling. What’s more interesting is not once is the price of oil mentioned in the article as a driver of food commodity prices. So maybe it’s not relevant. But maybe it deserves more investigation – and a significantly better piece, one that would provide better information to the public – could be written in the future. In either case, this discussion, conducted by non-experts simply looking at the data, helped surface some interesting leads.

And therein lies the power of social data.

With even only a handful of users a deeper, better analysis of the story has taken place. Why? Because people are able to access the data and look at it directly. If you’re a follower of Julian Assange of wikileaks, you might call this scientific journalism, maybe it is, maybe it isn’t, but it certainly is a much more transparent way for doing analysis and a potential audience builder – imagine if 100s or 1000s of readers were engaged in the data underlying a story. What would that do to the story? What would that do to journalism? With BuzzData it also becomes less difficult to imagine a data journalists who spends a significant amount of their time in BuzzData working with a community of engaged pro-ams trying to find hidden meaning in data they amass.

Obviously, this back and forth isn’t game changing. No smoking gun has been found. But I think it hints at a larger potential, one that it would be very interesting to see unlocked.

More than Journalism – I’m looking at you government

Of course, it isn’t just media companies that should be paying attention. For years I argued that governments – and especially politicians – interested in open data have an unhealthy appetite for applications. They like the idea of sexy apps on smart phones enabling citizens to do cool things. To be clear, I think apps are cool too. I hope in cities and jurisdictions with open data we see more of them.

But open data isn’t just about apps. It’s about the analysis.

Imagine a city’s budget up on Buzzdata. Imagine, the flow rates of the water or sewage system. Or the inventory of trees. Think of how a community of interested and engaged “followers” could supplement that data, analyze it, visualize it. Maybe they would be able to explain it to others better, to find savings or potential problems, develop new forms of risk assessment.

It would certainly make for an interesting discussion. If 100 or even just 5 new analyses were to emerge, maybe none of them would be helpful, or would provide any insights. But I have my doubts. I suspect it would enrich the public debate.

It could be that the analysis would become as sexy as the apps. And that’s an outcome that would warm this policy wonk’s soul.

The State of Open Data Licenses in Canada and where to go from here

(for readers less interested in Open Data – I promise something different tomorrow)

In February I wrote how 2011 would be the year of the license for Canada’s open data community. This has indeed been the case. For public servants and politicians overseeing the various open data projects happening in Canada and around the world, here is an outline of where we are, and what I hope will happen next. For citizens I hope this will serve as a primer and help explain why this matters. For non-Canadians, I hope this can help you strategize how to deal with the different levels of government in your own country.

This is important stuff, and will be important to ensure success in the next open data challenge: aligning different jurisdictions around common standards.

Why Licenses Matter

Licenses matter because they determine how you are able to use government data – a public asset. As I outlined in the three laws of open data, data is only open if it can be found, be played with and be shared. The license deals with the last of these. If you are able to take government data, find some flaw or use it to improve a service, it means nothing if you are not able to share what you create with others. The more freedom you have in doing this, the better.

What we want from the license regime (and for your government)

There are a couple of interests one is trying to balance in creating a license regime. You want:

  • Open: there should maximum freedom for reuse (see above, and this blog post)
  • Secure: it offers governments appropriate protections for privacy and security
  • Simplicity: to keep down legal costs low, and make it easier for everyone to understand
  • Standardized: so my work is accessible across jurisdictions
  • Stable: so I know that the government won’t change the rules on me

At the moment, two licenses in Canada meet these tests. The Public Domain Dedication and License (PDDL) used by Surrey, Langley, Winnipeg (for its transit data) and the BC government open data portal license (which is a copy of the UK Open Government license).

Presently a bunch of licenses do not. This includes the Government of Canada Open Data Licence Agreement for Unrestricted Use of Canada’s Data (couldn’t they choose a better name? But for a real critique of why, read this blog post). It also includes the variants of the license created by Vancouver and now used by Toronto, Ottawa and Edmonton (among others). Full disclosure, I was peripherally involved in the creation of this license – it was necessary at the time.

Both these licenses are not standardized, have restrictions in them not found in the UK/BC Open Government License and the PDDL and are anything but simple. Nor are they stable. At any time the government can revoke them. In other words, many developers and companies interested in open data dislike them immensely.

Where do we go from here?

At the moment there are a range of licenses available in Canada – this undermines the ability of developers to create software that uses open data across multiple jurisdictions.

First, the launch of BC’s open data portal and its use of the UK Open Government License has reset the debate in this country. The Federal government, which has an awkward, onerous and unloved license should stop trying to create a new license that simply adds unnecessary complexity and creates confusion for software developers. (I detail the voluminous problems with the Federal license here.)

Instead the Feds should adopt the UK Open Government Licence and push for it to be a standard, both for the provinces and federal government agencies, as well as for other common wealth countries. Its refusal to adopt the UK license is deeply puzzling. It has offered no explanation about why it can’t, indeed, it would be interesting to hear what the Federal Government believes it knows that the UK government (which has been doing this for much longer) and the BC government doesn’t know.

What I predict will happen is that more and more provinces will adopt the UK license and increasingly the Feds will look isolated and ridiculous. Barring some explanation, this silliness should end.

At the municipal level, things are more complicated. If you look at the open data portals of Vancouver, Toronto, Edmonton and Ottawa (sometimes referred to as the G4) you’ll notice each has a similar paragraph:

The Cities of Vancouver, Edmonton, Ottawa and Toronto have recently joined forces to collaborate on an “Open Data Framework”. The project aims to enhance current open data initiatives in the areas of data standards and terms of use agreements. Please contact us for further information.

This paragraph has been sitting on these sites for well over a year now (approaching almost two years) but in terms of data standards and common terms of use the work, to date, the G4 has produced nothing tangible for end users. (Full disclosure, I have sat in on some of these meetings.) The G4 cities, which were leaders, are now languishing with a license that actually puts them in the middle, not the front of the pack. They remain ahead of the bulk of Canadian cities that have no open data, but, in terms of license, behind the aforementioned cities of Surrey, Langley, Winnipeg (for its transit data).

These second generation open data cities either had fewer resources or drew the right lessons and have leap-frogged the G4 cities by adopting the PDDL – something they did because it essentially outsourced the management of the license to a competent third party. It maximized the effectiveness of their data, while limiting their costs all while giving them the same level of protection.

The UK and BC versions of the Open Government License could work for the cities, but the PDDL is a better license. Also, it is well managed. If the cities were to adopt the OGL it wouldn’t be the end of the world but it also isn’t necessary. It probably makes more sense for them to simply follow the new leaders in the space and adopt the PDDL as this will less restrictive and easier to adopt.

Thus, speaking personally, the ideal situation in Canada would be that:

  • the Federal and Provincial Governments to adopt the UK/BC Open Government License. I’d love to live in a world where the adopted the PDDL, but my conversations with them lead me to believe this simply is not likely in the near to mid term. I think 99% of software developers out there will agree that the Open Government License is an acceptable substitute. and
  • the municipalities push to adopt the PDDL. Already several municipalities have done this and the world has not ended. The bar has been set.

The worse outcome would be:

  • the G4 municipalities invent some new license. The last thing the world needs is another open data license to confuse users and increase legal costs.
  • the federal government continues along the path of evolving its own license. Its license was born broken and is unnecessary.

Sadly, I see little evidence for optimism at the federal level. However, I’m optimistic about the cities and provinces. The fact that most new open data portals at the municipal level have adopted the PDDL suggests that many in these governments “get it”. I also think the launch of data.gov.bc.ca will spur other provinces to be intelligent about their license choice.

 

 

Province of BC launches Open Data Catalog: What works

As revealed yesterday, the province of British Columbia became the first provincial government in Canada to launch an open data portal.

It’s still early but here are some things that I think they’ve gotten right.

1. License: Getting it Right (part 1)

Before anything else happens, this is probably the single biggest good news story for Canadians interested in the opportunities around open data. If the license is broken, it pretty much doesn’t matter how good the data is, it essential gets put in a legal straightjacket and cannot be used. For BC open data portal this happily, is not the case.

There’s actually two good news stories here.

The first is that the license is good. Obviously my preference would be for everything to be unlicensed and in the public domain as it is in the United States. Short of that however, the most progressive license out there is the UK Government’s Open Government License for Public Sector Information. Happily the BC government has essentially copied it. This means that many of that BC’s open data can be used for commercial purposes, political advocacy, personal use and so forth. In short the restrictions are minimal and, I believe, acceptable. The license addresses the concerns I raised back in March when I said 2011 would be the year of Open Data licenses in Canada.

2. License: The Virtuous Convergence (part 2)

The other great thing is that this is a standardized license. The BC government didn’t invent something new they copied something that already worked. This is music to the ears of many as it means applications and analysis developed in British Columbia can be ported to other jurisdictions that use the same license seamlessly. At the moment, that means all of the United Kingdom. There has been some talk of making the UK Open Government Licenses (OGL) a standard that can be used across the commonwealth – that, in my mind, would be a fantastic outcome.

My hope is that this will also put pressure on other jurisdictions to either improve their licenses or converge them with BC/UK or adopt a better license still. With the exception of the City of Surrey, which uses the PDDL license, the BC government’s license far superior to the licenses being used by other jurisdictions:  – the municipal licenses based on Vancouver’s license (used by Vancouver, Edmonton, Ottawa, Toronto and a few others) and the Federal Government’s open data license (used by Treasury Board and CIDA) are both much more restrictive. Indeed, my real hope is that BC’s move will snap the Federal Government out of their funk, make them realize their own licenses are confusing, problematic and a waste of time, and encourage them to contribute to making the UK’s OGL a new standard for all of Canada. It would be much better than what they have on offer.

3. Tools for non-developers

Another nice thing about the data.gov.bc.ca website is that it provides tools for non-developers, so that they can play with, and learn from, some of the data. This is, of course, standard fare on most newer open data portals – indeed, it’s seems to be the primary focus on Socrata, a company that specializes in creating open government data portals. The goal everywhere is to increase the number of people who can make use of the data.

4. Meaty Data – Including Public Accounts

One of the charges sometimes leveled against open data portals is that they don’t publish data that is important, or that could drive substantive public policy debates. While this is not true of what has happened in the UK and the United States, that charge probably is someone fair in Canada. While I’m still exploring the data available on data.gov.bc.ca one thing seems clear, there is a commitment to getting the more “high-value” data sets out to the public. For example, I’ve already noticed you can download the Consolidated Revenue Fund Detailed Schedules of Payments-FYE10-Suppliers which for the fiscal year 2009-2010 details the payees who received $25,000 or more from the government. I also noticed that the Provincial Obstacles to Fish Passage are available for download – something I hope our friends in the environmental movement will find helpful. There is also an entire section dedicated to data on the provincial educational system, I’ll be exploring that in more detail.

Wanted to publish this for now, definitely keen to hear about others thoughts and comments on the data portal, data sets you find interesting and helpful, or anything else. If you are building an app using this data, or doing an analysis that is made easier because of the data on this site, I’d love to hear from you.

This is a big step for the province. I’m sure I’ll discover some shortcomings as I dive deeper, but this is a solid start and, I hope, an example to other provinces about what is possible.

Using Data to Make Firefox Better: A mini-case study for your organization

I love Mozilla. Any reader of this blog knows it. I believe in its mission, I find the organization totally fascinating and its processes engrossing. So much so I spend a lot of time thinking about it – and hopefully, finding ways to contribute.

I’m also a big believer in data. I believe in the power of evidence-based public policy (hence my passion about the long-form census) and in the ability of data to help organizations develop better products, and people make smarter decisions.

Happily, a few months ago I was able to merge these two passions: analyzing data in an effort to help Mozilla understand how to improve Firefox. It was fun. But more importantly, the process says a lot about the potential for innovation open to organizations that cultivate an engaged user community.

So what happened?

In November 2010, Mozilla launched a visualization competition that asked: How do People Use Firefox? As part of the competition, they shared anonymous data collected from Test Pilot users (people who agreed to share anonymous usage data with Mozilla). Working with my friend (and quant genius) Diederik Van Liere, we analyzed the impact of add-on memory consumption on browser performance to find out which add-ons use the most memory and thus are most likely slowing down the browser (and frustrating users!). (You can read about our submission here).

But doing the analysis wasn’t enough. We wanted Mozilla engineers to know we thought that users should be shown the results – so they could make more informed choices about which add-ons they download. Our hope was to put pressure on add-on developers to make sure they weren’t ruining Firefox for their users. To do that we visualized the data by making a mock up of their website – with our data inserted.

FF-memory-visualizations2.001

For our efforts, we won an honourable mention. But winning a prize is far, far less cool than actually changing behaviour or encouraging an actual change. So last week, during a trip to Mozilla’s offices in Mountain View, I was thrilled when one of the engineers pointed out that the add-on site now has a page where they list add-ons that most slow down Firefox’s start up time.

Slow-Performing-Add-ons-Add-ons-for-Firefox_1310962746129

(Sidebar: Anyone else find it ironic that “FastestFox: Browse Faster” is #5?)

This is awesome! Better still, in April, Mozilla launched an add-on performance improvement initiative to help reduce the negative impact add-ons can have on Firefox. I have no idea if our submission to the visualization competition helped kick-start this project; I’m sure there were many smart people at Mozilla already thinking about this. Maybe it was already underway? But I like to believe our ideas helped push their thinking – or, at least, validated some of their ideas. And of course, I hope it continues to. I still believe that the above-cited data shouldn’t be hidden on a webpage well off the beaten path, but should be located right next to every add-on. That’s the best way to create the right feedback loops, and is in line with Mozilla’s manifesto – empowering users.

Some lessons (for Mozilla, companies, non-profits and governments)

First lesson. Innovation comes from everywhere. So why aren’t you tapping into it? Diederik and I are all too happy to dedicate some cycles to thinking about ways to make Firefox better. If you run an organization that has a community of interested people larger than your employee base (I’m looking at you, governments), why aren’t you finding targeted ways to engage them, not in endless brainstorming exercises, but in innovation challenges?

Second, get strategic about using data. A lot of people (including myself) talk about open data. Open data is good. But it can’t hurt to be strategic about it as well. I tried to argue for this in the government and healthcare space with this blog post. Data-driven decisions can be made in lots of places; what you need to ask yourself is: What data are you collecting about your product and processes? What, of that data, could you share, to empower your employees, users, suppliers, customers, whoever, to make better decisions? My sense is that the companies (and governments) of the future are going to be those that react both quickly and intelligently to emerging challenges and opportunities. One key to being competitive will be to have better data to inform decisions. (Again, this is the same reason why, over the next two decades, you can expect my country to start making worse and worse decisions about social policy and the economy – they simply won’t know what is going on).

Third, if you are going to share, get a data portal. In fact, Mozilla needs an open data portal (there is a blog post that is coming). Mozilla has always relied on volunteer contributors to help write Firefox and submit patches to bugs. The same is true for analyzing its products and processes. An open data portal would enable more people to help find ways to keep Firefox competitive. Of course, this is also true for governments and non-profits (to help find efficiencies and new services) and for companies.

Finally, reward good behaviour. If contributors submit something you end up using… let them know! Maybe the idea Diederik and I submitted never informed anything the add-on group was doing; maybe it did. But if it did… why not let us know? We are so pumped about the work they are doing, we’d love to hear more about it. Finding out by accident seems like a lost opportunity to engage interested stakeholders. Moreover, back at the time, Diederik was thinking about his next steps – now he works for the Wikimedia Foundation. But it made me realize how an innovation challenge could be a great way to spot talent.

It's the icing, not the cake: key lesson on open data for governments

At the 2010 GTEC conference I did a panel with David Strigel, the Program Manager of the Citywide Data Warehouse (CityDW) at the District of Columbia Government. During the introductory remarks David recounted the history of Washington DC’s journey to open data.

Interestingly, that journey began not with open data, but with an internal problem. Back around 2003 the city had a hypothesis that towing away abandoned cars would reduce crime rates in the immediate vicinity, thereby saving more money in the long term than the cost of towing. In order to access the program’s effectiveness city staff needed to “mash-up” longitudinal crime data against service request data – specifically, requests to remove abandoned cars. Alas, the data sets were managed by different departments, so this was tricky task. As a result the city’s IT department negotiated bilateral agreements with both departments to host their datasets in a single location. Thus the DC Data Warehouse was born.

Happily, the data demonstrated the program was cost effective. Building on this success the IT department began negotiating more bilateral agreements with different departments to host their data centrally. In return for giving up stewardship of the data the departments retained governance rights but reduced their costs and the IT group provided them with additional, more advanced, analytics. Over time the city’s data warehouse became vast. As a result, when DC decided to open up its data it was, relatively speaking, easy to do. The data was centrally located, was already being shared and used as a platform internally. Extending this platform externally (while not trivial) was a natural step.

In short, the deep problem that needed to solved wasn’t open data. Its was an information management. Getting the information management and governance policies right was essential for DC to move quickly. Moreover, this problem strikes at the heart of what it means to be government. Knowing what data you have, where it is, and under a governance structure that allows it to be shared internally (as well as externally) is a problem every government is going to face if it wants to be efficient, relevant and innovative in the 21st century. In other words, information management is the cake. Open data – which I believe is essential – is however the sweet icing you smother on top of that dense cake you’ve put in place.

Okay, with that said two points that flow from this.

First: Sometime, governments that “do” open data start off by focusing on the icing. The emphasis in on getting data out there, and then after the fact, figuring out  governance model that will make sense. This is a viable strategy, but it does have real risks. When sharing data isn’t at the core function but rather a feature tacked on at the end, the policy and technical infrastructure may be pretty creaky. In addition, developers may not want to innovate on top of your data platform because they may (rightly) question the level of commitment. One reason DC’s data catalog works is because it has internal users. This gives the data stability and a sense of permanence. On the upside, the icing is politically sexier, so it may help marshal resources to help drive a broader rethink of data governance. Either way, at some point, you’ve got to tackle the cake, otherwise, things are going to get messy. Remember it took DC 7 years to develop its cake before it put icing on it. But that was making it from scratch. Today thanks to new services (armies of consultants on this), tools (eg. Socrata) and models (e.g. like Washington, DC) you can make that cake following a recipe and even use cake mix. As David Strigel pointed out, today, he could do it in a fraction of the time.

Second: More darkly, one lesson to draw from DC is that the capacity of a government to do open data may be a pretty good proxy for their ability to share information and coordinate across different departments. If your government can’t do open data in a relatively quick time period, it may mean they simply don’t have the infrastructure in place to share data internally all that effectively either. In a world where government productivity needs to rise in order to deal with budget deficits, that could be worrying.

Lots of Open Data Action in Canada

A lot of movement on the open data (and not so open data) front in Canada.

Canadian International Development Agency (CIDA) Open Data Portal Launched

IATI-imagesSome readers may remember that last week I wrote a post about the imminent launch of CIDA’s open data portal. The site is now live and has a healthy amount of data on it. It is a solid start to what I hope will become a robust site. I’m a big believer – and supporter of the excellent advocacy efforts of the good people at Engineers Without Borders – that the open data portal would be greatly enhanced if CIDA started publishing its data in compliance with the emerging international standard of the International Aid Transparency Initiative as these 20 leading countries and organizations have.

If anyone creates anything using this data, I’d love to see it. One simple start might be to try using the Open Knowledge Foundation’s open source Where Does my Money Go code, to visualize some of the spending data. I’d be happy to chat with anyone interested in doing this, you can also check out the email group to find some people experienced in playing with the code base.

Improved License on the CIDA open data portal and data.gc.ca

One thing I’ve noticed with the launch of the CIDA open data portal was how the license was remarkably better than the license at data.gc.ca – which struck me as odd, since I know the feds like to be consistent about these types of things. Turns out that the data.gc.ca license has been updated as well and the two are identical. This is good news as some of the issues that were broken with the previous license have been fixed. But not all. The best license out there remains the license at data.gov (that’s a trick question, because data.gov has no license, it is all public domain! Tricky eh…? Nice!) but if you are going to have a license, the UK Open Government License used by at data.gov.uk is more elegant, freer and satisfies a number of the concerns I cite above and have heard people raise.

So this new data.gc.ca license is a step in the right direction, but still behind the open gov leaders (teaching lawyers new tricks sadly takes a long time, especially in government).

Great site, but not so open data: WellBeing Toronto

Interestingly, the City of Toronto has launched a fabulous new website called Well Being Toronto. It is definitely worth checking out. The main problem of course is that while it is interesting to look at, the underlying data is, sadly, not open. You can’t play with the data, such as mash it up with your own (or another jurisdiction’s) data. This is disappointing as I believe a number of non-profits in Toronto would likely find the underlying data quite helpful/important. I have, however, been told that the underlying data will be made open. It is something I hope to check in on again in a few months as I fear that it may never get prioritized, so it may be up to Torontonians to whold the Mayor and council’s feet to the fire to ensure it gets done.

Parliamentary Budget Office (PBO) launches (non-open) data website

It seems the PBO is also getting in on the data action with the launch of a beta site that allows you to “see” budgets from the last few years. I know that the Parliamentary Budget Office has been starved of resources, so they deserve to be congratulated for taking this first, important step. Also interesting is that the data has no license on the website, which could make it the most liberally licensed open data portal in the country. The site does have big downsides. First, the data can only be “looked” at, there is no obvious (simple) way to download it and start playing with it. More oddly still the PBO requires that users register with their email address to view the data. This seems beyond odd and actually, down right creepy, to me. First, parliament’s budget should be free and open and one should not need to hand over an email address to access it. Second, the email addresses collected appear to serve no purpose (unless the PBO intends to start spamming us), other than to tempt bad people to hack their site so they can steal a list of email addresses.

Mind. Prepare to be blown away. Big Data, Wikipedia and Government.

Okay, super psyched about this. Back at the Strata Conference in Feb (in San Diego) I introduced my long time uber-quant friend and now Wikimedia Foundation data scientist Diederik Van Liere to fellow Gov2.0 thinker Nicholas Gruen (Chairman) and Anthony Goldbloom (Founder and CEO) of an awesome new company called Kaggle.

As usually happens when awesome people get together… awesomeness ensued. Mind. Be prepared to be blown.

So first, what is Kaggle? They’re a company that helps companies and organizations post their data and run competitions with the goal of having it scrutinized by the world’s best data scientists towards some specific goal. Perhaps the most powerful example of a Kaggle competition to date was their HIV prediction competition, in which they asked contestants to use a data set to find markers in the HIV sequence which predict a change in the severity of the infection (as measured by viral load and CD4 counts).

Until Kaggle showed up the best science to date had a prediction rate of 70% – a feat that had taken years to achieve. In 90 days contributors to the contest were able to achieve a prediction rate of 77%. A 10% improvement. I’m told that achieving an similar increment had previously taken something close to a decade. (Data geeks can read how the winner did it here and here.)

Diederik and Anthony have created a similar competition, but this time using Wikipedia participation data. As the competition page outlines:

This competition challenges data-mining experts to build a predictive model that predicts the number of edits an editor will make in the five months after the end date of the training dataset. The dataset is randomly sampled from the English Wikipedia dataset from the period January 2001 – August 2010.

The objective of this competition is to quantitively understand what factors determine editing behavior. We hope to be able to answer questions, using these predictive models, why people stop editing or increase their pace of editing.

This is of course, a subject matter that is dear to me as I’m hoping that we can do similar analysis in open source communities – something Diederik and I have tried to theorize with Wikipedia and actually do Bugzilla data.

There is a grand prize of $5000 (along with a few others) and, amazingly, already 15 participants and 7 submissions.

Finally, I hope public policy geeks, government officials and politicians are paying attention. There is power in data and an opportunity to use it to find efficiencies and opportunities. Most governments probably don’t even know how to approach an organization like Kaggle or to run a competition like this, despite (or because?) it is so fast, efficient and effective.

It shouldn’t be this way.

If you are in government (or any org), check out Kaggle. Watch. Learn. There is huge opportunity here.

12:10pm PST – UPDATE: More Michael Bay sized awesomeness. Within 36 hours of the wikipedia challenge being launched the leading submission has improved on internal Wikimedia Foundation models by 32.4%