How Dirty is Your Data? Greenpeace Wants the Cloud to be Greener

My friends over at Greenpeace recently published an interesting report entitled “How dirty is your data?
A Look at the Energy Choices That Power Cloud Computing
.”

For those who think that cloud computing is an environmentally friendly business, let’s just say… it’s not without its problems.

What’s most interesting is the huge opportunity the cloud presents for changing the energy sector – especially in developing economies. Consider the follow factoids from the report:

  • Data centres to house the explosion of virtual information currently consume 1.5-2% of all global electricity; this is growing at a rate of 12% a year.
  • The IT industry points to cloud computing as the new, green model for our IT infrastructure needs, but few companies provide data that would allow us to objectively evaluate these claims.
  • The technologies of the 21st century are still largely powered by the dirty coal power of the past, with over half of the companies rated herein relying on coal for between 50% and 80% of their energy needs.

The 12% growth rate is astounding. It essentially makes it the fastest growing segment in the energy business – so the choices these companies make around how they power their server farms will dictate what the energy industry invests in. If they are content with coal – we’ll burn more coal. If they demand renewables, we’ll end up investing in renewables and that’s what will end up powering not just server farms, but lots of things. It’s a powerful position big data and the cloud hold in the energy marketplace.

And of course, the report notes that many companies say many of the right things:

“Our main goal at Facebook is to help make the world more open and transparent. We believe that if we want to lead the world in this direction, then we must set an example by running our service in this way.”

– Mark Zuckerberg

But then Facebook is patently not transparent about where its energy comes from, so it is not easy to assess how good or bad they are, or how they are trending.

Indeed it is worth looking at Greenpeace’s Clean Cloud report card to see – just how dirty is your data?

Report-card-cloud

I’d love to see a session at the upcoming (or next year) Strata Big Data Conference on say “How to use Big Data to make Big Data more Green.” Maybe even a competition to that effect if there was some data that could be shared? Or maybe just a session where Greenpeace could present their research and engage the community.

Just a thought. Big data has got some big responsibilities on its shoulders when it comes to the environment. It would be great to see them engage on it.

Lessons for Open Source Communities: Making Bug Tracking More Efficient

This post is a discussion about making bug tracking in Bugzilla for the Mozilla project more efficient. However, I believe it is applicable to any open source project or even companies or governments running service desks (think 311).

Almost exactly a year ago I wrote a blog post titled: Some thoughts on improving Bugzilla in which I made several suggestions for improving the work flow in bugzilla. Happily a number of those ideas have been implemented.

One however, remains outstanding and, I believe, creates an unnecessary amount of triage work as well as a terrible experience for end users. My understanding is that while the bug could not be resolved last year for a few reasons, there is growing interest (exemplified originally in the comment field of my original post) to tackle it once again. This is my attempt at a rallying cry to get that process moving.

Those who are already keen on this idea and don’t want to read anything more below, this refers to bug 444302.

The Challenge: Dealing with Support Requests that Arrive in Bugzilla

I first had this idea last summer while talking to the triage team at the Mozilla Summit. These are the guys who look at the firehose of bugs being submitted to Mozilla every day. They have a finite amount of time, so anything we can do to automate their work is going to help them, and the project, out significantly.

Presently, I’m told that Mozilla gets a huge number of bugs submitted that are not actually bugs, but support issues. This creates several challenges.

First, it means that support related issues, as opposed to real problems with the software, are clogging up the bug tracking system. This increases the amount of noise in the system – making it harder for everyone to find the information they need.

Second, it means the triage teams has to spend time filtering bugs that are actually support issues. Not a good use of their time.

Third, it means that users who have real support issues but submit them accidentally though Bugzilla, get a terrible experience.

This last one is a real problem. If you are a user, feeling frustrated (and possibly not behaving as your usual rational self – we’ve all been there) because your software is not working the way you expect, and then you submit what a triage person considers a support issue (Resolve-Invalid)  you get an email that looks like this:


If I’m already cheesed that my software isn’t doing what I want, getting an email that says “Invalid” and “Verified” is really going to cheese me off. That of course presumes I even know what this email means. More likely, I’ll be thinking that some ancient machine in the bowels of mozilla using software created in the late 1990s received my plea and has, in its 640K confusion, has spammed me. (I mean look at it… from a user’s perspective!)

The Proposal: Re-Automating the Process for a better result

Step 1: My sense is that this issue – especially problem #3 – could be resolved by simply creating a new resolution field. I’ve opted to call it “Support” but am happy to name it something else.

This feels like a simple fix and it would quickly move a lot of bugs that are cluttering up bugzilla… out.

Step 2: Query the text of bugs marked “support” against Mozilla’s database. Then insert the results in an email that goes back to the user. I’m imagining something that might look like this:

SUMO-transfer-v2

Such an email has several advantages:

First, if these are users who’ve submitted inappropriate bugs and who really need support, giving them a bugzilla email isn’t going to help them, they aren’t even going to know how to read it.

Second, there is an opportunity to explain to them where they should go for help – I haven’t done that explicitly enough in this email – but you get the idea.

Because, because we’ve done a query of the Mozilla support database (SUMO) we are able to include some support articles that might resolve their issue.

Fourth, if this really is a bug from a more sophisticated user, we give them a hyperlink back to bugzilla so they can make a note or comment.

What I like about this is it is customized engagement at a low cost. More importantly, it helps unclutter things while also making us more responsive and creating a better experience for users.

Next Steps:

It’s my understanding that this is all pretty doable. After last year’s post there were several helpful comments. Including this one from Bugzilla expert Gervase Markham:

The best way to implement this would be a field on SUMO where you paste a bug number, and it reaches out, downloads the Bugzilla information using the Bugzilla API, and creates a new SUMO entry using it. It then goes back and uses the API to automatically resolve the Bugzilla bug – either as SUPPORT, if we have that new resolution, or INVALID, or MOVED (which is a resolution Bugzilla has had in the past for bugs moved elsewhere), or something else.

The SUMO end could then send them a custom email, and it could include hyperlinks to appropriate articles if the SUMO engine thought there were any.

And Tyler Downer noted in this comment that there maybe be a dependency bug (#577561) that would also need resolving:

Gerv, I love you point 3. Exactly what I had in mind, have SUMO pull the relevant data from the bug report (we just need BMO to autodetect firefox version numbers, bug 577561 ;) and then it should have most of the required data. That would save the user so much time and remove a major time barrier. They think “I just filed a bug, now they want me to start a forum thread?” If it does it automatically, the user would be so much better served.

So, if there is interest in doing this, let me know. I’m happy to support any discussion, should it take place on the comment stream of the bug, the comments below, or somewhere else that might be helpful (maybe I should dial in on this call?). Regardless, this feels like a quick win, one that would better serve Mozilla users, teach them to go to the right place for support (over time) and improve the Bugzilla workflow. It might be worth implementing even for a bit, and we can assess any positive or negative feedback after 6 months.

Let me know how I can help.

Additional Resources

Bug 444302: Provide a means to migrate support issues that are misfiled as bugs over to the support.mozilla.com forums.

My previous post: Some thoughts on improving Bugzilla. The comments are worth checking out

Mozilla’s Bugzilla Wiki Page

How the WSJ's former owners could REALLY screw Rupert Murdoch

When the News of the World scandal began to really explode at the beginning of the month some intrepid reporter went and tracked down members of the Bancroft family – the former owners of the Wall Street Journal – and asked them if they regretted selling their controlling stock of the newspaper to Rupert Murdoch’s News Corporation.

Many did.

Since then, there has been some talk about the recourse available to the Bancrofts with an emphasis on a toothless special committee – which is supposed to ensure editorial independence – and how it could create some headaches for News Corp. I doubt it will matter.

But if the Bancrofts really do care about the Journal – or if a sub-segment of them do – there is something much more powerful they could do to screw Murdoch.

Offer to buy it back.

Remember, News Corporation paid $60 per share to the Bancrofts for their stake of Dow Jones & Company (the publisher of the Wall Street Journal). This represented an enormous 67% premium, or $2.24 billion, over the market valuation. In 2009, a mere 14 months later, News Corp wrote down the value of the purchase by almost half, accepting a loss of $2.8 billion. In other words, the value of Dow Jones is now back to, or even below the $35 a share it was at when the Bancrofts sold it.

Why not offer to buy it back at a theoretical valuation of $40 a share? The Bancrofts (or the members of the family that want to) certainly have some of the capital. They would essentially be using Murdoch’s own money to regain control of the WSJ at 2/3’s the price they paid for it. Clearly they would need to find other investors to be part of the group. But they couldn’t form the core of a new investor group.

Of course, you would say: Rupert Murdoch would never sell his crown jewel. But that’s the fun of it.

This week’s Economist references Nomura investment analyst Michael Nathanson’s assessment of News Corporation after the scandal:

Michael Nathanson, an analyst at Nomura, separated News Corporation into three hypothetical companies: a good one, based on television; a bad one, which makes films; and a downright toxic one, which runs newspapers. He suggests investors focus on the former.

An offer by the Bancrofts would force Murdoch to tell investors what type of company he intends to run. Refusing to sell the Wall Street Journal could confirm investors worst fears that Murdoch intends to cling to his newspapers empire. Worse, he would have to do this at a moment when defending that option is the most difficult for him. It could further weaken him as CEO in the eyes of investors and potentially speed up efforts to replace him. So even if the Bancrofts were rebuffed, it would allow them to extract some revenge for how Murdoch treated the Journal after their departure. On the flip side, if accepted, the Bancrofts would recapture their prized asset at a fraction of what they paid for it.

Of course, in The Man who Owns the News biographer Michael Wolff hardly paints the Bancrofts as a united group. Quite the opposite. So I’ll admit the above scenario does not feel all that likely: the Bancroft capital is no longer sufficiently concentrated. But it would have made for an interesting power play to observe.

Depression and Decline: American Irresponsibility is Ending the American Era with a Bang

Despite the assurances of US Treasury Secretary Timothy Geithner it is increasingly likely there will be no debt deal. The United States is going to default on its debt. I know it sounds crazy, but I believe it is going to happen. If it does, this is the black swan event no one imagined or was prepared to contemplate. Its impacts are going to be significant. Possibly immeasurable.

For history, August 2nd, 2011 could end up marking the end of the American Era. Sadly, it will not have been inevitable, it will have been entirely self-inflicted and it may now be irreversible. Even if an agreement is reached tomorrow I suspect the world will increasingly be unwilling to entrust the role of global financial system caretaker to the United States. The world has lost faith in America. And why not. Its Congress has demonstrated that it can no longer be trusted with the responsibility of global financial management. Indeed, even its closest allies have had their confidence shaken.

The economic and geopolitical ramifications of this outcome cannot be underestimated.

Economically, we may now be closer to a global depression than at anytime since 1930s. For all the talk of the financial crises being a near miss, this could potentially be much, much worse, simply because the consequences fall outside our predictive models.

What is clear is that America is trapped. In the short term spending less will devastate its population. Today more Americans (18.1%) than ever use food stamps. It takes American workers 40 weeks (and rising) to find a job, twice as long than in any previous recession. 1 in every 6 Americans use Medicaid. Any cuts to these services will have an immediate and harsh affect on the quality of life of a huge number of Americans.

Longer term, America cannot restart its economy. Already the top 5% of Americans by income account for 37% of all consumer outlays. This is unsurprising given the top 5% of Americans account for 34.7% of all income. This is similar to 1929 when the top 5% accounted for the top third of all personal income. This is precisely the type of economic structure that Kenneth Galbraith argues in The Great Crash, 1929, transformed the great crash into the great depression. Rather than being able to rely on a broad consumer base to power economic growth, the United States then (as now) was dependent on a high level of investment and luxury consumer spending driven by a small elite. The crash caused that elite to seize up, leaving the American economy paralyzed.

In other words, the Bush Tax cuts may have killed the US economically, and possibly geopolitical. By killing the surpluses they have broken the US treasury. By radically curtailing wealth redistribution they have fatally eroded the capacity of the US domestic economy to power new growth. Combine this with two wars that have sapped trillions of taxpayer dollars, and it is hard not to see a United States more ill prepared than at any time in its history to deal with an economic crisis. The only question that may remain is how much of the rest of the world it drags down with it.

Of course economic decline could become a leading indicator for political decline.

When I arrived to grad school in 1998 to study international relations the field had spent much of the previous decade grappling with the issue of American decline. Books like The Rise and Fall of Great Powers and Lester Thurow’s Head to Head seemed to suggest that economically and militarily, the United States was in, at the very least, relative decline as a the world’s leading power.

But then the successes of the US economy – coupled with the turn around in the size of the US government’s debt –  meant that as a peer, China felt a long way off while Brazil and India seemed more distant still. Europe was too old, disorganized and unambitious to matter. Russia, was fading quickly from the scene. Suddenly decline theory was, itself in decline.

But today the writings of Kennedy feel even more urgent. America, with or without a raised debt ceiling, cannot afford its empire, or the means to protect it. It may be able to find allies to help shoulder the burden – today the central challenge of 21st century geopolitics is the integration of India into the Western Alliance, something that proceeds apace. But if it defaults (and maybe even if it does not) it’s capacity to raise money at a reasonable rate should a major conflict arise, may be compromised. War, for America, is going to get more expensive because investors may be more nervous.

I want to clearly state that I don’t write any of this with any glee. Leftish non-americans who relish a world without the US hegemony should look at the what the period after Britain’s decline, or any period of hegemonic decline. They generally aren’t pretty. Indeed, they are often unstable, violent and nasty. Not something any country should wish for, especially smaller countries (such as my own – Canada). Moreover, while there is no immediate peer that could take America’s place, it isn’t clear that the most likely candidate – China – is one that most people would feel more comfortable with. Be careful what you wish for.

I hope that I’m wrong. I hope a deal will be reached. And that if it is, or if it isn’t, the impact on the markets will be minimal or non-existent. Or maybe, I just need to have more confidence in what I have often tell others: do not to underestimate America. As Sir Winston Churchill famously noted: “Americans can always be counted on to do the right thing…after they have exhausted all other possibilities.” And maybe they’ll have enough time to boot.

But I genuinely fear that in the haze of summer this crisis, as much as it has spurred some scary headlines, remains a sleeper. That we are confronting the mother of all black swans, and that a period of financial turmoil that will make the last two years look like a merry ride, could be upon us. Worse, that that financial turmoil will lead to other, great military and/or political turmoil.

These are scary times.

I can honestly say I never written a blog post that I hope I’m more wrong about.

Update: The Atlantic has a great article worth reading about the origins of the deficit published later this morning that includes a reference this fantastic graph from a few months ago.

Why I’m Struggling with Google+

So it’s been a couple of weeks since Google+ launched and I’ll be honest, I’m really struggling with the service. I wanted to give it a few weeks before writing anything, which has been helpful in letting my thinking mature.

First, before my Google friends get upset, I want to acknowledge the reason I’m struggling has more to do with me than with Google+. My sense is that Google+ is designed to manage personal networks. In terms of social networking, the priority, like at Facebook, is on a soft version of the word “social” eg. making making the experience friendly and social, not necessarily efficient.

And I’m less interested in the personal experience than in the learning/professional/exchanging experience. Mark Jones, the global communities editor for Reuters, completely nailed what drives my social networking experience in a recent Economist special on the News Industry: “The audience isn’t on Twitter, but the news is on Twitter.” Exactly! That’s why I’m on Twitter. Cause that’s where the news is. It is where the thought leaders are interacting and engaging one another. Which is very different activity than socializing. And I want to be part of all that. Getting intellectually stimulated and engaged – and maybe even, occasionally, shaping ideas.

And that’s what threw me initially about Google+. Because of where I’m coming from, I (like many people) initially focused on sharing updates which begged comparing Google+ to Twitter, not Facebook. That was a mistake.

But if Google+ is about about being social above all else, it is going to be more like Facebook than Twitter. And therein lies the problem. As a directory, I love Facebook. It is great for finding people, checking up on their profile and seeing what they are up to. For some people it is good for socializing. But as a medium for sharing information… I hate Facebook. I so rarely use it, it’s hard to remember the last time I checked my stream intentionally.

So I’m willing to accept that part of the problem is me. But I’m sure I’m not alone so if you are like me, let me try to further breakdown why I (and maybe you too) are struggling.

Too much of the wrong information, too little of the right information.

The first problem with Google+ and Facebook is that they have both too much of the wrong information, and too little of the right information.

What do I mean by too much of the wrong? What I love about Twitter is its 140 character limit. Indeed, I’m terrified to read over at Mathew Ingram’s blog that some people are questioning this limit. I agree with Mathew: changing Twitter’s 140 character limit is a dumb idea. Why? For the same reason I thought it made sense back in March of 2009, before Google+ was even a thought:

What I love about Twitter is that it forces writers to be concise. Really concise. This in turn maximizes efficiency for readers. What is it Mark Twain said?  “I didn’t have time to write a short letter, so I wrote a long one instead.” Rather than having one, or even thousands or readers read something that is excessively long, the lone drafter must take the time and energy to make it short. This saves lots of people time and energy. By saying what you’ve got to say in 140 characters, you may work more, but everybody saves.

On the other hand, while I want a constraint over how much information each person can transmit, I want to be able to view my groups (or circles) of people as I please.

Consider the screen shot of TweetDeck below. Look how much information is being displayed in a coherent manner (of my choosing). It takes me maybe, maybe 30-60 seconds to scan all this. In one swoop I see what friends are up to, some of my favourite thought leaders, some columnists I respect… it is super fast and efficient. Even on my phone, switching between these columns is a breeze.

twitter

But now look at Google+. There are comments under each item…but I’m not sure I really care to see. Rather then the efficient stream of content I want, I essentially have a stream of content I didn’t ask for. Worse, I can see, what, maybe 2-5 items per screen, and of course I see multiple circles on a single screen.

Google+1

Obviously, some of this is because Google+ doesn’t have any applications to display it in alternative forms. I find the Twitter homepage equally hard to use. So some of this could be fixed if (and hopefully when) Google makes public their Google+ API.

But it can’t solve some underlying problems. Because an item can be almost as long as the author wants, and there can be comments, Google+ doesn’t benefit from Twitter’s 140 character limit. As one friend put it, rather than looking at a stream of content, I’m looking at a blog in which everybody I know is a writer submitting content and in which an indefinite number of comments may appear. I’ll be honest: that’s not really a blog I’m interested in reading. Not because I don’t like the individual authors, but because it’s simply too much information, shared inefficiently.

Management Costs are too high

And herein lies the second problem. The management costs of Google+ are too high.

I get why “circles” can help solve some of the problems outlined above. But, as others have written, it creates a set of management costs that I really can’t be bothered with. Indeed this is the same reason Facebook is essentially broken for me.

One of the great things about Twitter is that it’s simple to manage: Follow or don’t follow. I love that I don’t need people’s permission to follow them. At the same time, I understand that this is ideal for managing divergent social groups. A lot of people live lives much more private than mine or want to be able to share just among distinct groups of small friends. When I want to do this, I go to email… that’s because the groups in my life are always shifting and it’s simple to just pick the email addresses. Managing circles and keeping track of them feels challenging for personal use. So Google+ ends up taking too much time to manage, which is, of course, also true of Facebook…

Using circles to manage for professional reasons makes way more sense. That is essentially what I’ve got with Twitter lists. The downside here is that re-creating these lists is a huge pain.

And now one unfair reason with some insight attached

Okay, so going to the Google+ website is a pain, and I’m sure it will be fixed. But presently my main Google account is centered around my eaves.ca address and Google+ won’t work with Google Apps accounts so I have to keep flipping to a gmail account I loathe using. That’s annoying but not a deal breaker. The bigger problem is my Google+ social network is now attached to an email account I don’t use. Worse, it isn’t clear I’ll ever be able to migrate it over.

My Google experience is Balkanizing and it doesn’t feel good.

Indeed, this hits on a larger theme: Early on, I often felt that one of the promises of Google was that it was going to give me more opportunities to tinker (like what Microsoft often offers in its products), but at the same time offer a seamless integrated operating environment (like what Apple, despite or because of their control freak evilness, does so well). But increasingly, I feel the things I use in Google are fractured and disconnected. It’s not the end of the world, but it feels less than what I was hoping for, or what the Google brand promise suggested. But then, this is what everybody says Larry Page is trying to fix.

And finally a bonus fair reason that’s got me ticked

Now I also have a reason for actively disliking Google+.

After scanning my address book and social network, it asked me if I wanted to add Tim O’Reilly to a circle. I follow Tim as a thought leader on Twitter so naturally I thought – let’s get his thoughts via Google+ as well. It turns out however, that Tim does not have a Google+ account. Later when I decided to post something a default settings I failed to notice sent emails to everyone in my circles without a Google+ account. So now I’m inadvertently spamming Tim O’Reilly who frankly, doesn’t need to get crap spam emails from me or anyone. I’m feeling bad for him cause I suspect, I’m not the only one doing it. He’s got 1.5 million followers on Twitter. That could be a lot of spam.

My fault? Definitely in part. But I think there’s a chunk of blame that can be heaped on to a crappy UI that wanted that outcome. In short: Uncool, and not really aligned with the Google brand promise.

In the end…

I remember initially, I didn’t get Twitter; after first trying it briefly I gave up for a few months. It was only after the second round that it grabbed me and I found the value. Today I’m struggling with Google+, but maybe in a few months, it will all crystallize for me.

What I get, is that it is an improvement on Facebook, which seems to becoming the new AOL – a sort of gardened off internet that is still connected but doesn’t really want you off in the wilds having fun. Does Google+ risk doing the same to Google? I don’t know. But at least circles are clearly a much better organizing system than anything Facebook has on offer (which I’ve really failed to get into). It’s far more flexible and easier to set up. But these features, and their benefits, are still not sufficient to overcome the cost setting it up and maintaining it…

Ultimately, if everybody moves, I’ll adapt, but I way prefer the simplicity of Twitter. If I had my druthers, I’d just post everything to Twitter and have it auto-post over to Google+ and/or Facebook as well.

But I don’t think that will happen. My guess is that for socially driven users (e.g. the majority of people) the network effects probably keep them at Facebook. And does Google+ have enough features to pull the more alpha type user away? I’m not sure. I’m not seeing it yet.

But I hope they try, as a little more competition in the social networking space might be good for everyone, especially when it comes to privacy and crazy end-user agreements.

The State of Open Data Licenses in Canada and where to go from here

(for readers less interested in Open Data – I promise something different tomorrow)

In February I wrote how 2011 would be the year of the license for Canada’s open data community. This has indeed been the case. For public servants and politicians overseeing the various open data projects happening in Canada and around the world, here is an outline of where we are, and what I hope will happen next. For citizens I hope this will serve as a primer and help explain why this matters. For non-Canadians, I hope this can help you strategize how to deal with the different levels of government in your own country.

This is important stuff, and will be important to ensure success in the next open data challenge: aligning different jurisdictions around common standards.

Why Licenses Matter

Licenses matter because they determine how you are able to use government data – a public asset. As I outlined in the three laws of open data, data is only open if it can be found, be played with and be shared. The license deals with the last of these. If you are able to take government data, find some flaw or use it to improve a service, it means nothing if you are not able to share what you create with others. The more freedom you have in doing this, the better.

What we want from the license regime (and for your government)

There are a couple of interests one is trying to balance in creating a license regime. You want:

  • Open: there should maximum freedom for reuse (see above, and this blog post)
  • Secure: it offers governments appropriate protections for privacy and security
  • Simplicity: to keep down legal costs low, and make it easier for everyone to understand
  • Standardized: so my work is accessible across jurisdictions
  • Stable: so I know that the government won’t change the rules on me

At the moment, two licenses in Canada meet these tests. The Public Domain Dedication and License (PDDL) used by Surrey, Langley, Winnipeg (for its transit data) and the BC government open data portal license (which is a copy of the UK Open Government license).

Presently a bunch of licenses do not. This includes the Government of Canada Open Data Licence Agreement for Unrestricted Use of Canada’s Data (couldn’t they choose a better name? But for a real critique of why, read this blog post). It also includes the variants of the license created by Vancouver and now used by Toronto, Ottawa and Edmonton (among others). Full disclosure, I was peripherally involved in the creation of this license – it was necessary at the time.

Both these licenses are not standardized, have restrictions in them not found in the UK/BC Open Government License and the PDDL and are anything but simple. Nor are they stable. At any time the government can revoke them. In other words, many developers and companies interested in open data dislike them immensely.

Where do we go from here?

At the moment there are a range of licenses available in Canada – this undermines the ability of developers to create software that uses open data across multiple jurisdictions.

First, the launch of BC’s open data portal and its use of the UK Open Government License has reset the debate in this country. The Federal government, which has an awkward, onerous and unloved license should stop trying to create a new license that simply adds unnecessary complexity and creates confusion for software developers. (I detail the voluminous problems with the Federal license here.)

Instead the Feds should adopt the UK Open Government Licence and push for it to be a standard, both for the provinces and federal government agencies, as well as for other common wealth countries. Its refusal to adopt the UK license is deeply puzzling. It has offered no explanation about why it can’t, indeed, it would be interesting to hear what the Federal Government believes it knows that the UK government (which has been doing this for much longer) and the BC government doesn’t know.

What I predict will happen is that more and more provinces will adopt the UK license and increasingly the Feds will look isolated and ridiculous. Barring some explanation, this silliness should end.

At the municipal level, things are more complicated. If you look at the open data portals of Vancouver, Toronto, Edmonton and Ottawa (sometimes referred to as the G4) you’ll notice each has a similar paragraph:

The Cities of Vancouver, Edmonton, Ottawa and Toronto have recently joined forces to collaborate on an “Open Data Framework”. The project aims to enhance current open data initiatives in the areas of data standards and terms of use agreements. Please contact us for further information.

This paragraph has been sitting on these sites for well over a year now (approaching almost two years) but in terms of data standards and common terms of use the work, to date, the G4 has produced nothing tangible for end users. (Full disclosure, I have sat in on some of these meetings.) The G4 cities, which were leaders, are now languishing with a license that actually puts them in the middle, not the front of the pack. They remain ahead of the bulk of Canadian cities that have no open data, but, in terms of license, behind the aforementioned cities of Surrey, Langley, Winnipeg (for its transit data).

These second generation open data cities either had fewer resources or drew the right lessons and have leap-frogged the G4 cities by adopting the PDDL – something they did because it essentially outsourced the management of the license to a competent third party. It maximized the effectiveness of their data, while limiting their costs all while giving them the same level of protection.

The UK and BC versions of the Open Government License could work for the cities, but the PDDL is a better license. Also, it is well managed. If the cities were to adopt the OGL it wouldn’t be the end of the world but it also isn’t necessary. It probably makes more sense for them to simply follow the new leaders in the space and adopt the PDDL as this will less restrictive and easier to adopt.

Thus, speaking personally, the ideal situation in Canada would be that:

  • the Federal and Provincial Governments to adopt the UK/BC Open Government License. I’d love to live in a world where the adopted the PDDL, but my conversations with them lead me to believe this simply is not likely in the near to mid term. I think 99% of software developers out there will agree that the Open Government License is an acceptable substitute. and
  • the municipalities push to adopt the PDDL. Already several municipalities have done this and the world has not ended. The bar has been set.

The worse outcome would be:

  • the G4 municipalities invent some new license. The last thing the world needs is another open data license to confuse users and increase legal costs.
  • the federal government continues along the path of evolving its own license. Its license was born broken and is unnecessary.

Sadly, I see little evidence for optimism at the federal level. However, I’m optimistic about the cities and provinces. The fact that most new open data portals at the municipal level have adopted the PDDL suggests that many in these governments “get it”. I also think the launch of data.gov.bc.ca will spur other provinces to be intelligent about their license choice.

 

 

Province of BC launches Open Data Catalog: What works

As revealed yesterday, the province of British Columbia became the first provincial government in Canada to launch an open data portal.

It’s still early but here are some things that I think they’ve gotten right.

1. License: Getting it Right (part 1)

Before anything else happens, this is probably the single biggest good news story for Canadians interested in the opportunities around open data. If the license is broken, it pretty much doesn’t matter how good the data is, it essential gets put in a legal straightjacket and cannot be used. For BC open data portal this happily, is not the case.

There’s actually two good news stories here.

The first is that the license is good. Obviously my preference would be for everything to be unlicensed and in the public domain as it is in the United States. Short of that however, the most progressive license out there is the UK Government’s Open Government License for Public Sector Information. Happily the BC government has essentially copied it. This means that many of that BC’s open data can be used for commercial purposes, political advocacy, personal use and so forth. In short the restrictions are minimal and, I believe, acceptable. The license addresses the concerns I raised back in March when I said 2011 would be the year of Open Data licenses in Canada.

2. License: The Virtuous Convergence (part 2)

The other great thing is that this is a standardized license. The BC government didn’t invent something new they copied something that already worked. This is music to the ears of many as it means applications and analysis developed in British Columbia can be ported to other jurisdictions that use the same license seamlessly. At the moment, that means all of the United Kingdom. There has been some talk of making the UK Open Government Licenses (OGL) a standard that can be used across the commonwealth – that, in my mind, would be a fantastic outcome.

My hope is that this will also put pressure on other jurisdictions to either improve their licenses or converge them with BC/UK or adopt a better license still. With the exception of the City of Surrey, which uses the PDDL license, the BC government’s license far superior to the licenses being used by other jurisdictions:  – the municipal licenses based on Vancouver’s license (used by Vancouver, Edmonton, Ottawa, Toronto and a few others) and the Federal Government’s open data license (used by Treasury Board and CIDA) are both much more restrictive. Indeed, my real hope is that BC’s move will snap the Federal Government out of their funk, make them realize their own licenses are confusing, problematic and a waste of time, and encourage them to contribute to making the UK’s OGL a new standard for all of Canada. It would be much better than what they have on offer.

3. Tools for non-developers

Another nice thing about the data.gov.bc.ca website is that it provides tools for non-developers, so that they can play with, and learn from, some of the data. This is, of course, standard fare on most newer open data portals – indeed, it’s seems to be the primary focus on Socrata, a company that specializes in creating open government data portals. The goal everywhere is to increase the number of people who can make use of the data.

4. Meaty Data – Including Public Accounts

One of the charges sometimes leveled against open data portals is that they don’t publish data that is important, or that could drive substantive public policy debates. While this is not true of what has happened in the UK and the United States, that charge probably is someone fair in Canada. While I’m still exploring the data available on data.gov.bc.ca one thing seems clear, there is a commitment to getting the more “high-value” data sets out to the public. For example, I’ve already noticed you can download the Consolidated Revenue Fund Detailed Schedules of Payments-FYE10-Suppliers which for the fiscal year 2009-2010 details the payees who received $25,000 or more from the government. I also noticed that the Provincial Obstacles to Fish Passage are available for download – something I hope our friends in the environmental movement will find helpful. There is also an entire section dedicated to data on the provincial educational system, I’ll be exploring that in more detail.

Wanted to publish this for now, definitely keen to hear about others thoughts and comments on the data portal, data sets you find interesting and helpful, or anything else. If you are building an app using this data, or doing an analysis that is made easier because of the data on this site, I’d love to hear from you.

This is a big step for the province. I’m sure I’ll discover some shortcomings as I dive deeper, but this is a solid start and, I hope, an example to other provinces about what is possible.

Using Data to Make Firefox Better: A mini-case study for your organization

I love Mozilla. Any reader of this blog knows it. I believe in its mission, I find the organization totally fascinating and its processes engrossing. So much so I spend a lot of time thinking about it – and hopefully, finding ways to contribute.

I’m also a big believer in data. I believe in the power of evidence-based public policy (hence my passion about the long-form census) and in the ability of data to help organizations develop better products, and people make smarter decisions.

Happily, a few months ago I was able to merge these two passions: analyzing data in an effort to help Mozilla understand how to improve Firefox. It was fun. But more importantly, the process says a lot about the potential for innovation open to organizations that cultivate an engaged user community.

So what happened?

In November 2010, Mozilla launched a visualization competition that asked: How do People Use Firefox? As part of the competition, they shared anonymous data collected from Test Pilot users (people who agreed to share anonymous usage data with Mozilla). Working with my friend (and quant genius) Diederik Van Liere, we analyzed the impact of add-on memory consumption on browser performance to find out which add-ons use the most memory and thus are most likely slowing down the browser (and frustrating users!). (You can read about our submission here).

But doing the analysis wasn’t enough. We wanted Mozilla engineers to know we thought that users should be shown the results – so they could make more informed choices about which add-ons they download. Our hope was to put pressure on add-on developers to make sure they weren’t ruining Firefox for their users. To do that we visualized the data by making a mock up of their website – with our data inserted.

FF-memory-visualizations2.001

For our efforts, we won an honourable mention. But winning a prize is far, far less cool than actually changing behaviour or encouraging an actual change. So last week, during a trip to Mozilla’s offices in Mountain View, I was thrilled when one of the engineers pointed out that the add-on site now has a page where they list add-ons that most slow down Firefox’s start up time.

Slow-Performing-Add-ons-Add-ons-for-Firefox_1310962746129

(Sidebar: Anyone else find it ironic that “FastestFox: Browse Faster” is #5?)

This is awesome! Better still, in April, Mozilla launched an add-on performance improvement initiative to help reduce the negative impact add-ons can have on Firefox. I have no idea if our submission to the visualization competition helped kick-start this project; I’m sure there were many smart people at Mozilla already thinking about this. Maybe it was already underway? But I like to believe our ideas helped push their thinking – or, at least, validated some of their ideas. And of course, I hope it continues to. I still believe that the above-cited data shouldn’t be hidden on a webpage well off the beaten path, but should be located right next to every add-on. That’s the best way to create the right feedback loops, and is in line with Mozilla’s manifesto – empowering users.

Some lessons (for Mozilla, companies, non-profits and governments)

First lesson. Innovation comes from everywhere. So why aren’t you tapping into it? Diederik and I are all too happy to dedicate some cycles to thinking about ways to make Firefox better. If you run an organization that has a community of interested people larger than your employee base (I’m looking at you, governments), why aren’t you finding targeted ways to engage them, not in endless brainstorming exercises, but in innovation challenges?

Second, get strategic about using data. A lot of people (including myself) talk about open data. Open data is good. But it can’t hurt to be strategic about it as well. I tried to argue for this in the government and healthcare space with this blog post. Data-driven decisions can be made in lots of places; what you need to ask yourself is: What data are you collecting about your product and processes? What, of that data, could you share, to empower your employees, users, suppliers, customers, whoever, to make better decisions? My sense is that the companies (and governments) of the future are going to be those that react both quickly and intelligently to emerging challenges and opportunities. One key to being competitive will be to have better data to inform decisions. (Again, this is the same reason why, over the next two decades, you can expect my country to start making worse and worse decisions about social policy and the economy – they simply won’t know what is going on).

Third, if you are going to share, get a data portal. In fact, Mozilla needs an open data portal (there is a blog post that is coming). Mozilla has always relied on volunteer contributors to help write Firefox and submit patches to bugs. The same is true for analyzing its products and processes. An open data portal would enable more people to help find ways to keep Firefox competitive. Of course, this is also true for governments and non-profits (to help find efficiencies and new services) and for companies.

Finally, reward good behaviour. If contributors submit something you end up using… let them know! Maybe the idea Diederik and I submitted never informed anything the add-on group was doing; maybe it did. But if it did… why not let us know? We are so pumped about the work they are doing, we’d love to hear more about it. Finding out by accident seems like a lost opportunity to engage interested stakeholders. Moreover, back at the time, Diederik was thinking about his next steps – now he works for the Wikimedia Foundation. But it made me realize how an innovation challenge could be a great way to spot talent.

The Audacity of Shaw: How Canada's Internet just got Worse

It is really, really, really hard to believe. But as bad as internet access is in Canada, it just got worse.

Yesterday, Shaw Communications, a Canadian telecommunications company and internet service provider (ISP) that works mostly in Western Canada announced they are launching Movie Club, a new service to compete with Netflix.

On the surface this sounds like a good thing. More offerings should mean more competition, more choice and lower prices. All things that would benefit consumers.

Look only slightly closer and you learn the very opposite is going on.

This is because, as the article points out:

“…subscribers to Movie Club — who initially can watch on their TV or computer, with phones and tablets planned to come on line later — can view content without it counting against their data plan.

“There should be some advantage to you being a customer,” Bissonnette said.”

The very reason the internet has been such an amazing part of our lives is that every service that is delivered on it is treated equally. You don’t pay more to look at the Vancouver Sun’s website than you do to look at eaves.ca or CNN or to any other website in the world. For policy and technology geeks this principle of equality of access is referred to as net neutrality. The idea is that ISPs (like Shaw) should not restrict or give favourable access to content, sites, or services on the internet.

But this is precisely what Shaw is doing with its new service.

This is because ISPs in Canada charge what are called “overages.” This means if you use the internet a lot, say you watch a lot of videos, at a certain point you will exceed a “cap” and Shaw charges you extra, beyond your fixed monthly fee. If, for example, you use Netflix (which is awesome and cheap, for $8 a month you get unlimited access to a huge quantity of content) you will obviously be watching a large number of videos, and the likelihood of exceeding the cap is quite high.

What Shaw has announced is that if you use their service – Movie Club – none of the videos you watch will count against your cap. In other words they are favouring their service over that of others.

So why should you care? Because, in short, Shaw is making the internet suck. It wants to turn your internet from the awesome experience where you have unlimited choice and can try any service that is out there, into the experience of cable, where your choice is limited to the channels they choose to offer you. Today they’ll favour their movie service as opposed to (the much better) Netflix service. But tomorrow they may decide… hey you are using Skype instead of our telephone service, people who use “our skype” will get cheaper access than people who use skype. Shaw is effectively applying a tax on new innovative and disruptively cheap service on the internet so that you don’t use them. They are determining – through pricing – what you can and cannot do with your computer while elsewhere in the world, people will be using cool new disruptive services that give them better access to more fun content, for cheaper. Welcome to the sucky world of Canada’s internet.

Doubling down on Audacity: The Timing

Of course what makes this all the more obscene is that Shaw has announced this service at the very moment the CRTC – the body that regulates Canada’s Internet Service Providers – is holding hearings on Usage Based Billings. One of the reasons Canada’s internet providers say that have to charge “overages” for those who use the internet a lot is because of there isn’t enough bandwidth. But how is it that there is enough bandwidth for their own services?

As Steve Anderson of the OpenMedia – a consumer advocacy group – shared with me yesterday “It’s a huge abuse of power.” and that “The launch of this service at the time when the CRTC is holding a hearing on pricing regulation should be seen as a slap in the face to the the CRTC, and the four hundred and ninety one thousand Canadians that signed the Stop The Meter petition.”

My own feeling is the solution is pretty simple. We need to get the ISPs out of the business of delivering content. Period. Their job should be to deliver bandwidth, and nothing else. You do that, you’ll have them competing over speed and price very, very quickly. Until then the incentive of ISPs isn’t to offer good internet service, it’s to do the opposite, it’s to encourage (or force) users to use the services they offer over the internet.

For myself, I’m a Shaw customer and a Netflix customer. Until now I’ve had nothing to complain about with either. Now, apparently I have to choose between the two. I can tell you right now who is going to win. Over the next few months I’m going to be moving my internet service to another provider. Maybe I’ll still get cable TV from Shaw, I don’t know, but my internet service is going to a company that gives me the freedom to choose the services I want and that doesn’t ding me with fees that apparently, I’m being charged under false pretenses. I’ll be telling by family members, friends and pretty much everyone I know, to do the same.

Shaw, I’m sorry it had to end this way. But as a consumer, it’s the only responsible thing to do.

It's the icing, not the cake: key lesson on open data for governments

At the 2010 GTEC conference I did a panel with David Strigel, the Program Manager of the Citywide Data Warehouse (CityDW) at the District of Columbia Government. During the introductory remarks David recounted the history of Washington DC’s journey to open data.

Interestingly, that journey began not with open data, but with an internal problem. Back around 2003 the city had a hypothesis that towing away abandoned cars would reduce crime rates in the immediate vicinity, thereby saving more money in the long term than the cost of towing. In order to access the program’s effectiveness city staff needed to “mash-up” longitudinal crime data against service request data – specifically, requests to remove abandoned cars. Alas, the data sets were managed by different departments, so this was tricky task. As a result the city’s IT department negotiated bilateral agreements with both departments to host their datasets in a single location. Thus the DC Data Warehouse was born.

Happily, the data demonstrated the program was cost effective. Building on this success the IT department began negotiating more bilateral agreements with different departments to host their data centrally. In return for giving up stewardship of the data the departments retained governance rights but reduced their costs and the IT group provided them with additional, more advanced, analytics. Over time the city’s data warehouse became vast. As a result, when DC decided to open up its data it was, relatively speaking, easy to do. The data was centrally located, was already being shared and used as a platform internally. Extending this platform externally (while not trivial) was a natural step.

In short, the deep problem that needed to solved wasn’t open data. Its was an information management. Getting the information management and governance policies right was essential for DC to move quickly. Moreover, this problem strikes at the heart of what it means to be government. Knowing what data you have, where it is, and under a governance structure that allows it to be shared internally (as well as externally) is a problem every government is going to face if it wants to be efficient, relevant and innovative in the 21st century. In other words, information management is the cake. Open data – which I believe is essential – is however the sweet icing you smother on top of that dense cake you’ve put in place.

Okay, with that said two points that flow from this.

First: Sometime, governments that “do” open data start off by focusing on the icing. The emphasis in on getting data out there, and then after the fact, figuring out  governance model that will make sense. This is a viable strategy, but it does have real risks. When sharing data isn’t at the core function but rather a feature tacked on at the end, the policy and technical infrastructure may be pretty creaky. In addition, developers may not want to innovate on top of your data platform because they may (rightly) question the level of commitment. One reason DC’s data catalog works is because it has internal users. This gives the data stability and a sense of permanence. On the upside, the icing is politically sexier, so it may help marshal resources to help drive a broader rethink of data governance. Either way, at some point, you’ve got to tackle the cake, otherwise, things are going to get messy. Remember it took DC 7 years to develop its cake before it put icing on it. But that was making it from scratch. Today thanks to new services (armies of consultants on this), tools (eg. Socrata) and models (e.g. like Washington, DC) you can make that cake following a recipe and even use cake mix. As David Strigel pointed out, today, he could do it in a fraction of the time.

Second: More darkly, one lesson to draw from DC is that the capacity of a government to do open data may be a pretty good proxy for their ability to share information and coordinate across different departments. If your government can’t do open data in a relatively quick time period, it may mean they simply don’t have the infrastructure in place to share data internally all that effectively either. In a world where government productivity needs to rise in order to deal with budget deficits, that could be worrying.