Open Data Advice for Librarian Coders

For the purposes of this post we will ignore how remarkably badass sounding the word librarian becomes when “coder” is added to the end of it.

I recently had a librarian who had just picked up some coding skills email me and ask how they could get into Open Data (if, by the way, you were interested in someone with those skills, I’d be happy to connect you to them). Here is an edited version of my response:

I actually think that librarians are exactly what the open data space needs. It is interesting to think about the consulting services and organizational problems that exist in a world of almost infinite data – and particularly in a world with large amounts of open data.

Open Data portals are are getting more and more data sets – but I’m not sure anyone has meaningfully figured out how you organize that data in meaningful way? Or at least make them searchable in a way that is meaningful to a cross section of people as opposed to a narrow highly specialized specific type of user (e.g. an academic or domain expert)? Nor has anyone – as far as I can tell – cracked the problem (or developed the human hacks necessary) around acquiring and organizing meta data (e.g. actually getting it, and then making it understandable to a range of audiences without losing its nuance).

Certainly when I facilitated consultations on open data in a range of cities across Canada the issue of searchability and finding data came up over and over again. It was a very strong theme.

These are serious problems. And they are probably going to get much much worse as more data becomes available.

Open Contracting Workshop in Montreal on Friday

This was sent to me by Michael Roberts who has been doing great work in the Open Data and Open Standards space in Canada. Please check this out:

What: Open Contracting Data Standard: Stakeholder Workshop

When: 31st January 2014 – 9am – 3pm

Where: Hilton Montreal Bonaventure 

Hashtag (always required): #opencontractingdata

Register at: http://bit.ly/ocds-stakeholders

About the Open Contracting Data Standards project:

Over the course of 2014 the Open Contracting Partnership (OCP), the World Wide Web Foundation and wide range of stakeholders will be working to develop the first version of a global open data standard for publishing information on public contracts. The Open Contracting Partnership (OCP) believes that increased disclosure and participation in public contracting will make contracting more competitive and fair, will improve contract performance, and will better secure development outcomes. The development of an open contracting data standard is a vital step in joining up data across sectors and silos, allowing it to become truly socially useful. It will result in increased transparency around contracting processes and will empower citizens to be able to hold governments to account.

Stakeholder workshop:

We invite you to join us on 31st January 2014 for a stakeholder workshop to:

  • Explore the goals and potential for an Open Contracting Data Standard;
  • Identify opportunities for involvement in the development of the standard;
  • Shape core activities on the 2014 standard development road map;

Outline agenda (tbc):

  • 8.30am – 9.00 am: Welcome and coffee
  • 9.00am – 10.30am: The Open Contracting Road Map 
  • Including the history of the Open Contracting initiative; an introduction to the data standard project; and an exploration of data standard development so far.
  • 10.30am – 10.45am: Coffee break
  • 10.45am – 11.45am:  Shaping the vision & identifying stakeholders
  • Participative small group discussions focussed on outlining short and long-term visions for an open contracting data standard, and identifying the roles for key stakeholders to play in the development of the standard.
  • 11.45am – 12:30pm: Ways of working and key issues: an open development approach
  • Introducing the collaborative tools available for engaging with the development of the data standard, and identifying the key issues to be addressed in the coming months (the basis for task-groups in the afternoon).
  • 12.30pm – 1.30pm: Lunch
  • 1.30pm – 3.00pmTask groups
  • Small group work on specific issues, including the future governance of a standard, shared identifiers (e.g. organisational identifiers), data formats, demand side workshops, supply side research and related standards.

Who is the meeting for?

This meeting is designed to provide an opportunity for anyone interested in the Open Contracting Data Standard work to learn more about it. There will be discussions tailored to both providers of contracting data, and users of data, as well as discussions focussed at connecting the Open Contracting Data Standard with related open data projects, including IATI, Open Spending and open data on companies.

Open Data Day Google+ Hang Out

With just about a month to go until Open Data Day things are going well. There are quite a few cities that have been added to the open data day wiki.

This year we thought we would try something new. On January 21st we are going to host a  Get Ready For Open Data Day 2014! Google hangout.

The goal of the hangout is to help people thinking about organizing an event in their city get a sense of what others doing, ask questions about what has worked in the past, and just learn more about what is possible on the day. 

We’ll be hosting a 30-60 minute event on Tuesday, January 21 (at 11:00 am EST/ 8:00 PST/ 16:00 GMT /17:00 CEST) with myself, Heather Leson and Beatrice Martini focused on:

  1. What is Open Day Day – History
  2. Planning tips
  3. Open Q&A

There is likely a limit to how many people we can host on the hangout so please let us know if you’d like to participate.

And if you are interested in connecting with others – especially those who have run open data day events before – please consider joining the mailing list!

Santa Claus, Big Data and Asymmetric Learning

Any sufficiently advanced technology is indistinguishable from magic.

– Arthur C. Clarke’s Third Law of Prediction

This Christmas I had a wonderfully simple experience of why asymmetric rates of learning matter so much, and a simple way to explain it to friends and colleagues.

I have a young son. This is his first Christmas where he’s really aware of the whole Christmas thing: that there is a Santa Claus, there is a tree, people are being extra nice to one another. He’s loving it.

Naturally, part of the ritual is a trip to visit Santa Claus and so the other day, he embarked on his first visit with the big guy. Here’s a short version of the transcript:

Santa: “Hello Alec, would you like to talk to Santa?”

Alec: (with somewhat shy smile…) “Yes.” 

Santa: “So Alec, do you like choo-choo trains?”

Alec: (smile, eyes wide) “Yes.”

Santa: “Do you like Thomas the choo-choo train?”

Alec: (practically giggling, eyes super wide) “Yes!”

Reflecting on this conversation, all I can think is… no wonder kids believe in Santa Claus. I mean, here’s Alec’s first interaction with Santa ever and the guy, with no prompting, knows his name, knows one of his favourite things in the world, and even a specific type of toy related to that thing. Combine this with an appealing user interface (costumes, Christmas theme, and visitors treated like they’re very special) and of course it appears to actually be like magic. Alec must have thought Santa knew him personally.

It is easy to pretend this is something that only happens to two year-olds, but pretty much everyone I’ve met has, at one point, felt like LinkedIn or Facebook has been uncanny (for better or worse) in predicting a preference or connection.

The reality is there is just a massive asymmetry in learning. Most kids will have a single “Santa” interaction (or learning opportunity) a season – at most they’ll have two or three. Whereas a Santa in a shopping mall is going to meet thousands of kids (and thus have thousands of learning opportunities). Between these interactions and some basic research, any Santa worth his salt is going to pick up pretty quickly on what boys and girls tend to like and develop some quickly testable hypotheses about what any kid wants. Compared to the kids he meets, Santa is swimming in a world of big data: Lots of interactions and learning opportunities that allow him to appear magically aware.

The point here is that, as users, it is important for us to remember these things are not magic, but are driven by some pretty understandable tools. In addition, while we can’t always diminish the asymmetry in the rate of learning between users and the services they use, just understanding the dynamic can help demystify a service in a way that can be empowering to the user. I’m not going to ruin the magic of Santa’s tricks with my son, at least not for a few years – but I’m quite happy (and feel we are collectively responsible) to do so when it comes to online services.

Finally, it does show the power that increasing the rate of transactions can have on how quickly a system can learn. I’m spending more and more time trying to think about how systems – be they governments, non-profits or companies – can capture interactions and learning moments to help them become more effective, without pretending that it is magical – or infantilizing their users.

Open Data for Development Challenge on Jan 27-28

This just came across my email via Michael Roberts who has been doing great work in this space.

Mail Attachment

Open Data for Development Challenge
January 27–28, 2014 — Montreal, Canada

Do you want to share your creative ideas and cutting-edge expertise, and make a difference in the world?
Do you want to help Canadians and the world understand how development aid is spent and what its impact is?
Do you want to be challenged and have fun at the same time?

If so, take the Open Data for Development Challenge!

This unique 36-hour ”codathon” organized by Foreign Affairs, Trade and Development Canada will bring together Canadian and international technical experts and policy makers to generate new tools and ideas in the fields of open data and aid transparency and contribute to innovative solutions to the world’s pressing development challenges.

The event will feature keynote speakers Aleem Walji, Director of the World Bank’s Innovation Labs, and Mark Surman, Executive Director of the Mozilla Foundation. It will have two related dimensions:

  • Technical challenges that involve building applications to make existing open aid and development-related data more useful. Proposed topics include building a data viewer compatible with multilingual data, creating a publishing tool suitable for use by mid-sized Canadian non-profit organizations, developing and testing applications for open contracting, and taking a deep dive into the procurement data of the World Bank Group. There is room for challenges proposed by the community. Proposals should be submitted through the event website no later than January 8th. Challenges will be published prior to the event, along with key datasets and other related information, to enable participants to prepare for the event.
  • Policy discussions on how open data and open government can enable development results. This would include the use of big data in development programming, the innovative ways in which data can be mapped and visualized for development, and the impact of open data on developing countries.

The international aid transparency community will be encouraged to take promising tools and ideas from the event forward for further research and development.

An overview of the draft program is attached. The event will be in English and French, with interpretation provided in the plenary sessions and panel discussions.

We invite you to register, at no cost, at this website as soon as possible and no later than January 10. A message confirming your registration and providing additional information about the venue and accommodation will be sent to confirmed participants. Please wait for this confirmation before making any travel arrangements. Participants are asked to make their own accommodation arrangements. A limited number of guest rooms will be available to event participants at a preferential rate.

To find out more about the Open Data for Development Challenge, please go to DFATD’s website.

Open Data Day 2014 is Coming Feb 22 – Time to Join the Fun!

So, with much help from various community members (who reminded me that we need to get this rolling – looking at you Heather Leson), I pleased to say we are starting to gear up for Open Data Day 2014 on February 22nd, 2014.

From its humble beginnings of a conversation between a few friends who were interested in promoting and playing with open data, last year Open Data Day had locally organized events take place in over 100 cities around the world. Check out this video of open data day in Kathmandu last year.

Why makes Open Data Day work? Mostly you. It is a global excuse for people in communities like yours to come together and organize an event that meets their needs. Whether that is a hackathon, a showcase and fair, lectures, workshops for local NGOs and businesses, training on data, or meetings with local politicians – people are free to organize around whatever they think their community needs. You can read more about how Open Data Day works on our website.

Want to join in on the fun? I thought you’d never ask. Listed below are some different ways you can help make Open Data Day 2014 a success in your community!

A) How can I let EVERYONE know about open data day

I love the enthusiasm. Here’s a tweet you can send:

#OpenData Day is community powered in a timezone near you.  http://opendataday.org/ #ODD2014

Yes, our hashtag is #ODD2014. Cause we are odd. And cause we love open data.

B) I’d like to participate!

Great! If you are interested in participating in check out the Open Data Day wiki. We’ve just unlocked the pages so cities haven’t been added yet but feel free to add your city to the list, and put down your name as interested in participating. You can even check to see who organized the event last year to see if they are interested in doing it again.

C) Forget about participating, I want to coordinate an Open Data Day event in my city.

Whoa! Very exciting! Here’s a short checklist of what to do:

  • If you didn’t organize one last year, check to see if anyone in your city did. It would be good to connect with them first.
  • Read the Open Data Day website. Basically, pick up on our vibe: we want Open Data Day to work for everyone, from novices who know little about data to experts like Kaggle participants and uber geeks like Bruce Schneier. These events have always been welcoming and encouraging – it is part of the design challenge.
  • Okay, now add your city to the list, let people know where it will be taking place (or that you are working on securing space), let them know a rough agenda, what to expect, and how they can contribute.
  • Add yourself to the 2014 Open Data Day map. (Hint: Wikipedia lists Lat/Long in the information side bar for each cities wiki page: “Coordinates: 43°42′N 79°24′W”)
  • Join the Open Data Day mailing list. Organizers tend to share best practices and tips here. It’s not serious, really just a help and support group.
  • Check out resources like this and this about how to organize a successful event.
  • Start spreading the news!

D) I want to help more! How can Open Data Day work more smoothly everywhere?

Okay, for the truly hardcore you right, we need help. Open Data day has grown. This means we’ve outgrown a whole bunch of our infrastructure… like our webpage! Everyone involved in this is a volunteer so… we have some extra heavy lifting we need help with. This includes:

a. Website template update: The current Open Data Day template was generously donated by Mark Dunkley (thank you!!!). We’d love to have it scale a little better and refresh the content. You can see the code on github here. Email me if you are interested. Skills required: css, design

b. Translation: Can you help translate the ODD site into your language? You can submit the requests on github or send a document to heather.leson at okfn dot org with the content. She’ll do the github stuff if that’s beyond you.

c. Map: Leaflet and layers helpers wanted! We’d like a map geek to help correct geolocation and keep the 2014 map fresh with accurate geo for all the locations. Github repo is here and the event list is here.

What’s next?

I’m really looking forward to this year… I’ve lots more thoughts I’ll be sharing shortly.

Plus, I can’t wait to hear from you!

The End of Canada Post and the Coming War for Your Mailbox

As pretty much everyone in Canada learned yesterday (and no one outside the country cares to know), Canada Post, the country’s national postal service will phase out home mail delivery by 2019.  The reason? It’s obvious. The internet has hammered mail volumes. There was 20% less mail delivered in 2012 than 2006. And 6 of that 20% decline occurred in 2012 alone, suggesting the pace is accelerating.

First, I’m really quite happy about (the long term implications of) the demise of home delivery. For me, Canada Post has become a state sanctioned spamming infrastructure. When the little red dot on my mailbox rubs off (as it recently did) the volume of actual wanted mail versus unsolicited mail I receive runs at at least 20% versus 80%. Indeed, the average Canadian household got 1,178 flyers in 2010. About 22 a week. And that doesn’t even count unaddressed mail.

I shudder to think of the colossal waste of paper and energy created by the production, shipping, delivery and recycling the essentially endless circulation of this vast pulp forrest. All the more so given less than 3% probably ever gets looked at, much less read.

The problem is, in the short term at least, things may get worse. Or at least messier. One way of thinking about this change is that your front door just got massively deregulated. I suspect a whole new level of unwanted and unsolicited mail spam is about to hit the more densely populated swaths of the country. So much so, I expect we are going to see – in fact demand – new legislation to regulate physical spam.

Let me explain.

Up until now the cheapest way to send you spam – unsolicited mail or even just targeted advertising – was via the post person. Indeed Canada Post has long depended on this – generally unwanted – mail. You may remember in May, in one of the saddest public campaigns ever launched, Canada Post tried persuading Canadians that Junk mail was good for them.

One of the big advantages of junk mail is, however fleeting, it ended up in your home. Shift delivery to a mailbox out of the house however and you get this:

mailbox mess

Toronto Star File Photo

So there are two implications of the change. The first, as the photo above testifies, is that some – maybe even many advertisers, will feel like their mailers are less effective. They will of course actually know this, since the ROI on mailers is a pretty exact, and measured, science.

The second is that the largest player in the delivery of pulp to peoples homes business will have retreated away from… the home. Leaving a big demand to be filled by new entrants.

Thus, it is quite conceivable that Canada Post may see its junk mail volumes decline faster still. However, I suspect that while mail will decline, unaddressed mail – what you and I think of as flyers – could increase. These flyers, delivered by private players, have the enormous benefit of going right to your front door, just like good old junk mail did. Oh, and deliverers of these flyers don’t have pesky policies that stop them from delivering items to houses with signs that say “no junk mail.”

What does that mean? Well hopefully, in the long term, junk mail proves less and less effective a means of selling things. But I suspect that there will always be an advantage to shoveling 30 flyers a week onto your front stoop. So I can imagine another long term trend. In 2010 the government passed anti-spam legislation that focused on, the digital form of spam. So while I’m quite confident this law will have close to zero impact on digital spammers, for a growing number of Canadians I suspect there is little difference between online and offline spam in their mind. So much so that, it would not surprise me if an uptick in unsolicited flyers and mailers to people’s door – where they no longer get actual mail – make real anti-spam legislation a political winner. Indeed, a clever opposition party, wanting to show the more ill-conceived elements of the government’s plan, burnish its environmental credentials and own the idea early, might even propose it.

Will we get there? I don’t know. But if we just unleashed a wave of new spam and flyers on Canadians, I hope some new tool emerges that allows Canadians to say no to unsolicited junk.

The Importance of Open Data Critiques – thoughts and context

Over at the Programmable City website Rob Kitchin has a thoughtful blog post on open data critiques. It is very much worth reading and wider discussion. Specifically, there are two competing things worth noting. First, it is important for the open data community – and advocates in particular – to acknowledge the responsibility we have in debates about open data. Second, I’d like to examine some of the critiques raised and discuss those I think misfire and those that deserve deeper dives.

Open Data as Dominant Discourse

During my 2011 keynote at Open Government Data camp I talked about how the open data movement was at an inflection point:

For years we have been on the outside, yelling that open data matters. But now we are being invited inside.

Two years later the transition is more than complete. If you have any doubts, consider this picture:OD as DCOnce you have these people talking about things like a G8 Open Data Charter you are no longer on the fringes. Not even remotely.

It also means understanding the challenges around open data has never been more important. We – open data advocates – are now complicit it what many of the above (mostly) men decide to do around open data. Hence the importance of Rob’s post. Previously those with power were dismissive of open data – you had to scream to get their attention. Today, those same actors want to act now and go far. Point them (or the institutions they represent) in the wrong direction and/or frame an issue incorrectly and you could have a serious problem on your hands. Consequently, the responsibility of advocates has never been greater. This is even more the case as open data has spread. Local variations matter. What works in Vancouver may not always be appropriate in Nairobi or London.

I shouldn’t have to say this but I will, because it matters so much: Read the critiques. They matter. They will make you better, smarter, and above all, more responsible.

The Four Critiques – a break down

Reading the critiques and agreeing with them is, of course, not the same thing. Rob cites four critiques of open data: funding and sustainability, politics of the benign and empowering the empowered, utility and usability, and neoliberalisation and marketisation of public services. Some of these I think miss the real concerns and risks around open data, others represent genuine concerns that everyone should have at the forefront of their thinking. Let me briefly touch on each one.

Funding and sustainability

This one strikes me as the least effective criticism. Outside the World Bank I’ve not heard of many examples where government effectively sell their data to make money. I would be very interested in examples to the contrary – it would make for a great list and would enlighten the discussion – although not, I suspect in ways that would make either side of the discussion happy.

The little research that has been done into this subject has suggested that charging for government data almost never yields much money, and often actually serves as a loss creating mechanism. Indeed a 2001 KPMG study of Canadian geospatial data found government almost never made money from data sales if purchases by other levels of government were not included. Again in Canada, Statistics Canada argued for years that it couldn’t “afford” to make its data open (free) as it needed the revenue. However, it turned out that the annual sum generated by these sales was around $2M dollars. This is hardly a major contributor to its bottom line. And of course, this does not count the money that had to go towards salaries and systems for tracking buyers and users, chasing down invoices, etc…

The disappointing line in the critique however was this:

de Vries et al. (2011) reported that the average apps developer made only $3,000 per year from apps sales, with 80 percent of paid Android apps being downloaded fewer than 100 times.  In addition, they noted that even successful apps, such as MyCityWay which had been downloaded 40 million times, were not yet generating profits.

Ugh. First, apps are not what is going to make open data interesting or sexy. I suspect they will make up maybe 5% of the ecosystem. The real value is going to be in analysis and enhancing other services. It may also be in the costs it eliminates (and thus capital and time it frees up, not in the companies it creates), something I outlined in Don’t Measure the Growth, Measure the Destruction.

Moreover, this is the internet. The average doesn’t mean anything. The average webpage probably gets 2 page views per day. That hardly means there aren’t lots of very successful webpages. The distribution is not a bell curve, its a long tail, so it is hard to see what the average tells us other than the cost of experimentation is very, very low. It tells us very little about if there are, or will be successful uses of open data.

Politics of the benign and empowering the empowered

The is the most important critique and it needs to be engaged. There are definitely cases where data can serve to further marginalize at risk communities. In addition, there are data sets that for reasons of security and privacy, should not be made open. I’m not interested in publishing the locations of women’s shelters or worse, the list of families taking refuge in them. Nor do I believe that open data will always serve to challenge the status quo or create greater equality. Even at its most reductionist – if one believes that information is power, then greater ability to access and make us of information makes one more powerful – this means that winners and losers will be created by the creation of new information.

There are however, two things that give me some hope in this space. The first is that, when it comes to open data, the axis of competition among providers usually centers around accessibility. For example, the Socrata platform (an provider of open data portals to government) invests heavily in creating tools that make government data accessible and usable to the broadest possible audience. This is not a claim that all communities are being engaged (far from it) and that a great deal more work cannot be done, but there is a desire to show greater use which drives some data providers to try to find ways to engage new communities.

The second is that if we want to create data literate society – and I think we do, for reasons of good citizenship, social justice and economic competitiveness – you need the data first for people to learn and play with. One of my most popular blog posts is Learning from Libraries: The Literacy Challenge of Open Data in which I point out that one of the best ways to help people become data literate is to give them more interesting data to play with. My point is that we didn’t build libraries after everyone knew how to read, we built them beforehand with the goal of having them as a place that could facilitate learning and education. Of course libraries also often have strong teaching components to them, and we definitely need more of this. Figuring out who to engage, and how it can be done most effectively is something I’m deeply interested in.

There are also things that often depress me. I struggle to think of technologies that did not empower the empowered – at least initially. From the cell phone to the car to the printing press to open source software, all these inventions have had helped billions of people, but they did not distribute themselves evenly, especially at first. So the question cannot be reduced to – will open data empower the empowered, but to what degree, and where and with whom. I’ve seen plenty of evidence where data has enabled small groups of people to protect their communities or make more transparent the impact (or lack there of) of a government regulation. Open data expands the number of people who can use government information for their own ends – this, I believe is a good thing – but that does not mean we shouldn’t be constantly looking for ways to ensure that it does not reinforce structural inequity. Achieving perfect distribution of the benefits of a new technology, or even public policy, is almost impossible. So we cannot make perfect the enemy of the good. However, that does not hide the fact that there are real risk – and responsibilities as advocates – that need to be considered here. This is an issue that will need to be constantly engaged.

Utility and Usability

Some of the issues around usability I’ve addressed above in the accessibility piece – for some portals (that genuinely want users) the axis of evolution is pointed in the right direction with governments and companies (like Socrata) trying to embed more tools on the website to make the data more usable.

I also agree with the central concern (not a critique) of this section, which is that rather than creating a virtuous circle, poorly thought out and launched open data portals will create a negative “doomloops” in which poor quality data begets little interest which begets less data. However, the concern, in my mind, focuses on to narrow a problem.

One of the big reasons I’ve been an advocate of open data was a desire not just to help citizens, non-profits and companies gain access to information that could help them with their missions, but to change the way government deals with its data so that it can share it internally more effectively. I often cite a public servant I know who had a summer intern spend 3 weeks surfing the national statistical agency website to find data they knew existed but could not find because of terrible design and search. A poor open data site is not just a sign that the public can’t access or effectively use government data, it usually suggests that the governments employees can’t access or effectively use their own data. This is often deeply frustrating to many public servants.

Thus, the most important outcome created by the open data movement may have been making governments realize that data represents an asset class that of which they have had little understanding (outside, sadly, the intelligence sector, which has been all too aware of this) and little policy and governance (outside, say, the GIS space and some personal records categories). Getting governments to think about data as a platform (yes, I’m a fan of government as a platform for external use, but above all for internal use) is, in my mind, one way we can both enable public servants to get better access to information while simultaneously attacking the huge vendors (like SAP and Oracle) whose $100 million dollar implementations often silo off data, rarely produce the results promised and are so obnoxiously expensive it boggles the mind (Clay Johnson has some wonderful examples of the roughly 50% of large IT projects that fail).

They key to all this is that open data can’t be something you slap on top of a big IT stack. I try to explain this in It’s the Icing Not the Cake, another popular blog post about why Washington DC was able to effectively launch an open data program so quickly (which was, apparently, so effective at bringing transparency to procurement data the subsequent mayor rolled it back). The point is, that governments need to start thinking in terms of platforms if – over the long term – open data is going to work. And it needs to start thinking of itself as the primary consumer of the data that is being served on that platform. Steve Yegge’s brilliant and sharp witted rant on how Google doesn’t get platforms is an absolute must read in this regard for any government official – the good news is you are not alone in not finding this easy. Google struggles with it as well.

My main point. Let’s not play at  the edges and merely define this challenge as one of usability. It is much, much bigger problem than that. It is a big, deep, culture-changing BHAG problem that needs tackling. If we get it wrong, then the big government vendors and he inertia of bureaucracy win. We get it right and we potentially could save taxpayers millions while enabling a more nimble, effective and responsive government.

Neoliberalisation and Marketisation of Government

If you not read Jo Bates article “Co-optation and contestation in the shaping of the UK’s Open Government Data Initiative” I highly recommend it. There are a number of arguments in the article I’m not sure I agree with (and feel are softened by her conclusion – so do read it all first). For example, the notion that open data has been co-opted into an “ideologically framed mould that champions the superiority of markets over social provision” strikes me as lacking nuance. One of the things open data can do is create a public recognition of a publicly held data set and the need to protect these against being privatized. Of course, what I suspect is that both things could be true simultaneously – there can be increased recognition of the importance of a public asset while also recognizing the increased social goods and market potential in leveraging said asset.

However, there is one thing Bates is absolutely correct about. Open data does not come into an empty playing field. It will be used by actors – on both the left and right – to advance their cause. So I too am uncomfortable with those that believe open data is going to somehow depoliticize government or politics – indeed I made a similar argument in a piece in Slate on the politics of data. As I try to point out you can only create a perverse, gerrymandered electoral district that looks like this…

gerrymandered in chicago… if you’ve got pretty good demographic data about target communities you want to engage (or avoid). Data – and even open data – doesn’t magically make things better. There are instances where open data can, I believe, create positive outcomes by shifting incentives in appropriate ways… but similarly, it can help all sorts of actors find ways to satisfy their own goals, which may not be aligned with your – or even society at large’s – goals.

This makes voices like Bates deeply important since they will challenge those of us interested in open data to be constantly evaluating the language we use, the coalitions we form and the priorities that get made, in ways that I think are profoundly important. Indeed, if you get to the end of Bates article there are a list of recommendations that I don’t think anyone I work with around open data would find objectionable, quite the opposite, they would agree are completely critical.

Summary

I’m so grateful to Rob for posting this piece. It is has helped me put into words some thoughts I’ve had, both about the open data criticisms as well as the important role the critiques play. I try hard to be critical advocate of open data – one who engages the risks and challenges posed by open data. I’m not perfect, and balancing these two goals – advocacy with a critical view – is not easy, but I hope this shines some window into the ways I’m trying to balance it and possible helps others do more of it as well.

What Werewolf teaches us about Trust & Security

After sharing the idea behind this post with Bruce Schneier, I’ve been encouraged to think a little more about what Werewolf can teach us about trust, security and rational choices in communities that are, or are at risk of, being infiltrated by a threat. I’m not a security expert, but I do spend a lot of time thinking about negotiation, collaboration and trust, and so thought I’d pen some thoughts. The more I write below, the more I feel Werewolf could be a fun teaching tool. This is something I hope we can do “research” on at Berkman next week.

For those unfamiliar with Werewolf (also known as mafia), it’s very simple:

At the start of the game each player is secretly assigned a role by a facilitator. Typically there are 3 werewolves (who make up one team) and around 15 villagers, including one seer and one healer (who make up the other team).

Each turn of the game has two alternating phases. The first phase is “night,” during which everyone covers their eyes. The facilitator then “wakes” the werewolves who agree on a single villager they “murder.” The werewolves then return to sleep. The seer “wakes” up and points at one sleeping player and the facilitator informs the seer if that that player is a werewolf or villager. The seer then goes back to sleep. Finally the healer “wakes” up and selects one person to “heal.” If that person was chosen to be murdered by the werewolves during the night they are saved and do not die.

The second phase is “day”; this starts with everyone “waking up” (uncovering their eyes). The facilitator identifies who has been murdered (assuming they were not healed). That person is immediately eliminated from the game. The surviving players – e.g. the remaining villagers and the werewolves hidden among them – then debate who among them is a werewolf. The “day” ends with a vote to eliminate a suspect (who is also immediately removed from the game).

Play continues until all of the werewolves have been eliminated, or until the werewolves outnumber the villagers.

You can see why Werewolf raises interesting questions about trust systems. Essentially, the game is about whether or not the villagers can figure out who is lying: who is claiming to be a villager but is actually a werewolf. This creates a lot of stress and theatre. With the right people, it is a lot of fun.

There are, however, a number of interesting lessons that come out of Werewolf that make it a fun tool for thinking about trust, organization and cooperation. And many strategies – including some that are quite ruthless – are quite rational under these conditions. Here are some typical strategies:

1. Kill the Newbies

If you are playing werewolf for the first time and people find out, the village will kill you. For first time players – and I remember this well – it sucked. It felt deeply unfair… but on further analysis it is also rational.

Villagers have only a few rounds to figure out who are the werewolves, and there are strategies and tactics that greatly improve their odds. The less familiar you are with those strategies the more you threaten the group’s ability to defeat the werewolves. This makes the calculus for dealing with newbies easy: at best the group is eliminating a werewolf, at worst they are eliminating someone who hurts the odds of them winning. Hence, they get eliminated.

I’m assuming that similar examples of this behaviour take place when a network gets compromised. Maybe new nodes are cut off quickly, leaving the established nodes to start testing one another to see if they can be trusted. Of course, the variable could be different; a threat could spark a network to kill connections to all connections that, say, have outdated firmware. The point is, that such activities, while sweeping, unfair and likely punishing many “innocent” members, can feel quite rational for those part of the group or network.

2. Noise Can be Helpful

The most important villager is the seer, since they are the only one that can know – with certainty – who is a werewolf and a villager. Their challenge is to communicate this information to other villagers without revealing who they are to the werewolves (who would obviously kill them during the next night).

Good seers first ask the facilitator if the person next to them is a villager, then the person to the other side and then slowly moving out (see figure 1 below). If the person next to them is a villager they can then confide in them (e.g. round 1). Good seers can start to build a “chain” of verified villagers (round 2-3) who, as a voting block can protect one another and kill suspected (or better identified) werewolves at the end of each “day.”

Figure 1

Figure 1

This strategy, however, is predicated on the seer being able to safely communicate with those on their left and right. Naturally, werewolves are on the lookout for this behaviour. A player that keeps discreetly talking to those on their left and right makes themselves a pretty obvious target for the werewolves. Thus it is essential during each round that everyone talk to the person to their left and right, regardless of whether they have anything relevant to say or not. Getting everyone talk creates noise that anonymizes communication and interferes with the werewolves’ ability to target the seer.

This is a wonderful example of a simple counter-surveillance tactic. Everybody engages in a behaviour so that it is impossible to find the one person doing it who matters. It was doubly interesting for me as I’ve normally seen noise (e.g. unnecessary communication) as a problem – and rarely as a form of counter-power.

Moreover, in a hostile environment, this form of trust building needs to happen discreetly. The werewolves have the benefit of being both anonymous (hidden) from the villagers but are highly connected (they know who the other werewolves are). The above strategy focuses on destroying the werewolves by using creating a parallel network of villagers who are equally anonymous and highly connected but, over time, greater in number.

3. Structured and Random Stress Tests

The good news for villagers is that many people are terrible liars. Being a werewolf is hard, in part because it is fun. You have knowledge and power. Many people get giddy (literally!). They laugh or smirk or overly compensate by being silent. And some… are liable to say something stupid.

As a result, in the first round players will often insist that everyone introduce themselves and say their role. E.g. “Hi my name is David Eaves and I’m a villager.” You’d be surprised how many people screw up. On rare occasions people will omit their role, or stumble over it, or pause to think about it. This is a surefire way of getting eliminated. It comes back to lesson 1. With poor information, any information that might mean you are a werewolf is probably worth acting on. Werewolf: it’s a harsh, ruthless world.

This may be a interesting example of why ritual and consistency can become prized in a community. It is also a caution about the high transaction costs created by low-trust environments (e.g. ones where you worry the person you are talking to is lying). I’ve heard of (and have experienced first hand) border guards employing a form of the above strategy. This includes yelling at someone and intimidating them to the point where they confess to some error. If a a small transgression is admitted to, this can be used as leverage to gain larger confessions or to simply remove the person from the network (or, say, deny them entry into the country).

However, I suspect this strategy has diminishing returns. People who haven’t screwed up in the first two rounds probably aren’t going to. However, I suspect perpetuating this strategy  is something werewolves love. This is because it is an approach that is devoid of fact. Ultimately any minor deviation from an undefined “right” answer becomes justification for eliminating people – thus the werewolves can convince villagers to eliminate people for trivial reasons, and not spend their time looking at who is eliminating who, and who is coming to the aid of who in debate, patterns that are likely more effective at revealing the werewolves.

A note on physical setup

Virtually every time I’ve played werewolf it has been in a room, with the players sitting around a large table. This has meant that a given player can only talk, discreetly, with the player to their left and right. I have once played in a living room where people basically were in an unstructured heap.

What’s interesting is that I suspect that unstructured groups aid the werewolves. The seer strategy outlined in section 2 would be much more difficult to execute in a room where people could roam. A group of people that clustered around a single player would quickly become obvious. There are probably strategies that could be devised to overcome this, but they would probably be more complicated to execute, and so would create further challenges for the villagers.

So perhaps some rigidity to the structure of a community or network can go a long way to making it easier to build trust. This feels right to me, but I’m not sure what more to add on this.

All of this is a simple starting point (I’m sure I have few readers left at this point). But it would be fun to think of more ways that Werewolf could be used as a fun teaching tool around networks, trust and power. Definitely interested in hearing more thoughts.

Mozillians: Announcing Community Metrics DashboardCon – January 21, 2014

Please read background below for more info. Here’s the skinny.

What

A one day mini-conference, held (tentatively) in Vancouver on January 14th  San Francisco on January 21st and 22nd, 2014 (remote participating possible) for Mozillians about community metrics and dashboards.

Update: Apologies for the change of date and location, this event has sparked a lot of interest and so we had to change it so we could manage the number of people.

Why?

It turns out that in the past 2-3 years a number of people across Mozilla have been tinkering with dashboards and metrics in order to assess community contributions, effectiveness, bottlenecks, performance, etc… For some people this is their job (looking at you Mike Hoye) for others this is something they arrived at by necessity (looking at you SUMO group) and for others it was just a fun hobby or experiment.

Certainly I (and I believe co-collaborators Liz Henry and Mike Hoye) think metrics in general and dashboards in particular can be powerful tools, not just to understand what is going in the Mozilla Community, but as a way to empower contributors and reduce the friction to participating at Mozilla.

And yet as a community of practice, I’m not sure those interested in converting community metrics into some form of measurable output have ever gathered together. We’ve not exchanged best practices, aligned around a common nomenclature or discussed the impact these dashboards could have on the community, management and other aspects of Mozilla.

Such an exercise, we think, could be productive.

Who

Who should come? Great question. Pretty much anyone who is playing around with metrics around community, participation, or something parallel at Mozilla. If you are interested in participating please contact sign up here.

Who is behind this? I’ve outlined more in the background below, but this event is being hosted by myself, Mike Hoye (engineering community manager) and Liz Henry (bugmaster)

Goal

As you’ve probably gathered the goals are to:

  • Get a better understanding of what community metrics and dashboards exist across Mozilla
  • Learn about how such dashboards and metrics are being used to engage, manage or organize communities and/or influence operations
  • Exchange best around both the development of and use/application of dashboards and metrics
  • Stretch goal – begin to define some common definitions for metrics that exists across mozilla to enable portability of metrics across dashboards.

Hope this sounds compelling. Please feel free to email or ping me if you have questions.

—–

Background

I know that my cocollaborators – Mike Hoye and Liz Henry have their own reasons for ending up here. I, as many readers know, am deeply interested in understanding how open source communities can combine data and analytics with negotiation and management theory to better serve their members. This was the focus on my keynote at OSCON in 2012 (posted below).

For several years I tried with minimal success to create some dashboards that might provide an overview of the community’s health as well as diagnose problems that were harming growth. Despite my own limited success, it has been fascinating to see how more and more individuals across Mozilla – some developers, some managers, others just curious observers – have been scrapping data they control of can access to create dashboards to better understand what is going on in their part of the community. The fact is, there are probably at least 15 different people running community oriented dashboards across Mozilla – and almost none of us are talking to one another about it.

At the Mozilla Summit in Toronto after speaking with Mike Hoye (engineering community manager) and Liz Henry (bugmaster) I proposed that we do a low key mini conference to bring together the various Mozilla stakeholders in this space. Each of us would love to know what others at Mozilla are doing with dashboards and to understand how they are being used. We figured if we wanted to learn from others who were creating and using dashboards and community metrics data – they probably do to. So here we are!

In addition to Mozillians, I’d also love to invite an old colleague, Diederik van Liere, who looks at community metrics for the Wikimedia foundation, as his insights might also be valuable to us.

http://www.youtube.com/watch?v=TvteDoRSRr8