As many readers are likely aware two weeks ago The Journal News, a newspaper just outside of New York city, published a map showing the addresses and names of handgun owners in Westchester and Rockland counties. The map, which was part of a story responding to the tragic shooting in Newtown, Connecticut, was constructed with data the paper acquired through Freedom of Information requests. Since their publication the story has generated enormous public interest, including a tremendous amount of anger from gun owners and supporters. The newspaper and its staff have received death threats, had their home addresses published and details of where their kids attend school published. Today the newspapers headquarters are guarded by… armed guards.
While there is a temptation to talk about this even in terms of open data, I don’t think this is a debate about open data. This is a debate about privacy and policy.
Let me clarify.
There is lots of information governments collect about people – the vast majority of which is not, and should not be available. As both an open data advocate and a gov 2.0 advocate I’m strongly interested in ensuring that – around any given data set – peoples sense of privacy is preserved. There are of course interests that benefit from information being made inaccessible, just as there are interests that benefit from it being made accessible, but when it comes to individually identifying pieces of information, I prefer to be cautious.
So, from my perspective, it is critical that this debate not get sloppy. This is not about open data. It is about personable identifiable data – and what governments should and should not do with it. Obviously “open” and “personal identifiable” data can overlap, but they are not the same. A great deal of open data has nothing to do with individuals. However, if we allow the two to become synonymous… well… expect a backlash against open data. No one ever gave anyone a blank check to make any and everything open. I don’t expect my personal healthcare or student record to be downloadable by anyone – I suspect you don’t either.
This is why – when I advise governments – I try to focus on data that is the least contentious (e.g. not even at risk of being personally identifying) since this gives public servants, politicians and the public some time to build knowledge and capacity around understand the issues.
This is not to say that no personalbly identifiable data should be made available – the question is, to what end? And the question matters. I suspect privacy played a big part if the outcry and reaction to the Journal’s gun map. But I suspect that for many – particularly strong pro-gun advocates – there was a recognition that this data was being used as a device (of VERY unclear efficacy) to accelerate public support for stricter gun laws. So they object not just to the issue of privacy, but to the usage.
In the case of guns, I don’t know what the right answer is. But here is an example I feel more confident about. Personally I (and many others) believe businesses license data should be open, including personal identifiable data. But again, these are issues that need to be hammered out, debated and the public given choices. This is not where the open data discussion needs to start, and this is certainly not how it should be defined in the public, as it is much, much more that that and includes touches many issues that are far, far less contentious. But we need to be building the capacity – in the public, among politicians and among public servants – to have these conversations, because disclosure, or the lack thereof, will increasingly be a political and policy choice.
And many of these questions will be tricky. I also believe data should be made available in aggregate. While I understand there are risks I believe researchers should be allowed to use large data sets to try to find out how age or other factors might effective a terrible medical condition, or to gain insights into how graduation rates of at risk groups might be improved. These are big benefits that are – again, for me, worth the risk. But they will of course need to constantly be weighed and debated. What personal data should also be allowed to become open data, under what circumstances and to whose benefit… these are big questions.
So, if you are an open data advocates out there – please don’t let people confuse Open Data with Personal Data. The two can and almost certainly will overlap at times. But that does not make them the same thing. If these two terms become synonymous in the public’s mind in ANY way, it could take years to recover. So educate yourself on privacy issues, and be sure to educate the people you work with. But above all, help them get ready for these debates. More are coming.
Some additional Thoughts
Of course, when it comes to data, if you are really worried about personally identifiable stuff, there is a lot more to fear that isn’t maintained by governments. The world of free and purchasable data contains a lot of goodies (think maps, stock prices, etc…) but there is also plenty of overlap with personal data as well.
Indeed, much of the retaliatory data about the employees of The Journal News was data that was personally identifiable and readily available. A simple look at who I follow on twitter would likely reveal a fair bit about my social graph to anyone. And this isn’t even the juicy stuff. One wonders how many people realize just how much about them can be purchased. Indeed accessing some information has become so common place people don’t even think about it anymore: my understanding is that almost anyone can get a copy of your credit score, right?
That siad, I recognize the difference between data the state forces you to disclose (gun ownership) and that which you “voluntarily” submit and cede control over so as to take part in a service (facebook friends). I don’t always like the latter, but I recognize it is different from the government – it is one thing to have a monopoly on violence it is another to have a terrible EULA. That said, I suspect that many people would be disturbed if they saw exactly how many people were tracking all of the things they do online. Mozilla’s Collusion project is a fun – if ultimately fruitless – tool for getting a sense of this. It is worth doing for a day just to see who is watching what you watch and do online.
I share all this not because I want to scare anyone – indeed I suspect these additional notes are old hat to anyone still reading, but recognize how complex the public’s relationship with data is. And as much as it will upset my privacy advocacy friends to hear me say this: my sense is that public is actually still quite comfortable with vast amounts of data about them being collected (Facebook seems to be able to do whatever it wants with almost no impact on usage). Where people get finicky is around how that data gets used. Apparently, be sold to more effectively doesn’t bother them all that much (although I wish some algorithm could figure out that I’ve already bought a fitbit so there ads need no longer follow me all over the web). However, try to use it to take away their guns… and some of them will get very angry. Somewhere in there a line has been drawn. It has all the makings of an epic public policy and corporate policy nightmare.