At the 2010 GTEC conference I did a panel with David Strigel, the Program Manager of the Citywide Data Warehouse (CityDW) at the District of Columbia Government. During the introductory remarks David recounted the history of Washington DC’s journey to open data.
Interestingly, that journey began not with open data, but with an internal problem. Back around 2003 the city had a hypothesis that towing away abandoned cars would reduce crime rates in the immediate vicinity, thereby saving more money in the long term than the cost of towing. In order to access the program’s effectiveness city staff needed to “mash-up” longitudinal crime data against service request data – specifically, requests to remove abandoned cars. Alas, the data sets were managed by different departments, so this was tricky task. As a result the city’s IT department negotiated bilateral agreements with both departments to host their datasets in a single location. Thus the DC Data Warehouse was born.
Happily, the data demonstrated the program was cost effective. Building on this success the IT department began negotiating more bilateral agreements with different departments to host their data centrally. In return for giving up stewardship of the data the departments retained governance rights but reduced their costs and the IT group provided them with additional, more advanced, analytics. Over time the city’s data warehouse became vast. As a result, when DC decided to open up its data it was, relatively speaking, easy to do. The data was centrally located, was already being shared and used as a platform internally. Extending this platform externally (while not trivial) was a natural step.
In short, the deep problem that needed to solved wasn’t open data. Its was an information management. Getting the information management and governance policies right was essential for DC to move quickly. Moreover, this problem strikes at the heart of what it means to be government. Knowing what data you have, where it is, and under a governance structure that allows it to be shared internally (as well as externally) is a problem every government is going to face if it wants to be efficient, relevant and innovative in the 21st century. In other words, information management is the cake. Open data – which I believe is essential – is however the sweet icing you smother on top of that dense cake you’ve put in place.
Okay, with that said two points that flow from this.
First: Sometime, governments that “do” open data start off by focusing on the icing. The emphasis in on getting data out there, and then after the fact, figuring out governance model that will make sense. This is a viable strategy, but it does have real risks. When sharing data isn’t at the core function but rather a feature tacked on at the end, the policy and technical infrastructure may be pretty creaky. In addition, developers may not want to innovate on top of your data platform because they may (rightly) question the level of commitment. One reason DC’s data catalog works is because it has internal users. This gives the data stability and a sense of permanence. On the upside, the icing is politically sexier, so it may help marshal resources to help drive a broader rethink of data governance. Either way, at some point, you’ve got to tackle the cake, otherwise, things are going to get messy. Remember it took DC 7 years to develop its cake before it put icing on it. But that was making it from scratch. Today thanks to new services (armies of consultants on this), tools (eg. Socrata) and models (e.g. like Washington, DC) you can make that cake following a recipe and even use cake mix. As David Strigel pointed out, today, he could do it in a fraction of the time.
Second: More darkly, one lesson to draw from DC is that the capacity of a government to do open data may be a pretty good proxy for their ability to share information and coordinate across different departments. If your government can’t do open data in a relatively quick time period, it may mean they simply don’t have the infrastructure in place to share data internally all that effectively either. In a world where government productivity needs to rise in order to deal with budget deficits, that could be worrying.