Category Archives: datadotgc.ca

Let's Hack data.gc.ca

In just under two weeks data.gc.ca will celebrate its one year anniversary. This will also mark the period that the pilot project is officially supposed to end.

Looking at data.gc.ca three things stand out. First, the license has improved a great deal since its launch. Second, a LOT of data has been added to the site over the last year. And finally, the website is remarkably bad at searching for data and enabling a community of users.

Indeed, I believe that a lot of people have stopped visiting the site and don’t even know what data is available. My suspicion is that almost none of us know what is actually available since a) there is a lot, b) much of it is not sexy and c) it is very hard to search.

Let’s do something about that.

I have managed to create, and upload to buzzdata, a list of all the data sets in data.gc.ca – both geographic and non-geographic data sets.

I’m proposing that we go through the data.gc.ca data sets and find what is interesting to each of us, and on March 15th, find a way to highlight it or talk about it so that other people find out about it. Maybe you tweet about it (use the hashtah #gcdata) or blog about it.

Even more interesting would be if we could find a way to do it collaboratively – have a way of collectively marking what data sets are interesting (in say, a piratepad somewhere). If someone had a clever proposal about how to go through all the datasets, I’d love for us to collectively highlight the high value datasets (if there are any) available in data.gc.ca.

Speaking with the great community of open data activists in Ottawa, we brainstormed about organizing an event after work on the 15th where people might get together and do this. We could call it “The Big Search” – an effort in any city where people are interested to gather and comb through the data. All with the goal of signaling to developers, non-profits, journalists and others, what, if any, data in data.gc.ca might be of interest for analysis, applications, or other uses. In addition, this exercise would also help us write supportive and critical comments about the government’s open data trial.

Finally, and most ambitiously, I’ve heard some people say they’d like to design an alternative data portal – I’m definitely game for that and am happy to offer up the datadotgc.ca url for that too.

So, I’m throwing this out there. If there is interest, please comment below. Would love to hear your thoughts and hope we can maybe organize some events on March 15th, or at least posts data sets in blogs, on facebook and on twitter, that people think are interesting.

Canada launches data.gc.ca – what works and what is broken

Those on twitter will already know that this morning I had the privilege of conducting a press conference with Minister Day about the launch of data.gc.ca – the Federal Government’s Open Data portal. For those wanting to learn more about open data and why it matters, I suggest this and this blog post, and this article – they outline some of the reasons why open data matters.

In this post I want to review what works, and doesn’t work, about data.gc.ca.

What works

Probably the most important thing about data.gc.ca is that it exists. It means that public servants across the Government of Canada who have data they would like to share can now point to a website that is part of government policy. It is an enormous signal of permission from a central agency that will give a number of people who want to share data permission, a process and a vehicle, by which to do this. That, in of itself, is significant.

Indeed, I was informed that already a number of ministries and individuals are starting to approach those operating the portal asking to share their data. This is exactly the type of outcome we as citizens should want.

Moreover, I’ve been told that the government wants to double the number of data sets, and the number of ministries, involved in the site. So the other part that “works” on this site is the commitment to make it bigger. This is also important, as there have been some open data portals that have launched with great fanfare, only to have the site languish as neither new data sets are added and the data sets on the site are not updated and so fall out of date.

What’s a work in progress

The number of “high value” datasets is, relatively speaking, fairly limited. I’m always cautious about this as, I feel, what constitutes high value varies from user to user. That said, there are clearly data sets that will have greater impact on Canadians: budget data, line item spend data by department (as the UK does), food inspection data, product recall data, pretty much everything on the statscan website, Service Canada locations, postal code data and, mailbox location data, business license data, Canada Revenue data on charities and publicly traded companies are all a few that quickly come to mind, clearly I can imagine many, many more…

I think the transparency, tech, innovation, mobile and online services communities will be watching data.gc.ca closely to see what data sets get added. What is great is that the government is asking people what data sets they’d like to see added. I strongly encourage people to let the government know what they’d like to see, especially when it involves data the government is already sharing, but in unhelpful formats.

What doesn’t work

In a word: the license.

The license on data.gc.ca is deeply, deeply flawed. Some might go so far as to say that the license does not make it data open at all – a critique that I think is fair. I would say this: presently the open data license on data.gc.ca effectively kills any possible business innovation, and severally limits the use in non-profit realms.

The first, and most problematic is this line:

“You shall not use the data made available through the GC Open Data Portal in any way which, in the opinion of Canada, may bring disrepute to or prejudice the reputation of Canada.”

What does this mean? Does it mean that any journalist who writes a story, using data from the portal, that is critical of the government, is in violation of the terms of use? It would appear to be the case. From an accountability and transparency perspective, this is a fatal problem.

But it is also problematic from a business perspective. If one wanted to use a data set to help guide citizens around where they might be well, and poorly, served by their government, would you be in violation? The problem here is that the clause is both sufficiently stifling and sufficiently negative that many businesses will see the risk of using this data simply too great.

UPDATE: Thursday March 17th, 3:30pm, the minister called me to inform me that they would be striking this clause from the contract. This is excellent news and Treasury Board deserves credit for moving quickly. It’s also great recognition that this is a pilot (e.g. beta) project and so hopefully, the other problems mentioned here and in the comments below will also be addressed.

It is worth noting that no other open data portal in the world has this clause.

The second challenging line is:

“you shall not disassemble, decompile except for the specific purpose of recompiling for software compatibility, or in any way attempt to reverse engineer the data made available through the GC Open Data Portal or any part thereof, and you shall not merge or link the data made available through the GC Open Data Portal with any product or database for the purpose of identifying an individual, family or household or in such a fashion that gives the appearance that you may have received or had access to, information held by Canada about any identifiable individual, family or household or about an  organization or business.”

While I understand the intent of this line, it is deeply problematic for several reasons. First, many business models rely on identifying individuals, indeed, frequently individuals ask businesses to do this. Google, for example, knows who I am and offers custom services to me based on the data they have about me. It would appear that terms of use would prevent Google from using Government of Canada data to improve its service even if I have given them permission. Moreover, the future of the digital economy is around providing customized services. While this data has been digitized, it effectively cannot be used as part of the digital economy.

More disconcerting is that these terms apply not only to individuals, but also to organizations and businesses. This means that you cannot use the data to “identify” a business. Well, over at Emitter.ca we use data from Environment Canada to show citizens facilities that pollute near them. Since we identify both the facilities and the companies that use them (not to mention the politicians whose ridings these facilities sit in), are we not in violation of the terms of use? In a similar vein, I’ve talked about how government data could have prevented $3B of tax fraud. Sadly, data from this portal would not have changed that since, in order to have found the fraud, you’d have to have identified the charitable organizations involved. Consequently, this requirement manifestly destroys any accountability the data might create.

It is again worth noting that no other open data portal in the world has this clause.

And finally:

4.1 You shall include and maintain on all reproductions of the data made available through the GC Open Data Portal, produced pursuant to section 3 above, the following notice:

Reproduced and distributed with the permission of the Government of Canada.

4.2 Where any of the data made available through the GC Open Data Portal is contained within a Value-Added Product, you shall include in a prominent location on such Value-Added Product the following notice:

This product has been produced by or for (your name – or corporate name, if applicable) and includes data provided by the Government of Canada.

The incorporation of data sourced from the Government of Canada within this product shall not be construed as constituting an endorsement by the Government of Canada of our product.

or any other notice approved in writing by Canada.

The problem here is that this creates what we call the “Nascar effect.” As you use more and more government data, these “prominent” displays of attribution begin to pile up. If I’m using data from 3 different governments, each that requires attribution, pretty soon all your going to see are the attribution statements, and not the map or other information that you are looking for! I outlined this problem in more detail here. The UK Government has handled this issue much, much more gracefully.

Indeed, speaking of the UK Open Government License, I really wish our government had just copied it wholesale. We have a similar government system and legal systems so I see no reason why it would not easily translate to Canada. It is radically better than what is offered on data.gc.ca and, by adopting it, we might begin to move towards a single government license within Commonwealth countries, which would be a real win. Of course, I’d love it if we adopted the PDDL, but the UK Open Government License would be okay to.

In Summary

The launch of data.gc.ca is an important first step. It gives those of us interested in open data and open government a vehicle by which to get more data open and improve the accountability, transparency as well as business and social innovation. That said, there is much work to be done still: getting more data up and, more importantly, addressing the significant concerns around the license. I have spoken to Treasury Board President Stockwell Day about these concerns and he is very interested and engaged by them. My hope is that with more Canadians expressing their concerns, and with better understanding by ministerial and political staff, we can land on the right license and help find ways to improve the website and program. That’s why we to beta launches in the tech world, hopefully it is something the government will be able to do here too.

 

Apologies for any typos, trying to get this out quickly, please let me know if you find any.

Canada's Secret Open Data Strategy?

Be prepared for the most boring sentence to an intriguing blog post.

The other night, I was, as one is wont to do, reading through a random Organization for Economic Coordination and Development report entitled Towards Recovery and Partnership with Citizens: The Call for Innovative and Open Government. The report was, in fact, a summary of its recent Ministerial Meeting of the OECD’s Public Governance Committee.

Naturally, I flipped to the section authored by Canada and, imagine the interest with which I read the following:

The Government of Canada currently makes a significant amount of open data available through various departmental websites. Fall 2010 will see the launch of a new portal to provide one-stop access to federal data sets by providing a “single-window” to government data. In addition to providing a common “front door” to government data, a searchable catalogue of available data, and one-touch data downloading, it will also encourage users to develop applications that re-use and combine government data to make it useful in new and unanticipated ways, creating new value for Canadians. Canada is also exploring the development of open data policies to regularise the publication of open data across government. The Government of Canada is also working on a strategy, with engagement and input from across the public service, developing short and longer-term strategies to fully incorporate Web 2.0 across the government.

In addition, Canada’s proactive disclosure initiatives represent an ongoing contribution to open and transparent government. These initiatives include the posting of travel and hospitality expenses, government contracts, and grants and contribution funding exceeding pre-set thresholds. Subsequent phases will involve the alignment of proactive disclosure activities with those of the Access to Information Act, which gives citizens the right to access information in federal government records.

Lots of interesting things packed into these two paragraphs, something I’m sure readers concerned with open data, open government and proactive, would agree with. So let’s look at the good, the bad and the ugly, of all of this, in that order.

The Good

So naturally the first sentence is debatable. I don’t think Canada makes a significant amount of its data available at all. Indeed, across every government website there is probably no more than 400 data sets available in machine readable format. That’s less that the city of Washington DC. It’s about (less than) 1% of what Britain or the United States disclose. But, okay,let’s put that unfortunate fact aside.

The good and really interesting thing here is that the Government is stating that it was going to launch an open data portal. This means the government is thinking seriously about open data. This means – in all likelihood – policies are being written, people are being consulted (internally), processes are being thought through. This is good news.

It is equally good news that the government is developing a strategy for deploying Web 2.0 technologies across the government. I hope this will be happening quickly as I’m hearing that in many departments this is still not embraced and, quite often, is banned outright. Of course, using social media tools to talk with the public is actually the wrong focus (Since the communications groups will own it all and likely not get it right for quite a while), the real hope is being allowed to use the tools internally.

The Bad

On the open data front, the bad is that the portal has not launched. We are now definitely passed the fall of 2010 and, as for whatever reason, there is no Canadian federal open data portal. This may mean that the policy (despite being announced publicly in the above document) is in peril or that it is simply delayed. Innumerable things can delay a project like this (especially on the open data front). Hopefully whatever the problem is, it can be overcome. More importantly, let us hope the government does something sensible around licensing and uses the PDDL and not some other license.

The Ugly

Possibly the heart stopping moment in this brief comes in the last paragraph where the government talks about posting travel and hospitality expenses. While these are often posted (such as here) they are almost never published in machine readable format and so have to be scrapped in order to be organized, mashed up or compared to other departments. Worse still, these files are scattered across literally hundreds of government websites and so are virtually impossible to track down. This guy has done just that, but of course now he has the data, it is more easily navigable but no more open then before. In addition, it takes him weeks (if not months) to do it, something the government could fix rather simply.

The government should be lauded for trying to make this information public. But if this is their notion of proactive disclosure and open data, then we are in for a bumpy, ugly ride.

How Tories could do transparency – Globe and Mail

Today’s blog post appears in the Globe and Mail. You can read it there (please do, also give it a vote).

How Tories could do transparency

Britain’s new Conservative government did something on Friday that Canadians would fine impossible to imagine. After a brief video announcement from Prime Minister David Cameron about the importance of the event, Francis Maude, Minister of the Cabinet Office, and Tim Berners-Lee, inventor of the World Wide Web, announced that henceforth the spending data for every British ministry on anything over £25,000 (about $40,000) would be available for anyone in the world to download. The initial release of information revealed thousands and thousands of lines of data and almost £80-billion (about $129.75-billion) in spending. And starting in January, every ministry must update the data once a month.

For the British Conservative Party, this is a strategic move. Faced with a massive deficit, the government is enlisting the help of all Britons to identify any waste. More importantly, however, they see releasing data as a means by which to control government spending. Indeed, Mr. Maude argues: “When you are forced to account for the money you spend, you spend it more wisely. We believe that publishing this data will lead to better decision-making in government and will ultimately help us save money.” And they might be right. Already, organizations like Timetric, the Guardian newspaper and the Open Knowledge Foundation have visualized, organized and indexed the data so it is easier for ordinary citizens understand and explore how their government spends their money.

These external sites are often more powerful than what the government has. After observing the way these sites handle the data, the minister noted how he wished he’d had access to them while negotiating with some of the government’s largest contractors.

For Canadians, the Conservative-Liberal Democrat coalition government is but a distant example of a world that a truly transparent government could – and should – create. In contrast, Stephen Harper’s Conservatives seem stuck in a trap described by Mr. Maude in his opening sentences: “Opposition parties are always remarkably keen on greater government transparency, but this enthusiasm mysteriously tends to diminish once they actually gain power.” Canada’s Conservatives have been shy about sharing any information with anyone. Afghan detainee files aren’t shared with Parliament; stimulus package accounts were not emailed to the Parliamentary Budget Office, but uselessly handed over in 4,476 printed pages. Even the Auditor-General is denied MP expense data. All this as access-to-information wait times exceed critical levels and Canada, unlike the United States, Britain , Australia and New Zealand, languishes with no open-data policy. Only once has the government pro-actively shared real “data,” when it shared some stimulus data that could be downloaded.

The irony is not only that the Tories ran on an agenda of accountability and transparency, but that – as their British counterparts understand – actually implementing a transparency and open-data policy may be one of the best ways to stamp a conservative legacy on the government’s future. Moreover, it could be a very popular move.

During the digital economy strategy consultations, open data was the second-most popular suggestion. Interestingly, it would appear the Liberals are prepared to explore the opportunity. They are the only party with a formal policy on open data that matches the standards recently set by Britain and, increasingly, in the United States.

Open data will eventually come to Canada. When, however, is unclear. In the meantime it is our colleagues elsewhere that will reap the benefits of savings, improved analysis and better civic engagement. So until Mr. Harper’s team changes its mind, Canadians must look abroad to see what a Conservative government that actually believes in transparency could look like.

David Eaves is a public-policy entrepreneur, open government activist and negotiation expert based in Vancouver

Launching datadotgc.ca 2.0 – bigger, better and in the clouds

Back in April of this year we launched datadotgc.ca – an unofficial open data portal for federal government data.

At a time when only a handful of cities had open data portals and the words “open data” were not being even talked about in Ottawa, we saw the site as a way to change the conversation and demonstrate the opportunity in front of us. Our goal was to:

  • Be an innovative platform that demonstrates how government should share data.
  • Create an incentive for government to share more data by showing ministers, public servants and the public which ministries are sharing data, and which are not.
  • Provide a useful service to citizens interested in open data by bringing it all the government data together into one place to both make it easier to find.

In every way we have achieved this goal. Today the conversation about open data in Ottawa is very different. I’ve demoed datadotgc.ca to the CIO’s of the federal government’s ministries and numerous other stakeholders and an increasing number of people understand that, in many important ways, the policy infrastructure for doing open data already exists since datadotgc.ca show the government is already doing open data. More importantly, a growing number of people recognize it is the right thing to do.

Today, I’m pleased to share that thanks to our friends at Microsoft & Raised Eyebrow Web Studio and some key volunteers, we are taking our project to the next level and launching Datadotgc.ca 2.0.

So what is new?

In short, rather than just pointing to the 300 or so data sets that exist on federal government websites members may now upload datasets to datadotg.ca where we can both host them and offer custom APIs. This is made possible since we have integrated Microsoft’s Azure cloud-based Open Government Data Initiative into the website.

So what does this mean? It means people can add government data sets, or even mash up government data sets with their own data to create interest visualization, apps or websites. Already some of our core users have started to experiment with this feature. London Ontario’s transit data can be found on Datadotgc.ca making it easier to build mobile apps, and a group of us have taken Environment Canada’s facility pollution data, uploaded it and are using the API to create an interesting app we’ll be launching shortly.

So we are excited. We still have work to do around documentation and tracking some more federal data sets we know are out there but, we’ve gone live since nothing helps us develop like having users and people telling us what is, and isn’t working.

But more importantly, we want to go live to show Canadians and our governments, what is possible. Again, our goal remains the same – to push the government’s thinking about what is possible around open data by modeling what should be done. I believe we’ve already shifted the conversation – with luck, datadotgc.ca v2 will help shift it further and faster.

Finally, I can never thank our partners and volunteers enough for helping make this happen.

Your Government Just Got Dumber: how it happened and why it matters to you

This piece was published in the Globe and Mail today so always nice when you read it there and let them know it matters to you.

Last week the Conservative Government decided that it would kill the mandatory long census form it normally sends out to thousands of Canadians every five years. On the surface such a move may seem unimportant and, to many, uninteresting, but it has significant implications for every Canadian and every small community in Canada.

Here are 3 reasons why this matters to you:

1. The Death of Smart Government

Want to know who the biggest user of census data is? The government. To understand what services are needed, where problems or opportunities may arise, or how an region is changing depends on having accurate data. The federal government, but also the provincial and, most importantly, local governments use Statistics Canada’s data every day to find ways to save tax payers money, improve services and make plans. Now, at the very moment – thanks to computers – governments are finding new ways to use this information more effectively than ever before, it is to be cut off.

To be clear this is a direct attack on the ability of government to make smart decisions. In fact it is an attack on evidence based public policy. Moreover, it was a political decision – it came from the Minister’s office and does not appear to reflect what Statistics Canada either wants or recommends. Of course, some governments prefer not to have information, all that data and evidence gets in the way of legislation and policies that are ineffective, costly and that reward vested interests (I’m looking at you Crime Bill).

2. The Economy is Less Competitive

But it isn’t just government that will suffer. In the 21st century economies data and information are at the heart of economic activity, it is what drives innovation, efficiencies and productivity. Starve our governments, ngo’s, businesses and citizens of data and you limit the wealth a 21st century economy will generate.

Like roads to the 20th century economy, data is the core infrastructure for a 21st century economy. While just a boring public asset, it can nonetheless foster big companies, jobs and efficiencies. Roads spawned GM. Today, people often fail to recognize that the largest company already created by the new economy – Google – is a data company. Google is effective and profitable not because it sells ads, but because it generates and leverages petabytes of data every day from billions of search queries. This allows it to provide all sorts of useful services such as pointing us, with uncanny accuracy, to merchandises and services we want, or better yet, spam we’d like to avoid. It can even predict when communities will experience flu epidemics four months in advance.

And yet, it is astounding that the Minister in charge of Canada’s digital economy, the minister who should understand the role of information in a 21st century economy, is the minister who authorized killing the creation of this data. In doing so he will deprive Canadians and their businesses of information that would make them, and thus our economy, more efficient, productive and profitable. Of course, the big international companies will probably be able to find the money to do their own augmented census, so those that will really suffer will be small and medium size Canadian businesses.

3. Democracy Just got Weaker

Of course, the most important people who could use the data created by the census aren’t government or businesses. It is ordinary Canadians. In theory, the census creates a level playing field in public policy debates. Were Statistics Canada website usable and its data accessible (data, may I remind you we’ve already paid for) then citizens could use this information to fight ineffective legislation, unjust policies, or wasteful practices. In a world where this information won’t exist those who are able to pay for the creation of this information – read large companies – will have an advantage not only over citizens, but over our governments (which of course, won’t have this data anymore either). Today, the ability of ordinary citizens to defend themselves against government and businesses just got weaker.

So who’s to blame? Tony Clement, the Minister of Industry Canada who oversees Statistics Canada, is to blame. His office authorized this decision. But Statistics Canada also shares in the blame. In an era where the internet has flattened the cost of distributing information Statistics Canada: continues to charge citizens for data their tax dollars already paid for; has an unnavigable website where it is impossible to find anything; and often distributes data in formats that are hard to use. In short, for years the department has made its data inaccessible to ordinary Canadians. As a result it isn’t hard to see why most Canadians don’t know about or understand this issue. Sadly, once they do wake up to the cost of this terrible decisions, I fear it will be too late.

Learning from Libraries: The Literacy Challenge of Open Data

We didn’t build libraries for a literate citizenry. We built libraries to help citizens become literate. Today we build open data portals not because we have public policy literate citizens, we build them so that citizens may become literate in public policy.

Yesterday, in a brilliant article on The Guardian website, Charles Arthur argued that a global flood of government data is being opened up to the public (sadly, not in Canada) and that we are going to need an army of people to make it understandable.

I agree. We need a data-literate citizenry, not just a small elite of hackers and policy wonks. And the best way to cultivate that broad-based literacy is not to release in small or measured quantities, but to flood us with data. To provide thousands of niches that will interest people in learning, playing and working with open data. But more than this we also need to think about cultivating communities where citizens can exchange ideas as well as involve educators to help provide support and increase people’s ability to move up the learning curve.

Interestingly, this is not new territory.  We have a model for how to make this happen – one from which we can draw lessons or foresee problems. What model? Consider a process similar in scale and scope that happened just over a century ago: the library revolution.

In the late 19th and early 20th century, governments and philanthropists across the western world suddenly became obsessed with building libraries – lots of them. Everything from large ones like the New York Main Library to small ones like the thousands of tiny, one-room county libraries that dot the countryside. Big or small, these institutions quickly became treasured and important parts of any city or town. At the core of this project was that literate citizens would be both more productive and more effective citizens.

But like open data, this project was not without controversy. It is worth noting that at the time some people argued libraries were dangerous. Libraries could spread subversive ideas – especially about sexuality and politics – and that giving citizens access to knowledge out of context would render them dangerous to themselves and society at large.  Remember, ideas are a dangerous thing. And libraries are full of them.

Cora McAndrews Moellendick, a Masters of Library Studies student who draws on the work of Geller sums up the challenge beautifully:

…for a period of time, censorship was a key responsibility of the librarian, along with trying to persuade the public that reading was not frivolous or harmful… many were concerned that this money could have been used elsewhere to better serve people. Lord Rodenberry claimed that “reading would destroy independent thinking.” Librarians were also coming under attack because they could not prove that libraries were having any impact on reducing crime, improving happiness, or assisting economic growth, areas of keen importance during this period… (Geller, 1984)

Today when I talk to public servants, think tank leaders and others, most grasp the benefit of “open data” – of having the government sharing the data it collects. A few however, talk about the problem of just handing data over to the public. Some questions whether the activity is “frivolous or harmful.” They ask “what will people do with the data?” “They might misunderstand it” or “They might misuse it.” Ultimately they argue we can only release this data “in context”. Data after all, is a dangerous thing. And governments produce a lot of it.

As in the 19th century, these arguments must not prevail. Indeed, we must do the exact opposite. Charges of “frivolousness” or a desire to ensure data is only released “in context” are code to obstruct or shape data portals to ensure that they only support what public institutions or politicians deem “acceptable”. Again, we need a flood of data, not only because it is good for democracy and government, but because it increases the likelihood of more people taking interest and becoming literate.

It is worth remembering: We didn’t build libraries for an already literate citizenry. We built libraries to help citizens become literate. Today we build open data portals not because we have a data or public policy literate citizenry, we build them so that citizens may become literate in data, visualization, coding and public policy.

This is why coders in cities like Vancouver and Ottawa come together for open data hackathons, to share ideas and skills on how to use and engage with open data.

But smart governments should not only rely on small groups of developers to make use of open data. Forward-looking governments – those that want an engaged citizenry, a 21st-century workforce and a creative, knowledge-based economy in their jurisdiction – will reach out to universities, colleges and schools and encourage them to get their students using, visualizing, writing about and generally engaging with open data. Not only to help others understand its significance, but to foster a sense of empowerment and sense of opportunity among a generation that could create the public policy hacks that will save lives, make public resources more efficient and effective and make communities more livable and fun. The recent paper published by the University of British Columbia students who used open data to analyze graffiti trends in Vancouver is a perfect early example of this phenomenon.

When we think of libraries, we often just think of a building with books.  But 19th century mattered not only because they had books, but because they offered literacy programs, books clubs, and other resources to help citizens become literate and thus, more engaged and productive. Open data catalogs need to learn the same lesson. While they won’t require the same centralized and costly approach as the 19th century, governments that help foster communities around open data, that encourage their school system to use it as a basis for teaching, and then support their citizens’ efforts to write and suggest their own public policy ideas will, I suspect, benefit from happier and more engaged citizens, along with better services and stronger economies.

So what is your government/university/community doing to create its citizen army of open data analysts?

CIO Summit recap and links

Yesterday I was part of a panel at the CIO Summit, a conference for CIO’s of the various ministries of the Canadian Government.  There was lots more I would have liked to have shared with the group, so I’ve attached some links here as a follow up for those in (and not in) attendance, to help flesh out some of my thoughts:

1. Doing mini-GCPEDIAcamps or WikiCamps

So what is a “camp“? Check out Wikipedia! “A term commonly used in the titles of technology-related unconferences, such as Foo Camp and BarCamp.” In short, it is an informal gathering of people who share a common interest who gather to share best practices or talk about the shared interest.

There is interest in GCPEDIA across the public service but many people aren’t sure how to use it (in both the technical and social sense). So let’s start holding small mini-conferences to help socialize how people can use GCPEDIA and help get them online. Find a champion, organize informally, do it at lunch, make it informal, and ensure there are connected laptops or computers on hand. And do it more than once! Above all, a network peer-based platform, requires a networked learning structure.

2. Send me a Excel Spreadsheet of structured data sets on your ministries website

As I mentioned, a community of people have launched datadotgc.ca. If you are the CIO of a ministry that has structured data sets (e.g. CVS, excel spreadsheets, KML, SHAPE files, things that users can download and play with, so not PDFs!) drop the URLs of their locations into an email or spreadsheet and send it to me! I would love to have your ministry well represented on the front page graph on datadotgc.ca.

3. Some links to ideas and examples I shared

– Read about how open data help find/push the CRA to locate $3.2B dollar in lost tax revenue.

– Read about how open data needs to be part of the stimulus package.

– Why GCPEDIA could save the public service here.

– Check out Vantrash, openparliament is another great site too.

– The open data portals I referenced: the United States, the United Kingdom, The World Bank, & Vancouver’s

4. Let’s get more people involved in helping Government websites work (for citizens)

During the conference I offered to help organize some Government DesignCamps to help ensure that CLF 3 (or whatever the next iteration will be called) helps Canadians navigate government websites. There are people out there who would offer up some free advice – sometimes out of love, sometimes out of frustration – that regardless of their motivation could be deeply, deeply helpful. Canada has a rich and talented design community including people like this – why not tap into it? More importantly, it is a model that has worked when done right. This situation is very similar to the genesis of the original TransitCamp in Toronto.

5. Push your department to develop an Open Source procurement strategy

The fact is, if you aren’t even looking at open source solutions you are screen out part of your vendor ecosystem and failing in your fiduciary duty to engage in all options to deliver value to tax payers. Right now Government’s only seem to know how to pay LOTS of money for IT. You can’t afford to do that anymore. GCPEDIA is available to every government employee, has 15,000 users today and could easily scale to 300,000 (we know it can scale because Wikipedia is way, way bigger). All this for the cost of $60K in consulting fees and $1.5M in staff time. That is cheap. Disruptively cheap. Any alternative would have cost you $20M+ and, if scaled, I suspect $60M+.

Not every piece of software should necessarily be open source, but you need to consider the option. Already, on the web, more and more governments are looking at open source solutions.

Help with datadotgc.ca

For regular readers of my blog I promise not to talk too much about datadotgc.ca here at eaves.ca. I am going to today since I’ve received a number of requests from people asking if and how they could help so I wanted to lay out what is on my mind at the moment, and if people had time/capacity, how they could help.

The Context

Next wednesday I’ll be doing a small presentation to all the CIO’s of the federal public service. During that presentation I’d like to either go live to datadotgc.ca or at least show an up to date screen shot (if there is no internet). It would be great to have more data sets in the site at that time so I can impress upon this group a) how little machine readable data there is in Canada versus other countries (especially the UK and US) and b) show them what an effective open data portal should look like.

So what are the datadotgc.ca priorities at this moment?

1. Get more data sets listed in datadotgc.ca

There is a list of machine readable data sets known to exist in the federal government that has been posted here. For coders – the CKAN API is relatively straight forward to use. There is also an import script that can allow one to bulk import data lists into datadotgc.ca, as well as instructions posted here in the datadotgc.ca google group.

2. Better document how to bulk add data sets.

While the above documentation is good, I’d love to have some documentation and scripts that are specific to datadotgc.ca/ca.ckan.net. I’m hoping to recruit some help with this tonight at the Open Data hackathon, but if you are interested, please let me know.

3. Build better tools

One idea I had that I have shared with Steve T. is to develop a jet-pack add on for Firefox that, when you are on a government page scans for links to certain file types (SHAPE, XLS, etc…) and then let’s you know if they are already in datadotgc.ca. If not, it would provide a form to “liberate the dataset” without forcing the user to leave the government website. This would make it easier for non-developers to add datasets to datadotgc.ca.

4. Locate machine readable data sets

Of course, we can only add to datadotgc.ca data sets that we know about, so if you know about a machine readable datasets that could be liberated, please add it! If there are many and you don’t know how ping me, or add it directly to the list in the datadotgc.ca google group.