The dangerous mystique of the “open data” business

I’m frequently asked by people about how they can start an “open data business.” Let me first say that I love that the question gets asked. I love that people are interested in Open Data. I love that people want to learn more, they want to play, they want to think of ways of creating a company. These are, in part, signs of how far the open data discussion has come – people see it as a resource that they would like to leverage.

It is, also, the wrong question.

This is not to say there are not businesses that use open data. Indeed, a vast number of companies use open data (anyone company using census data for even a tiny part of their business qualifies). Nor am I denying there aren’t businesses built primarily with open data – the Open Data 500 list demonstrates there are. Plus I get introduced almost daily to businesses that are: both Ajah and OpenCorporates come to mind (I have donated advice, but have no financial connection, with either).

The Trap

But from a founder (or, I suppose, investor) perspective there are dangers to thinking about “open data” as a unique business space.

The danger is in failing to understand there is virtually nothing that distinguishes an open data business from any other business. Any business needs to solve a real (or sadly, at times imagined) problem, it needs to find clients (e.g. people willing to pay for that solution), and it needs to execute on a number of other things at least competently (HR, marketing, management, cashflow, etc…).

The danger with putting the words “open data” before the word “business” is that it risks making people think Open Data businesses are somehow unique. They are not. If there is a gapping chasm between the question of “what can I do with software” and “how can I create a viable software company” there is an equally large gap between “what can I do with open data” and “how can I create a viable company using open data.” And the questions you need to ask yourself to figure out that latter question (many of which are nicely laid out in this book) are independent of whether it is a software, hardware, crafts or open data business.

Indeed open source software space gives us a nice analogy. I suspect few people decide to create an open source software company – they decide to create a company and the software license is a reflection of their strategic options. I think it is the same with open data. You don’t start a company saying “let’s use open data.” You start a company to solve a problem, of which using or publishing open data may be the only, or the most strategic, way of doing this.

The Opportunity 

Some readers may be surprised to see me write this. I am, and continue to be an advocate of open data. But open data is not some magic pixie dust that causes normal business logic to disappear. And it is not that I think people are saying that per se, it is just that I want them to understand that the 99% of the problems that needs to be solved in an “open data business” lie in the third word of that string, and that while the first two do confer some unique advantages and disadvantages, these are relatively trivial.

The real opportunity of open data lies not in the way it creates a new unique type of business, but that it offers a new set of cheap building blocks by which to try to solve problems. In other words it increases the diversity and, lowers the cost of, inputs.

Here again the world of software is instructive. The Economist’s recent survey on Tech Start Ups talks of a Cambrian Explosion because of the availability of “Cheap and ubiquitous building blocks for digital products…” many of which are (and many of which are not) open source. The cheap availability of these building blocks is allowing for a range of experimentation that was previously not possible, or at least, prohibitively expensive.

Open data – whether as an input for software products and services, for analysis (journalistic or corporate) or for scientific research – is cheap (in theory free) and increasingly plentiful. It has the possibility of thus being the equivalent of the cheap code that is powering a great deal of experimentation in the world of software. As an open data advocate the possibility of this increased experimentation has me excited.


So if you are thinking about starting an open data business – that is great! I’m excited to hear that and I am keen to help and be supportive. But focus on that third word – business. That’s the one that really matters.

From a data perspective you should be asking yourself – what real tangible pain does doing something with this data set help me solve that was previously only possible with a more expensive input (e.g. proprietary data) or not possible at all. The second is to think of the impact of using open data on your strategy. Where does it leave you more vulnerable (too copy cats or the whims of the data publisher) and where does it leave you stronger (if the data is commoditized then the axis of competition will lie in other parts of the business).

I hope this is a helpful nuance to the issue of open data businesses, and some helpful input for those looking at open data and thinking about to find business opportunities in it.

Data Wars: A mini-case study of Southwest Airlines vs. TripIt and Orbitz

As a regular flyer, I’m an enormous fan of TripIt. It’s a simple service in which you forward almost any reservation – airline, hotel, car rental, etc… to plans@tripit.com and their service will scan it, grab the relevant data, and create a calendar of events for you. While it’s a blessing not to have to manually enter my travel plans into my calendar, what’s particularly fantastic is that I give my partner access to the calendar – so she knows when I’m flying out and when I return. With 135,000 miles of travel last year alone, there was a lot of that.

TripIt Pro users, however, have added benefits: they can use TripIt to track how many loyalty points they are gathering. That is, unless you travel on Southwest Airlines. Apparently Southwest sent a legal warning to any company that tracks their members’ loyalty benefits and ordered them to stop doing so. (Award Wallet is another example of an app I use that was affected). In a similar vein, veteran travelers know that Southwest does not appear on many travel search sites like Orbitz.

These are great examples of a data wars – places where a company are fighting over who gets access to customers data. In this case Southwest is using its user license to forbids another company from displaying data Southwest generates, but that its customers might wish to share with others because it is helpful to them. It’s not just that Southwest wants to control its relationship with its customers when it comes to loyalty points, or that it wants to sell them hotels and rental cars though its site. It’s that it wants the data about how you behave, about what choices you make and how you make them. Use another site to access loyalty points and they can’t track or sell to you. Ditto if you use another site to buy airfare for their flights.

Southwest isn’t nuts. But it’s a strategy that won’t work for all companies (and may not work for them) and it has real consequences.

To begin with, they are making it hard for their customers to engage their service. When traveling in the US, I regularly use Kayak and/or other types of airline aggregators – it means I never see Southwest as an option. Nor do I go to their website. The bigger irony of course is that while I frequently find fares on aggregator sites, I often book them on the airline’s site. But again, I don’t go to Southwest because they never appear in any searches I do. Maybe they don’t care about business travelers, but they are making a big trade-off – they get more data about their users and have unique opportunities to sell to them, but I suspect they get far fewer users.

In addition, they may be alienating their customers. I’m not so sure customers will feel like loyalty point data belongs to Southwest. After all, it was their dollars and flying that paid to create the data… why shouldn’t they be able to access a copy of it via an application they find useful?

This was all confirmed by an email from a friend and colleague Gary R., who recently wrote me to say:

While we love Southwest Airlines for its low prices, generous affinity programs and flexibility in changing business trips at the last moment with little consequence, their closed data sharing policy drives up our overall cost of managing travel. Entering flight information manually into TripIt is a pain, yet the service is incredible at keeping one informed during a trip, presenting a palette of options seemingly the instant things go wrong. We have chosen other carriers over Southwest on occasion because they play nicely with Orbitz and TripIt.

I can’t tell if Southwest’s tradeoffs are worth it or not. But any business person must at least recognize there is a tradeoff. That’s the real lesson. You need to find a way to value the data you collect and be able to compare it against the opportunity of a) happier clients and b) potentially accessing more clients. This is particularly true since many customers probably (and rightly) will feel that is data is as much theirs as it is yours. They did co-create it.

Ultimately if you increase the transaction costs of the experience – because you want to shut other actors out – you will lose customers.  Southwest already has.

Definitely expect more of these types of legal battles in the future. Your data is now as important as the service you use. This makes it both powerful, and dangerous in the hands of the wrong people.

The next Open Data battle: Advancing Policy & Innovation through Standards

With the possible exception of weather data, the most successful open data set out there at the moment is transit data. It remains the data with which developers have experimented and innovated the most. Why is this? Because it’s been standardized. Ever since Google and the City of Portland creating the General Transit Feed Specification (GTFS) any developer that creates an application using GTFS transit data can port their application to over 100+ cities around the world with 10s and even 100s of millions of potential users. Now that’s scale!

All in all the benefits of a standard data structure are clear. A public good is more effectively used, citizens receive enjoy better service and companies (both Google and the numerous smaller companies that sell transit related applications) generate revenue, pay salaries, etc…

This is why, with a number of jurisdictions now committed to open data, I believe it is time for advocates to start focusing on the next big issue. How do we get different jurisdictions to align around standard structures so as to increase the number of people to whom an application or analysis will be relevant? Having cities publish open data sets is a great start and has led to real innovation, next generation open data and the next leaps in innovation will require some more standards.

The key, I think, is to find areas that meet three criteria:

  • Government Data: Is there relevant government data about the service or issue that is available?
  • Demand: Is this a service for which there is regular demand? (this is why transit is so good, millions of people touch the service on a daily basis)
  • Business Model: Is there a business that believes it can use this data to generate revenue (either directly, or indirectly)




Two comments on this.

First, I think we should look at this model because we want to find places where the incentives are right for all the key stakeholders. The wrong way to create a data structure is to get a bunch of governments together to talk about it. That process will take 5 years… if we are lucky. Remember the GTFS emerged because Google and Portland got together, after that, everybody else bandwagoned because the value proposition was so high. This remains, in my mind, not the perfect, but the fastest and more efficient model to get more common data structures. I also respect it won’t work for everything, but it can give us more successes to point to.

Which leads me to point two. Yes, at the moment, I think that target in the middle of this model is relatively small. But I think we can make it bigger. The GTFS shows cities, citizens and companies that there is value in open data. What we need are more examples so that a) more business models emerge and b) more government data is shared in a structured way across multiple jurisdictions. The bottom and and right hand circles in this diagram can, and if we are successful will, move. In short, I think we can create this dynamic:


So, what does this look like in practice?

I’ve been trying to think of services that fall in various parts of the diagram. A while back I wrote a post about using open restaurant inspection data to drive down health costs. Specifically around finding a government to work with a Yelp!, Bing or Google Maps, Urban Spoon or other company to integrate the  inspection data into the application. That for me is an example of something that I think fits in the middle. Government’s have the data, its a service citizens could touch on a regular base if the data appeared in their workflow (e.g. Yelp! or Bing Maps) and for those businesses it either helps drive search revenue or gives their product a competitive advantage. The Open311 standard (sadly missing from my diagram), and the emergence of SeeClickFix strike me as another excellent example that is right on the inside edge of the sweet spot).

Here’s a list of what else I’ve come up with at the moment:


You can also now see why I’ve been working on Recollect.net – our garbage pick up reminder service – and helping develop a standard around garbage scheduling data – the Trash & Recycling Object Notation. I think it is a service around which we can help explain the value of common standards to cities.

You’ll notice that I’ve put “democracy data” (e.g. agendas, minutes, legislation, hansards, budgets, etc…) in the area where I don’t think there is a business plan. I’m not fully convinced of this – I could see a business model in the media space for this – but I’m trying to be conservative in my estimate. In either case, that is the type of data the good people at the Sunlight Foundation are trying to get liberated, so there is at least, non-profit efforts concentrated there in America.

I also put real estate in a category where I don’t think there is real consumer demand. What I mean by this isn’t that people don’t want it, they do, but they are only really interested in it maybe 2-4 times in their life. It doesn’t have the high touch point of transit or garbage schedules, or of traffic and parking. I understand that there are businesses to be built around this data, I love Viewpoint.ca – a site that takes mashes opendata up with real estate data to create a compelling real estate website – but I don’t think it is a service people will get attached to because they will only use it infrequently.

Ultimately I’d love to hear from people on ideas they on why might fit in this sweet spot. (if you are comfortable sharing the idea, of course). Part of this is because I’d love to test the model more. The other reason is because I’m engaged with some governments interested in getting more strategic about their open data use and so these types of opportunities could become reality.

Finally, I just hope you find this model compelling and helpful.

What Canada’s Realtors could learn from Canada’s Lawyers

Lawyers aren’t generally known to be the most technologically forwarding looking group – but here in Canada they have done one thing really, really well. Making radically efficient the transaction costs around sharing critical information regarding their industry.

CanLII – the non-profit managed by the Federation of Law Societies of Canada has the goal “to make Canadian law accessible for free on the Internet.” In essence CanLII copies all of the materials produced by the courts, organizes it and makes it searchable and re-usable by anyone. For realtors wondering about their future, looking over this service might be a good place to start.

Consider MLS.ca (now rebranded as realtor.ca) the website run by the Canadian Real Estate Association (CREA) that shares information on what homes are for sale where. A few of you may also know that the Competition Bureau and CREA have recently been tangling over access to MLS. While the it is now easier for people to list properties on MLS, the data within MLS is very restricted. Much of the data only realtors can see and re-use of the data appears strictly verboten. These restrictions cause Canadians to suffer from what I like to call the Hulu Syndrome – they can see what a more open system would look like by surfing the various property websites in the United States – but they are stuck using MLS when trying to browse for a home to buy.

Canadian realtors wanting to know what the future looks like for a professional service in a world where data and information is widely available, CanLII offers both a window and a model. Unlike MLS, the great thing about CanLII is that it serves everyone, not just lawyers. It isn’t hard to imagine a world where lawyers insisted that only they can access the cataloging system they pay for (lawyers pay a small annual fee to support CanLII) much like only realtors can access the full database of MLS. In such world if you wanted to read a judgement, or view court documents on a specific case, only a lawyer could access it for you, and then they would interpret it for you, and, to carry the analogy to its logically conclusion, you would rarely or likely never see the original documents.

Thankfully for both the legal system, the market place for legal services and for our democracy, CanLII doesn’t work this way. As mentioned anyone can search, find and download all the information. Indeed, look at CanLII’s Terms of Use:

Subject to the following paragraph and the below conditions pertaining to prohibited use, legal materials published on the CanLII website, such as legislation, regulations and decisions, including editorial enhancements inserted into the documents by CanLII, such as hyperlinks and information in headers and footers, can be copied, printed and distributed by Users free of charge and without any other authorization from CanLII, provided that CanLII is identified as the source of the document.

Compare this to MLS’s terms of use:

This database and all materials on this site are protected by copyright laws and are owned by The Canadian Real Estate Association (CREA) or by the member who has supplied the data. Property listings and other data available on this site are intended for the private, non-commercial use by individuals. Any commercial use of the listings or data in whole or in part, directly or indirectly, is specifically forbidden except with the prior written authority of the owner of the copyright.

(Side note, I’m pretty sure you can’t copyright data – so not sure what the legal rights being exercised here are).

Of course, even though CanLI makes legal documents are freely available, many people still want to use lawyers because they don’t have time or, just as often, realize they need expert advice in this complicated field.

The same would be true of MLS. Many, many buyers will still want to use a realtor, although the buyers and sellers in the market place would be smarter and more informed – but this would probably lead to a better marketplace and happier customers. There are of course, a number of buyers and sellers who will simply freeload off MLS’s data to sell or buy their home on their own (much like some people probably “freeload” off CanLII to represent themselves or do research). But these are probably clients who would prefer to be doing it this way anyway – giving them full access to the database may cause them to a) realize they do need professional help or b) remove customers who don’t really want to use a realtor in the first place and are thus… terrible customers.

This isn’t to say that sharing MLS data won’t be disruptive, I suspect that some people will automate the buying/selling process which a percentage of the market place will prefer to a handheld process – but I suspect that, at some point, this will happen anyway (someone will figure out a model to make it work) at which point CREA and the realtors will have been firmly entrenched in the minds of Canadians as the obstacle to a better, more efficient marketplace, not the leaders who helped foster it.

Lawyers aren’t often known for clarity and simplicity, but clearly when they get it right, they get it right. I hope other professional services will look at what they are up to.

When Measuring the Digital Economy, Measure the (Creative) Destruction Too

Yesterday I had a great lunch with Justin Kozuch of the Pixels to Product research study which aims “to create a classification system for Canada’s digital media industry and shed light on the industry’s size and scope.”

I think the idea of measuring the size and scope of Canada’s digital media industry is a fantastic idea. Plenty of people – including many governments – are probably very curious about this.

But one thought I had was: if we really want to impress on governments the importance of the digital economy, don’t measure it’s size. Measure its creative destructive/disruptive power.

In short, measure the amount of the “normal” economy it has destroyed.

Think of every newspaper subscription canceled, every print shop closed, every board game not played, every add not filmed, whatever… but think of all the money saved by businesses and consumers because the digital made their options dramatically cheaper.

I’m not sure what the methodology for such a measurement would look like, or even if it is possible. But it would be helpful.

I suspect the new digital businesses that replace them are smaller and more efficient. Indeed, they often have to be dramatically so to justify the switching cost. This is part of what makes them disruptive. Take, for example, Google. Did you know it only has 20,000 employees? I always find that an incredible figure. These 20,000 people are creating systems that are wiping out (and creating) whole industries.

I say all this because often the digital replacement of the economy won’t (initially) be as big as what it replaced – that’s the whole point. The risk is governments and economic planning groups will look at the current size of the digital economy and be… unimpressed. Measuring destruction might be one way to change the nature of the conversation, to show them how big this part of the economy really is and why they need to give it serious consideration.

The Social Network and the real villains of the old/new economy

The other week I finally got around to watching The Social Network. It’s great fun and I recommend going out and watching it whether you’re a self-professed social media expert or don’t even have a Facebook account.

Here are some of my thoughts about the movie (don’t worry, no spoilers here).

1. Remember this is a Hollywood movie: Before (or after) you watch it, read Lawrence Lessig’s fantastic critique of the movie. This review is so soundly brilliant and devastating I’m basically willing to say, if you only have 5 minutes to read, leave my blog right now and go check it out. If you are a government employee working on innovation, copyright or the digital economy, I doubly urge you to read it. Treble that if you happen to be (or work for) the CIO of a major corporation or organization who (still) believes that social media is a passing phase and can’t see its disruptive implications.

2. It isn’t just the legal system that is broken: What struck me about the movie wasn’t just the problems with the legal system, it was how badly the venture capitalists come off even worse. Here is supposed to be a group of people who are supposed to help support and enable entrepreneurs and yet they’re directing lawyers to draft up contracts that screw some of the original founders. If the story is even remotely true it’s a damning and cautionary tale for anyone starting (or looking to expand) a company. Indeed, in the movie the whole success of Facebook and the ability of (some) of the owners to retain control over it rests on the fact that graduates of the first internet bubble who were screwed over by VCs are able to swoop in and protect this second generation of internet entrepreneurs. Of course they – played by Sean Parker (inventor of Napster) – are parodied as irresponsible and paranoid.

One thought I walked away with was: if, say as a result of the rise of cloud computing, the costs of setting up an online business continue to drop, at a certain point the benefits of VC capital will significantly erode or their value proposition will need to significantly change. More importantly, if you are looking to build a robust innovation cluster, having it built on the model that all the companies generated in it have the ultimate goal of being acquired by a large (American) multinational doesn’t seem like a route to economic development.

Interesting questions for policy makers, especially those outside Silicon Valley, who obsess about how to get venture capital money into their economies.

3. Beyond lawyers and VCs, the final thing that struck me about the movie was the lack of women doing anything interesting. I tweeted this right away and, of course, a quick Google search reveals I’m not the only one who noticed it. Indeed, Aaron Sorkin (the film’s screenwriter) wrote a response to questions regarding this issue on Emmy winner Ken Levine’s blog. What I noticed in The Social Network is there isn’t a single innovating or particularly positive female character. Indeed, in both the new and old economy worlds shown in the film, women are largely objects to be enjoyed, whether it is in the elite house parties of Harvard or the makeshift start-up home offices in Palo Alto. Yes, I’m sure the situation is more complicated, but essentially women aren’t thinkers – or drivers – in the movie. It’s a type of sexism that is real, and in case you think it isn’t just consider a TechCrunch article from the summer titled “Too Few Women In Tech? Stop Blaming The Men” in which the author, Michael Arrington, makes the gobsmacking statement:

The problem isn’t that Silicon Valley is keeping women down, or not doing enough to encourage female entrepreneurs. The opposite is true. No, the problem is that not enough women want to become entrepreneurs.

Really? This is a country (the United States) where women start businesses at twice the rate of men and where 42% of all businesses are women owned? To say that women don’t want to be entrepreneurs is so profoundly stupid and incorrect it perfectly reflects the roles women are shoveled into in The Social Network. And that is something the new economy needs to grapple with.

When Canada makes the US border thicker

Canadians spend a lot of time worrying about the “thickening” border with the United States. This is for good reason. Given the importance of the US market and the sheer number of exports between the two countries, issues that thicken the border – like the requirement to use a passport or more strict rules around shipping goods – have an enormous impact on Canada’s economy.

Usually, Canadian officials complain that it is hard to get Americans to engage on this issue. So it is exceedingly frustrating when the Canadian government takes actions that thicken the border and simultaneously discouraging and encouraging when it is senior American officials have to intervene to make it thinner.

Last week, despite lobbying from the Mayor of Vancouver, the Premier of British Columbia, a number of business and tourism representatives and even conservative party caucus members, the Federal Goverment looked set on killing a program that saw a set of Border Guards pre-clearing trains that run from Vancouver to Seattle. Without this pre-clearance the trains would run much, much slower and so Amtrak, who runs the trains, said it would end the service.

It now appears that the border service was saved only after U.S. Homeland Security Secretary Janet Napolitano and U.S. ambassador to Canada David Jacobson personally intervene. Yes, you read that right. US officials were racing trying to persuade Canadian officials to keep the border more open. The problematic nature of such a headline cannot be underscored. Yes, it is great that senior officials in the US care about ensure the Canada-US border remains as open as possible. But, as a country still dependent on an open and friction free border with the Unites States it is disturbing their intervention was necessary.

Indeed, as the country with the most to suffer when the border gets thicker (we feel the loss of exports and trade more than the Americans do) we need to model behaviour and be a leader in striving to make it as open and as accessible as possible. Secretary Janet Napolitano and U.S. ambassador to Canada David Jacobson intervention now means that two senior US officials may now believe that Canada’s commitment to friction free and accessible border is not as strong as we have claimed. If we aren’t concerned here, maybe we aren’t as concerned  on other, even greater areas of concern regarding the increased thickening of the Canada-US border.

And the damage has not been undone. Public Safety Minister Vic Toews, who is responsible for the decision, has only only preserved the service for one year. Indeed, in his statement he added “In this period of time, the residents of British Columbia and Washington State primarily will demonstrate whether, in fact, this is a necessary service.” Of course, the second train has already doubled the number of people traveling via rail between the two cities and, according to BC’s transportation minister, has injected $11.8 million into the BC economy.

Canadians should be thrilled that Public Safety Minister Vic Toews and the government ultimately made the right decision around keeping this service in place. But as a country still concerned about the weakened economy, the US border and our relationship with the United States, we should be concerned that the government took the most painful and costly route to arrive at this decision.