Tag Archives: bigdata

Public Policy: The Big Opportunity For Health Record Data

A few weeks ago Colin Hansen – a politician in the governing party in British Columbia (BC) – penned an op-ed in the Vancouver Sun entitled Unlocking our data to save lives. It’s a paper both the current government and opposition should read, as it is filled with some very promising ideas.

In it, he notes that BC has one of the best collections of health data anywhere in the world and that, data mining these records could yield patterns – like longitudinal adverse affects when drugs are combined or the correlations between diseases – that could save billions as well as improve health care outcomes.

He recommends that the province find ways to share this data with researchers and academics in ways that ensure the privacy of individuals are preserved. While I agree with the idea, one thing we’ve learned in the last 5 years is that, as good as academics are, the wider public is often much better in identifying patterns in large data sets. So I think we should think bolder. Much, much bolder.

Two years ago California based Heritage Provider Network, a company that runs hospitals, launched a $3 Million predictive health contest that will reward the team who, in three years, creates the algorithm that best predicts how many days a patient will spend in a hospital in the next year. Heritage believes that armed with such an algorithm, they can create strategies to reach patients before emergencies occur and thus reduce the number of hospital stays. As they put it: “This will result in increasing the health of patients while decreasing the cost of care.”

Of course, the algorithm that Heritage acquires through this contest will be proprietary. They will own it and I can choose who to share it with. But a similar contest run by BC (or say, the VA in the United States) could create a public asset. Why would we care if others made their healthcare system more efficient, as long as we got to as well. We could create a public good, as opposed to Heritage’s private asset. More importantly, we need not offer a prize of $3 million dollars. Several contests with prizes of $10,000 would likely yield a number of exciting results. Thus for very little money with might help revolutionize BC, and possibly Canada’s and even the world’s healthcare systems. It is an exciting opportunity.

Of course, the big concern in all of this is privacy. The Globe and Mail featured an article in response to Hansen’s oped (shockingly but unsurprisingly, it failed to link back to – why do newspaper behave that way?) that focused heavily on the privacy concerns but was pretty vague about the details. At no point was a specific concern by the privacy commissioner raised or cited. For example, the article could have talked about the real concern in this space, what is called de-anonymization. This is when an analyst can take records – like health records – that have been anonymized to protect individual’s identity and use alternative sources to figure out who’s records belong to who. In the cases where this occurs it is usually only only a handful of people whose records are identified, but even such limited de-anonymization is unacceptable. You can read more on this here.

As far as I can tell, no one has de-anonymized the Heritage Health Prize data. But we can take even more precautions. I recently connected with Rob James – a local epidemiologist who is excited about how opening up anonymized health care records could save lives and money. He shared with me an approach taking by the US census bureau which is even more radical than de-anonymization. As outlined in this (highly technical) research paper by Jennifer C. Huckett and Michael D. Larsen, the approach involves creating a parallel data set that has none of the features of the original but maintains all the relationships between the data points. Since it is the relationships, not the data, that is often important a great deal of research can take place with much lower risks. As Rob points out, there is a reasonably mature academic literature on these types of privacy protecting strategies.

The simple fact is, healthcare spending in Canada is on the rise. In many provinces it will eclipse 50% of all spending in the next few years. This path is unsustainable. Spending in the US is even worse. We need to get smarter and more efficient. Data mining is perhaps the most straightforward and accessible strategy at our disposal.

So the question is this: does BC want to be a leader in healthcare research and outcomes in an area the whole world is going to be interested in? The foundation – creating a high value data set – is already in place. The unknown is if can we foster a policy infrastructure and public mandate that allows us to think and act in big ways. It would be great if government officials, the privacy commissioner and some civil liberties representatives started to dialogue to find some common ground.  The benefits to British Columbians – and potentially to a much wider population – could be enormous, both in money and, more importantly, lives, saved.

How Dirty is Your Data? Greenpeace Wants the Cloud to be Greener

My friends over at Greenpeace recently published an interesting report entitled “How dirty is your data?
A Look at the Energy Choices That Power Cloud Computing

For those who think that cloud computing is an environmentally friendly business, let’s just say… it’s not without its problems.

What’s most interesting is the huge opportunity the cloud presents for changing the energy sector – especially in developing economies. Consider the follow factoids from the report:

  • Data centres to house the explosion of virtual information currently consume 1.5-2% of all global electricity; this is growing at a rate of 12% a year.
  • The IT industry points to cloud computing as the new, green model for our IT infrastructure needs, but few companies provide data that would allow us to objectively evaluate these claims.
  • The technologies of the 21st century are still largely powered by the dirty coal power of the past, with over half of the companies rated herein relying on coal for between 50% and 80% of their energy needs.

The 12% growth rate is astounding. It essentially makes it the fastest growing segment in the energy business – so the choices these companies make around how they power their server farms will dictate what the energy industry invests in. If they are content with coal – we’ll burn more coal. If they demand renewables, we’ll end up investing in renewables and that’s what will end up powering not just server farms, but lots of things. It’s a powerful position big data and the cloud hold in the energy marketplace.

And of course, the report notes that many companies say many of the right things:

“Our main goal at Facebook is to help make the world more open and transparent. We believe that if we want to lead the world in this direction, then we must set an example by running our service in this way.”

– Mark Zuckerberg

But then Facebook is patently not transparent about where its energy comes from, so it is not easy to assess how good or bad they are, or how they are trending.

Indeed it is worth looking at Greenpeace’s Clean Cloud report card to see – just how dirty is your data?


I’d love to see a session at the upcoming (or next year) Strata Big Data Conference on say “How to use Big Data to make Big Data more Green.” Maybe even a competition to that effect if there was some data that could be shared? Or maybe just a session where Greenpeace could present their research and engage the community.

Just a thought. Big data has got some big responsibilities on its shoulders when it comes to the environment. It would be great to see them engage on it.