Category Archives: mozilla

Using Data to Make Firefox Better: A mini-case study for your organization

I love Mozilla. Any reader of this blog knows it. I believe in its mission, I find the organization totally fascinating and its processes engrossing. So much so I spend a lot of time thinking about it – and hopefully, finding ways to contribute.

I’m also a big believer in data. I believe in the power of evidence-based public policy (hence my passion about the long-form census) and in the ability of data to help organizations develop better products, and people make smarter decisions.

Happily, a few months ago I was able to merge these two passions: analyzing data in an effort to help Mozilla understand how to improve Firefox. It was fun. But more importantly, the process says a lot about the potential for innovation open to organizations that cultivate an engaged user community.

So what happened?

In November 2010, Mozilla launched a visualization competition that asked: How do People Use Firefox? As part of the competition, they shared anonymous data collected from Test Pilot users (people who agreed to share anonymous usage data with Mozilla). Working with my friend (and quant genius) Diederik Van Liere, we analyzed the impact of add-on memory consumption on browser performance to find out which add-ons use the most memory and thus are most likely slowing down the browser (and frustrating users!). (You can read about our submission here).

But doing the analysis wasn’t enough. We wanted Mozilla engineers to know we thought that users should be shown the results – so they could make more informed choices about which add-ons they download. Our hope was to put pressure on add-on developers to make sure they weren’t ruining Firefox for their users. To do that we visualized the data by making a mock up of their website – with our data inserted.

FF-memory-visualizations2.001

For our efforts, we won an honourable mention. But winning a prize is far, far less cool than actually changing behaviour or encouraging an actual change. So last week, during a trip to Mozilla’s offices in Mountain View, I was thrilled when one of the engineers pointed out that the add-on site now has a page where they list add-ons that most slow down Firefox’s start up time.

Slow-Performing-Add-ons-Add-ons-for-Firefox_1310962746129

(Sidebar: Anyone else find it ironic that “FastestFox: Browse Faster” is #5?)

This is awesome! Better still, in April, Mozilla launched an add-on performance improvement initiative to help reduce the negative impact add-ons can have on Firefox. I have no idea if our submission to the visualization competition helped kick-start this project; I’m sure there were many smart people at Mozilla already thinking about this. Maybe it was already underway? But I like to believe our ideas helped push their thinking – or, at least, validated some of their ideas. And of course, I hope it continues to. I still believe that the above-cited data shouldn’t be hidden on a webpage well off the beaten path, but should be located right next to every add-on. That’s the best way to create the right feedback loops, and is in line with Mozilla’s manifesto – empowering users.

Some lessons (for Mozilla, companies, non-profits and governments)

First lesson. Innovation comes from everywhere. So why aren’t you tapping into it? Diederik and I are all too happy to dedicate some cycles to thinking about ways to make Firefox better. If you run an organization that has a community of interested people larger than your employee base (I’m looking at you, governments), why aren’t you finding targeted ways to engage them, not in endless brainstorming exercises, but in innovation challenges?

Second, get strategic about using data. A lot of people (including myself) talk about open data. Open data is good. But it can’t hurt to be strategic about it as well. I tried to argue for this in the government and healthcare space with this blog post. Data-driven decisions can be made in lots of places; what you need to ask yourself is: What data are you collecting about your product and processes? What, of that data, could you share, to empower your employees, users, suppliers, customers, whoever, to make better decisions? My sense is that the companies (and governments) of the future are going to be those that react both quickly and intelligently to emerging challenges and opportunities. One key to being competitive will be to have better data to inform decisions. (Again, this is the same reason why, over the next two decades, you can expect my country to start making worse and worse decisions about social policy and the economy – they simply won’t know what is going on).

Third, if you are going to share, get a data portal. In fact, Mozilla needs an open data portal (there is a blog post that is coming). Mozilla has always relied on volunteer contributors to help write Firefox and submit patches to bugs. The same is true for analyzing its products and processes. An open data portal would enable more people to help find ways to keep Firefox competitive. Of course, this is also true for governments and non-profits (to help find efficiencies and new services) and for companies.

Finally, reward good behaviour. If contributors submit something you end up using… let them know! Maybe the idea Diederik and I submitted never informed anything the add-on group was doing; maybe it did. But if it did… why not let us know? We are so pumped about the work they are doing, we’d love to hear more about it. Finding out by accident seems like a lost opportunity to engage interested stakeholders. Moreover, back at the time, Diederik was thinking about his next steps – now he works for the Wikimedia Foundation. But it made me realize how an innovation challenge could be a great way to spot talent.

Mind. Prepare to be blown away. Big Data, Wikipedia and Government.

Okay, super psyched about this. Back at the Strata Conference in Feb (in San Diego) I introduced my long time uber-quant friend and now Wikimedia Foundation data scientist Diederik Van Liere to fellow Gov2.0 thinker Nicholas Gruen (Chairman) and Anthony Goldbloom (Founder and CEO) of an awesome new company called Kaggle.

As usually happens when awesome people get together… awesomeness ensued. Mind. Be prepared to be blown.

So first, what is Kaggle? They’re a company that helps companies and organizations post their data and run competitions with the goal of having it scrutinized by the world’s best data scientists towards some specific goal. Perhaps the most powerful example of a Kaggle competition to date was their HIV prediction competition, in which they asked contestants to use a data set to find markers in the HIV sequence which predict a change in the severity of the infection (as measured by viral load and CD4 counts).

Until Kaggle showed up the best science to date had a prediction rate of 70% – a feat that had taken years to achieve. In 90 days contributors to the contest were able to achieve a prediction rate of 77%. A 10% improvement. I’m told that achieving an similar increment had previously taken something close to a decade. (Data geeks can read how the winner did it here and here.)

Diederik and Anthony have created a similar competition, but this time using Wikipedia participation data. As the competition page outlines:

This competition challenges data-mining experts to build a predictive model that predicts the number of edits an editor will make in the five months after the end date of the training dataset. The dataset is randomly sampled from the English Wikipedia dataset from the period January 2001 – August 2010.

The objective of this competition is to quantitively understand what factors determine editing behavior. We hope to be able to answer questions, using these predictive models, why people stop editing or increase their pace of editing.

This is of course, a subject matter that is dear to me as I’m hoping that we can do similar analysis in open source communities – something Diederik and I have tried to theorize with Wikipedia and actually do Bugzilla data.

There is a grand prize of $5000 (along with a few others) and, amazingly, already 15 participants and 7 submissions.

Finally, I hope public policy geeks, government officials and politicians are paying attention. There is power in data and an opportunity to use it to find efficiencies and opportunities. Most governments probably don’t even know how to approach an organization like Kaggle or to run a competition like this, despite (or because?) it is so fast, efficient and effective.

It shouldn’t be this way.

If you are in government (or any org), check out Kaggle. Watch. Learn. There is huge opportunity here.

12:10pm PST – UPDATE: More Michael Bay sized awesomeness. Within 36 hours of the wikipedia challenge being launched the leading submission has improved on internal Wikimedia Foundation models by 32.4%

Developing Community Management Metrics and Tools for Mozilla

Background – how we got here

Over the past few years I’ve spent a great deal of time thinking about how we can improve both the efficiency of open source communities and contributors experience. Indeed, this was the focus, in part, of my talk at the Mozilla Summit last summer. For some years Diederik Van Liere – now with the Wikimedia foundation’s metrics team – and I have played with Bugzilla data a great deal to see if we could extract useful information from it. This led us to engaging closely with some members of the Mozilla Thunderbird team – in particular Dan Mosedale who immediately saw its potential and became a collaborator. Then, in November, we connected with Daniel Einspanjer of Mozilla Metrics and began to imagine ways to share data that could create opportunities to improve the participation experience.

Yesterday, thank’s to some amazing work on the part of the Mozilla Metrics team (listed at bottom of the post), we started sharing some of work at the Mozilla all hands. Specifically, Daniel demoed the first of a group of dashboards that describe what is going on in the Mozilla community, and that we hope, can help enable better community management. While these dashboards deal with the Mozilla community in particular I nonetheless hope they will be of interest to a number of open source communities more generally. (presently the link is only available to Mozilla staffers until the dashboard goes through security review – see more below, along with screen shots – you can see a screencast here).

Why – the contributor experience is a key driver for success of open source projects

My own feeling is that within the Mozilla community the products, like Firefox, evolve quickly, but the process by which people work together tends to evolve more slowly. This is a problem. If Mozilla cannot evolve and adopt new approaches with sufficient speed then potential and current contributors may go where the experience is better and, over time, the innovation and release cycle could itself cease to be competitive.

This task is made all the more complicated since Mozilla’s ability to fulfill its mission and compete against larger, better funded competitors depends on its capacity to tap into a large pool of social capital – a corps of paid and unpaid coders whose creativity can foster new features and ideas. Competing at this level requires Mozilla to provide processes and tools that can effectively harness and coordinate that energy at minimal cost to both contributors and the organization.

As I discussed in my Mozilla Summit talk on Community Management, processes that limit the size or potential of our community limit Mozilla. Conversely, making it easier for people to cooperate, collaborate, experiment and play enhances the community’s capacity. Consequently, open source projects should – in my opinion – constantly be looking to reduce or eliminate transactions costs and barriers to cooperation. A good example of this is how Github showed that forking can be a positive social contribution. Yes it made managing the code base easier, but what it really did was empower people. It took something everyone thought would kill open source projects – forking – and made it a powerful tool of experimentation and play.

How – Using data to enable better contributor experience

Unfortunately, it is often hard to quantitatively asses how effectively an open source community manages itself. Our goal is to change that. The hope is that these dashboards – and the data that underlies them – will provide contributors with an enhanced situational awareness of the community so they could improve not just the code base, but the community and its processes. If we can help instigate a faster pace of innovation of change in the processes of Mozilla, then I think this will both make it easier to improve the contributor experience and increase the pace of innovation and change in the software. That’s the hope.

That said, this first effort is a relatively conservative one. We wanted to create a dashboard that would allow us to identify some broader trends in the Mozilla Community, as well as provide tangible, useful data to Module Owners – particularly around identifying contributors who may be participating less frequently.


This dashboard is primarily designed to serve two purposes. First is to showcase what dashboards could be with the hope of inspiring the Mozilla community members to use it and, more importantly, to inspire them to build their own. The second reason was to provide module owners with a reliable tool with which to more effective manage their part of the community.  So what are some of the ways I hope this dashboard might be helpful? One important feature is the ability to sort contributors by staff or volunteer. An open source communities volunteer contributors should be a treasured resource. One nice things about this dashboard is that you can not only see just volunteers, but you can get a quick sense of those who haven’t submitted a patch in a while.

In the picture below I de-selected all Mozilla employees so that we are only looking at volunteer contributors. Using this view we can see who are volunteers who are starting to participate less – note the red circle marked “everything okay?” A good community manager might send these people an email asking if everything is okay. Maybe they are moving on, or maybe they just had a baby (and so are busy with a totally different type of patch – diapers), but maybe they had a bad experience and are frustrated, or a bunch of code is stuck in review. These are things we would want to know, and know quickly, as losing these contributors would be bad. In addition, we can also see who are the emerging power contributors – they might be people we want to mentor, or connect with mentors in order to solidify their positive association with our community and speed up their development. In my view, this should be core responsibilities of community managers and this dashboard makes it much easier to execute on these opportunities.

main-dasboard-notes
Below you can see how zooming in more closely allows you to see trends for contributors over time. Again, sometimes large changes or shifts are for reasons we know of (they were working on features for a big release and its shipped) but where we don’t know the reasons maybe we should pick up the phone or email this person to check to see if everything is okay.

user-dashboard-notes

Again, if this contributor had a negative experience and was drifting away from the community – wouldn’t we want to know before they silently disappeared and moved on? This is in part the goal.

Some of you may also like the fact that you can dive a little deeper by clicking on a user to see what specific patches that user has worked on (see below).

User-deep-dive1

Again, these are early days. My hope is that other dashboards will provide still more windows into the community and its processes so as to show us where there are bottlenecks and high transaction costs.

Some of the features we’d like to add to this or other dashboards include:

  • a code-review dashboard that would show how long contributors have been waiting for code-review, and how long before their patches get pushed. This could be a powerful way to identify how to streamline processes and make the experience of participating in open source communities better for users.
  • a semantic analysis of bugzilla discussion threads. This could allow us to flag threads that have become unwieldy or where people are behaving inappropriately so that module owners can better moderate or problem solve them
  • a dashboard that, based on your past bugs and some basic attributes (e.g. skillsets) informs newbies and experienced contributors which outstanding bugs could most use their expertise
  • Ultimately I’d like to see at least 3 core dashboards – one for contributors, one for module owners and one for overall projects – emerge and, as well as user generated dashboards developed using Mozilla metrics data.
  • Access to all the data in Bugzilla so the contributors, module owners, researchers and others can build their own dashboards – they know what they need better than we do

What’s Next – How Do I Get To Access it and how can I contribute

Great questions.

At the moment the dashboard is going through security review which it must complete before being accessible. Our hope is that this will be complete by the end of Q2 (June).

More importantly, we’d love to hear from contributors, developers and other interested users. We have a standing weekly call every other Friday at 9am PST where we discuss development issues with this and the forthcoming code-review dashboard, contributors needs and wanted features, as well as use cases. If you are interested in participating on these calls please either let me know, or join the Mozilla Community Metrics Google group.

Again, a huge shout out is deserved by Daniel Einspanjer and the Mozilla Metrics team. Here is a list of contributors both so people know who they are but also in case anyone has question about specific aspects of the dashboard:
Pedro Alves – Team Lead
Paula Clemente – Dashboard implementor
Nuno Moreira – UX designer
Maria Roldan – Data Extraction
Nelson Sousa – Dashboard implementor