Rethinking Wikipedia contributions rates

About a year ago news stories began to surface that wikipedia was losing more contributors that it was gaining. These stories were based on the research of Felipe Ortega who had downloaded and analyzed millions the data of contributors.

This is a question of importance to all of us. Crowdsourcing has been a powerful and disruptive force socially and economically in the short history of the web. Organizations like Wikipedia and Mozilla (at the large end of the scale) and millions of much smaller examples have destroyed old business models, spawned new industries and redefined the idea about how we can work together. Understand how the communities grow and evolve is of paramount importance.

In response to Ortega’s research Wikipedia posted a response on its blog that challenged the methodology and offered some clarity:

First, it’s important to note that Dr. Ortega’s study of editing patterns defines as an editor anyone who has made a single edit, however experimental. This results in a total count of three million editors across all languages.  In our own analytics, we choose to define editors as people who have made at least 5 edits. By our narrower definition, just under a million people can be counted as editors across all languages combined.  Both numbers include both active and inactive editors.  It’s not yet clear how the patterns observed in Dr. Ortega’s analysis could change if focused only on editors who have moved past initial experimentation.

This is actually quite fair. But the specifics are less interesting then the overall trend described by the Wikmedia Foundation. It’s worth noting that no open source or peer production project can grow infinitely. There is (a) a finite number of people in the world and (b) a finite amount of work that any system can absorb. At some point participation must stabilize. I’ve tried to illustrate this trend in the graphic below.

Open-Source-Lifecyclev2.0021-1024x606

As luck would have it, my friend Diederik Van Liere was recently hired by the Wikimedia Foundation to help them get a better understanding of editor patterns on Wikipedia – how many editors are joining and leaving the community at any given moment, and over time.

I’ve been thinking about Diederik’s research and three things have come to mind to me when I look at the above chart:

1. The question isn’t how do you ensure continued growth, nor is it always how do you stop decline. It’s about ensuring the continuity of the project.

Rapid growth should probably be expected of an open source or peer production project in the early stage that has LOTS of buzz around it (like Wikipedia was back in 2005). There’s lots of work to be done (so many articles HAVEN’T been written).

Decline may also be reasonable after the initial burst. I suspect many open source lose developers after the product moves out of beta. Indeed, some research Diederik and I have done of the Firefox community suggests this is the case.

Consequently, it might be worth inverting his research question. In addition to figuring out participation rates, figure out what is the minimum critical mass of contributors needed to sustain the project. For example, how many editors does wikipedia need to at a minimum (a) prevent vandals from destroying the current article inventory and/or at the maximum (b) sustain an article update and growth rate that sustains the current rate of traffic rate (which notably continues to grow significantly). The purpose of wikipedia is not to have many or few editors, it is to maintain the world’s most comprehensive and accurate encyclopedia.

I’ve represented this minimum critical mass in the graphic above with a “Maintenance threshold” line. Figuring out the metric for that feels like it may be more important than participation rates independently as such as metric could form the basis for a dashboard that would tell you a lot about the health of the project.

2. There might be an interesting equation describing participation rates

Another thing that struck me was that each open source project may have a participation quotient. A number that describes the amount of participation required to sustain a given unit of work in the project. For example, in wikipedia, it may be that every new page that is added needs 0.000001 new editors in order to be sustained. If page growth exceeds editors (or the community shrinks) at a certain point the project size outstrips the capacity of the community to sustain it. I can think of a few variables that might help ascertain this quotient – and I accept it wouldn’t be a fixed number. Change the technologies or rules around participation and you might make increase the effectiveness of a given participant (lowering the quotient) or you might make it harder to sustain work (raising the quotient). Indeed, the trend of a participation quotient would itself be interesting to monitor… projects will have to continue to find innovative ways to keep it constant even as the projects article archive or code base gets more complex.

3. Finding a test case – study a wiki or open source project in the decline phase

One things about open source projects is that they rarely die. Indeed, there are lots of open source projects out there that are the walking zombies. A small, dedicated community struggles to keep a code base intact and functioning that is much too large for it to manage. My sense is that peer production/open source projects can collapse (would MySpace count as an example?) but the rarely collapse and die.

Diederik suggested that maybe one should study a wiki or open source project that has died. The fact that they rarely do is actually a good thing from a research perspective as it means that the infrastructure (and thus the data about the history of participation) is often still intact – ready to be downloaded and analyzed. By finding such a community we might be able to (a) ascertain what “maintenance threshold” of the project was at its peak, (b) see how its “participation quotient” evolved (or didn’t evolve) over time and, most importantly (c) see if there are subtle clues or actions that could serve as predictors of decline or collapse. Obviously, in some cases these might be exogenous forces (e.g. new technologies or processes made the project obsolete) but these could probably be controlled for.

Anyways, hopefully there is lots here for metric geeks and community managers to chew on. These are only some preliminary thoughts so I hope to flesh them out some more with friends.

13 thoughts on “Rethinking Wikipedia contributions rates

  1. Pingback: Tweets that mention Rethinking Wikipedia contributions rates | eaves.ca -- Topsy.com

  2. karl dubost

    I guess there are other things which might influence than only the raw number of participations. The social dynamic of Wikipedia system has evolved. The rules (like any communities) have strengthened and are more rigid than what they were in the past. It is basically more and more painful to create content for Wikipedia. As an occasional editor, I’m less and less inclined to make the effort to contribute because the article might go quickly under “Articles for Deletion” hammer.

    Interesting thing to look at:

    1. Timeline of Articles for Deletion (raw number and % of how many new articles)
    2. Deletion compared to new contributors (stable, going down, going up)
    3. Timeline should also plot the milestones of new editing rules.
    4. The rate of new articles creation compared to the number of contributors.

    These could drive to new processes, maybe there is a need for a better drafting tool that will help an article to reach a stage of maturity to be part of Wikipedia, and this will create another set of behaviours ;) Not a closed system.

    Reply
    1. Anonymous

      You make a big assumption when you assume “create content for Wikipedia” means new articles. *IF* you think Wikipedia’s 3.5 million English-language articles has achieved decent coverage, editors should mostly be improving existing articles, so deletion issues aren’t relevant and new articles is not the right metric.

      Wikipedia has a hell of a hard time deciding what’s notable and as @mcepl points out it’s wildly inconsistent. On the one hand, what’s the harm in being encyclopedic and providing verified information about some new topic; on the other, in 20 years will we want disambiguation pages cluttered with pop songs and characters from TV episodes of today?

      Reply
      1. karl dubost

        you misunderstood what I was saying I guess. I didn’t say “create content” = “new article”. I said “new articles creation” as an indicator. Many articles have only a few information, and I think it is fine. It is part of a community of interests, when there is enough stakeholder in a topic, or someone who feels that this need to be completed, it will be done.

        I have a feeling that deletion is part of possible issues of creating new pages. I guess there might be issues also for people reverting edits.

        As for being encyclopedic, that is a difficult topic. What is relevant today will not be tomorrow (and that’s fine). What is relevant for one community is not necessary relevant for another community. Mitigating editing wars is a difficult job and Wikipedia has quite talented people in maintaining a healthy community.

        All of that said we should not put our head in the sands. There are issues in determining what is right or not. Some decisions are sometimes strange and not consistent. There are WP communities in other languages where it is easier to edit than the one in English which is a bit more draconian. :)

        Reply
  3. mcepl

    The thing which made me to loose interest in Wikipedia, was the deletionist craze run by people who have no idea what actually is notable. Would be an article about ii (http://hg.suckless.org/ii/) right for the Wikipedia? Is this program notable? Yes, I think so (despite the project is a sheer lunacy), but would any deletionist recognize it? “Notable” if it means anything is very different than popular. So we have Wikipedia full of articles about Hannah Montana and similar trash, but articles which would require somebody actually understand anything are deleted.

    Reply
  4. Diederik van Liere

    David,
    I love the way your rephrase questions and how you are encouraging us to contextualize the number of active editors. I think that a healthy community will always see a steady (but small) outflow of volunteers and that’s not a problem especially since most open source communities have a well developed institutional memory in the form of emailing lists, user groups, wikis, source code repositories and IRC chat logs. As long as there is no exodus of volunteers leaving it will open spots for new volunteers with new ideas and new energy.

    Reply
  5. Diederik van Liere

    David,
    I love the way your rephrase questions and how you are encouraging us to contextualize the number of active editors. I think that a healthy community will always see a steady (but small) outflow of volunteers and that’s not a problem especially since most open source communities have a well developed institutional memory in the form of emailing lists, user groups, wikis, source code repositories and IRC chat logs. As long as there is no exodus of volunteers leaving it will open spots for new volunteers with new ideas and new energy.

    Reply
  6. Aaron

    “3. Finding a test case – study a wiki or open source project in the decline phase”. You don’t have to leave the Wikimedia Foundation to find such a case. Simply take a look at Wikibooks. “A small, dedicated community struggles to keep a code base intact and functioning that is much too large for it to manage” would perfectly describe it. One administrator has performed more sysop actions than there are content pages in the wiki (47500 to 35500) and made nearly three edits for every page in the wiki (102750 to 35500). Yet the majority of the books there are incomplete or stubs and recent posts on the foundation-l mailing list suggest that the Wikimedia Foundation will dedicate no efforts to revive the community because it’s “too risky”.

    Reply
  7. Occasional WikiAddict

    This is a great article. I think reports of Wikipedia’s maturity are exaggerated, however. It doesn’t even have a WYSIWYG editor yet. Article rating schemes are just being put through Beta. College courses are just starting to make article editing routine parts of the curriculum. Academic experts have yet to lay hands on many of the core subjects. Thousands and thousands of articles aren’t up to GA or FA status. And there will always, always, always be another season of football scores, free agent trades, and Australia’s next top model results. So your analysis was pretty sophisticated with the exception of leaving out AUSNTM. Otherwise you’re good.

    There’s a small typo in 3. you said ‘the’ where you meant ‘they’. How’s that for crowdsourcing.

    Reply
  8. Pingback: Mind. Prepare to be blown. Big Data, Wikipedia and Government. | eaves.ca

  9. Pingback: The Technology newsbucket: Microsoft’s big numbers, Flash into HTML5, safe from sheep and more | Richard Hartley

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s