Antoine Amarilli's blog

Forkability of community projects

— updated

A community project is an interaction between a community of users who create a resource and a host which stores and serves this resource. Extremely useful and valuable resources such as Wikipedia have been created in this way, and it is easy to feel compelled to contribute to such projects to "give back" to the community. However, in some cases, you could end up benefiting the host more than the community, because the terms of the relationship between community and host are unfair.

Here is an example of this. CDDB was an early collaborative effort to create a database of audio CDs. It started as a one-man effort to which anyone could contribute by email. As time passed, it was incorporated, then bought, then renamed, and access to the database became burdened with restrictions to serve the commercial interests of the host. The users who contributed to the project had actually helped someone to create their product, and that someone ended up using the product against the community's interests.

What went wrong here? Does this mean that the Wikimedia foundation could start to act unethically towards the community? Fortunately not: there is a difference in forkability between Wikipedia and CDDB. I say that a community project is forkable if anyone in the community can take a copy of the content and host it somewhere else. Forkability ensures that the host cannot take the content away from the community. Furthermore, it is a strong guarantee of the optimality of the hosting service, because it ensures that anyone can start to compete with the host.

There are two facets to forkability, which are:

Legal forkability.
Do you have the right to fork? This is satisfied if the resource is under a free license; it is not satisfied if users keep their copyright but grant the current hosting service a right to host the content, or (worse) if they assign their copyright to the host. To publish their content, users should waive the rights that stand in the way, not privilege the current host in any way.
Practical forkability.
Do you have the capacity to fork? This is satisfied if dumps of the resource are provided by the hosting service in an open format (ie. not requiring specific proprietary software); it can still be satisfied if the hosting service allows users to crawl the resource. It is not satisfied if the hosting service tries to prevent crawling or forbids it in their TOS.

Some community projects today are forkable. Wikimedia projects are under the free CC-BY-SA license and dumps are available. (Incidentally, these dumps aren't just an abstract guarantee against wrongdoing; they are extremely useful resources for researchers or for people who need a local mirror of Wikipedia.) StackExchange projects such as StackOverflow are under CC-BY-SA too and dumps are available (though, sometimes, Stack Exchange thinks they can decide how their users' content should be attributed -- a host should specify a suggested mode of attribution but should not assume that users will not be more lenient). MusicBrainz (a modern alternative to CDDB) is available under a combination of public domain and CC-BY-NC-SA and provides dumps. OpenStreetMap provides dumps, and though its legal situation isn't clear (it seems that you have to assign copyright to the OpenStreetMap foundation who guarantee that the content will always be available under a free license), I'm pretty sure that this is fine in practice. Project Gutenberg has a convoluted license for its ebooks, but you can strip it from the public domain books and get public domain content, and you are welcome to mirror it.

Examples of non-forkable projects today are also numerous. Google Maps welcomes people to contribute, but it is not forkable. Most reviews websites are not forkable. For instance, the Yelp TOS require you to grant them a license to use your content, and prohibits any practical attempts to crawl the content. Reviews on websites such as Amazon are also examples of collaboration to create non-forkable content. ReCAPTCHA is not really a community project but is worth mentioning because you get the same sort of enthusiasm ("awesome! I can help digitize books by completing captchas") before you realize that reCAPTCHA is really Google, that Google never guarantees that you will be able to fork the content that you helped to digitize, and that they are using reCAPTCHA to improve StreetView which is definitely not-forkable. Interestingly, I don't know of any forkable alternative to Yelp or reCAPTCHA, though I can't see any good reason why such alternatives couldn't exist and thrive (except that they are hard to bootstrap).

So, before you contribute to a community project, make sure that the resource doesn't just belong to the host, but really belongs to the community (and just happens to be hosted somewhere). Forkable community projects are, in my opinion, the only ethical alternatives to federated projects; they are a bit worse because they require one master host to exist and because there will always be some degree of inertia before people start to fork, but they are the best that we can do whenever centralization is a technical requirement.

comments welcome at a3nm<REMOVETHIS>@a3nm.net