Centrality and Content Creation in Networks – The Case of German Wikipedia

ZEW Discussion Paper No. 12-053 // 2012
ZEW Discussion Paper No. 12-053 // 2012

Centrality and Content Creation in Networks – The Case of German Wikipedia

User-generated content has proved to be a cheap and surprisingly accurate source of information. Still, little is known about how its producers select the content to which they contribute and how platform administrators may channel this choice. While Wikipedia has been the most successful prototype of a wiki, wikis in other contexts, e.g. private businesses, often struggle to encourage and manage activity. By and large, administrators who wish to start or maintain a wiki face three problems. First, they have to succeed in motivating potential users to give it a try. Second, users have to like what they find on the platform, so that they can connect, come back, and eventually contribute to it. Third, users have to contribute content that is useful to others, so that new users have something to connect and come back to. Particularly the third step is critical, and at the same time it can be very challenging to achieve. Not only must the content be good and trolls be discouraged (cf. Jian and MacKie-Mason (2012)), but contributing must also be fun and a credible leadership is needed to prevent the project from forking (cf. Lerner and Tirole (2002)). Non-voluntary organizations might be able to overcome this problem by mildly forcing their members (e.g. employees or students). By doing so, they can directly influence where and in which ways users participate and contribute. However, in voluntary organizations and on the open web, users are free like a flock of birds, and there does not seem to be a way of telling anybody what to do without the risk of scaring them away. This is even more true on big platforms, where the users are numerous, their contributions are often spontaneous and the content is vast.

In this paper, we study user-generated articles on German Wikipedia and the network that is formed by hyperlinks between them. We analyze where users decide to provide content on a platform characterized by the feature that many articles need to be written or improved. In particular, we analyze how the position of an article in the network of articles is related to how much content is provided by users, and which role the network position of an article plays in attracting the contributions of new authors. This question is situated in the more general context of understanding how producers in peer production of information goods select their tasks.

Since generating content on large platforms is highly complex, readers as well as authors take advantage of organizing mechanisms when identifying articles of interest. There are three main possibilities to find articles on Wikipedia: categories, text search and hyperlinks. Frequent authors use additional devices such as lists of new articles, watchlists or lists of articles classi fied as needing improvement. Hyperlinks constitute an organizing principle that is indispensable to online peer production of a vast amount of information. They enable a non-hierarchical access and a nonlinear reading experience that are characteristic for wikis (Greenstein and Devereux (2009)). Meanwhile little research has been undertaken on the question how hyperlinks influence contributions in wikis. Wikipedia's rules determine hyperlinks between articles to be semantic links, that means links that are set according to important connections in meaning between the two subjects. The links need not to be reciprocal. The main guidelines on German Wikipedia say that an article must be readable without information from the linked pages. Within Wikipedia, links should point only to pages on technical terms or to pages that contain further information on topics that might be of particular interest to readers of the article. It is not compatible with Wikipedia's rules to set links just to attract attention to an article without embedding its subject into the text pointing to it.

Hyperlinks on Wikipedia are generally regarded as a reliable source of information on semantic relations between words. They have been used extensively in linguistic research (see e.g. Medelyan et al. (2009)). Adafre and de Rikjke (2005) propose a procedure that automatically detects missing links between pages that should be linked given their relevance to each other. Taken together, this research suggests that hyperlinks on Wikipedia are generally set in accordance with the guidelines (see also Priedhorsky et al. (2007) on rapid detection of vandalism), but that the topics of articles on Wikipedia do not completely predetermine their link structure. The actual links depend on the dynamic content of an article and on the accuracy of linking. This implies that variations in centrality occur regularly and a ffect the navigation of readers and potential authors on a given set of articles. Our main hypotheses are that higher centrality is positively related to (1) contributions to an article and (2) contributions by new authors.

In the context of economic research on production of information goods, we consider centrality in the network of articles as a possible channel of knowledge spillovers. Links may trigger the contribution of knowledge that might not have been contributed in their absence. In line with the vast literature on knowledge spillovers in di fferent contexts, we investigate which dimensions of proximity a ffect the strength of the spillovers. We chose a sample of more than 7,000 articles belonging to a particular category ("Wirtschaft" - "Economics"). For this sample, we compute centrality measures within the category and on the entire German Wikipedia. Thus we can compare links from articles that are semantically close to links which are on average less close. Another dimension of proximity applied is the comparison of direct links, measured by the number of incoming links, to indirect links, measured by the closeness centrality.

Our main result is that an increase in the number of links from within the category is strongly associated with an increase in page length. In particular, we find that greater centrality of an article is associated with new authors contributing to it. However, evidence for a relation between links from outside the category and page length turns out to be rather weak. Social network analysis reveals that the category "Economics" is, like many networks, constituted by one large cluster and single articles or small network components that are disconnected from it. We find that getting connected to the large component raises the page length and its rate of change sizeably in the following weeks.

Kummer, Michael, Marianne Saam, Iassen Halatchliyski and George Giorgidze (2012), Centrality and Content Creation in Networks – The Case of German Wikipedia, ZEW Discussion Paper No. 12-053, Mannheim.

Authors Michael Kummer // Marianne Saam // Iassen Halatchliyski // George Giorgidze