Maps
Mapping Wikipedia
I began looking at the frequency of references to locations in Wikipedia in 2006. This involved automatically matching Wikipedia articles to locations on the earths surface, then counting the references to these Location articles.
Below is the map of the English Language Wikipedia, the more intense the colour, the grater the number of references to locations:
Mapping Co-occurrence
One of the central tasks in Geographic Information Retrieval is Placename Disambiguation. This attempts to solve the problem: When "Cambridge" is mentioned, which Cambridge is it? This is one of the central themes of my Thesis. I use Wikipedia as a huge corpus to learn how Locations co-occur together.
An example can be seen in the map below showing which locations are referred to in the same documents as Cambridge, Cambridgeshire (red), Cambridge, Massachusetts (green) and Cambridge, New Zealand (blue).
Mapping Languages
The start of my PhD had concentrated entirely on the English Language Wikipedia. During the end of 2007 I started applying the techniques to alternate language versions of Wikipedia. Applying the same methods I had previously used with the English Language Wikipedia in combination with the interlanguage-links between different versions of Wikipedia I generated the following Maps:
Map of the German Language Wikipedia:
Map of the French Language Wikipedia:
Mapping Bias
As can be seen from the per-language maps of Wikipedia, the different language versions of Wikipedia contain a Systematic Bias, i.e. the French Wikipedia is concerned with the affairs of the French speaking World. The English Wikipedia can be seen as a special case as English has become the Lingua Franca of the Internet.
This Systematic Bias can probably be seen most clearly in the two maps below. The first is a cartogram of references in the Spanish Language Wikipedia, the second is the Portuguese Language Wikipedia.
Note the shape of the Iberia peninsula and South America.
Mapping Similarities
Despite these significant differences between the different language versions of Wikipedia it turns out people actually all view the world in a similar way. We call this view of the world the Steinberg Hypothesis, based on Saul Steinberg's famous cover of the New Yorker showing the view from 9th Avenue:
Essentially this is how we all see the world. We estimate the relevance of a location to a person is proportional to the product of the root of the distance of the person to the location and the root of the population.
Applying this Hypothesis can generate my view of the World:
Returning to Placename Disambiguation
As described above we can disambiguate which Cambridge is referenced in a document based on the different co-occurring locations, but now by applying the Steinberg Hypothesis, if we have no co-occurring locations we can disambiguate based on where the document was published.
Below shows the most likely Cambridge to be referenced depending on the Location of the publisher (Cambridge, Cambridgeshire - red, Cambridge, Massachusetts - green and Cambridge, New Zealand - blue):