Simon Overell's
Publications

Journal Articles

View of the world according to Wikipedia: Are we all little Steinbergs?
Journal of Computational Science, 2011

Saul Steinberg's most famous cartoon "View of the world from 9th Avenue" depicts the world as seen by self-absorbed New Yorkers. By analysing wikipediae of a range of different languages, we find that this particular fish-eye world view is ubiquitous and inherently part of human nature. By measuring the skew in the distribution of locations in different languages we can confirm the validity of plausible quantitative models. These models demonstrate convincingly that people all have similar world views: "We are all little Steinbergs." Our Steinberg hypothesis allows the world view of specific people to be more accurately modelled; this will allow greater understanding of a person’s discourse, either by someone else or automatically by a computer.

@article{Overell2011193,
title = "View of the world according to Wikipedia: Are we all little Steinbergs?",
journal = "Journal of Computational Science",
volume = "2",
number = "3",
pages = "193 - 197",
year = "2011",
note = "Social Computational Systems",
issn = "1877-7503",
doi = "10.1016/j.jocs.2011.05.006",
url = "http://www.sciencedirect.com/science/article/pii/S1877750311000494",
author = "S.E. Overell and S. R\"uger",
keywords = "Social role identification",
keywords = "Modelling",
keywords = "Data Mining"
}
Using co-occurrence models for placename disambiguation.
International Journal of Geographical Information Science, 2008

This paper describes the generation of a model capturing information on how placenames co-occur together. The advantages of the co-occurrence model over traditional gazetteers are discussed and the problem of placename disambiguation is presented as a case study. We begin by outlining the problem of ambiguous placenames. We demonstrate how analysis of Wikipedia can be used in the generation of a co-occurrence model. The accuracy of our model is compared to a handcrafted ground truth; then we evaluate alternative methods of applying this model to the disambiguation of placenames in free text (using the GeoCLEF evaluation forum). We conclude by showing how the inclusion of placenames in both the text and geographic parts of a query provides the maximum mean average precision and outline the benefits of a co-occurrence model as a data source for the wider field of geographic information retrieval (GIR).

@article{overell08a,
title={Using co-occurrence models for placename disambiguation.},
author={Simon Overell and Stefan R\"uger},
year={2008},
volume={22},
issue={3},
journal={International Journal of Geographical Information Science},
pages={265--287}
}

Simon Overell's Publications
About.me | Academia | Linked in | Publications | Stuff I've Built | Musings | Follow Me
My PhD topic was Geographic Information Retrieval. I've written papers on Geographic Disambiguation and Modelling, Patents on Classification and Accurate NLP at Scale and given talks on Extracting Data from Wikipedia and the Web. For abstracts and citation details on all my publications click the boxes below.

Theses

PhD Thesis. Geographic Information Retrieval: Classification, Disambiguation and Modelling. (Imperial College London, 2009)

Master’s Thesis. TRIDE: Implementation of a Teleo-Reactive Integrated Development Environment. (Imperial College London, 2005)

Journal Articles

View of the world according to Wikipedia: Are we all little Steinbergs? (JOCS, 2011)

Using co-occurrence models for placename disambiguation. (IJGIS, 2008)

Conference & Workshop Papers

Classifying Tags using Open Content Resources. (WSDM, 2009, Barcelona)

Geographic Co-occurrence as a Tool for GIR. (GIR @ CIKM, 2007, Lisbon)
...

Invited Talks

I've given 9 invited talks covering my PhD, research at Yahoo! and work at True Knowledge.

Invited Articles

The Problem of Place Name Ambiguity (The SIGSPATIAL Special, 2011)

Are we getting it right? The results of the Student Survey (Informer, Spring 2008)

Patents

I've written a various patents all broadly related to classification. Four have been granted with previous employers and two are pending with Spider.io.

Evaluation Conference Papers

A key part of Information Retrieval is evaluation. Due to the efforts of the TREC and CLEF conferences there are now a series of standardised data sets for these evaluations. I've taken part in three CLEF conferences and one TREC conference, publishing 10 papers.

Posters

Distribution of Location References in Wikipedia (The Future of Multimedia Knowledge Management 2008, Milton Keynes)

SIRIL: A multidimensional browsing framework (MMKM Workshop 2007, Milton Keynes)

Citations

Both Google Scholar and Microsoft Academic Search maintain co-author and citation lists.