Simon Overell's
Publications

Conference & Workshop Papers

Classifying Tags using Open Content Resources.
WSDM 2009, Barcelona

Tagging has emerged as a popular means to annotate on-line objects such as bookmarks, photos and videos. Tags vary in semantic meaning and can describe different aspects of a media object. Tags describe the content of the media as well as locations, dates, people and other associated meta-data. Being able to automatically classify tags into semantic categories allows us to understand better the way users annotate media objects and to build tools for viewing and browsing the media objects. In this paper we present a generic method for classifying tags using third party open content resources, such as Wikipedia and the Open Directory. Our method uses structural patterns that can be extracted from resource meta-data. We describe the implementation of our method on Wikipedia using WordNet categories as our classification schema and ground truth. Two structural patterns found in Wikipedia are used for training and classification: categories and templates. We apply our system to classifying Flickr tags. Compared to a WordNet baseline our method increases the coverage of the Flickr vocabulary by 115%. We can classify many important entities that are not covered by WordNet, such as, London Eye, Big Island, Ronaldinho, geocaching and wii.

@inproceedings{overell09a,
title={Classifying Tags using Open Content Resources},
author={Simon Overell and B\"orkur Sigurbj\"ornsson and Roelof van Zwohl},
year={2009},
month={February},
booktitle={Second ACM International Conference on Web Search and Data Mining},
location={Barcelona, Spain}
}
Geographic Co-occurrence as a Tool for GIR.
GIR @ CIKM 2007, Lisbon

In this paper we describe the development of a geographic co-occurrence model and how it can be applied to geographic information retrieval. The model consists of mining cooccurrences of placenames from Wikipedia, and then mapping these placenames to locations in the Getty Thesaurus of Geographical Names. We begin by quantifying the accuracy of our model and compute theoretical bounds for the accuracy achievable when applied to placename disambiguation in free text. We conclude with a discussion of the improvement such a model could provide for placename disambiguation and geographic relevance ranking over traditional methods.

@inproceedings{overell07f,
title={Geographic Co-occurrence as a Tool for GIR.},
author={Simon Overell and Stefan R\"uger},
year={2007},
month={November},
editor={Chris Jones and Ross Purves},
booktitle={CIKM Workshop on Geographic Information Retrieval},
pages={71--76},
location={Lisbon, Portugal}
}
A Semantic Vector Space for Query by Image Example.
MMIR @ SIGIR 2007, Amsterdam

Content-based image retrieval enables the user to search a database for visually similar images. In these scenarios, the user submits an example that is compared to the images in the database by their low-level characteristics such as colour, texture and shape. While visual similarity is essential for a vast number of applications, there are cases where a user needs to search for semantically similar images. For example, the user might want to find all images depicting bears on a river. This might be quite difficult using only low-level features, but using concept detectors for “bear” and “river” will produce results that are semantically closer to what the user requested. Following this idea, this paper studies a novel paradigm: query by semantic multimedia example. In this setting the user’s query is processed at a semantic level: a vector of concept probabilities is inferred for each image and a similarity metric computes the distance between the concept vector of the query and of the concept vectors of the images in database. The system is evaluated with a COREL Stock Photo collection.

@inproceedings{overell07b,
title={A Semantic Vector Space for Query by Image Example},
author={Jo\~ao Magalh\~aes and Simon Overell and Stefan R\"uger},
year={2007},
month={July},
booktitle = {SIGIR 2007 Workshop on Multimedia Information Retrieval},
pages = {11 -- 16}
location = {Amsterdam, Netherlands}
} 
Identifying and grounding descriptions of places.
GIR @ SIGIR 2006, Seattle

In this paper we test the hypothesis Given a piece of text describing an object or concept our combined disambiguation method can disambiguate whether it is a place and ground it to a Getty Thesaurus of Geographical Names unique identifier with significantly more accuracy than naïve methods. We demonstrate a carefully engineered rule-based place name disambiguation system and give Wikipedia as a worked example with hand-generated ground truth and bench mark tests. This paper outlines our plans to apply the co-occurrence models generated with Wikipedia to solve the problem of disambiguating place names in text using supervised learning techniques.

@inproceedings{overell06a,
title={Identifying and grounding descriptions of places.},
author={Simon Overell and Stefan R\"uger},
year={2006},
month={August},
editor={Chris Jones and Ross Purves},
booktitle={SIGIR Workshop on Geographic Information Retrieval},
pages={14--16},
location={Seattle, USA}
}

Simon Overell's Publications
About.me | Academia | Linked in | Publications | Stuff I've Built | Musings | Follow Me
My PhD topic was Geographic Information Retrieval. I've written papers on Geographic Disambiguation and Modelling, Patents on Classification and Accurate NLP at Scale and given talks on Extracting Data from Wikipedia and the Web. For abstracts and citation details on all my publications click the boxes below.

Theses

PhD Thesis. Geographic Information Retrieval: Classification, Disambiguation and Modelling. (Imperial College London, 2009)

Master’s Thesis. TRIDE: Implementation of a Teleo-Reactive Integrated Development Environment. (Imperial College London, 2005)

Journal Articles

View of the world according to Wikipedia: Are we all little Steinbergs? (JOCS, 2011)

Using co-occurrence models for placename disambiguation. (IJGIS, 2008)

Conference & Workshop Papers

Classifying Tags using Open Content Resources. (WSDM, 2009, Barcelona)

Geographic Co-occurrence as a Tool for GIR. (GIR @ CIKM, 2007, Lisbon)
...

Invited Talks

I've given 9 invited talks covering my PhD, research at Yahoo! and work at True Knowledge.

Invited Articles

The Problem of Place Name Ambiguity (The SIGSPATIAL Special, 2011)

Are we getting it right? The results of the Student Survey (Informer, Spring 2008)

Patents

I've written a various patents all broadly related to classification. Four have been granted with previous employers and two are pending with Spider.io.

Evaluation Conference Papers

A key part of Information Retrieval is evaluation. Due to the efforts of the TREC and CLEF conferences there are now a series of standardised data sets for these evaluations. I've taken part in three CLEF conferences and one TREC conference, publishing 10 papers.

Posters

Distribution of Location References in Wikipedia (The Future of Multimedia Knowledge Management 2008, Milton Keynes)

SIRIL: A multidimensional browsing framework (MMKM Workshop 2007, Milton Keynes)

Citations

Both Google Scholar and Microsoft Academic Search maintain co-author and citation lists.