|
Table of Contents
Theses
PhD Thesis. Geographic Information Retrieval: Classification, Disambiguation and Modelling
My thesis aims to augment the Geographic Information Retrieval process with information extracted
from world knowledge. This aim is approached from three directions: classifying world knowledge,
disambiguating placenames and modelling users. Geographic information is becoming ubiquitous across
the Internet, with a significant proportion of web documents and web searches containing geographic
entities, and the proliferation of Internet enabled mobile devices. Traditional information retrieval treats
these geographic entities in the same way as any other textual data. In this thesis I augment the retrieval
process with geographic information, and show how methods built upon world knowledge outperform
methods based on heuristic rules.
The source of world knowledge used is Wikipedia. Wikipedia has become a phenomenon
of the Internet age and needs little introduction. As a linked corpus of semi-structured data, it is
unsurpassed. Two approaches to mining information from Wikipedia are rigorously explored: initially
I classify Wikipedia articles into broad categories; this is followed by much finer classification where
Wikipedia articles are disambiguated as specific locations. The thesis concludes with the proposal of the
Steinberg hypothesis: By analysing a range of wikipedias in different languages I demonstrate that a
localised view of the world is ubiquitous and inherently part of human nature. All people perceive closer
places as larger and more important than distant ones.
The core contributions of mythesis are in the areas of extracting information from Wikipedia,
supervised placename disambiguation, and providing a quantitative model for how people view the world.
The findings clearly have a direct impact for applications such as geographically aware search engines,
but in a broader context documents can be automatically annotated with machine readable meta-data
and dialogue enhanced with a model of how people view the world. This will reduce ambiguity and
confusion in dialogue between people or computers.
@phdthesis{overellPhd,
author = {Simon Overell},
title = {{Geographic Information Retrieval: Classification, Disambiguation and Modelling}},
school = {Imperial College London},
year = {2009},
URL = {http://www.numenore.co.uk/wiki}
}
Master’s Thesis. TRIDE: Implementation of a Teleo-Reactive Integrated Development Environment
My Masters thesis was a Teleo Reactive Integrated Development and Debugging Environment. It allows TR programs to be written and run on robots (currently implemented on the Lego Mindstorms RCX and the Acroname Garcia). Teleo-Reactive programs are simple hierarchical programs formalised by Nils Nilsson as a method of agent control.
@mastersthesis{overellMasters,
author = {Simon Overell},
title = {{T.R.I.D.E}},
school = {Imperial College London},
year = {2005},
URL = {http://www.doc.ic.ac.uk/~seo01/TRIDE/}
}
Journal Articles
Using co-occurrence models for placename disambiguation. This paper describes the generation of a model capturing information on how placenames co-occur together. The advantages of the co-occurrence model over traditional gazetteers are discussed and the problem of placename disambiguation is presented as a case study.
We begin by outlining the problem of ambiguous placenames. We demonstrate how analysis of Wikipedia can be used in the generation of a co-occurrence model. The accuracy of our model is compared to a handcrafted ground truth; then we evaluate alternative methods of applying this model to the disambiguation of placenames in free text (using the GeoCLEF evaluation forum). We conclude by showing how the inclusion of placenames in both the text and geographic parts of a query provides the maximum mean average precision and outline the benefits of a co-occurrence model as a data source for the wider field of geographic information retrieval (GIR).
@article{overell08a,
title={Using co-occurrence models for placename disambiguation.},
author={Simon Overell and Stefan R\"uger},
year={2008},
volume={22},
issue={3},
journal={International Journal of Geographical Information Science},
pages={265--287}
}
Peer reviewed Conference & Workshop Papers
Classifying Tags using Open Content Resources. Tagging has emerged as a popular means to annotate on-line
objects such as bookmarks, photos and videos. Tags vary
in semantic meaning and can describe different aspects of a
media object. Tags describe the content of the media as well
as locations, dates, people and other associated meta-data.
Being able to automatically classify tags into semantic categories
allows us to understand better the way users annotate
media objects and to build tools for viewing and browsing
the media objects. In this paper we present a generic method
for classifying tags using third party open content resources,
such as Wikipedia and the Open Directory. Our method
uses structural patterns that can be extracted from resource
meta-data. We describe the implementation of our method
on Wikipedia using WordNet categories as our classification
schema and ground truth. Two structural patterns found
in Wikipedia are used for training and classification: categories
and templates. We apply our system to classifying
Flickr tags. Compared to a WordNet baseline our method
increases the coverage of the Flickr vocabulary by 115%. We
can classify many important entities that are not covered by
WordNet, such as, London Eye, Big Island, Ronaldinho, geocaching
and wii.
@inproceedings{overell09a,
title={Classifying Tags using Open Content Resources},
author={Simon Overell and B\"orkur Sigurbj\"ornsson and Roelof van Zwohl},
year={2009},
month={February},
booktitle={Second ACM International Conference on Web Search and Data Mining},
location={Barcelona, Spain}
}
Geographic Co-occurrence as a Tool for GIR. In this paper we describe the development of a geographic
co-occurrence model and how it can be applied to geographic
information retrieval. The model consists of mining cooccurrences
of placenames from Wikipedia, and then mapping
these placenames to locations in the Getty Thesaurus
of Geographical Names. We begin by quantifying the accuracy
of our model and compute theoretical bounds for
the accuracy achievable when applied to placename disambiguation
in free text. We conclude with a discussion of
the improvement such a model could provide for placename
disambiguation and geographic relevance ranking over traditional
methods.
@inproceedings{overell07f,
title={Geographic Co-occurrence as a Tool for GIR.},
author={Simon Overell and Stefan R\"uger},
year={2007},
month={November},
editor={Chris Jones and Ross Purves},
booktitle={CIKM Workshop on Geographic Information Retrieval},
pages={71--76},
location={Lisbon, Portugal}
}
A Semantic Vector Space for Query by Image Example.
Content-based image retrieval enables the user to search a
database for visually similar images. In these scenarios, the user
submits an example that is compared to the images in the database
by their low-level characteristics such as colour, texture and
shape. While visual similarity is essential for a vast number of
applications, there are cases where a user needs to search for
semantically similar images. For example, the user might want to
find all images depicting bears on a river. This might be quite
difficult using only low-level features, but using concept detectors
for “bear” and “river” will produce results that are semantically
closer to what the user requested. Following this idea, this paper
studies a novel paradigm: query by semantic multimedia example.
In this setting the user’s query is processed at a semantic level: a
vector of concept probabilities is inferred for each image and a
similarity metric computes the distance between the concept
vector of the query and of the concept vectors of the images in
database. The system is evaluated with a COREL Stock Photo
collection.
@inproceedings{overell07b,
title={A Semantic Vector Space for Query by Image Example},
author={Jo\~ao Magalh\~aes and Simon Overell and Stefan R\"uger},
year={2007},
month={July},
booktitle = {SIGIR 2007 Workshop on Multimedia Information Retrieval},
pages = {11 -- 16}
location = {Amsterdam, Netherlands}
}
Identifying and grounding descriptions of places.
In this paper we test the hypothesis Given a piece of text describing an object or concept our combined disambiguation method can disambiguate whether it is a place and ground it to a Getty Thesaurus of Geographical Names unique identifier with significantly more accuracy than naïve methods. We demonstrate a carefully engineered rule-based place name disambiguation system and give Wikipedia as a worked example with hand-generated ground truth and bench mark tests. This paper outlines our plans to apply the co-occurrence models generated with Wikipedia to solve the problem of disambiguating place names in text using supervised learning techniques.
@inproceedings{overell06a,
title={Identifying and grounding descriptions of places.},
author={Simon Overell and Stefan R\"uger},
year={2006},
month={August},
editor={Chris Jones and Ross Purves},
booktitle={SIGIR Workshop on Geographic Information Retrieval},
pages={14--16},
location={Seattle, USA}
}
Invited Talks
Using AI to get Answers from the Internet
True Knowledge is a pioneer in a new class of Internet search technology that’s aimed at dramatically improving the experience of finding known facts on the Web. Their first service - the True Knowledge Answer Engine - is a major step toward fulfilling a longstanding Internet industry goal: providing consumers with instant answers to complex questions, with a single click.
Picking up where search engines leave off, True Knowledge’s path-breaking Answer Engine automates the laborious, time-consuming work that users generally must do to get final answers to their questions. True Knowledge does this by structuring data in a way that enables computers to work and think like humans do, drawing inferences and conclusions when needed to find the information that’s requested. Another key differentiator: True Knowledge is tapping subject matter experts around the globe to build its information repository - bringing together the benefits of machine-driven automation and people-driven intelligence.
Simon Overell of True Knowledge will lead us through the story of how they applied AI techniques to make the break from search engines that give links to search engines that gives facts.
Using AI to get Answers from the Internet
True Knowledge is a pioneer in a new class of Internet search technology that’s aimed at dramatically improving the experience of finding known facts on the Web. Their first service - the True Knowledge Answer Engine - is a major step toward fulfilling a longstanding Internet industry goal: providing consumers with instant answers to complex questions, with a single click.
Picking up where search engines leave off, True Knowledge’s path-breaking Answer Engine automates the laborious, time-consuming work that users generally must do to get final answers to their questions. True Knowledge does this by structuring data in a way that enables computers to work and think like humans do, drawing inferences and conclusions when needed to find the information that’s requested. Another key differentiator: True Knowledge is tapping subject matter experts around the globe to build its information repository - bringing together the benefits of machine-driven automation and people-driven intelligence.
Simon Overell of True Knowledge will lead us through the story of how they applied AI techniques to make the break from search engines that give links to search engines that gives facts.
The World According to Wikipedia
Detecting Locations and Events in Wikipedia (Slides, Movie 1, Movie 2)
Wikipedia is the largest encyclopedia mankind has ever known. It contains over 10 million articles across 250 languages and is now the 9th most visited site on the Internet. Wikipedia has led the way for user-generated-content sites such as Flickr and YouTube. In this talk, Simon will present his work on mining location and temporal references from Wikipedia, and will show that despite its best efforts at neutrality, Wikipedia still reflects the cultural biases of its contributers. By analysing different language versions of Wikipedia we can show how different locations and events have significance to different peoples. The talk will conclude with a summary of the applications of the work to Information Retrieval, Computer Science and beyond.
Distribution of Location References in Wikipedia (Short talk)
Classifying Wikipedia pages at home and abroad
Proposing a geographic co-occurrence model as a tool for GIR
The motivation behind developing such a tool is to improve performance on Geographic Information Retrieval problems such as placename disambiguation (if “Sheffield” appears in text, which Sheffield is it?) and geographic relevance (if “Sheffield” appears in a query are “Yorkshire”, “Manchester” or “Derby” relevant?). The talk will cover the development of a geographic co-occurrence model mined from Wikipedia and similar user-generated content. The co-occurrence model is similar to a language model, however, contains only geographic entities. The accuracy and clarity of the co-occurrence model are also quantified.
The talk will begin with a description of how Wikipedia can be mined for named-entity associations and the area Geographic Information Retrieval, followed by details of the co-occurrence model and its application. The talk will conclude with future directions and applying the described techniques to the CLEF corpora.
Placename disambiguation with co-occurrence models
My talk will cover an introduction to Geographic Information Retrieval (GIR) and the advantages provided by indexing placenames as unambiguous locations. I will describe our GIR system which generates a large-scale co-occurrence model and applies this model to the problem of placename disambiguation. The data for the model is mined from Wikipedia and applied to the GeoCLEF corpus. An example of placename disambiguation could be when “London” is referred to in text, is it “London, UK” or “London, Ontario”? The motivation behind this problem is to make un-annotated data machine readable and allow users to query and browse data geographically. The talk will begin with a description of GIR, placename disambiguation techniques and the use of Wikipedia as a corpus. Then a description of my probabilistic models, using first and higher orders of co-occurrence. The talk will conclude with our findings on how Information Retrieval methods can be enhanced with Geographic Knowledge.
Evaluating co-occurrence models applied to disambiguation
My presentation will cover the evaluation of large-scale co-occurrence models for disambiguation. The data for the models is mined from Wikipedia and applied to the GeoCLEF corpus. The mining and application parts of the system are entirely independent to avoid bias.
The specific problem I am applying co-occurrence models to is place name disambiguation (for example when “London” is referred to in text, is it “London, UK” or “London, Ontario”?). The motivation behind this problem is to make un-annotated data machine readable and allow users to query and browse data geographically. With the recent introduction of the geographic track to the Cross Language Evaluation Forum there is now a standardised way to test Geographic Information Systems.
I have evaluated three approaches to applying co-occurrence to place name disambiguation: 1. Assign a co-occurrence index to place triplets. 2. Infer co-occurrence classifiers from the ground truth. 3. Represent the places occurring in the training data as vectors in a high dimensional space. The talk will begin with a description of place name disambiguation techniques and the use of Wikipedia as a corpus. Then a description of my probabilistic models, using first and higher orders of co-occurrence. The talk will conclude with my intended future work: expansion beyond just place names to looking at all named entities. Patents
Media/Tag-Based Word Games
A method of creating a word game comprising receiving a seed value from a browser, obtaining from a media database a plurality of words associated with the seed value, creating a word game from at least a subset of the obtained plurality of words, integrating the word game into a browser interpretable document, and, returning the browser interpretable document to the browser. Some embodiments further comprise incorporating into the browser interpretable document an advertisement associated with the seed value and/or at least one of the obtained plurality of words. Also disclosed is a system comprising a gaming server which receives a game request; a media server and media tag database; the gaming server requesting from the media server a set of media tags associated with a game seed value, building a word game using at least a subset of the media tags, and transmitting the word game.
@{patent:20090298594,
title = "MEDIA/TAG-BASED WORD GAMES",
number = "20090298594",
author = "Pueyo, Lluis Garcia (Barcelona, ES), Sigurbjornsson, Borkur (Barcelona, ES), Overell, Simon E. (London, GB), Murdock, Vanessa (Barcelona, ES), Zwol, Roelof Van (Badalona, ES)",
year = "2009",
month = "December",
url = "http://www.freepatentsonline.com/y2009/0298594.html"
}
System and Method for Classifying Tags of Content Using a Hyperlinked Corpus of Classified Web Pages
An improved system and method for classifying tags of content using a hyperlinked corpus of classified web pages is provided. An anchor text index may be searched to find anchor texts that may match text of the tag, documents referenced by the matching anchor texts may be found, and the documents referenced by the matching anchor texts may be grouped to disambiguate multiple classifications that result from matching the anchor texts with the categories of the reference documents. To resolve ambiguity between multiple classifications, weighted classifications may be used where each document may be assigned a positive weight for a mapping to a category to indicate the confidence of the classification of the document to the category. The classification for the grouping of the documents referenced by the matching anchor texts with greatest frequency may be selected and output as the classification for the tag.
@{patent:20090265315,
title = "SYSTEM AND METHOD FOR CLASSIFYING TAGS OF CONTENT USING A HYPERLINKED CORPUS OF CLASSIFIED WEB PAGES",
number = "20090265315",
author = "Sigurbjornsson, Borkur (Barcelona, ES), Van Zwol, Roelof (Badalona, ES), Overell, Simon E. (London, GB)",
year = "2009",
month = "October",
url = "http://www.freepatentsonline.com/y2009/0265315.html"
}
Classifying Content Using Structured Patterns
Methods and apparatus are described for classifying content resources in a data set according to an external classification scheme using structural patterns associated with the data set.
@{patent:20090240729,
title = "CLASSIFYING CONTENT RESOURCES USING STRUCTURED PATTERNS",
number = "20090240729",
author = "Zwol, Roelof Van (Badalona, ES), Sigurbjornsson, Borkur (Badalona, ES), Overell, Simon Ernest (Fulham, GB)",
year = "2009",
month = "September",
url = "http://www.freepatentsonline.com/y2009/0240729.html"
}
Unreviewed Conference & Workshop Papers
Geographic and Textual Data Fusion in Forostar.
In this paper we provide some analysis of data fusion techniques employed at GeoCLEF 2008 to merge textual and geographic relevance. These methods are compared to our own experiments, where using our GIR system, Forostar, we show that an aggressive filter-based data fusion method can outperform a more sophisticated penalisation method.
Exploiting Term Co-occurrence for Enhancing Automated Image Annotation.
This paper describes an application of statistical co-occurrence techniques that built on top of a probabilistic image annotation framework is able to increase the precision of an image annotation system. We observe that probabilistic image analysis by itself is not enough to describe the rich semantics of an image. Our hypothesis is that more accurate annotations can be produced by introducing additional knowledge in the form of statistical co-occurrence of terms. This is provided by the context of images that otherwise independent keyword generation would miss. We applied our algorithm to the dataset provided by ImageCLEF 2008 for the Visual Concept Detection Task (VCDT). Our algorithm not only obtained better results but also it appeared in the top quartile of all methods submitted in ImageCLEF 2008.
MMIS at GeoCLEF 2008: Experiments in GIR.
In this paper we present our Geographic Information Retrieval System, Forostar, and the results of three experiments. We compare two data fusion methods, and show that a simple geographic filter outperforms a penalty based system. We compare context based disambiguation to a default gazetteer and show no significant difference. Finally we compare a unique geographic index to an ambiguous geographic index. The ambiguous index outperformed all other methods and was statistically significantly better than the baseline.
@inproceedings{overell08b,
title={MMIS at GeoCLEF 2008: Experiments in GIR.},
author={Simon Overell and Adam Rae and Stefan R\"uger},
year={2008},
month={September},
booktitle={CLEF 2008 Workshop, Working notes},
editor={Alessandro Nardi and Carol Peters},
location={Aarhus, Denmark}
}
MMIS at ImageCLEF 2008: Experiments combining Different Evidence Sources.
This paper presents the work of the MMIS group at ImageCLEF 2008. The results for three tasks are presented: Visual Concept Detection Task (VCDT), ImageCLEFphoto and ImageCLEFwiki. We combine image annotations, CBIR, textual relevance and a geographic filter using our generic data fusion method. We also compare methods for BRF and clustering. Our top performing method in the VCDT enhances supervised learning by modifying probabilities based on a matrix that shows how terms appear together. Although it occurred in the top quartile of submitted runs, the enhancement did not provide a statistically significant improvement. In the ImageCLEFphoto task we demonstrate that evidence from image retrieval can provide a contribution to retrieval; however we are yet to find a way of combining text and image evidence in a way to provide an improvement over the baseline. Due to the relative performances of difference evidences in ImageCLEFwiki and our failure to improve over a baseline we conclude that text is the dominant feature in this collection.
@inproceedings{overell08c,
title={MMIS at ImageCLEF 2008: Experiments combining Different Evidence Sources.},
author={Simon Overell and Ainhoa Llorente and Haiming Liu and Rui Hu and Adam Rae and Jianhan Zhu and Dawei Song and Stefan R\"uger},
year={2008},
month={September},
booktitle={CLEF 2008 Workshop, Working notes},
editor={Alessandro Nardi and Carol Peters},
location={Aarhus, Denmark}
}
GIR Experiments with Forostar.
In this paper we describe our Geographic Information Retrieval experiments with Forostar, our GIR application on the GeoCLEF 2007 corpus and query set. We compare the results from orthogonal text with no geographic entities and only geographic entities with standard text retrieval and combined text and geographic relevance methods. The text and named entity analysis and retrieval methods of Forostar are described in detail. We also detail our placename disambiguation and geographic relevance ranking methods.
The paper concludes with an analysis of our results including significance testing
where we show our baseline method, in fact, to be best. Finally we identify weaknesses
in our approach and ways in which the system could be optimised and improved.
@inproceedings{overell08d,
publisher={Springer Berlin / Heidelberg},
volume={5152/2008},
booktitle={Advances in Multilingual and Multimodal Information Retrieval},
year={2008},
isbn={978-3-540-85759-4},
pages={856-863},
title={GIR Experiments with Forostar},
author={Simon Overell and Jo\~ao Magalh\~aes and Stefan R\"uger}
}
GIR experiments with Forostar at GeoCLEF 2007. (Poster)
In this paper we describe our Geographic Information Retrieval experiments with Forostar, our GIR application on the GeoCLEF 2007 corpus and query set. We compare the results from orthogonal text with no geographic entities and only geographic entities with standard text retrieval and combined text and geographic relevance methods. The text and named entity analysis and retrieval methods of Forostar are described in detail. We also detail our placename disambiguation and geographic relevance ranking methods.
The paper concludes with an analysis of our results including significance testing
where we show our baseline method, in fact, to be best. Finally we identify weaknesses
in our approach and ways in which the system could be optimised and improved.
@inproceedings{overell07e2,
title={GIR experiments with Forostar at GeoCLEF 2007.},
author={Simon Overell and Jo\~ao Magalh\~aes and Stefan R\"uger},
year={2007},
month={September},
booktitle={CLEF 2007 Workshop, Abstracts},
editor={Alessandro Nardi and Carol Peters},
ISBN={2-912335-31-0},
ISSN={1818-8044},
pages={51},
location={Budapest, Hungary}
}
Exploring Image, Text and Geographic Evidences in ImageCLEF 2007. (Poster)
This year, ImageCLEF2007 data provided multiple evidences that can be explored in many different ways. In this paper we describe an information retrieval framework that combines image, text and non-geographic terms. Geographic analysis implements a placename disambiguation method and placenames are indexed by their Getty TGN Unique Id. Image analysis implements a query by semantic example model. The paper concludes with an analysis of our results. Finally we identify the weaknesses in our approach and ways in which the system could be optimised and improved.
@inproceedings{overell07d1,
title={Exploring Image, Text and Geographic Evidences in ImageCLEF 2007.},
author={Jo\~ao Magalh\~aes and Simon Overell and Stefan R\"uger},
year={2007},
month={September},
booktitle={CLEF 2007 Workshop, Working notes},
editor={Alessandro Nardi and Carol Peters},
ISBN={2-912335-32-9},
ISSN={1818-8044},
location={Budapest, Hungary}
}
@inproceedings{overell07d2,
title={Exploring Image, Text and Geographic Evidences in ImageCLEF 2007.},
author={Jo\~ao Magalh\~aes and Simon Overell and Stefan R\"uger},
year={2007},
month={September},
booktitle={CLEF 2007 Workshop, Abstracts},
editor={Alessandro Nardi and Carol Peters},
ISBN={2-912335-31-0},
ISSN={1818-8044},
pages={30},
location={Budapest, Hungary}
}
Forostar: A System for GIR.
We detail our methods for generating and applying co-occurrence models for the purpose of placename disambiguation. We explain in detail our use of co-occurrence models for placename disambiguation using a model generated from Wikipedia. The presented system is split into two stages: a batch text & geographic indexer and a real time query engine. Four alternative query constructions and six methods of generating a geographic index are compared. The paper concludes with a full description of future work and ways in which the system could be optimised.
@inbook{overell2007c,
series={Lecture Notes in Computer Science},
publisher={Springer},
title={Forostar: A System for GIR.},
author={Simon Overell and Jo\~ao Magalh\~aes and Stefan R\"uger},
year={2007},
month={September},
booktitle={Evaluation of Multilingual and Multi-modal Information Retrieval},
editor={Carol Peters and Paul Clough and Fredric C. Gey and Jussi Karlgren and Bernardo Magnini and Douglas W. Oard and Maarten de Rijke and Maximilian Stempfhuber},
volume={4730},
doi={10.1007/978-3-540-74999-8_119},
ISBN={978-3-540-74998-1},
ISSN={0302-9743},
location={Berlin / Heidelberg},
pages={930-937}
}
Imperial College and Johns Hopkins University at TRECVID. (Poster)
We describe our experiments for the high-level feature extraction and search tasks. For the search task, we tested the system we have used in previous years, which encapsulates content based image search, image browsing, automated image annotation and named entity extraction. For the feature task we apply the nonparametric density estimation model and the HMM-based concept specific image model.
@inproceedings{overell06c,
title={ {I}mperial {C}ollege and {J}ohns {H}opkins {U}niversity at {TRECVid}.},
author={Arnab Ghoshal and Sanjeev Khudanpur and Jo\~ao Magalh\~aes and Simon Overell and Stefan R\"uger and Alexei Yavlinsky},
year={2006},
month={November},
booktitle = {TRECVid 2006 -- Text REtrieval Conference TRECVid Workshop},
location = {Gaithersburg, MD}
}
Place disambiguation with co-occurrence models.
In this paper we describe the geographic information retrieval system developed by the Multimedia & Information Systems team for GeoCLEF 2006 and the results achieved. We detail our methods for generating and applying co-occurrence models for the purpose of place name disambiguation, our use of named entity recognition tools and text indexing applications. The presented system is split into two stages: a batch text & geographic indexer and a real time query engine. The query engine takes manually crafted queries where the text component is separated from the geographic component. Two monolingual runs were submitted for the GeoCLEF evaluation, the first constructed from the title and description, the second included the narrative also. We explain in detail our use of co-occurrence models for place name disambiguation using a model generated from Wikipedia. The paper concludes with a full description of future work and ways in which the system could be optimised.
@inproceedings{overell06b1,
title={Place disambiguation with co-occurrence models.},
author={Simon Overell and Jo\~ao Magalh\~aes and Stefan R\"uger},
year={2006},
month={September},
booktitle={CLEF 2006 Workshop, Working notes},
editor={Alessandro Nardi and Carol Peters and Jose Luis Vicedo},
ISBN={2-912335-23-x},
ISSN={1818-8044},
location={Alicante, Spain}
}
@inproceedings{overell06b2,
title={Place disambiguation with co-occurrence models.},
author={Simon Overell and Jo\~ao Magalh\~aes and Stefan R\"uger},
year={2006},
month={September},
booktitle={CLEF 2006 Workshop, Abstracts},
editor={Alessandro Nardi and Carol Peters and Jose Luis Vicedo},
ISBN={2-912335-23-3},
ISSN={1818-8044},
pages={59},
location={Alicante, Spain}
}
Conference & Workshop Posters
Distribution of Location References in Wikipedia
This poster will present our work mining location references from different language versions of Wikipedia. The extracted events will be visualised on a map. We will demonstrate that despite Wikipedia’s best efforts for neutrality, cultural biases are introduced. This analysis of Wikipedia will be of interest to the growing community of researches using Wikipedia as a pool of “World Knowledge.”
By analysing different language versions of Wikipedia we can show how different events have significance to different cultures. Finally we calculate a “bias index” for each language version of Wikipedia we analyse, this is the ratio of references to locations within countries where this is the native language spoken to references outside these countries.
@inproceedings{overell08a,
title={ Distribution of Location References in Wikipedia },
author={Simon Overell and Stefan R\"uger},
year={2008},
month={February},
booktitle = {MMKM Workshop: The Future of Multimedia Knowledge Management},
location = {Milton Keynes, UK},
pages = {23}
}
SIRIL: A multidimensional browsing framework
SIRIL is part of the multidimensional browsing framework currently being developed by the MMIS team. It consists of an XML API, a dynamically generated user interface and a search engine supporting text, image & geographic IR. The framework will support multi-document types and browsing methods; any server should support any front-end. SIRIL will show case the MMIS term’s work on image indexing & browsing, image annotation and GIR.
@inproceedings{overell07a,
title={ {SIRIL}: {A} multidimensional browsing framework },
author={Peter Howarth and Jo\~ao Magalh\~aes and Simon Overell and Stefan R\"uger and Alexei Yavlinsky},
year={2007},
month={January},
booktitle = {MMKM Workshop: Multimedia Knowledge Managment: Industry meets academia},
location = {Milton Keynes, UK},
pages = {17}
}
Miscellaneous
Are we getting it right? The results of the Student Survey
Back in Spring 2006, Ali Azimi Bolourian and I
took over as student officers of the BCS IRSG.
Our remit was to provide a student voice and
perspective for the IRSG committee and
promote the IRSG to the IR student
community. After exchanging dozens of emails
and negotiating with the committee for a
prize to serve as a suitable incentive, we ran
the survey over spring and summer 2007.
Information retrieval has a huge research
community. Masters’ and PhD students keep
the research fresh and flowing. Our first
priority was to check the student community
knew who the IRSG was and what we were
already doing. Beyond that, we wanted to
know who was out there, what they wanted
and what we should offer.
|
