Using AI to get Answers from the Internet
ECIR Industry Day
The Open University, Milton Keynes
1 April 2010
True Knowledge is a pioneer in a new class of Internet search technology that’s aimed at dramatically improving the experience of finding known facts on the Web. Their first service - the True Knowledge Answer Engine - is a major step toward fulfilling a longstanding Internet industry goal: providing consumers with instant answers to complex questions, with a single click. Picking up where search engines leave off, True Knowledge’s path-breaking Answer Engine automates the laborious, time-consuming work that users generally must do to get final answers to their questions. True Knowledge does this by structuring data in a way that enables computers to work and think like humans do, drawing inferences and conclusions when needed to find the information that’s requested. Another key differentiator: True Knowledge is tapping subject matter experts around the globe to build its information repository - bringing together the benefits of machine-driven automation and people-driven intelligence. Simon Overell of True Knowledge will lead us through the story of how they applied AI techniques to make the break from search engines that give links to search engines that gives facts.
Using AI to get Answers from the Internet
Real AI
Peterhouse College, Cambridge, UK
17 December 2009
The World According to Wikipedia
Centre for Digital Video Processing, Dublin City University
26 June 2008
Detecting Locations and Events in Wikipedia
Imperial College Internet Centre, London
29 April 2008
Wikipedia is the largest encyclopaedia mankind has ever known. It contains over 10 million articles across 250 languages and is now the 9th most visited site on the Internet. Wikipedia has led the way for user-generated-content sites such as Flickr and YouTube. In this talk, Simon will present his work on mining location and temporal references from Wikipedia, and will show that despite its best efforts at neutrality, Wikipedia still reflects the cultural biases of its contributors. By analysing different language versions of Wikipedia we can show how different locations and events have significance to different peoples. The talk will conclude with a summary of the applications of the work to Information Retrieval, Computer Science and beyond.
Distribution of Location References in Wikipedia (Short talk)
Million Books Workshop
Imperial College Internet Centre, London
14 March 2008
Classifying Wikipedia pages at home and abroad
Yahoo! UK, London
20 November 2007
Proposing a geographic co-occurrence model as a tool for GIR
Natural Language Processing Group, University of Sheffield
10 July 2007
The motivation behind developing such a tool is to improve performance on Geographic Information Retrieval problems such as placename disambiguation (if "Sheffield" appears in text, which Sheffield is it?) and geographic relevance (if "Sheffield" appears in a query are "Yorkshire", "Manchester" or "Derby" relevant?). The talk will cover the development of a geographic co-occurrence model mined from Wikipedia and similar user-generated content. The co-occurrence model is similar to a language model, however, contains only geographic entities. The accuracy and clarity of the co-occurrence model are also quantified. The talk will begin with a description of how Wikipedia can be mined for named-entity associations and the area Geographic Information Retrieval, followed by details of the co-occurrence model and its application. The talk will conclude with future directions and applying the described techniques to the CLEF corpora.
Placename disambiguation with co-occurrence models
Knowledge Media Institute, Open University
6 December 2006
My talk will cover an introduction to Geographic Information Retrieval (GIR) and the advantages provided by indexing placenames as unambiguous locations. I will describe our GIR system which generates a large-scale co-occurrence model and applies this model to the problem of placename disambiguation. The data for the model is mined from Wikipedia and applied to the GeoCLEF corpus. An example of placename disambiguation could be when "London" is referred to in text, is it "London, UK" or "London, Ontario"? The motivation behind this problem is to make un-annotated data machine readable and allow users to query and browse data geographically. The talk will begin with a description of GIR, placename disambiguation techniques and the use of Wikipedia as a corpus. Then a description of my probabilistic models, using first and higher orders of co-occurrence. The talk will conclude with our findings on how Information Retrieval methods can be enhanced with Geographic Knowledge.
Evaluating co-occurrence models applied to disambiguation
University of Glasgow
13 November 2006
My presentation will cover the evaluation of large-scale co-occurrence models for disambiguation. The data for the models is mined from Wikipedia and applied to the GeoCLEF corpus. The mining and application parts of the system are entirely independent to avoid bias. The specific problem I am applying co-occurrence models to is place name disambiguation (for example when “London” is referred to in text, is it "London, UK" or "London, Ontario"?). The motivation behind this problem is to make un-annotated data machine readable and allow users to query and browse data geographically. With the recent introduction of the geographic track to the Cross Language Evaluation Forum there is now a standardised way to test Geographic Information Systems.
I have evaluated three approaches to applying co-occurrence to place name disambiguation:
1. Assign a co-occurrence index to place triplets.
2. Infer co-occurrence classifiers from the ground truth.
3. Represent the places occurring in the training data as vectors in a high dimensional space. The talk will begin with a description of place name disambiguation techniques and the use of Wikipedia as a corpus. Then a description of my probabilistic models, using first and higher orders of co-occurrence. The talk will conclude with my intended future work: expansion beyond just place names to looking at all named entities.