Simon Overell's
Publications

Patents

Extracting Structured Knowledge from Unstructured Text
Patent Number: US20110/307435
December 2011

Embodiments of the present invention relate to knowledge representation systems which include a knowledge base in which knowledge is represented in a structured, machine-readable format that encodes meaning. Techniques for extracting structured knowledge from unstructured text and for determining the reliability of such extracted knowledge are also described.

@{patent:20110307435,
title     = "EXTRACTING STRUCTURED KNOWLEDGE FROM UNSTRUCTURED TEXT",
number    = "20110307435",
author    = "Overell, Simon (Hitchin, GB), Tunstall-pedoe, William (Cambridge, GB)",
year      = "2011",
month     = "December",
url       = "http://www.freepatentsonline.com/y2011/0307435.html"
}
Media/Tag-Based Word Games
Patent Number: US2009/0298594
December 2009

A method of creating a word game comprising receiving a seed value from a browser, obtaining from a media database a plurality of words associated with the seed value, creating a word game from at least a subset of the obtained plurality of words, integrating the word game into a browser interpretable document, and, returning the browser interpretable document to the browser. Some embodiments further comprise incorporating into the browser interpretable document an advertisement associated with the seed value and/or at least one of the obtained plurality of words. Also disclosed is a system comprising a gaming server which receives a game request; a media server and media tag database; the gaming server requesting from the media server a set of media tags associated with a game seed value, building a word game using at least a subset of the media tags, and transmitting the word game.

@{patent:20090298594,
title = "MEDIA/TAG-BASED WORD GAMES",
number = "20090298594",
author = "Pueyo, Lluis Garcia (Barcelona, ES), Sigurbjornsson, Borkur (Barcelona, ES), Overell, Simon E. (London, GB), Murdock, Vanessa (Barcelona, ES), Zwol, Roelof Van (Badalona, ES)",
year = "2009",
month = "December",
url = "http://www.freepatentsonline.com/y2009/0298594.html"
}
System and Method for Classifying Tags of Content Using a Hyperlinked Corpus of Classified Web Pages
Patent Number: US2009/0265315
October 2009

An improved system and method for classifying tags of content using a hyperlinked corpus of classified web pages is provided. An anchor text index may be searched to find anchor texts that may match text of the tag, documents referenced by the matching anchor texts may be found, and the documents referenced by the matching anchor texts may be grouped to disambiguate multiple classifications that result from matching the anchor texts with the categories of the reference documents. To resolve ambiguity between multiple classifications, weighted classifications may be used where each document may be assigned a positive weight for a mapping to a category to indicate the confidence of the classification of the document to the category. The classification for the grouping of the documents referenced by the matching anchor texts with greatest frequency may be selected and output as the classification for the tag.

@{patent:20090265315,
title = "SYSTEM AND METHOD FOR CLASSIFYING TAGS OF CONTENT USING A HYPERLINKED CORPUS OF CLASSIFIED WEB PAGES",
number = "20090265315",
author = "Sigurbjornsson, Borkur (Barcelona, ES), Van Zwol, Roelof (Badalona, ES), Overell, Simon E. (London, GB)",
year = "2009",
month = "October",  
url = "http://www.freepatentsonline.com/y2009/0265315.html"
}
Classifying Content Using Structured Patterns
Patent Number: US2009/0240729
September 2009

Methods and apparatus are described for classifying content resources in a data set according to an external classification scheme using structural patterns associated with the data set.

@{patent:20090240729,
title = "CLASSIFYING CONTENT RESOURCES USING STRUCTURED PATTERNS",
number = "20090240729",
author = "Zwol, Roelof Van (Badalona, ES), Sigurbjornsson, Borkur (Badalona, ES), Overell, Simon Ernest (Fulham, GB)",
year = "2009",
month = "September",
url = "http://www.freepatentsonline.com/y2009/0240729.html"
}

Simon Overell's Publications
About.me | Academia | Linked in | Publications | Stuff I've Built | Musings | Follow Me
My PhD topic was Geographic Information Retrieval. I've written papers on Geographic Disambiguation and Modelling, Patents on Classification and Accurate NLP at Scale and given talks on Extracting Data from Wikipedia and the Web. For abstracts and citation details on all my publications click the boxes below.

Theses

PhD Thesis. Geographic Information Retrieval: Classification, Disambiguation and Modelling. (Imperial College London, 2009)

Master’s Thesis. TRIDE: Implementation of a Teleo-Reactive Integrated Development Environment. (Imperial College London, 2005)

Journal Articles

View of the world according to Wikipedia: Are we all little Steinbergs? (JOCS, 2011)

Using co-occurrence models for placename disambiguation. (IJGIS, 2008)

Conference & Workshop Papers

Classifying Tags using Open Content Resources. (WSDM, 2009, Barcelona)

Geographic Co-occurrence as a Tool for GIR. (GIR @ CIKM, 2007, Lisbon)
...

Invited Talks

I've given 9 invited talks covering my PhD, research at Yahoo! and work at True Knowledge.

Invited Articles

The Problem of Place Name Ambiguity (The SIGSPATIAL Special, 2011)

Are we getting it right? The results of the Student Survey (Informer, Spring 2008)

Patents

I've written a various patents all broadly related to classification. Four have been granted with previous employers and two are pending with Spider.io.

Evaluation Conference Papers

A key part of Information Retrieval is evaluation. Due to the efforts of the TREC and CLEF conferences there are now a series of standardised data sets for these evaluations. I've taken part in three CLEF conferences and one TREC conference, publishing 10 papers.

Posters

Distribution of Location References in Wikipedia (The Future of Multimedia Knowledge Management 2008, Milton Keynes)

SIRIL: A multidimensional browsing framework (MMKM Workshop 2007, Milton Keynes)

Citations

Both Google Scholar and Microsoft Academic Search maintain co-author and citation lists.