GeoCLEF 2007

Evaluation of multilingual Geographic Information Retrieval (GIR) systems

Task Description

The 2007 GeoCLEF track will consist of two parts:

  1. a modification of the existing GeoCLEF search task:
    • As in previous years, GeoCLEF will examine geographic search of a text corpus; this year the geographic area to be searched will be defined in both text and machine readable format. Documents will be in English, German and Portuguese, topics in a wider range of languages. How best to transform into a machine readable format the imprecise description of a geographic area found in many user queries is an open research problem.
  2. a brand new query parsing task:
    • This sub-task will be run by Microsoft Research Asia who will be supplying a substantial set of Web queries to geo-parse. It is hoped that 800,000 queries will be provided for any participant of GeoCLEF. Queries will be initially available in English, however parsing queries in other languages may also occur. Further details of the task are provided.
Important Dates
  • [Query parsing track] Query collection release: April 15, 2007
  • Topic Release: May 4, 2007
  • Run submissions due: June 12, 2007
  • Relevance judgments released: July 17, 2007
  • GeoCLEF Notebook papers due: August 15, 2007
  • CLEF Workshop in Budapest : September 19-20, 2007
GeoCLEF 2007 Topics

TBA

Organisers of GeoCLEF

Thomas Mandl and Christa Womser-Hacker of U. Hildesheim Germany (German language coordinators)

Fred Gey, University of California, Berkeley, USA (gey@berkeley.edu)

Ray Larson, University of California, Berkeley, USA (ray@sims.berkeley.edu)

Hideo Joho, University of Glasgow, UK (hideo@dcs.gla.ac.uk)

Mark Sanderson, Department of Information Studies, University of Sheffield, UK (m.sanderson@sheffield.ac.uk)

Diana Santos Linguateca  (Portuguese coordinators)

Julio Villena Román, Daedalus; (Spanish translator)

Resources

In order to conduct geo-retrieval well, you may need resources such as gazetteers or ontologies. Here is a brief list of resources that we know about. Please contact Mark Sanderson, if you have other resources you want added to this list.

Introduction and background
Geographical Information Retrieval (GIR) concerns the retrieval of information involving some kind of spatial awareness. Given that many documents contain some kind of spatial reference, there are examples where geographical references (georeferences) may be important for IR. For example, to retrieve, re-rank and visualise search results based on a spatial dimension (e.g. “find me news stories about riots near Dublin City”). In addition to this, many documents contain geo-references expressed in multiple languages which may or may not be the same as the query language. This would require an additional translation step to enable successful retrieval.

Existing evaluation campaigns such as TREC and CLEF do not explicitly evaluate geographical IR relevance. The aim of GeoCLEF is to provide the necessary framework in which to evaluate GIR systems for search tasks involving both spatial and multilingual aspects. GeoCLEF is the cross-language geographic retrieval track which is run as part of the Cross Language Evaluation Forum (CLEF) campaign. There is a preliminary flyer for GeoCLEF 2006.

Mailing list
We have set up a mailing list: geoclef@sheffield.ac.uk for participants. Please contact m.sanderson@sheffield.ac.uk to be added to the list.
Past GeoCLEF

GeoCLEF 2006

Last Modified: March 2007