GeoCLEF 2006

Evaluation of cross-language Geographic Information Retrieval (GIR) systems

Task Description

GeoCLEF 2006 consists of document collections in English, German, Portuguese and Spanish. There are 25 search topics in these languages (below). The tasks for 2006 are:

  1. Monolingual retrieval – retrieval where the topic and document languages are the same
  2. Bilingual retrieval – cross-language retrieval where the topic language is different from the document language, i.e. X -> {DE, EN, ES, PT}.

Required runs --- for each document language, participants may submit the results of up to 10 runs: 5 monolingual and 5 bilingual. Two of these runs are required:

  1. Title-Description – where the search queries are created using only the contents of the Title and Desc tags of the topic.
  2. Title-Description-Narrative – where the search queries are created using the contents of the Title, Desc and Narr tags from the topic.

The Narrative tag contains a more comprehensive description of the information request defined by the topic, including specifics about the geography of the topic such as a list of desired cities, states, countries, rivers or latitudes and longitudes

Important Dates

Topic Release: May 2-4, 2006

Run submissions due: June 12, 2006, noon Central European Time. Formatting your runs.

Relevance judgments released: July 17, 2006

GeoCLEF Notebook papers due: August 15, 2006

CLEF Workshop in Alicante Spain : , September 20-22, 2006

GeoCLEF 2006 Topics

English topics (XML), this replaces the old version.

German topics (XML), this replaces the old version.

Portuguese topics (XML)

Spanish topics (XML)

Japanese topics (XML)

Organisers of GeoCLEF


Fred Gey, University of California, Berkeley, USA (gey@berkeley.edu)

Ray Larson, University of California, Berkeley, USA (ray@sims.berkeley.edu)

Hideo Joho, University of Glasgow, UK (hideo@dcs.gla.ac.uk)

Mark Sanderson, Department of Information Studies, University of Sheffield, UK (m.sanderson@sheffield.ac.uk)

Thomas Mandl and Christa Womser-Hacker of U. Hildesheim Germany (German language coordinators)

Diana Santos and Paulo Rocha of Linguateca  (Portuguese coordinators)

Andrés Montoyo of U. Alicante  (Spanish coordinator).

Resources

In order to conduct geo-retrieval well, you may need resources such as gazetteers or ontologies. Here is a brief list of resources that we know about. Please contact Mark Sanderson, if you have other resources you want added to this list.

Introduction and background
Geographical Information Retrieval (GIR) concerns the retrieval of information involving some kind of spatial awareness. Given that many documents contain some kind of spatial reference, there are examples where geographical references (georeferences) may be important for IR. For example, to retrieve, re-rank and visualise search results based on a spatial dimension (e.g. “find me news stories about riots near Dublin City”). In addition to this, many documents contain geo-references expressed in multiple languages which may or may not be the same as the query language. This would require an additional translation step to enable successful retrieval.

Existing evaluation campaigns such as TREC and CLEF do not explicitly evaluate geographical IR relevance. The aim of GeoCLEF is to provide the necessary framework in which to evaluate GIR systems for search tasks involving both spatial and multilingual aspects. GeoCLEF is the cross-language geographic retrieval track which is run as part of the Cross Language Evaluation Forum (CLEF) campaign. There is a preliminary flyer for GeoCLEF 2006.

Mailing list
We have set up a mailing list: geoclef@sheffield.ac.uk for participants. Please contact m.sanderson@sheffield.ac.uk to be added to the list.
Past GeoCLEF

GeoCLEF 2005

Last Modified: May 2006

By: Fred Gey, Paul Clough & Mark Sanderson