GeoCLEF 2005

Evaluation of cross-language Geographic Information Retrieval (GIR) systems



Introduction

Geographical Information Retrieval (GIR) concerns the retrieval of information involving some kind of spatial awareness. Given that many documents contain some kind of spatial reference, there are examples where geographical references (georeferences) may be important for IR. For example, to retrieve, re-rank and visualise search results based on a spatial dimension (e.g. “find me news stories about riots near Dublin City”). In addition to this, many documents contain geo-references expressed in multiple languages which may or may not be the same as the query language. This would require an additional translation step to enable successful retrieval.

Existing evaluation campaigns such as TREC and CLEF do not explicitly evaluate geographical IR relevance. The aim of GeoCLEF is to provide the necessary framework in which to evaluate GIR systems for search tasks involving both spatial and multilingual aspects. GeoCLEF is the cross-language geographic retrieval track which is run as part of the Cross Language Evaluation Forum (CLEF) campaign and will be run as a pilot experiment this year.

A preliminary flyer for GeoCLEF 2005 can be downloaded here

What to do if interested in participating

You can register for GeoCLEF 2004 by contacting Carol Peters, the main coordinator for CLEF. For more specific information about any aspect of GeoCLEF or the tasks, please contact Fred Gey or Hideo Joho.

Once you have registered your interest with Carol Peters and filled in the appropriate copyright and declaration forms you will be able to download the data collections.


Data collections

Currently, we plan to investigate cross-language retrieval between English and German.
Texts used are from existing CLEF collections that include a variety of topics and geographical regions from news stories between 1994 and 1995. The following CLEF data collections will be used:
  • Glasgow Herald (British) 1995
  • LA Times (American) 1994
  • Der Spiegel (German) 1994/95
  • Frankfurter Rundschau (German) 1994
  • German SDA 1994/95

These collections will definitely be used, but additional collections may also be added at a later date.

Description of the track

Task description is now available here.

Goal: given a multilingual statement describing a spatial user need (topic), find as many relevant documents as possible from all target document collections.

Topics: A set of textual descriptions in a range of languages including English, Spanish, Italian and German. Topics will be structured in the form of: (e.g. “find stories about disasters in Geneva”). Spatial relations can include “near to”, “within X miles of”, “north of”, “south of” etc. Like TREC and CLEF topics, this will include a short description (title, description) and a longer narrative describing relevance. In addition we may provide a spatial footprint for the query.

Spatial analysis: not required to participate in this task, but can augment text-based retrieval methods.

Challenges: translating locations, ambiguity of geo-references (e.g. “Jack London” the author not place; South Yorkshire and S. Yorks. refer to the same place), spatial ambiguity (e.g. Sheffield in UK or USA), finding/creating suitable multilingual gazetteer lists and combining both text and spatial retrieval methods.

Aims: to compare methods of query translation, query expansion, translation of geographical references, use of text and spatial retrieval methods separately or combined, retrieval models and indexing methods.
Topics for GeoCLEF 2005 (released 6 May 2005)

Topics for English, German, Portuguese* and Spanish** versions of the topics can be found here. Topics follow a format similar to this (topic 1):

<top>
<num> GC001 </num>
<orignum> C084 </orignum>
<EN-title>Shark Attacks off Australia and California</EN-title>
<EN-desc> Documents will report any information relating to shark attacks on humans. </EN-desc>
<EN-narr> Identify instances where a human was attacked by a shark, including where the attack took place and the circumstances surrounding the attack. Only documents concerning specific attacks are relevant; unconfirmed shark attacks or suspected bites are not relevant. </EN-narr>
<!-- NOTE: This topic has added tags for GeoCLEF -->
<EN-concept> Shark attacks </EN-concept>
<EN-spatialrelation>near</EN-spatialrelation>
<EN-location> Australia </EN-location>
<EN-location> California </EN-location>
</top>


Some topics are based on previous CLEF ad-hoc queries (hence <orignum> which identifies the original CLEF topic number). GeoCLEF topics are basically the same as CLEF ad-hoc, except for an additional set of tags which define <concept, spatial relation, location> triples. Note that multilpe locations can be defined for a topic.

We would to thank offer thanks to those who have contributed in translating topics. In particular:

*Diana Santos of SINTEF Information and Communication Technology (for Portuguese)
**Andrés Montoyo of Universidad de Alicante (for Spanish)


Important dates
   
Registration opens 31 January 2005
Data release from 15 February 2005
Topics release from 1 May 2005
Submission of runs 13 June 2005
Release of results from 1 August 2005
Submission of paper for working notes 21 August 2005
Workshop (in conjunction with ECDL2005 in Vienna, Austria) 21-23 September 2005
   

Organisers of GeoCLEF


Fred Gey, University of California, Berkeley, USA (gey@berkeley.edu)

Ray Larson, University of California, Berkeley, USA (ray@sims.berkeley.edu)

Paul Clough, Department of Information Studies, University of Sheffield, UK (p.d.clough@sheffield.ac.uk)

Hideo Joho, Department of Information Studies, University of Sheffield, UK (h.joho@sheffield.ac.uk)

Mark Sanderson, Department of Information Studies, University of Sheffield, UK (m.sanderson@sheffield.ac.uk)


Mailing list

We have set up a mailing list: geoclef@sheffield.ac.uk for participants. Please contact Hideo Joho to be added to the list.

Last Modified: May 19, 2005

By: Fred Gey & Paul Clough