| Introduction |
Geographical Information
Retrieval (GIR) concerns the retrieval of information involving some kind of
spatial awareness. Given that many documents contain some kind of spatial
reference, there are examples where geographical references (georeferences) may
be important for IR. For example, to retrieve, re-rank and visualise search
results based on a spatial dimension (e.g. find me news stories about
riots near Dublin City). In addition to this, many documents contain
geo-references expressed in multiple languages which may or may not be the same
as the query language. This would require an additional translation step to
enable successful retrieval.
Existing evaluation campaigns such as TREC and CLEF do not
explicitly evaluate geographical IR relevance. The aim of GeoCLEF is to provide
the necessary framework in which to evaluate GIR systems for search tasks
involving both spatial and multilingual aspects. GeoCLEF is
the cross-language geographic retrieval track which is run as part of the
Cross Language Evaluation Forum
(CLEF) campaign and will be run as a pilot experiment this
year.
A preliminary flyer for GeoCLEF 2005
can be downloaded here
|
| What to do if interested in
participating |
You can register for
GeoCLEF 2004 by contacting
Carol Peters, the
main coordinator for CLEF. For more specific information about any aspect of
GeoCLEF or the tasks, please contact
Fred Gey or
Hideo Joho.
Once you have registered
your interest with Carol Peters and filled in the appropriate copyright and
declaration forms you will be able to download the data
collections.
|
| Data
collections |
Currently, we plan to investigate
cross-language retrieval between English and German. Texts used are from existing CLEF collections that include a
variety of topics and geographical regions from news stories between 1994 and
1995. The following CLEF
data collections will be used:
- Glasgow Herald (British)
1995
- LA Times (American) 1994
- Der Spiegel (German)
1994/95
- Frankfurter Rundschau (German)
1994
- German SDA 1994/95
These collections will definitely be
used, but additional collections may also be added at a later date.
|
| Description of the
track |
Task
description is now available here.
Goal: given a multilingual statement describing a spatial
user need (topic), find as many relevant documents as possible from all target
document collections.
Topics: A set of
textual descriptions in a range of languages including English, Spanish,
Italian and German. Topics will be structured in the form of: (e.g. find
stories about disasters in Geneva). Spatial relations can include
near to, within X miles of, north of,
south of etc. Like TREC and CLEF topics, this will include a short
description (title, description) and a longer narrative describing relevance.
In addition we may provide a spatial footprint for the query.
Spatial analysis: not required to participate in this task, but
can augment text-based retrieval methods.
Challenges:
translating locations, ambiguity of geo-references (e.g. Jack
London the author not place; South Yorkshire and S. Yorks. refer to the
same place), spatial ambiguity (e.g. Sheffield in UK or USA), finding/creating
suitable multilingual gazetteer lists and combining both text and spatial
retrieval methods.
Aims: to compare methods of query
translation, query expansion, translation of geographical references, use of
text and spatial retrieval methods separately or combined, retrieval models and
indexing methods. |
| Topics for GeoCLEF 2005 (released 6
May 2005) |
Topics for English, German,
Portuguese* and Spanish** versions of the topics can be found
here. Topics follow a format similar
to this (topic 1):
<top> <num> GC001
</num> <orignum> C084 </orignum> <EN-title>Shark
Attacks off Australia and California</EN-title> <EN-desc>
Documents will report any information relating to shark attacks on humans.
</EN-desc> <EN-narr> Identify instances where a human was
attacked by a shark, including where the attack took place and the
circumstances surrounding the attack. Only documents concerning specific
attacks are relevant; unconfirmed shark attacks or suspected bites are not
relevant. </EN-narr> <!-- NOTE: This topic has added tags for
GeoCLEF --> <EN-concept> Shark attacks
</EN-concept> <EN-spatialrelation>near</EN-spatialrelation> <EN-location>
Australia </EN-location> <EN-location> California
</EN-location> </top>
Some
topics are based on previous CLEF ad-hoc queries (hence <orignum> which
identifies the original CLEF topic number). GeoCLEF topics are basically the
same as CLEF ad-hoc, except for an additional set of tags which define
<concept, spatial relation, location> triples. Note that multilpe
locations can be defined for a topic.
We
would to thank offer thanks to those who have contributed in translating
topics. In particular:
*Diana Santos of SINTEF
Information and Communication Technology (for Portuguese) **Andrés
Montoyo of Universidad de Alicante (for Spanish)
|
| Important
dates |
| |
|
| Registration
opens |
31
January 2005 |
| Data
release |
from 15
February 2005 |
| Topics
release |
from 1
May 2005 |
| Submission of
runs |
13 June
2005 |
| Release of
results |
from 1
August 2005 |
| Submission of paper
for working notes |
21 August
2005 |
| Workshop (in
conjunction with ECDL2005 in Vienna,
Austria) |
21-23
September 2005 |
| |
|
|
| Organisers of
GeoCLEF |
Fred Gey,
University of California, Berkeley, USA (gey@berkeley.edu)
Ray Larson, University of
California, Berkeley, USA (ray@sims.berkeley.edu)
Paul
Clough, Department of Information Studies, University of Sheffield, UK (p.d.clough@sheffield.ac.uk)
Hideo Joho,
Department of Information Studies, University of Sheffield, UK (h.joho@sheffield.ac.uk)
Mark Sanderson,
Department of Information Studies, University of Sheffield, UK (m.sanderson@sheffield.ac.uk)
|
|