The purpose of GeoCLEF is to experiment with and evaluate retrieval oriented toward geographic places which are descriptive of documents. The main idea is to see if addition of geographic operators and geographic locations will improve the accuracy and specificity of retrieval of relevant documents.
GeoCLEF has four possible tasks, two monolingual and two bilingual:
Monolingual: English topics against English documents
German topics against German documents
Bilingual: Language X (German, Spanish, Portuguese) topics against English documents
Language X (English, Spanish, Portuguese) topics against German documents
Other languages will be added (translation of topics to Portuguese is underway) if we can get the topics translated soon.
Participants may undertake any of the four
tasks, but are not required to do more than one task.
Two
runs for each task are mandatory:
One required run will be use only the topic title and topic description without
using the topic concept tag or topic geographic tags or the topic
narrative. The other required run will use both
topic title and topic description (but not the topic narrative) and all
geographic tags (operator and location) as well as the concept tag. For comparison purposes, we desire that the
mandatory runs be fully automatic (see below for description). However, if a group cannot
produce fully automatic runs, we will accept manual runs (identified as such)
for the mandatory tasks.
Other runs which experiment with leaving off the operator are encouraged. Experiments which compare use of named entity tagging and external resources such as gazetteers are encouraged.
There are two types of runs: fully automatic and manual. Fully automatic are runs which no human intervention has occurred in any part of the experimental process. If there is any human intervention in the experimental process (such as interactive relevance feedback or human augmentation of query terms), the resulting runs are considered to be manual and should be described as such.
The results of each run will be the top 1000 ranked documents for each topic in standard TREC/CLEF format. Thus each run should have 25,000 document references. At most 5 runs for each language pair will be accepted from each group. Thus a group which uses all 4 topic languages and submits the maximum for each language pair could have 10 monolingual and 30 bilingual (5 each for DE|PT|ESàEN, and 5 each for EN|PT|ESàG) runs. The groups will be asked to prioritize their runs for each task (in case we have too many documents for our pool). A more detailed description on results submission will be forthcoming
Last update: June 2, 2005