-------------------------------------------------------- Notes on ImageCLEF topics for the ad hoc retrieval task -------------------------------------------------------- Paul Clough, April 2003. Introduction ------------- This file contains a description of the format used to encode topics for the ImageCLEF 2003 ad hoc retrieval task. The 50 topics consist of an English version containing a title and narrative, and translations of the titles of these topics into German, French, Italian, Spanish and Dutch, together with possible linguistic variations. Literal translations of each topic were performed by translators native to the language into which the English titles were translated. In some cases, more than one translator was available for a language and their translations were merged with the results from other translators. Multiple translations are due to either differences between translators, or due to more than one possible literal translation for each topic title. In cases of multiple translations of the same topic for the same language, the first is the most suitable translation as judged by the first translator, the rest are in no particular order. The English topic consists of a title (a short query of typically 2-3 words), and a narrative. The narrative is a longer description of what constitutes a relevant image offering a more specific description of relevance than the title. Translations of the topic include only the title due to limitations of the time and effort available from our translators. Topics were carefully chosen to represent "typical" searches that one might expect against the target collection: the Eurovision St Andrews photographic collection (or ESTA). Topic subject matter was derived from: (1) St Andrews University library Web logs for this collection. (2) Subject categories as used in St Andrews photographic collection. (3) An initial study done by an MSc student at Sheffield University. The topics are as far as possible representative of: (1) Real queries as found in Web logs. (2) The length of typical queries. (3) Queries which will cause problems in translation. For example: a. Proper names. b. Verb and noun phrases. c. Source and target word ambiguity. d. Compound nouns and verbs. e. Word inflections e.g. plurals and gender. (4) Types of requests: find pictures of specific objects, more general concepts, or describing an action. Topic format ------------- We have tried to be consistent with the existing CLEF topic encoding scheme to enable the use of existing CLEF topic parsers. However, we have had to adapt the format slightly to enable us to encode further information into the existing scheme. The format of an example English topic is as follows: Number: 1 Men and women processing fish A relevant image will show men and/or women processing fish after catching them. Processing may include gutting or curing and the picture must show the fish processors at work; not just mention fish processing, e.g. that fish processing takes place at this port. An example relevant document is [stand03_2093/stand03_2382]. An additional attribute for the tag has been added to enable further variations of the topic to be included in the list of translations (this is not in the existing CLEF topic encoding scheme). The <num> tag encapsulates the topic number (1 to 50) and the <narr> tag the narrative (which only exists in the English version of the topic). At the end of each narrative is a reference to an example relevant image (and caption) which we provide to enable the use of query-by-example with content-based retrieval tools, and provide a seed relevant document for relevance feedback. The reference to the relevant image is always demarked by [ ] in the narrative. The format of an example non-English topic is: <top> <num> Number: 1 </num> <DE-title n="1"> Männer und Frauen verabreiten Fisch </DE-title> </top> At least one title translation exists for each topic, but we also encode possible variants, distinguished by the "n" attribute. For example, in the case of the Italian translation for topic 1: <top> <num> Number: 1 </num> <IT-title n="1"> Uomini e donne che puliscono il pesce </IT-title> <IT-title n="2"> Pulizia del pesce al porto </IT-title> <IT-title n="3"> uomini e donne che lavorano il pesce </IT-title> </top> In summary, the following tags are used to encapsulate the translations: <top></top> The start and end of a topic <num></num> The topic number (1 to 50) <EN-title></EN-title> The English topic title <EN-narr></EN-narr> The English topic narrative <DE-title></DE-title> The German topic title <FR-title></FR-title> The French topic title <IT-title></IT-title> The Italian toipc title <ES-title></ES-title> The Spanish topic title <NL-title></NL-title> The Dutch topic title Summary -------- If you have further questions regarding the topics or their encoding, or you encounter problems during parsing then please contact: Paul Clough (p.d.clough@sheffield.ac.uk). University of Sheffield April 20003