ImageCLEF 2003

University of Sheffield
 


Introduction

Welcome to a new CLEF track for 2003 called ImageCLEF. This track concerns cross language retrieval of images via their associated textual captions. As a pilot experiment in CLEF, the goal of this track is to explore and study the relationship between images and their captions during the retrieval process. It is likely that this track will appeal to members of more than one research community, including those from image retrieval, cross language retrieval and user interaction. Given queries in languages other than English, the goal is to use whatever method is appropriate to retrieve relevant images from a photographic collection built especially for this puspose task (the Eurovision St Andrews photographic collection, or ESTA).

We propose the following two tasks. Participants of ImageCLEF can attempt either one of the tasks, or both:
(1) Automatic: similar to the classic TREC ad hoc retrieval task, we supply 50 short queries and the goal is to retrieve as many relevant images as possible.
   
(2) Interactive: similar to iCLEF, this task aims to explore user interface issues and issues surrounding user interaction with image retrieval systems.

As a pilot experiment, we know that there will be unexpected problems with the collection, the topics and evaluation method and we are already aware of limitations with the current resources, such as limited topic translations. We would therefore warmly welcome any recommendations, suggestions for improvements, or help from participants that would make this task of greater benefit to the information retrieval community. By running this track, we hope to stimulate ideas and interaction between ImageCLEF participants (via the ImageCLEF mailing list: imageclef@sheffield.ac.uk) in order to further research in cross language image retrieval.

Task 1: automatic ad hoc retrieval

The task is similar to the classic TREC ad hoc retrieval task, in that we simulate the situation in which a system knows the set of documents to be searched, but cannot anticipate the particular topic that will be investigated (i.e. topics are not known to the system in advance). For this task, we provide a list of topic statements and a collection of images with semi-structured captions in English (target language).

The English version of the topics consist of a title (a short sentence or phrase describing the search request in a few words), and a narrative (a description of what constitutes a relevant or non-relevant image for that search request). The narrative also contains an example image and caption, which we envisage could be used for relevance feedback and query-by-example searches.

The titles of each topic have been translated into five European languages: Spanish, Italian, German, French and Dutch (source language) by native speakers, and variations on titles are included as part of the topic statement. Due to limited translation resources, we have been unable to translate all of the topic statement (i.e. the narrative) into non-English. The topics are available here and more information about their format is available here. We expect that participants will focus on translating the titles and the narrative will remain largely unused (the descriptions of relevance are more necessary during the relevance assessments). However, participants are able to use the example image and caption as stated in the English narrative during retrieval.

The goal of the automatic task is to retrieve as many relevant images from the collection as possible given the multilingual topic titles. Participants are free to use whatever methods they want to retrieve relevant documents including content-based retrieval methods, and query expansion. Participants are also free to experiment with whatever methods they wish for CLIR. The task is to be fully automatic without any user interaction.

The document collection consists of around 30,000 images and captions. The captions consist of several fields containing semi-structured data and participants are free to match on any part of the caption for textual retrieval. We encourage participants who have the resources to translate the English narratives or the image captions into non-English to do so and to share these translations with other members of ImageCLEF.

For this task, participants are required to submit ranked lists of the top 1000 images with the highest similarity measures nearer the top of the list. Participants can submit (via email) as many system runs as they require, but should indicate their best run as we can only guarantee evaluation of this alone. Ranked lists from all participants will be pooled to create relevance sets and assessors from the University of Sheffield will make relevance judgements.

Ranked lists from participants for this ad hoc task will be evaluated using trec_eval by including recall and precision at various cut-off levels plus single-value summaries derived from precision and recall, i.e. mean average precision and R-precision. We will publish results in a manner similar to the way in which NIST publishes the results from TREC.

Task 2: interactive image retrieval

The goal of the interactive task is not to compare participants systems in a competitive environment, but rather for participants to explore variations of their retrieval system in two scenarios. The scenarios are detailed here, and participants are free to complete one or both tasks. The tasks can be used to compare two systems (or any other two variables in the system such as the type of translation method used) or to evaluate a single system. We recommend that at least 4 users are involved in testing the system. In both scenarios, user questionnaires are a recommended way of obtaining feedback from the user about their level of satisfaction with the system. An example questionnaire can be obtained, if required, by contacting Paul Clough.
   
Scenario 1: Given an image (not a caption) from the St Andrews collection, the goal for the searcher is to find the same image again. This aims to allow researchers to study how users describe images and their methods of searching the collection for particular images, e.g. browsing or by conducting specific searches. This task can be used to determine whether the retrieval system is being used in the manner intended by the system designers and determine how the interface helps users formulate their search requests.

The following images have been selected for this task:
  • stand03_1590/stand03_28416 (image)
  • stand03_1974/stand03_11252 (image)
  • stand03_1853/stand03_1915 (image)
  • stand03_1955/stand03_26303 (image)
  • stand03_1825/stand03_26105 (image)

To measure the performance of this task, the following metrics could be used: whether the user could find the intended image or not, the time taken to find the image, the number of steps/iterations required to reach the solution (e.g. the number of clicks or the number of queries), and the proportion of time spent searching (specific queries) or browsing. These factors help to measure the efficiency with which a cross language image retrieval search could be performed, e.g. how quickly or how many clicks were necessary to find the relevant image. Information about how the interface was useful for the user can be obtained from performing a user questionnaire after the task.

Participants are not required to submit anything for this task.
   
Scenario 2: The second scenario is aimed more at assessing the effectiveness of the search system, the goal being to get the user to find as many relevant images as possible on a particular topic. For example the following scenario is derived from topic number 41 of the ad hoc retrieval task:

"Imagine that you are interested in collecting pictures which contain a clearly visible coat of arms somewhere within the image. A coat of arms is a heraldic insignia which typically represent organisations such as families, countries, corporations or trading companies. It does not matter whether the insignia is part of a postcard, or mounted or carved on a building, but it must be clearly visible to be of use."

To measure the performance of this task, i.e. the effectiveness of the retrieval system to carry out the task, we will determine the proportion of relevant documents retrieved by the system with respect to the total number of relevant documents as determined from relevance judgements made by assessors at the University of Sheffield. Given the pool of images for topic number 41 submitted by participants of the ad hoc retrieval task, together with the images classified as relevant in this automatic task, we will create a combined pool of images which will be judged for relevance by our assessors. This will enable participants to determine the proportion of relevant documents retrieved by their system during an interactive search.

Other metrics that may also be useful include the proportion of time spent seaching versus browsing and the time taken to find each relevant image. Participants will be able to calculate the number of relevant documents found after we have performed relevance assessments and released the results.

Participants are requested to submit the results of the interactive searches from the users involved in the evaluation.

In both scenarios, native speakers of languages other than English should be able to interact with your image retrieval system in a language of their choice (we suggest limiting it to one of the 5 European languages used in task 1: French, German, Italian, Spanish or Dutch). It is likely that users will want to browse through images in the collection and participants are encouraged to explore different interfaces, or ways of organising the collection to support these tasks. For example you may want to experiment with approaches for browsing versus searching, clustering images, caption similarity searches, relevance feedback and maybe even various methods of user input mechanisms, e.g. sketching the required image.

We suggest the scenarios can be used to address at least three aspects of cross language image retrieval which may affect the overall retrieval performance (but not necessarily be due to good or bad effectiveness of the retrieval system itself):
   
(1) How the CLIR system supports user query formulation for images with English captions, particularly for users in their native language which may be non-English. This is also an opportunity to study how the images themselves could also be used as part of the query formulation.
 
(2) Whether the CLIR system supports query re-formulation, e.g. the support of positive and negative feedback to improve the user's search experience, and how this affects retrieval.
   
(3) How well the CLIR system presents the retrieval results to the user to enable selection of relevant images. This might include how the system presents the caption to the user (particularly if they are not familiar with English or some of the specific and colloquial language used in the captions) and investigate the relationship between the image and caption for retrieval purposes.

We encourage users to look at the iCLEF track guidelines for further advice on how to perform interactive retrieval experiments and no formal evaluation of this task will take place (except for the relevance assessments for the second scenario), rather participants are encouraged to discuss with others what they have learned from these scenarios.

Important dates
   
Registration open: 15 January 2003
Data release: February 2003
Topic release: 11 April 2003
Submission of runs by participants for task 1 23 May 2003
Submission of runs by participants for task 2 30 May 2003
Release of relevance assessments and results 1 July 2003
Submission of paper for working notes: 20 July 2003
Workshop: 21-22 August 2003
   


The image collection: ESTA

The image collection for this task consists of 28,133 images crawled from the St Andrews University Library photographic collection and arranged onto 2 CDs which can be obtained from the contacts. Before receiving the data, you must fill in a CLEF agreement form available from Carol Peters. Once we are notified of your agreement, we will send you the CDs and include you on the ImageCLEF mailing list.

Submission

The submission guidelines can be found here. Submission is only required for task 1; the interactive task is not formally evaluated by us. Any papers written by yourselves and our evaluation of your retrieval results will be published in the CLEF proceedings. The deadline for submitting papers is 20th July.


Contacts for this track

Paul Clough, University of Sheffield (p.d.clough@sheffield.ac.uk)
Mark Sanderson, University of Sheffield (m.sanderson@sheffield.ac.uk)

We have set up a mailing list: imageclef@sheffield.ac.uk for participants. Please email the above contacts to be added to the list.

Results from ImageCLEF 2003

Four groups participated in ImageCLEF 2003 including the University of Surrey, Daedalus (Spain) and the NTU (China) and ourselves. A summary of the results can be found in our ImageCLEF overview paper (Clough,2003). For more information about participants entries, see the CLEF web site.

The relevance judgements (qrels) are now available to download for the ImageCLEF 2003 ad hoc task.

Publications

Clough, P.D. and Sanderson, M.(2003), The CLEF 2003 Cross Language Image Retrieval Track, In Submission, Cross Language Evaluation Forum (CLEF) 2003, Trondheim, Norway.