CLEF logo
The Bilingual Ad Hoc Retrieval Task
Introduction

The task is similar to the classic TREC ad hoc retrieval task, in that we simulate the situation in which a system knows the set of documents to be searched, but cannot anticipate the particular topic that will be investigated (i.e. topics are not known to the system in advance).

The goal of the ad hoc task is to retrieve as many relevant images as possible from the St. Andrews image collection given multilingual topics. Any method can be used to retrieve relevant documents and we encourage the use of both text and content-based retrieval methods. We would like to determine how text and image attributes can be combined to enhance cross-language image image retrieval in this kind of domain.



Topics

For this task, we provide a list of topic statements and a collection of images with semi-structured captions in English (the target language). The English version of the topics consist of a title (a short sentence or phrase describing the search request in a few words), and a narrative (a description of what constitutes a relevant or non-relevant image for that search request). We also include an example image which we envisage could be used for relevance feedback (both manual and automatic) and query-by-example searches. An example topic is shown below (showing only the topic title and example image):



The titles only of each topic have been translated into 12 languages (the source language): Spanish, Italian, German, French, Dutch, Danish, Swedish, Finnish, Chinese, Japanese, Russian and Arabic by native speakers, and variations on titles are included as part of the topic statement. If participants have access to their own translators they can translate the English topic narrative into a different language (and ideally share with other participants!). The topics are available individually below (only the English topics contain a title and narrative) and more information about their format is available here.

English 
         
 
French

German 

Spanish 

Italian 
 
Chinese

Dutch
 
Danish
 
Finnish
 
Swedish
 
Japanese
 
Russian

Arabic
Download all languages

Example images
       


Relevance assessments

Relevance assessments are performed by students and staff at the University of Sheffield. Submissions are used to create image pools which are judged for relevance by assessors. The pools are assessed and the end result is a set of relevance assessments called qrels. These are then used to evaluate system performance and compare submissions. For more information about this procedure and the qrels sets see this paper: "The CLEF 2003 Cross Language Image Retrieval Track".

Relevant assessment is primarily based on the image, but for certain topics the caption is also required to make a decision (e.g. "pictures of North Street St Andrews"). What constitutes a relevant image is a subjective decision, but typically a relevant image will have the subject of the topic in the foreground, the image will not be too dark in contrast, and maybe the caption confirms the judge's decision. For example take the following example for the query "children playing on beaches":


It is likely that most would judge the image on the top right as relevant because one can clearly see children on a beach and they appear to be playing (i.e. building sand castles). On the other hand, the image on the left might not be considered as relevant (maybe partially relevant) because although the caption says there are children playing on the beach, they appear on the background and very difficult to see (even when enlarged). The example on the bottom right is also likely to be irrelevant because the contrast is too dark to clearly see the children.

The narratives provided with the English topics are supposed to help in specifying what constitutes a relevant image and will, among other things, be given to assessors to help then judge the topic. Click here to see the instructions that we gave assessors last year.


Experiments

Experiments are performed as follows: participants are given topics, these are used to create a query which is used to perform retrieval on the image collection. This process iterates (e.g. maybe involving relevance feedback) until you are satisfied with your runs. You might try different methods to increase the number of relevant in the top N rank positions (e.g. query expansion). You can repeat these different methods for each query language. You then submit your runs to ImageCLEF for evaluation. In case we are unable to analyse all of your runs, please indicate which one you would like us to evaluate for each query language. We will compare your runs for each language with other systems using your submission which scores highest.

We distinguish between manual and automatic submissions. Automatic runs will involve no user interaction; whereby manual runs are those in which a human has been involved in query construction and the iterative retrieval process, e.g. manual relevance feedback is performed. We encourage groups who want to investigate manual intervention further to participate in the interactive evaluation.

The initial search should be a text search, but thereafter content-based systems can also be used to enhance retrieval. We are willing to relax this constraint for participants who want to experiment with a purely visual approach, but this is not the preferred submission. One of the main interested of the ImageCLEF ad hoc task is investigating various methods of query translation and how features derived from the image captions and images themselves can be combined to enhance retrieval.


Participants are also free to experiment with whatever methods they wish for CLIR and image retrieval, e.g. query expansion based on thesaurus lookup or relevance feedback, indexing and retrieval on only part of the image caption, different models of retrieval, different translation resources (e.g. dictionary-based vs. MT), and combining text and content-based methods for retrieval. Given the many different possible approaches which could be used to peform the ad hoc retrieval, rather than list all of these we will ask you to indicate which of the following applies to each of your runs (we consider these the "main" dimensions which define the query for this ad hoc task):

    Query
language
english non-english (state which)
    Initial query title narrative
    Query type automatic manual
    Feedback/ expansion  without with
    Modality text image

We would ask that you submit a baseline run with which to compare your other submissions. According to the previous table this would be classed as: english+title+automatic+ without+text. It is extremely important that we can get a description of the techniques that you use for all runs. This should be as detailed as possible to ease the comparison or classification of techniques and results. The final proceedings will be published in Springer Lecture Notes on Computer Science. It is probably easier if you use the LNCS templates for the submission of your results.

For this task, participants are required to submit ranked lists of (up tp) the top 1000 images ranked in descending order of similarity (i.e. the highest nearer the top of the list). Participants can submit (via email) as many system runs as they require, but should indicate their best runs for each language as we can only guarantee evaluation of this alone. The format of submissions for this ad hoc task can be found here and the filenames should distinguish different types of submission (e.g. with/without feedback).

Ranked lists from participants for this ad hoc task will be evaluated using trec_eval by including recall and precision at various cut-off levels plus single-value summaries derived from precision and recall, i.e. mean average precision and R-precision. We will publish results in a manner similar to the way in which NIST publishes the results from TREC.


Training data

You can download a small training dataset based on topics for ImageCLEF 2003. Click here for a zip file containing 5 topic descriptions including example images, and results from a visual search with GIFT/Viper based on the example images. The file also contains the relevance assessments for each topic indicating relevant images from the St. Andrews collection.


CBIR assistance

To enable participation to the ad hoc task to those without access to their own CBIR system, we provide access to the GIFT/Viper image retrieval system via an http link. The St. Andrews collection has been indexed and a test interface is provided here. In addition, for those interested in using CBIR techniques, but do not want to use GIFT/Viper, a list of the top N images returned by GIFT/Viper for each test image can be downloaded here. This can be used to retrieve an initial set of images based on visual similarity, then captions can be used to retrieve further images. For more information about using the GIFT/Viper system in ImageCLEF please contact Henning Mueller (henning.mueller@sim.hcuge.ch).



Last Modified: May 2004 By: Paul Clough