ImageCLEF 2005

Evaluation of image retrieval systems for historic photographic and medical images



Bilingual ad-hoc retrieval task

Introduction

The task is similar to the classic TREC ad-hoc retrieval task, in that we simulate the situation in which a system knows the set of documents to be searched, but cannot anticipate the particular topic that will be investigated (i.e. topics are not known to the system in advance). The task is bilingual in that the collection is in English and queries are in different languages requiring translation from X to English. This simulates the scenario in which a library like St Andrews in Scotland wishes to provide multilingual access to their existing image archive.

The goal of the ImageCLEF ad-hoc task is to retrieve as many relevant images as possible from the St. Andrews image collection given multilingual topics. Any method can be used to retrieve relevant documents and we encourage the use of both text and content-based retrieval methods. We would like to determine how text and image attributes can be combined to enhance cross-language image retrieval in this kind of domain.


For more information see the main ImageCLEF website.
Topics

For this task, we provide a list of topic statements and a collection of images with semi-structured captions in English (the target language). The English version of the topics consist of a title (a short sentence or phrase describing the search request in a few words), and a narrative (a description of what constitutes a relevant or non-relevant image for that search request). For example:

<top>

<num> Number: 1 </num>

<title> aircraft on the ground </title>

<narr> Relevant images will show one or more airplanes positioned on the ground. Aircraft do not have to be the focus of the picture, although it should be possible to make out that the picture contains aircraft. Pictures of aircraft flying are not relevant and pictures of any other flying object (e.g. birds) are not relevant. </narr>

</top>


The topics are encapsulated by the <top> tags and the ImageCLEF 2005 topics are numbered from 1 to 28. The short title is between the <title> tags and the longer narrative description between the <narr> tags. Both title and narrative have been translated into the following languages: German, French, Italian, Spanish (European), Spanish (Latin American), Chinese (Simplified), Chinese (Traditional) and Japanese. Translations have also been produced for ImageCLEF for the titles only and these are available in 23 languages including: Russian, Croatian, Bulgarian, Hebrew, Arabic and Norwegian.
ImageCLEF 2005 ad-hoc topics: can be downloaded here: With each topic we have also included two example images which we envisage could be used for relevance feedback (both manual and automatic) and query-by-example searches. For example, topic 1 is also described by the following two images:
     
 


Relevance Assessments

Relevance assessments are performed by students and staff at the University of Sheffield. Submissions are used to create image pools which are judged for relevance by assessors. The pools are assessed and the end result is a set of relevance assessments called qrels. These are then used to evaluate system performance and compare submissions.

For more information about this procedure and the qrels sets see this paper: "The CLEF 2003 Cross Language Image Retrieval Track" and "The CLEF 2004 Cross Language Image Retrieval Track"
.
Relevance assessment for the more general topics is based entirely on the visual content of images (e.g. aircraft on the ground). However, certain topics also require the use of the caption to make a confident decision (e.g. "pictures of North Street St Andrews"). What constitutes a relevant image is a subjective decision, but typically a relevant image will have the subject of the topic in the foreground, the image will not be too dark in contrast, and maybe the caption confirms the judge's decision.

The assessment of images in ImageCLEF is based on using a ternary classification scheme: (1) relevant, (2) partially relevant and (3) not relevant. The aim of the ternary scheme is to help assessors in making their relevance judgements more accurate (e.g. an image is definitely relevant in some way, but maybe the query object is not directly in the foreground: it is therefore considered partially relevant). Various combinations of assessor judgements are used to create the qrels sets and more information can be found from the links given above.

Experiments
Experiments are performed as follows: participants are given topics, these are used to create a query which is used to perform retrieval on the image collection. This process iterates (e.g. maybe involving relevance feedback) until you are satisfied with your runs. You might try different methods to increase the number of relevant in the top N rank positions (e.g. query expansion). You can repeat these different methods for each query language. You then submit your runs to ImageCLEF for evaluation. In case we are unable to analyse all of your runs, please indicate which one you would like us to evaluate for each query language. We will compare your runs for each language with other systems using your submission which scores highest.

We distinguish between manual and automatic submissions. Automatic runs will involve no user interaction; whereby manual runs are those in which a human has been involved in query construction and the iterative retrieval process, e.g. manual relevance feedback is performed. We encourage groups who want to investigate manual intervention further to participate in the interactive evaluation.

The initial search should be a text search, but thereafter content-based systems can also be used to enhance retrieval. We are willing to relax this constraint for participants who want to experiment with a purely visual approach, but this is not the preferred submission. One of the main interested of the ImageCLEF ad hoc task is investigating various methods of query translation and how features derived from the image captions and images themselves can be combined to enhance retrieval.


Participants are also free to experiment with whatever methods they wish for CLIR and image retrieval, e.g. query expansion based on thesaurus lookup or relevance feedback, indexing and retrieval on only part of the image caption, different models of retrieval, different translation resources (e.g. dictionary-based vs. MT), and combining text and content-based methods for retrieval. Given the many different possible approaches which could be used to perform the ad hoc retrieval, rather than list all of these we will ask you to indicate which of the following applies to each of your runs (we consider these the "main" dimensions which define the query for this ad hoc task):
     
Query
language
English non-English (state which)
Initial query title narrative
Query type automatic manual
Feedback/ expansion  without with
Modality text image

We would ask that you submit a baseline run with which to compare your other submissions. According to the previous table this would be classed as: english+title+automatic+ without+text. It is extremely important that we can get a description of the techniques that you use for all runs. This should be as detailed as possible to ease the comparison or classification of techniques and results. The final proceedings will be published in Springer Lecture Notes on Computer Science. It is probably easier if you use the LNCS templates for the submission of your results.

For this task, participants are required to submit ranked lists of (up tp) the top 1000 images ranked in descending order of similarity (i.e. the highest nearer the top of the list). Participants can submit (via email) as many system runs as they require, but should indicate their best runs for each language as we can only guarantee evaluation of this alone. The format of submissions for this ad-hoc task can be found here and the filenames should distinguish different types of submission (e.g. with/without feedback). Please note that there should be at least 1 document entry in your results for each topic (i.e. if your system returns no results for a query then insert a dummy entry, e.g. 25 1 stand03_118/stand03_20631 0 4238 xyzT10af5 ). The reason for this is to make sure that all systems are compared with the same number of topics and relevant documents.

Ranked lists from participants for this ad hoc task will be evaluated using trec_eval by including recall and precision at various cut-off levels plus single-value summaries derived from precision and recall, i.e. mean average precision and R-precision. We will publish results in a manner similar to the way in which NIST publishes the results from TREC.

Provided Data and Systems

Training data

Topics from ImageCLEF 2004 are available as training data from here:

(1) Textual versions of the topics (same format as this year).
(2) Example images (1 per topic).
(3) Relevance judgements (qrels) for the total-pisec qrels set (used in ImageCLEF 2004)


GIFT/Viper default CBIR system


To enable participation to the ad hoc task to those without access to their own CBIR system, we provide access to the GIFT/Viper image retrieval system via an http link. The St. Andrews collection has been indexed and a test interface is provided here. In addition, for those interested in using CBIR techniques, but do not want to use GIFT/Viper, a list of the top N images returned by GIFT/Viper for each test image can be downloaded here. This can be used to retrieve an initial set of images based on visual similarity, then captions can be used to retrieve further images. For more information about using the GIFT/Viper system in ImageCLEF please contact Henning Mueller (henning.mueller@sim.hcuge.ch).

Organisers of ImageCLEF ad-hoc


Paul Clough, Department of Information Studies, University of Sheffield, UK (p.d.clough@sheffield.ac.uk)

Michael Grubinger, School of Computer Science and Mathematics, Victoria University, Australia (michael.grubinger@research.vu.edu.au)

Mailing list

We have set up a mailing list: imageclef@sheffield.ac.uk for participants. Please contact Paul Clough to be added to the list.

Last Modified: May 2005

By: Paul Clough