The task is similar to the
classic TREC ad hoc retrieval task, in that we simulate the situation
in which a system knows the set of documents to be searched, but cannot
anticipate the particular topic that will be investigated (i.e. topics
are not known to the system in advance).
The goal of the ad hoc task is to retrieve as many relevant images as
possible from the St. Andrews image collection
given multilingual topics. Any method can be used to retrieve relevant
documents and we encourage the use of both text and content-based
retrieval methods. We would like to determine how text and image
attributes can be combined to enhance cross-language image image
retrieval in this kind of domain.
For this task, we provide a
list of topic statements and a collection of images with
semi-structured captions in English (the target language). The English
version of the topics consist of a title (a short sentence or phrase
describing the search request in a few words), and a narrative (a
description of what constitutes a relevant or non-relevant image for
that search request). We also include an example image which we
envisage could be used for relevance feedback (both manual and
automatic) and query-by-example searches. An example topic is shown
below (showing only the topic title and example image):
The titles only of
each topic have been translated into 12 languages (the source
language): Spanish, Italian, German, French, Dutch, Danish, Swedish,
Finnish, Chinese, Japanese, Russian and Arabic by native speakers, and
variations on titles are included as part of the topic statement. If
participants have access to their own translators they can translate
the English topic narrative into a different language (and ideally
share with other participants!). The topics are available individually
below (only the English topics contain a title and narrative) and more
information about their format is available here.
| Relevance assessments
Relevance assessments are
performed by students and staff at the University of Sheffield.
Submissions are used to create image pools which are judged for
relevance by assessors. The pools are assessed and the end result is a
set of relevance assessments called qrels. These are then used to
evaluate system performance and compare submissions. For more
information about this procedure and the qrels sets see this paper: "The CLEF
2003 Cross Language Image Retrieval Track".
Relevant assessment is primarily based on the image, but
for certain topics the caption is also required to make a decision
(e.g. "pictures of North Street St Andrews"). What constitutes a
relevant image is a subjective decision, but typically a relevant image
will have the subject of the topic in the foreground, the image will
not be too dark in contrast, and maybe the caption confirms the judge's
decision. For example take the following example for the query
"children playing on beaches":
It is likely that most would
judge the image on the top right as relevant because one can clearly
see children on a beach and they appear to be playing (i.e. building
sand castles). On the other hand, the image on the left might not be
considered as relevant (maybe partially relevant) because although the
caption says there are children playing on the beach, they appear on
the background and very difficult to see (even when enlarged). The
example on the bottom right is also likely to be irrelevant because the
contrast is too dark to clearly see the children.
The narratives provided with
the English topics are supposed to help in specifying what constitutes
a relevant image and will, among other things, be given to assessors to
help then judge the topic. Click here
to see the instructions that we gave assessors last year.
Experiments are performed as
follows: participants are given topics, these are used to create a
query which is used to perform retrieval on the image collection. This
process iterates (e.g. maybe involving relevance feedback) until you
are satisfied with your runs. You might try different methods to
increase the number of relevant in the top N rank positions (e.g. query
expansion). You can repeat these different methods for each query
language. You then submit your
runs to ImageCLEF for evaluation. In case we are unable to analyse all
of your runs, please indicate which one you would like us to evaluate
for each query language. We will compare your runs for each language
with other systems using your submission which scores highest.
We distinguish between manual and automatic submissions. Automatic runs
will involve no user interaction; whereby manual runs are those in
which a human has been involved in query construction and the iterative
retrieval process, e.g. manual relevance feedback is performed. We
encourage groups who want to investigate manual intervention further to
participate in the interactive evaluation.
The initial search should be a text search, but thereafter
content-based systems can also be used to enhance retrieval. We are
willing to relax this constraint for participants who want to
experiment with a purely visual approach, but this is not the preferred
submission. One of the main interested of the ImageCLEF ad hoc task is
investigating various methods of query translation and how features
derived from the image captions and images themselves can be combined
to enhance retrieval.
Participants are also free to
experiment with whatever methods they wish for CLIR and image
retrieval, e.g. query expansion based on thesaurus lookup or relevance
feedback, indexing and retrieval on only part of the image caption,
different models of retrieval, different translation resources (e.g.
dictionary-based vs. MT), and combining text and content-based methods
for retrieval. Given the many different possible approaches which could
be used to peform the ad hoc retrieval, rather than list all of these
we will ask you to indicate which of the following applies to each
of your runs (we consider these the "main" dimensions which define the
query for this ad hoc task):
||non-english (state which)
We would ask that you
submit a baseline run with which to compare your other submissions.
According to the previous table this would be classed as: english+title+automatic+
without+text. It is extremely important that we can get a
description of the techniques that you use for all runs. This should be
as detailed as possible to ease the comparison or classification of
techniques and results. The final proceedings will be published in
Springer Lecture Notes on Computer Science. It is probably easier if
you use the LNCS
templates for the submission of your results.
For this task, participants
are required to submit ranked lists of (up tp) the top 1000 images
ranked in descending order of similarity (i.e. the highest nearer the
top of the list). Participants can submit (via email) as many system
runs as they require, but should indicate their best runs for each
language as we can only guarantee evaluation of this alone. The format
of submissions for this ad hoc task can be found here and the filenames should
distinguish different types of submission (e.g. with/without feedback).
Ranked lists from participants
for this ad hoc task will be evaluated using trec_eval by including
recall and precision at various cut-off levels plus single-value
summaries derived from precision and recall, i.e. mean average
precision and R-precision. We will publish results in a manner similar
to the way in which NIST publishes the results from TREC.
You can download a small
training dataset based on topics for ImageCLEF 2003. Click here for a zip file containing 5
topic descriptions including example images, and results from a visual
search with GIFT/Viper based on the example images. The file also contains
the relevance assessments for each topic indicating relevant images
from the St. Andrews collection.
To enable participation to the
ad hoc task to those without access to their own CBIR system, we
provide access to the GIFT/Viper image retrieval system
via an http link.
The St. Andrews collection has been indexed and a test interface is
In addition, for those interested in using CBIR techniques, but do not
want to use GIFT/Viper, a list of the top N images returned by GIFT/Viper for
each test image can be downloaded here.
This can be used to retrieve an initial set of images based on visual
similarity, then captions can be used to retrieve further images. For
more information about using the GIFT/Viper system in ImageCLEF please contact Henning