| Introduction |
The task is
similar to the classic TREC ad-hoc retrieval task, in that we simulate the
situation in which a system knows the set of documents to be searched, but
cannot anticipate the particular topic that will be investigated (i.e. topics
are not known to the system in advance). The task is bilingual in that the
collection is in English and queries are in different languages requiring
translation from X to English. This simulates the scenario in which a library
like St Andrews in Scotland wishes to provide multilingual access to their
existing image archive. The goal of the ImageCLEF ad-hoc task is to
retrieve as many relevant images as possible from the
St. Andrews image
collection given multilingual topics. Any method can be used to retrieve
relevant documents and we encourage the use of both text and content-based
retrieval methods. We would like to determine how text and image attributes can
be combined to enhance cross-language image retrieval in this kind of
domain.
For more
information see the main ImageCLEF
website.
|
| Topics |
For this task, we
provide a list of topic statements and a collection of images with
semi-structured captions in English (the target language). The English version
of the topics consist of a title (a short sentence or phrase describing the
search request in a few words), and a narrative (a description of what
constitutes a relevant or non-relevant image for that search request). For
example:
<top>
<num> Number: 1
</num>
<title> aircraft on the ground
</title>
<narr> Relevant images will show one or more airplanes
positioned on the ground. Aircraft do not have to be the focus of the picture,
although it should be possible to make out that the picture contains aircraft.
Pictures of aircraft flying are not relevant and pictures of any other flying
object (e.g. birds) are not relevant. </narr>
</top>
The
topics are encapsulated by the <top> tags and the ImageCLEF 2005 topics
are numbered from 1 to 28. The short title is between the <title> tags
and the longer narrative description between the <narr> tags. Both title
and narrative have been translated into the following languages: German,
French, Italian, Spanish (European), Spanish (Latin American), Chinese
(Simplified), Chinese (Traditional) and Japanese. Translations have also been
produced for ImageCLEF for the titles only and these are available in 23
languages including: Russian, Croatian, Bulgarian, Hebrew, Arabic and
Norwegian. ImageCLEF 2005
ad-hoc topics: can be downloaded here:
With each topic we
have also included two example images which we envisage could be used for
relevance feedback (both manual and automatic) and query-by-example searches.
For example, topic 1 is also described by the following two images:
|
| Relevance
Assessments |
Relevance
assessments are performed by students and staff at the University of Sheffield.
Submissions are used to create image pools which are judged for relevance by
assessors. The pools are assessed and the end result is a set of relevance
assessments called qrels. These are then used to evaluate system performance
and compare submissions.
For more information about this procedure and the qrels sets see
this paper: "The CLEF 2003
Cross Language Image Retrieval Track" and "The CLEF 2004
Cross Language Image Retrieval Track" . Relevance
assessment for the more general topics is based
entirely on the visual content of images (e.g. aircraft on the ground).
However, certain topics also require the use of the caption to make a confident
decision (e.g. "pictures of North Street St Andrews"). What constitutes a
relevant image is a subjective decision, but typically a relevant image will
have the subject of the topic in the foreground, the image will not be too dark
in contrast, and maybe the caption confirms the judge's decision.
The assessment of
images in ImageCLEF is based on using a ternary classification scheme: (1)
relevant, (2) partially relevant and (3) not relevant. The aim of the ternary
scheme is to help assessors in making their relevance judgements more accurate
(e.g. an image is definitely relevant in some way, but maybe the query object
is not directly in the foreground: it is therefore considered partially
relevant). Various combinations of assessor judgements are used to create the
qrels sets and more information can be found from the links given above.
|
| Experiments |
Experiments are performed as follows: participants are given topics,
these are used to create a query which is used to perform retrieval on the
image collection. This process iterates (e.g. maybe involving relevance
feedback) until you are satisfied with your runs. You might try different
methods to increase the number of relevant in the top N rank positions (e.g.
query expansion). You can repeat these different methods for each query
language. You then submit your runs to
ImageCLEF for evaluation. In case we are unable to analyse all of your runs,
please indicate which one you would like us to evaluate for each query
language. We will compare your runs for each language with other systems using
your submission which scores highest. We distinguish between manual and automatic submissions.
Automatic runs will involve no user interaction; whereby manual runs are those
in which a human has been involved in query construction and the iterative
retrieval process, e.g. manual relevance feedback is performed. We encourage
groups who want to investigate manual intervention further to participate in
the interactive evaluation. The initial search should be a text
search, but thereafter content-based systems can also be used to enhance
retrieval. We are willing to relax this constraint for participants who want to
experiment with a purely visual approach, but this is not the preferred
submission. One of the main interested of the ImageCLEF ad hoc task is
investigating various methods of query translation and how features derived
from the image captions and images themselves can be combined to enhance
retrieval.
Participants are also free to experiment with
whatever methods they wish for CLIR and image retrieval, e.g. query expansion
based on thesaurus lookup or relevance feedback, indexing and retrieval on only
part of the image caption, different models of retrieval, different translation
resources (e.g. dictionary-based vs. MT), and combining text and content-based
methods for retrieval. Given the many different possible approaches which could
be used to perform the ad hoc retrieval, rather than list all of these we will
ask you to indicate which of the following applies to each of your runs
(we consider these the "main" dimensions which define the query for this ad hoc
task):
| |
|
|
Query language |
English |
non-English (state which) |
| Initial query |
title |
narrative |
| Query type |
automatic |
manual |
| Feedback/ expansion |
without |
with |
| Modality |
text |
image |
We would ask that you submit
a baseline run with which to compare your other submissions. According to the
previous table this would be classed as: english+title+automatic+
without+text. It is extremely important that we can get a description of
the techniques that you use for all runs. This should be as detailed as
possible to ease the comparison or classification of techniques and results.
The final proceedings will be published in Springer Lecture Notes on Computer
Science. It is probably easier if you use the
LNCS
templates for the submission of your results.
For this task, participants
are required to submit ranked lists of (up tp) the top 1000 images ranked in
descending order of similarity (i.e. the highest nearer the top of the list).
Participants can submit (via email) as many system runs as they require, but
should indicate their best runs for each language as we can only guarantee
evaluation of this alone. The format of submissions for this ad-hoc task can be
found here
and the filenames should distinguish different types of submission (e.g.
with/without feedback). Please note that there should be at least 1 document
entry in your results for each topic (i.e. if your system returns no
results for a query then insert a dummy entry, e.g. 25 1
stand03_118/stand03_20631 0 4238 xyzT10af5 ). The reason for this is to make
sure that all systems are compared with the same number of topics and relevant
documents. Ranked lists from
participants for this ad hoc task will be evaluated using trec_eval by
including recall and precision at various cut-off levels plus single-value
summaries derived from precision and recall, i.e. mean average precision and
R-precision. We will publish results in a manner similar to the way in which
NIST publishes the results from TREC.
|
| Provided Data and
Systems |
Training data
Topics from ImageCLEF 2004 are
available as training data from here:
(1)
Textual
versions of the topics (same format as this year). (2) Example
images (1 per topic). (3)
Relevance
judgements (qrels) for the total-pisec qrels set (used in ImageCLEF
2004)
GIFT/Viper default CBIR
system
To enable participation to the ad hoc task to those without access to
their own CBIR system, we provide access to the
GIFT/Viper image retrieval
system via an http link. The St. Andrews collection has been indexed and a
test interface is provided
here.
In addition, for those interested in using CBIR techniques, but do not want to
use GIFT/Viper, a list of the top N images returned by GIFT/Viper for each test
image can be downloaded here. This
can be used to retrieve an initial set of images based on visual similarity,
then captions can be used to retrieve further images. For more information
about using the GIFT/Viper system in ImageCLEF please contact Henning Mueller (henning.mueller@sim.hcuge.ch).
|
|