Evaluation of image retrieval systems for historic photographic
and medical images
The ImageCLEF interactive
search task provides user-centered evaluation of cross-language image retrieval
systems. For more background information about cross-language image retrieval
and evaluation then click here.
Each participant will compare two interactive cross language image
retrieval systems (one intended as a baseline) that differ in the
facilities provided for interactive retrieval. For
example comparing the use of visual versus textual features in query
formulation and refinement.
As a cross-language image retrieval task, the initial query should be
in a language different from the collection (i.e. not English) and translated
into English for retrieval. Any text displayed to the user must be translated into
the user's source language. This might include captions, summaries, pre-defined
image categories etc.
A minimum of 8 users (native speakers in the source
language) and 16 images (used as topics) are required for this task (we supply
Here are a couple of suggestions on how you might
perform your experiments:
I want to test a different
method for displaying images to users (e.g. grouping images by categories or
visual features). My baseline system will be one in which images are displayed
to users using a ranked list and the source language will be Spanish (all text
displayed to the user - including image captions - will also be translated into
Spanish). To run my experiment, all users perform interactive searching with
each system using 8 topics (according to the experimental setup provided in the
I want to test a method for
query translation (e.g. dictionary-lookup). Two versions of the same system will be compared: a version in
English which acts as the monolingual baseline, and a version in a selected
source language. To reduce the variables which could affect the results, I
display all text in English (or display only images as results to the users).
Again, to run my experiment all users test
search with each system using 8 topics according to the experimental setup
provided in the following text.
|Scenario and image tasks
an image (not including the caption) from the St Andrews
collection, the goal for the searcher is to find the same image again using
a Cross-Language image retrieval system. This models the situation in which a user searches with a specific
image in mind (perhaps they have seen it before) but without knowing key
information thereby requiring them to describe the image instead, e.g. searches
for a familiar painting whose title and painter are unknown. The 16 images
(used for each search task) are as follows:
| TOPIC 5
| TOPIC 9
*(NB - the
black box in the larger version of these images is to hide text on the
Background information about the experiment should be described to
users before starting the experiments. For example, you could use something
In this task
we will show you 16 different images, one at a time, using two different
Cross-Language image retrieval systems. The pictures cover a variety of topics
and are taken from the St Andrews historic photographic collection. When we
show you each image, we will ask you to search the collection and try and find
that same image again. We will let you keep the image to refer to during your
search. This known-item search is aimed at modelling the scenario in which you
know the image you want from the collection, but don't have it to hand; you
know it exists in the collection but can't remember the exact person, location
or name of the object in the image. You can browse and search for the image any
way you want and you have a maximum of 5 minutes to find each image. You can
stop searching when you have found it. We want to observe how our system
supports this kind of task, what words/phrases you use to describe the images
and whether you are successful in finding the required images or not.
note that it is a good idea to let users search the collection prior to
starting the experiments to let them get a feel for its contents. More
information about the collection which you could give to people can be found
here. It is also a good idea to iterate to users that
they can search using any part of the image, i.e. objects in the foreground and
Experiment instructions for
The interactive ImageCLEF
task is run similar to
using a similar experimental procedure. However, because of the type of
evaluation (i.e. whether known items are found or not), the experimental
procedure for iCLEF 2004
(Q&A) is also very relevant and we make use of both iCLEF
Given the 16 topics shown above, participants get the 8 users to test each system with 8
topics. Users are given a maximum of 5 mins only to find each image.
Topics and systems will be presented to the
user in combinations following a
design to ensure user/topic and system/topic interactions are minimised.
procedure given in iCLEF 2004 is to be followed.
duration is slightly different than for iCLEF and participants should use the
following as a guideline:
|Tutorials (2 systems)
||30 minutes total
|Searching (system A, 8
||40 minutes (5
|Searching (system B, 8
||40 minutes (5
are a recommended way of obtaining feedback from the user about their level of
satisfaction with the system. There is no fixed questionnaire, but you can
iCLEF 2003 to give you some ideas for ImageCLEF. These correspond to the
surveys suggested in the above procedure, but may need some modification to
suit the image retrieval task.
To measure the performance of this task, the following metrics will
be used: whether the user could find the intended image or not, the time taken
to find the image, the number of steps/iterations required to reach the
solution (e.g. the number of clicks or the number of queries), and the number
of images displayed to the user. For each topic, we require that you summarise
your system and provide us with this information.
These factors help to measure the
efficiency with which a cross language image retrieval search could be
performed, e.g. how quickly or how many queries were necessary to find the
relevant image. Information about how the interface was useful for the user can
be obtained from performing a user questionnaire after the
|Please provide us with a basic description of your two systems
highlighting their main features and the language used for searching.
For each topic, please state information
for the measures given above: whether the user found the image or not
(mandatory), the time taken to find the image (mandatory), the number of
queries (optional), and the number of images displayed to the user (optional).
We will normalise some of these scores (e.g. the time taken) across all
submissions to compare systems.
Please submit user results in
using XML and this DTD for submission
format. An example submission can be found here.
Please can you submit your results by: June 22nd.
We follow the timetable given in
Clough, Department of Information Studies, University of Sheffield, UK (email@example.com)
NLP Group, UNED, Spain (firstname.lastname@example.org)
Department of Information Studies, University of Sheffield, UK (email@example.com)