ImageCLEF 2005

Evaluation of image retrieval systems for historic photographic and medical images



Interactive search task
 

Procedure

The ImageCLEF interactive search task provides user-centered evaluation of cross-language image retrieval systems. For more background information about cross-language image retrieval and evaluation then click here.

Each participant will compare two interactive cross language image retrieval systems (one intended as a baseline) that differ in the facilities provided for interactive retrieval. For example comparing the use of visual versus textual features in query formulation and refinement.

As a cross-language image retrieval task, the initial query should be in a language different from the collection (i.e. not English) and translated into English for retrieval. Any text displayed to the user must be translated into the user's source language. This might include captions, summaries, pre-defined image categories etc.

A minimum of 8 users (native speakers in the source language) and 16 images (used as topics) are required for this task (we supply the topics).


Example experiments

Here are a couple of suggestions on how you might perform your experiments:

Example 1:

I want to test a different method for displaying images to users (e.g. grouping images by categories or visual features). My baseline system will be one in which images are displayed to users using a ranked list and the source language will be Spanish (all text displayed to the user - including image captions - will also be translated into Spanish). To run my experiment, all users perform interactive searching with each system using 8 topics (according to the experimental setup provided in the following text).


Example 2:

I want to test a method for query translation (e.g. dictionary-lookup). Two versions of the same system will be compared: a version in English which acts as the monolingual baseline, and a version in a selected source language. To reduce the variables which could affect the results, I display all text in English (or display only images as results to the users). Again, to run my experiment all users test search with each system using 8 topics according to the experimental setup provided in the following text.

Scenario and image tasks
Given an image (not including the caption) from the St Andrews collection, the goal for the searcher is to find the same image again using a Cross-Language image retrieval system. This models the situation in which a user searches with a specific image in mind (perhaps they have seen it before) but without knowing key information thereby requiring them to describe the image instead, e.g. searches for a familiar painting whose title and painter are unknown. The 16 images (used for each search task) are as follows:
TOPIC 1 TOPIC 2 TOPIC 3  TOPIC 4
 
 TOPIC 5 TOPIC 6 TOPIC 7 TOPIC 8
 
*
 TOPIC 9 TOPIC 10 TOPIC 11 TOPIC 12

*
TOPIC 13 TOPIC 14 TOPIC 15 TOPIC 16

*
*(NB - the black box in the larger version of these images is to hide text on the postcard)

Background information about the experiment should be described to users before starting the experiments. For example, you could use something like this:
In this task we will show you 16 different images, one at a time, using two different Cross-Language image retrieval systems. The pictures cover a variety of topics and are taken from the St Andrews historic photographic collection. When we show you each image, we will ask you to search the collection and try and find that same image again. We will let you keep the image to refer to during your search. This known-item search is aimed at modelling the scenario in which you know the image you want from the collection, but don't have it to hand; you know it exists in the collection but can't remember the exact person, location or name of the object in the image. You can browse and search for the image any way you want and you have a maximum of 5 minutes to find each image. You can stop searching when you have found it. We want to observe how our system supports this kind of task, what words/phrases you use to describe the images and whether you are successful in finding the required images or not.

Please note that it is a good idea to let users search the collection prior to starting the experiments to let them get a feel for its contents. More information about the collection which you could give to people can be found here. It is also a good idea to iterate to users that they can search using any part of the image, i.e. objects in the foreground and background.


Experiment instructions for participants

The interactive ImageCLEF task is run similar to iCLEF 2003 using a similar experimental procedure. However, because of the type of evaluation (i.e. whether known items are found or not), the experimental procedure for iCLEF 2004 (Q&A) is also very relevant and we make use of both iCLEF procedures.

Given the 16 topics shown above, participants get the 8 users to test each system with 8 topics. Users are given a maximum of 5 mins only to find each image. Topics and systems will be presented to the user in combinations following a latin-square design to ensure user/topic and system/topic interactions are minimised. The experimental procedure given in iCLEF 2004 is to be followed.

The experiment duration is slightly different than for iCLEF and participants should use the following as a guideline:

Introductory stuff 10 minutes
Initial survey 5 minutes
Tutorials (2 systems) 30 minutes total
Break 10 minutes
Searching (system A, 8 topics) 40 minutes (5 mins/img)
Post/system survey 5 minutes
Break 10 minutes
Searching (system B, 8 topics) 40 minutes (5 mins/img)
Post/system survey 5 minutes
Final survey 10 minutes

User questionnaires are a recommended way of obtaining feedback from the user about their level of satisfaction with the system. There is no fixed questionnaire, but you can use the questionnaires from iCLEF 2003 to give you some ideas for ImageCLEF. These correspond to the surveys suggested in the above procedure, but may need some modification to suit the image retrieval task.

To measure the performance of this task, the following metrics will be used: whether the user could find the intended image or not, the time taken to find the image, the number of steps/iterations required to reach the solution (e.g. the number of clicks or the number of queries), and the number of images displayed to the user. For each topic, we require that you summarise your system and provide us with this information. These factors help to measure the efficiency with which a cross language image retrieval search could be performed, e.g. how quickly or how many queries were necessary to find the relevant image. Information about how the interface was useful for the user can be obtained from performing a user questionnaire after the task.

What to submit
Please provide us with a basic description of your two systems highlighting their main features and the language used for searching. For each topic, please state information for the measures given above: whether the user found the image or not (mandatory), the time taken to find the image (mandatory), the number of queries (optional), and the number of images displayed to the user (optional). We will normalise some of these scores (e.g. the time taken) across all submissions to compare systems.

Please submit user results in using XML and this DTD for submission format. An example submission can be found here.

Please can you submit your results by: June 22nd.

We follow the timetable given in iCLEF 2005.

Organisers
 
Paul Clough, Department of Information Studies, University of Sheffield, UK (p.d.clough@sheffield.ac.uk)


Julio Gonzalo, NLP Group, UNED, Spain (julio@lsi.uned.es)


Daniela Petrelli, Department of Information Studies, University of Sheffield, UK (d.petrelli@sheffield.ac.uk)



Last Modified: May 18th 2005

By: Paul Clough