Results for the medical retrieval task Here are the results for the ImageCLEF 2004 medical retrieval task. If you just want the results click here. Introduction This summarises the process of evaluation and the format of results using the trec_eval tool (we are using the version as supplied to us by UMASS and the ireval.pl Perl script which comes with the Lemur toolkit distribution). A comparison between entries and a discussion of the evaluation procedure will be given in the ImageCLEF 2004 overview paper that will appear in this year's CLEF proceedings. To assess your entries, we did the following:
Three "expert" assessors judged the image pools generated from pooling the submissions. We created 9 sets of relevance sets (qrels) based on the overlap of relevant images between assessors, and whether partially relevant images were included in the qrels set. The partially relevant judgment was used to pick up image where the judge thought it was in some way relevant, but could not be entirely confident. The 9 relevance sets are listed here:
The qrel files contain files which list relevant images in numerically ascending order. Note that the files contain only the last numeric part of the image name (i.e. rather than F_10/12345, the file contains 12345). This does not effect evaluation as this part of the image name is a unique identifier. Qrels for the partial_isec-total qrels set can be found in the TREC (4 column) format here which will work with ireval.pl and trec_eval. EvaluationGiven your submission, we went through a process of identifying and marking documents in the ranked list as relevant or not based on the 9 sets of relevant documents. To enable comparison between other participants, we used a method where relevant documents not found in the top 1000 results are assigned to a rank position starting from 1001. This makes sure that for every topic and each participant we have the same number of relevant documents which makes the scores comparable. We used the UMASS and Lemur versions of the standard trec_eval tool to compute the mean average precision scores for your submission. This provides the "standard" information retrieval evaluation measures, e.g. precision at a given rank cut-off, average precision across 11 recall points, and single-valued summaries for each measure. We have computed the scores across each topic so you can inspect performance for individual queries, as well as across all 26 topics. If you want to evaluate your own systems, follow these instructions. If you need further details of the evaluation process, have any questions or problems with interpreting the results then please don't hesitate to contact Paul Clough. Results Results are based on the partial_isec-total qrels set, that is images are included in the relevance set if they are judged relevant or partially relevant by at least 2 assessors. We have ranked systems based on their uninterpolated mean average precision (MAP) score across all 26 topics. Submissions are listed by run identifier (a list of which run ids relate to which groups can be found here). The results listed include both manual and automatic submissions. For the final presentation of results we will separate results for manual and automatic runs, but this will only be possible when we have your workshop papers describing the results you submitted. A summary of the results can be found in the CLEF overview paper. Automatic (top 5):
Manual (all 9):
Official results(partial_isec-total) [csv] [Excel] You can get a summary of the trec_eval output (called <runid>.res_short) and an output for each topic (called <runid>.res_long) for the partial_isec-total qrels set from here [zip]. What to do next You can continue experimenting with your systems using these results, but please be ready to submit a paper to Carol Peters at CLEF by 15th August. If you are familiar with trec_eval then you should be able to make use of the qrels files in further system evaluation, otherwise you can analyse and use results from the trec_eval output supplied by us. If you need help performing your own evaulation, please contact Paul Clough. Thanks and acknowledgements We thank everyone who participated in ImageCLEF 2004 ot make this such an interesting and successful evaluation. In particular we thank University Hospitals Geneva for the CasImage Collection. What makes this evaluation possible are the relevance assessments and we want to thank Paul Fabry, Tristan Zand and Antoine Rosset. |
||||||||||||||||||||||||||||||||||||||||||||||||
|
Page Maintained by Paul Clough © University of Sheffield 2004 |