ImageCLEF evaluation --------------------- Paul Clough (p.d.clough@sheffield.ac.uk) To evaluate your own results you need: (1) the relevance assessments (a qrels file) and (2) trec_eval from NIST or the attached Perl script from the Lemur IR toolkit called ireval.pl (both give the same results for average precision but ireval.pl does not compute statistical measures). The qrels files are in the standard TREC 4 column format. You run ireval.pl as follows: % perl ireval.pl -j -trec < your_input_file The input file of your results (your_input_file) should be in the trec 5 column format, e.g. 1 Q0 stand03_1590/stand03_28035.txt 1 -1.72264 system_name 1 Q0 stand03_1588/stand03_28472.txt 2 -2.10279 system_name 1 Q0 stand03_1588/stand03_28021.txt 3 -2.46973 system_name 1 Q0 stand03_1577/stand03_28488.txt 4 -2.49392 system_name 1 Q0 stand03_1589/stand03_28486.txt 5 -2.67261 system_name 1 Q0 stand03_1579/stand03_28043.txt 6 -2.72549 system_name 1 Q0 stand03_1586/stand03_28176.txt 7 -2.78748 system_name 1 Q0 stand03_1580/stand03_28023.txt 8 -2.97151 system_name 1 Q0 stand03_1579/stand03_10791.txt 9 -3.11026 system_name of for the medical task: 1 Q0 f_11/10952 1 0.524393 system_name 1 Q0 f_9/8970 2 0.522450 system_name 1 Q0 f_10/10341 3 0.518187 system_name 1 Q0 f_10/10082 4 0.518032 system_name 1 Q0 f_12/12354 5 0.518029 system_name Note the -trec option is important for making the script run correctly. Please note! ------------- To obtain comparable results with ImageCLEF, you must ensure that you provide at least one retrieved document for EVERY topic (that is your resuls contain something for all topics). This is very important to ensure that the ireval.pl tool (and trec_eval) compute precision and recall based on ALL the relevant documents. If this is not done, the results are incomparable across different results because you may have a different number of relevant documents.