Introduction
|
The St Andrews dataset
consists of 28,133 photographs from
St Andrews
University Library photographic collection which holds one of the largest
and most important collections of historic photography in Scotland.
The
collection numbers in excess of 300,000 images, 10% of which have been
digitised and used for the ImageCLEF ad hoc retrieval task. Photos are
primarily historic in nature from areas in and around Scotland; although
pictures of other locations also exist.
| |
| |
Record
ID: |
JV-A.000460 |
| |
Short
title: |
The
Fountain, Alexandria. |
| |
Long
title: |
Alexandria.
The Fountain. |
| |
Location:
|
Dunbartonshire, Scotland |
| |
Description: |
Street
junction with large ornate fountain with columns, surrounded by rails and lamp
posts at corners; houses and shops. |
| |
Date:
|
Registered
17 July 1934 |
| |
Photographer: |
J Valentine
& Co |
| |
Categories: |
[ columns
unclassified ][ street lamps - or-nate ][ electric street lighting ][ shepherds
& shepherdesses ][ streetscapes ][ shops ] |
| |
Notes: |
JV-A460
jf/mb |
| |
Fig. 1
Example image caption
|
All images have an
accompanying textual description consisting of 8 distinct fields (see, e.g.
Fig.1). These fields can be used individually or collectively to facilitate
image retrieval. The 28,133 captions consist of 44,085 terms and 1,348,474 word
occurrences; the maximum caption length is 316 words, but on average 48 words
in length. All captions are written in British English, although the language
also contains colloquial expressions. Approximately 81% of captions contain
text in all fields, the rest generally without the description field. In most
cases the image description is a grammatical sentence of around 15 words. The
majority of images (82%) are in black and white, although colour images are
also present in the collection.
The
type of information that people typically look for in this collection include
the following:
- Social history, e.g. old
towns and villages, children at play and work.
- Environmental concerns,
e.g. lanscapes and wild plants.
- History of photography,
e.g. particular photographers.
- Architecture, e.g.
specific or general places or buildings.
- Golf, e.g. individual
golfers or tournaments.
- Events, e.g. historic, war
related.
- Transport, e.g. general or
specific roads, bridges etc.
- Ships and shipping, e.g.
particular vessels or fishermen.
More information about the
St Andrews collection as used in ImageCLEF can be found here.
|
Directory structure of the St
Andrews collection
|
Download
the
St Andews data and unzip and untar the archive file. This file contains the
images and captions which are stored under directories in the format:
stand03_[0-9]+, e.g. stand03_1171. The image filenames are in the form
stand03_[0-9]+.jpg, e.g. stand03_15312.jpg and captions in the same format
except eith suffix .txt, e.g. stand03_1171.txt.
The "docs" directory contains
the following files:
- stand03_bigimages.txt
- a list of all large images including their pathnames.
- stand03_thumbnails.txt - a list of all thumbnails and their
pathnames.
- stand03_captions.txt
- a list of all image captions and their pathnames.
- stand03_captions.trec - a single file with all captions in
TREC-style format.
- stand03_guide.pdf -
an introductory document describing the collection.
All images have a
corresponding caption.
|
Format of the
captions
|
The captions are stored
in the directories with the images as plain text files with no encoding. The
captions are also stored in one file (stand03_captions.trec) which contains the
captions in a TREC-type encoding scheme which can be indexed by most
TREC-compliant parsers. For example the first caption is:
| <DOC>
|
| <DOCNO> - stand03_2096/stand03_10695.txt
</DOCNO> |
| <HEADLINE>Departed glories - Falls of Cruachan
Station above Loch Awe on the Oban
line.</HEADLINE> |
| <TEXT> |
| <RECORD_ID>HMBR-.000273
</RECORD_ID>(1)Falls of Cruachan Station.
(2)Sheltie dog by single track railway below embankment, with wooden
ticket office, and signals; gnarled trees lining banks. (3)ca.1990
(4)Hamish Macmillan Brown (5)Argyllshire, Scotland
(6)HMBR-273 pc/ADD: The photographer's pet Shetland collie dog,
'Storm'. |
| <CATEGORIES>[tigers],[Fife all
views],[gamekeepers],[identified male],[dress - national],[dogs]
</CATEGORIES> |
| <SMALL_IMG>stand03_2096/stand03_10695.jpg
</SMALL_IMG> |
| <LARGE_IMG>stand03_2096/stand03_10695_big.jpg
</LARGE_IMG> |
| </TEXT> |
|
</DOC> |
The majority of the "useful" text for retrieval is
contained between the <TEXT> tags. Each line represents a different
caption field (labelled 1 to 6 for reference only) in the following order:
short title, description, date of registration in the St Andrews collection,
the photographer, the location and additonal notes as provided by the archive
historian.
|
|