CLEF logo
The St Andrews historic photographic collection
Introduction


The St Andrews dataset consists of 28,133 photographs from St Andrews University Library photographic collection which holds one of the largest and most important collections of historic photography in Scotland. The collection numbers in excess of 300,000 images, 10% of which have been digitised and used for the ImageCLEF ad hoc retrieval task. Photos are primarily historic in nature from areas in and around Scotland; although pictures of other locations also exist.
 
  Record ID: JV-A.000460
  Short title: The Fountain, Alexandria.
  Long title: Alexandria. The Fountain.
  Location: Dunbartonshire, Scotland
  Description: Street junction with large ornate fountain with columns, surrounded by rails and lamp posts at corners; houses and shops.
  Date: Registered 17 July 1934
  Photographer: J Valentine & Co
  Categories: [ columns unclassified ][ street lamps - or-nate ][ electric street lighting ][ shepherds & shepherdesses ][ streetscapes ][ shops ]
  Notes: JV-A460 jf/mb
 

Fig. 1 Example image caption  

All images have an accompanying textual description consisting of 8 distinct fields (see, e.g. Fig.1). These fields can be used individually or collectively to facilitate image retrieval. The 28,133 captions consist of 44,085 terms and 1,348,474 word occurrences; the maximum caption length is 316 words, but on average 48 words in length. All captions are written in British English, although the language also contains colloquial expressions. Approximately 81% of captions contain text in all fields, the rest generally without the description field. In most cases the image description is a grammatical sentence of around 15 words. The majority of images (82%) are in black and white, although colour images are also present in the collection.

The type of information that people typically look for in this collection include the following:
  • Social history, e.g. old towns and villages, children at play and work.
  • Environmental concerns, e.g. lanscapes and wild plants.
  • History of photography, e.g. particular photographers.
  • Architecture, e.g. specific or general places or buildings.
  • Golf, e.g. individual golfers or tournaments.
  • Events, e.g. historic, war related.
  • Transport, e.g. general or specific roads, bridges etc.
  • Ships and shipping, e.g. particular vessels or fishermen.
More information about the St Andrews collection as used in ImageCLEF can be found here.

Directory structure of the St Andrews collection


Download the St Andews data and unzip and untar the archive file. This file contains the images and captions which are stored under directories in the format: stand03_[0-9]+, e.g. stand03_1171. The image filenames are in the form stand03_[0-9]+.jpg, e.g. stand03_15312.jpg and captions in the same format except eith suffix .txt, e.g. stand03_1171.txt.


The "docs" directory contains the following files:
  • stand03_bigimages.txt - a list of all large images including their pathnames.
  • stand03_thumbnails.txt - a list of all thumbnails and their pathnames.
  • stand03_captions.txt - a list of all image captions and their pathnames.
  • stand03_captions.trec - a single file with all captions in TREC-style format.
  • stand03_guide.pdf - an introductory document describing the collection.
All images have a corresponding caption.

Format of the captions


The captions are stored in the directories with the images as plain text files with no encoding. The captions are also stored in one file (stand03_captions.trec) which contains the captions in a TREC-type encoding scheme which can be indexed by most TREC-compliant parsers. For example the first caption is:
<DOC>
<DOCNO> - stand03_2096/stand03_10695.txt </DOCNO>
<HEADLINE>Departed glories - Falls of Cruachan Station above Loch Awe on the Oban line.</HEADLINE>
<TEXT>
<RECORD_ID>HMBR-.000273 </RECORD_ID>(1)Falls of Cruachan Station. (2)Sheltie dog by single track railway below embankment, with wooden ticket office, and signals; gnarled trees lining banks. (3)ca.1990 (4)Hamish Macmillan Brown (5)Argyllshire, Scotland (6)HMBR-273 pc/ADD: The photographer's pet Shetland collie dog, 'Storm'.
<CATEGORIES>[tigers],[Fife all views],[gamekeepers],[identified male],[dress - national],[dogs] </CATEGORIES>
<SMALL_IMG>stand03_2096/stand03_10695.jpg </SMALL_IMG>
<LARGE_IMG>stand03_2096/stand03_10695_big.jpg </LARGE_IMG>
</TEXT>
</DOC>

The majority of the "useful" text for retrieval is contained between the <TEXT> tags. Each line represents a different caption field (labelled 1 to 6 for reference only) in the following order: short title, description, date of registration in the St Andrews collection, the photographer, the location and additonal notes as provided by the archive historian.




Last Modified: January 2004 By: Paul Clough