Research Projects



Name Role Status Date Funding
IFF@TNA Principal Investigator Active 2010 - 2010 UK National Archives
TrebleCLEF Principal Investigator Completed 2007 - 2009 EU CA (IST-FP7)
Memoir Principal Investigator Completed 2006 - 2008 EU (Marie Curie)
MultiMatch Principal Investigator Active 2006 - 2008 EU (IST-FP6)
SPIRIT Research Assistant Completed 2002 - 2005 EU (IST-FP5)
Eurovision Research Assistant Completed 2001 - 2003 EPSRC
METER Research Assistant Completed 1999 - 2002 EPSRC

I also have an EPSRC CASE-funded PhD studentship with the Ordnance Survey (the national mapping agency for the UK) looking at automatically generating imprecise regions using information gathered from online sources (2006 - 2009).


IFF@TNA

The IFF@TNA project (Improving Information Finding at The National Archives) aims at improving access to data managed by TNA. The project involves analysing TNA's main web server logs to establish the range of subjects being searched by online visitors to their archives. Additionally the project aims to analyse separate server logs of the UK Government Web Archive to establish the range of subjects of interest to online visitors and to determine any common patterns of user behaviour. A final aim is to create a methodology will be created for TNA that will allow them to evaluate their existing and future search products and services.

Website: Not yet

Demos: N/A

Key publication: N/A


TrebleCLEF

TrebleCLEF is an EU-funded Coordination Action (CA) designed to bring together investigators working in the field of evaluation for multilingual information access to consolidate and promote best practice. The project seeks to build upon and extend the results already achieved by the existing Cross-Language Evaluation Forum (CLEF) and continue the development and dissemination of resources for evaluation of multilingual information system. The specific target for this project is the European digital library community. The project is due to begin 2008.

Website: http://www.trebleclef.eu/index.php

Demos: N/A

Key publication: N/A


Memoir

The Memoir project is investigating the technology, ethics and psychology of storing and accessing a life-time of personal information. The project aims to carry out research into new techniques to organise, store and retrieve personal information that focus on user-centric concepts and methods investigating how technology can help people create and manage long-term personal memories. Memoir is funded by the EU under a Marie Curie Fellowship for the Transfer of Knowledge (ToK) Development Host Scheme and runs until 2008. My personal interests in the memoir project are related to personal multimedia management, particularly photos (e.g. how do we collect multimedia data, what do we collect and why, what is the role of audiovisual material within personal social structures such as the family?) A presentation outlining my initial work can be found here.

Website: http://dagda.shef.ac.uk/memoir/index.html

Demos: N/A

Key publication: Steve Whittaker: Why do we want memories for life? Memories For Life Workshop panel. The British Library, London. December 11th, 2006.



MultiMatch

The MultiMatch (Multilingual/Multimedia Access to Cultural Heritage) project aims to enable users to explore and interact with online accessible cultural heritage content, across media types and language boundaries. Users will be able to search across languages (having queries automatically translated), search for webpages, audio, video, and images simultaneously, and explore connections and relationships between creators, creations, time, and place. MultiMatch is funded by the EU (IST-FP6) and runs until 2008. I am leading a workpackage on the design of the user interface. A short presentation describing the project can be found here.

Website: http://www.multimatch.org

Demos: first prototype system [Flash demo]

Key publication: Carol Peters, MultiMatch – Multilingual/Multimedia Access to Cultural Heritage, paper presented at the 2nd Italian Research Conference on Digital Library Management Systems 2007.



SPIRIT

The SPIRIT (Spatially Aware Access to Information on the Internet) project was engaged in the design and implementation of a search engine to find documents and datasets on the web relating to places or regions referred to in a query. The project created software tools and techniques that can be used to produce search engines and websites that display intelligence in the recognition of geographical terminology. In order to demonstrate and evaluate the project outcomes, a prototype spatially-aware search engine has been built and is serving as the platform for testing and evaluation of new techniques in geographical information retrieval.

Website: http://www.geo-spirit.org/

Demosprototype system [Flash demos]

Key publication:  Purves, R.S., Clough, P., Jones, C.B., Arampatzis, A., Bucher, B., Finch, D., Fu, G., Joho, H., Khirini, A.S., Vaid, S., and Yang, B. (2007), The Design and Implementation of SPIRIT: a Spatially-Aware Search Engine for Information Retrieval on the Internet, International Journal Geographic Information Systems (IJGIS), Volume 21(7), January 2007, pp. 717 - 745.



Eurovision

The Eurovision project explored the cross-language retrieval of images via their captions. The aim of the project was to build and test an image Cross-Language Information Retrieval (CLIR) system, where users could search for images via their captions in languages they have no knowledge of. In a picture archive, images are described by their captions and users want to retrieve from the collection regardless of the language they speak. For any vendor of an image library, use of CLIR offers the opportunity of broadly expanding the range of potential searchers of their library.

Website: http://ir.shef.ac.uk/eurovision/

Demosprototype system [Flash demo]

Key publication:  Clough, P. and Sanderson, M. (2006) User Experiments with the Eurovision Cross-Language Image Retrieval System, In Journal of the American Society for Information Science and Technology (JASIST) Special Topic Section on Multilingual Information Systems, Volume 57(5), pp. 697 - 708.



METER

The Measuring Text Reuse (METER) project, funded by the EPSRC (the Engineering and Physical Sciences Research Council) and sponsored by the PA (the British Press Association), aimed to investigate the issue of automatically detecting and measuring text reuse, focusing on the domain of journalism. In this project, various NLP/LE techniques were investigated including n-gram approaches, a visual dot-metric approach (the dotplot), various methods of string matching, sentence alignment techniques and machine learning classifiers. We envisaged that, in order to efficiently deal with the METER issue, various approaches would need to be incorporated to form a system. If obtainable, such a system would be useful in various areas such as text reuse/plagiarism detection, information extraction/retrieval, multi-document summarisation etc.

Website: http://www.dcs.shef.ac.uk/nlp/meter/

Demos: tools [website]

Key publication:  Clough, P., Gaizauskas, R., Piao, S.L. and Wilks, Y. (2002), METER: MEasuring TExt Reuse. In proceedings of the 40th Anniversary Meeting for the Association for Computational Linguistics (ACL-02), pp.152-159, 7-12 July, University of Pennsylvania, Philadelphia, USA.

Contact Details:

Information School
University of Sheffield
Room 226,
Regent Court,
211 Portobello Street,
Sheffield, S1 4DP UK.
  Tel : +44 (0) 114 2222664
Fax : +44 (0) 114 2780300
mailto: p.d.clough@sheffield.ac.uk