Research Projects

Name Role Status Date Funding
Search25 Principal Investigator Active 2012 - 2012 JISC
PATHS Principal Investigator Active 2011 - 2013 EU (IST-FP7)
User-Centered Design of a Recommender System for a 'Universal' Library Catalogue Principal Investigator Active 2011 - 2013 AHRC CDA / OCLC Inc.
IFF@TNA Principal Investigator Completed 2010 - 2010 UK National Archives
TrebleCLEF Principal Investigator Completed 2007 - 2009 EU CA (IST-FP7)
Memoir Principal Investigator Completed 2006 - 2008 EU (Marie Curie)
MultiMatch Principal Investigator Completed 2006 - 2008 EU (IST-FP6)
SPIRIT Research Assistant Completed 2002 - 2005 EU (IST-FP5)
Eurovision Research Assistant Completed 2001 - 2003 EPSRC
METER Research Assistant Completed 1999 - 2002 EPSRC

I also have an EPSRC CASE-funded PhD studentship with the Ordnance Survey (the national mapping agency for the UK) looking at automatically generating imprecise regions using information gathered from online sources (2006 - 2009).


We are acting as consultants in the Search25 JISC-funded project to evaluate the effectiveness of the existing InforM25 service and the new system under development: Search25. Read about our contributions to the project on the Search25 project blog.


The PATHS project aims to support information exploration and discovery through digital cultural heritage collections. This project aims to implement various user models to provide a mechanism for users to create and share pathways through information spaces for learning and knowledge discovery. Personalised access to digital cultural heritage resources will be provided by adapting suggested routes to the requirements of individual users and groups, such as students/teachers, professional archivists and historians and scholars. A prototype system will provide search assistance and capture user- and expert-generated paths through digital information spaces to provide personalised and contextualised information access. I am Scientific Director of this project and leader of the Work Package on user interfaces.


Key publication: Clough, P., Ford, N. and Stevenson, M. (2011). Personalizing Access to Cultural Heritage Collections using Pathways. In Proceedings of PATCH 2011.

User-Centered Design of a Recommender System for a 'Universal' Library Catalogue

The goal of this project is to increase our understanding of the applicability of the recommender concept to the domain of the library catalogue and better understand the criteria, requirements, preferences and reactions of library catalogue users. Rather than focusing on a single institution we will experiment with making recommendations for WorldCat, the worlds’ largest and most comprehensive bibliographic database and 'universal' library catalogue. This resource is managed by OCLC Inc. and provides a unique source of evidence upon which to base recommendations. This is funded by the AHRC under the Collaborative Doctoral Award (CDA) programme.

Key publication: Wakeling, S., Clough, P., Sen, B. and Connaway, L. (2012) "Reader's who borrowed this also borrowed ...": Recommender Systems in UK Libraries, Library Hi-Tech, Volume 30(1), pp. 134-150.


The IFF@TNA project (Improving Information Finding at The National Archives) aimed at improving access to data managed by TNA. The project involved analysing TNA's main web server logs to establish the range of subjects being searched by online visitors to their archives. Additionally the project analysed separate server logs of the UK Government Web Archive to establish the range of subjects of interest to online visitors and to determine any common patterns of user behaviour. An evaluation methodology was also developed for TNA based on crowdsourcing that allows them to evaluate their existing and future search products and services.


TrebleCLEF is an EU-funded Coordination Action (CA) designed to bring together investigators working in the field of evaluation for multilingual information access to consolidate and promote best practice. The project seeks to build upon and extend the results already achieved by the existing Cross-Language Evaluation Forum (CLEF) and continue the development and dissemination of resources for evaluation of multilingual information system. The specific target for this project is the European digital library community. The project is due to begin 2008.


Demos: N/A

Key publication: N/A


The Memoir project is investigating the technology, ethics and psychology of storing and accessing a life-time of personal information. The project aims to carry out research into new techniques to organise, store and retrieve personal information that focus on user-centric concepts and methods investigating how technology can help people create and manage long-term personal memories. Memoir is funded by the EU under a Marie Curie Fellowship for the Transfer of Knowledge (ToK) Development Host Scheme and runs until 2008. My personal interests in the memoir project are related to personal multimedia management, particularly photos (e.g. how do we collect multimedia data, what do we collect and why, what is the role of audiovisual material within personal social structures such as the family?) A presentation outlining my initial work can be found here.


Demos: N/A

Key publication: Steve Whittaker: Why do we want memories for life? Memories For Life Workshop panel. The British Library, London. December 11th, 2006.


The MultiMatch (Multilingual/Multimedia Access to Cultural Heritage) project aims to enable users to explore and interact with online accessible cultural heritage content, across media types and language boundaries. Users will be able to search across languages (having queries automatically translated), search for webpages, audio, video, and images simultaneously, and explore connections and relationships between creators, creations, time, and place. MultiMatch is funded by the EU (IST-FP6) and runs until 2008. I am leading a workpackage on the design of the user interface. A short presentation describing the project can be found here.


Demos: first prototype system [Flash demo]

Key publication: Carol Peters, MultiMatch – Multilingual/Multimedia Access to Cultural Heritage, paper presented at the 2nd Italian Research Conference on Digital Library Management Systems 2007.


The SPIRIT (Spatially Aware Access to Information on the Internet) project was engaged in the design and implementation of a search engine to find documents and datasets on the web relating to places or regions referred to in a query. The project created software tools and techniques that can be used to produce search engines and websites that display intelligence in the recognition of geographical terminology. In order to demonstrate and evaluate the project outcomes, a prototype spatially-aware search engine has been built and is serving as the platform for testing and evaluation of new techniques in geographical information retrieval.


Demosprototype system [Flash demos]

Key publication:  Purves, R.S., Clough, P., Jones, C.B., Arampatzis, A., Bucher, B., Finch, D., Fu, G., Joho, H., Khirini, A.S., Vaid, S., and Yang, B. (2007), The Design and Implementation of SPIRIT: a Spatially-Aware Search Engine for Information Retrieval on the Internet, International Journal Geographic Information Systems (IJGIS), Volume 21(7), January 2007, pp. 717 - 745.


The Eurovision project explored the cross-language retrieval of images via their captions. The aim of the project was to build and test an image Cross-Language Information Retrieval (CLIR) system, where users could search for images via their captions in languages they have no knowledge of. In a picture archive, images are described by their captions and users want to retrieve from the collection regardless of the language they speak. For any vendor of an image library, use of CLIR offers the opportunity of broadly expanding the range of potential searchers of their library.


Demosprototype system [Flash demo]

Key publication:  Clough, P. and Sanderson, M. (2006) User Experiments with the Eurovision Cross-Language Image Retrieval System, In Journal of the American Society for Information Science and Technology (JASIST) Special Topic Section on Multilingual Information Systems, Volume 57(5), pp. 697 - 708.


The Measuring Text Reuse (METER) project, funded by the EPSRC (the Engineering and Physical Sciences Research Council) and sponsored by the PA (the British Press Association), aimed to investigate the issue of automatically detecting and measuring text reuse, focusing on the domain of journalism. In this project, various NLP/LE techniques were investigated including n-gram approaches, a visual dot-metric approach (the dotplot), various methods of string matching, sentence alignment techniques and machine learning classifiers. We envisaged that, in order to efficiently deal with the METER issue, various approaches would need to be incorporated to form a system. If obtainable, such a system would be useful in various areas such as text reuse/plagiarism detection, information extraction/retrieval, multi-document summarisation etc.


Demos: tools [website]

Key publication:  Clough, P., Gaizauskas, R., Piao, S.L. and Wilks, Y. (2002), METER: MEasuring TExt Reuse. In proceedings of the 40th Anniversary Meeting for the Association for Computational Linguistics (ACL-02), pp.152-159, 7-12 July, University of Pennsylvania, Philadelphia, USA.

Contact Details:

Information School
University of Sheffield
Room 226,
Regent Court,
211 Portobello Street,
Sheffield, S1 4DP UK.
  Tel : +44 (0) 114 2222664
Fax : +44 (0) 114 2780300