Research Projects
| Name |
Role |
Status |
Date |
Funding |
| IFF@TNA |
Principal Investigator |
Active |
2010 - 2010 |
UK National Archives |
| TrebleCLEF |
Principal Investigator |
Completed |
2007 - 2009 |
EU CA (IST-FP7) |
| Memoir |
Principal Investigator |
Completed |
2006 - 2008 |
EU (Marie Curie) |
| MultiMatch |
Principal Investigator |
Active |
2006 - 2008 |
EU (IST-FP6) |
| SPIRIT |
Research Assistant |
Completed |
2002 - 2005 |
EU (IST-FP5) |
| Eurovision |
Research Assistant |
Completed |
2001 - 2003 |
EPSRC |
| METER |
Research Assistant |
Completed |
1999 - 2002 |
EPSRC |
I also have an EPSRC CASE-funded PhD studentship with
the
Ordnance Survey
(the national mapping agency for the UK) looking at automatically generating
imprecise regions using information gathered from online sources (2006 -
2009).
IFF@TNA The IFF@TNA project (Improving Information Finding at The National Archives) aims at improving access to data managed by TNA. The project involves analysing TNA's main web server logs to establish the range of subjects being searched by online visitors to their archives. Additionally the project aims to analyse separate server logs of the UK Government Web Archive to establish the range of subjects of interest to online visitors and to determine any common patterns of user behaviour. A final aim is to create a methodology will be created for TNA that will allow them to evaluate their existing and future search products and services.
Website: Not yet
Demos: N/A
Key publication: N/A
TrebleCLEF TrebleCLEF is an EU-funded
Coordination Action (CA) designed to bring together investigators working in
the field of evaluation for multilingual information access to consolidate and
promote best practice. The project seeks to build upon and extend the results
already achieved by the existing
Cross-Language Evaluation Forum (CLEF)
and continue the development and dissemination of resources for evaluation of
multilingual information system. The specific target for this project is the
European digital library community. The project is due to begin 2008.
Website:
http://www.trebleclef.eu/index.php
Demos: N/A
Key publication: N/A
Memoir
The
Memoir
project is investigating the technology, ethics and psychology of storing
and accessing a life-time of personal information. The project aims to carry
out research into new techniques to organise, store and retrieve personal
information that focus on user-centric concepts and methods investigating how
technology can help people create and manage long-term personal memories.
Memoir is funded by the EU under a Marie Curie Fellowship for the Transfer
of Knowledge (ToK) Development Host Scheme and runs until 2008. My personal
interests in the memoir project are related to personal multimedia management,
particularly photos (e.g. how do we collect multimedia data, what do we collect
and why, what is the role of audiovisual material within personal social
structures such as the family?) A presentation outlining my initial work can be
found
here.
Website:
http://dagda.shef.ac.uk/memoir/index.html
Demos: N/A
Key publication: Steve Whittaker:
Why do we want memories for life? Memories For Life
Workshop panel. The British Library, London. December 11th, 2006.
MultiMatch The MultiMatch
(Multilingual/Multimedia Access to Cultural Heritage) project aims to enable
users to explore and interact with online accessible cultural heritage content,
across media types and language boundaries. Users will be able to search across
languages (having queries automatically translated), search for webpages,
audio, video, and images simultaneously, and explore connections and
relationships between creators, creations, time, and place. MultiMatch is
funded by the EU (IST-FP6) and runs until 2008. I am leading a workpackage on
the design of the user interface. A short presentation describing the project
can be found here.
Website:
http://www.multimatch.org
Demos:
first prototype system [Flash demo]
Key publication: Carol Peters,
MultiMatch
Multilingual/Multimedia Access to Cultural Heritage, paper presented
at the 2nd Italian Research Conference on Digital Library Management Systems
2007.
SPIRIT The
SPIRIT (Spatially Aware Access to Information
on the Internet) project was engaged in the design and implementation of a
search engine to find documents and datasets on the web relating to places or
regions referred to in a query. The project created software tools and
techniques that can be used to produce search engines and websites that display
intelligence in the recognition of geographical terminology. In order to
demonstrate and evaluate the project outcomes, a prototype spatially-aware
search engine has been built and is serving as the platform for testing and
evaluation of new techniques in geographical information retrieval.
Website:
http://www.geo-spirit.org/
Demos:
prototype system
[Flash demos] Key
publication: Purves, R.S., Clough, P., Jones, C.B., Arampatzis,
A., Bucher, B., Finch, D., Fu, G., Joho, H., Khirini, A.S., Vaid, S., and Yang,
B. (2007),
The
Design and Implementation of SPIRIT: a Spatially-Aware Search Engine for
Information Retrieval on the Internet,
International Journal Geographic
Information Systems (IJGIS), Volume 21(7), January 2007, pp. 717 - 745.
Eurovision The
Eurovision
project explored the cross-language retrieval of images via their captions.
The aim of the project was to build and test an image Cross-Language
Information Retrieval (CLIR) system, where users could search for images via
their captions in languages they have no knowledge of. In a picture archive,
images are described by their captions and users want to retrieve from the
collection regardless of the language they speak. For any vendor of an image
library, use of CLIR offers the opportunity of broadly expanding the range of
potential searchers of their library.
Website:
http://ir.shef.ac.uk/eurovision/
Demos:
prototype system [Flash demo]
Key publication: Clough, P. and
Sanderson, M. (2006)
User
Experiments with the Eurovision Cross-Language Image Retrieval System,
In Journal of the American Society for Information Science and Technology
(JASIST)
Special
Topic Section on Multilingual Information Systems, Volume 57(5), pp. 697 -
708.
METER The
Measuring Text
Reuse (METER) project, funded by the
EPSRC (the Engineering and Physical Sciences
Research Council) and sponsored by the
PA (the British Press
Association), aimed to investigate the issue of automatically detecting and
measuring text reuse, focusing on the domain of journalism. In this project,
various NLP/LE techniques were investigated including n-gram approaches, a
visual dot-metric approach (the dotplot), various methods of string matching,
sentence alignment techniques and machine learning classifiers. We envisaged
that, in order to efficiently deal with the METER issue, various approaches
would need to be incorporated to form a system. If obtainable, such a system
would be useful in various areas such as text reuse/plagiarism detection,
information extraction/retrieval, multi-document summarisation etc.
Website:
http://www.dcs.shef.ac.uk/nlp/meter/
Demos:
tools
[website] Key
publication: Clough, P., Gaizauskas, R., Piao, S.L. and
Wilks, Y. (2002),
METER:
MEasuring TExt Reuse.
In proceedings of the 40th Anniversary Meeting for
the Association for Computational Linguistics (ACL-02), pp.152-159, 7-12
July, University of Pennsylvania, Philadelphia, USA.
Contact Details:
Information School University of Sheffield Room 226, Regent Court,
211 Portobello Street, Sheffield, S1 4DP UK. |
|
Tel : +44 (0)
114 2222664 Fax : +44 (0) 114 2780300
mailto:
p.d.clough@sheffield.ac.uk
|