sClippy

Introduction

sClippy is an application that acts as a local information hub for different scientific publication repositories. Its goals is to support early stage researchers in their initial efforts of familiarization with a particular field. sClippy lifts shallow and deep metadata captured in scientific publications and making it accessible for export and embed, as well as using it for achieving information expansion. The metadata extraction process is split between the extraction of :

  • shallow metadata - title, authors, abstract, references and the linear structure of the discourse, and
  • deep metadata - generic knowledge items as claims, positions, arguments part of the rhetorical structure of the discourse

The shallow metadata extraction is developed based on a low-level document engineering approach, by combining mining and analysis of the publications' text based on its formatting style and font information. On the other hand, the deep metadata extraction follows a combined impirical and linguistic approach.

sClippy currently does the following: Perform automatic extraction of shallow metadata (listed above) from publications encoded as PDF documents and formatted with the LNCS or ACM styles. Perform automatic extraction of knowledge items from the publications' content -- the main idea of exposing the deep metadata is to give the user the chance of having a quick glance over the main contributions the paper provides, without the need of her reading the entire paper. Export the extracted metadata in RDF (in a particular format) Embed the extracted metadata into the original document Perform information expansion based on the publication's title and authors, by using a particular publication repository. Provide the means for exploring the co-author space, starting from a selected author and using the same repository.

sClippy is implemented as an Eclipse application, thus making it highly extensible. Basically sClippy provides the framework that connects three types of plugins:

  • Extractor plugins - they are connected to a particular document encoding and perform the actual metadata extraction. In addition, they can also support embedding the extracted metadata into the original document. The current implementation features a PDF extractor plugin.
  • Exporter plugins - they take the extracted metadata and export it as a separate file, in a particular format. Currently, sClippy contains a SALT RDF export plugin.
  • Expander plugins - they have the role of performing information expansion using a publication repository. We have currently implemented a DBLP expander.

Download: http://sclippy.semanticauthoring.org/download.html