Howto

The keyphrase extraction comes in various deployments:

  • as Java API
  • as DBus Implementation
  • as a number of GATE plugins, which can be used directly in the GATE gui/platform

The following documents how to use the Java API and DBus implementation.
For general usage of the keyphrase extraction as GATE plugin please refer to the dedicated page.

Java API

This program constitutes the functionality of a keyword/keyphrase extraction algorithm for the English language, and can be embedded into Java applications.
It is able to extract keywords/keyphrases from textual sources (given as string) and document formats {pdf,doc,txt} , given as URL to the respective document.
It provides a Map containing the extraction result as entries, ordered by significance.

Requirements

  • java-5

Usage

  1. download the keyphrase-extraction java distribution and unzip the archive, which will yield a binary/jar and a folder called "gate", containing the necessary GATE-plugins.
  2. Before you do anything, you have to set the system-property "gate.home", referring to the mentioned GATE-folder. You can do so in several ways:
    1. specifying it at runtime:

      $> java -Dgate.home=path/to/gate/dir -jar younameyour.jar
    2. specifying at compile time: in your code, *before* you use any of the
      keyword-extraction stuff, insert something like into your code.

      System.setProperty("gate.home","path/to/gate/dir");
  3. The functionality is exposed as an object, on which the method two different methods may be called, depending on the input type.
    Therefore, you have to instantiate a KeywordExtraction object, very much like in the following snippet:

    KeywordExtraction kex = new KeywordExtractionImpl();

    Now, you can call methods on the instantiated object:
    1. From a string representing a text:
      "getKeywordsFromText(String documentText)" may be called.

      KeywordExtraction kex = new KeywordExtractionImpl();
      Map kwMap = kex.getKeywordsFromText("This sentence is not sufficiently long to produce meaningful keyphrases, if at all.");
    2. From a string representing a document URL:
      "getKeywordsFromText(String documentUrl)" may be called.

      KeywordExtraction kex = new KeywordExtractionImpl();
      Map kwMap = kex.getKeywordsFromUrl(documentUrl);

DBus Invocation

This program constitutes the functionality of a keyword/keyphrase extraction algorithm for the english language, running as a service on the DBus.
It is able to extract keywords/keyphrases from textual sources, and provides a Map containing the extraction result as entries, ordered by significance.

Requirements

  • KDE-4
  • java-6-sun

Usage

The functionality is exposed as an object on the DBus, and may be called by a connecting client.

  1. First download the keyphrase-extraction DBus distribution and extract the zip-archive.
  2. Ensure the dbus demon is running on your system, start it in case it is not running.
  3. To expose the functionality, the script "start-service.sh" has to be called from the command line:

    $> sh start-service.sh
  4. you are now able to access the object on the DBus with a client, and call two methods:
    • extractKeywordsFromText(String documentText)
    • extractKeywordsFromUrl(String documentUrl)

If called from some application (or directly), it is mandatory to call the jar with specified JVM option java.library.path in order to indicate the path to the location of the native unix-java library "libunix-java.so":

$> java -Djava.library.path=lib/ -Dgate.home=gate -jar keyword-extractor-dbus.jar

Calling from a client

The exposed service may be called by two different methods, as specified in the respective interface
ie.deri.smile.nlp.KeywordExtraction.java

  1. From a string representing a text:
    "getKeywordsFromText(String documentText)" may be called.

    KeywordExtraction kex = new KeywordExtractionImpl();
    Map kwMap = kex.getKeywordsFromText("My very short sample sentence that will not generate any keyphrases because it is too short.");
  2. From a string representing a document URL:
    "getKeywordsFromText(String documentUrl)" may be called.

    KeywordExtraction kex = new KeywordExtractionImpl();
    Map kwMap = kex.getKeywordsFromUrl("http://newsvote.bbc.co.uk/mpapps/pagetools/print/news.bbc.co.uk/sport2/hi/football/teams/n/newcastle_united/7636504.stm");

Feel invited to take a look at a sample client implementation:
ie.deri.smile.nlp.KeywordExtractionDBusClient.java

Web Service

please see the info page dedicated to the web service