-
Notifications
You must be signed in to change notification settings - Fork 5
Tutorial
The standard way to use CoreNLP is to set up a pipeline of annotators with the functionality that you require. In corenlp-clj.core we find pipeline
, which is a higher-order function that returns an annotating function. This returned function can annotate text using the specified pipeline setup.
Please note that prerequisites
simply creates a string of annotator dependencies based on the annotators specified.
;; options for setting up a new pipeline
(def opts {"annotators" (prerequisites ["depparse" "lemma" "ner"])}))
;; initialising the pipeline, creating a function for annotating
(def nlp (pipeline opts))
The annotating function (here named nlp
) then forms the first step in a chain of functions. Following nlp
are a series of calls to annotation
or the several convenience functions found in corenlp-clj.annotations. These functions extract information from the hierarchy of annotations created by the pipeline and lend themselves well to Clojure's threading macro.
(->> "This is an example sentence. That is another."
nlp
sentences
tokens
pos)
This example returns the Part-Of-Speech (POS) tags for every word of both sentences: (("DT" "VBZ" "DT" "NN" "NN" ".") ("DT" "VBZ" "DT" "."))
. The POS tags are further delimited by separate seqs, one for each sentence. Omitting sentences
from the chain would return a single seq containing all of the POS tags. Removing pos
at the end returns the word tokens (represented as CoreLabels) and also divided into separate sentences. Removing both results in a single seq of word tokens.