The cl-nlp from vseloved

Installation issue

During installation of cl-nlp, the process fails at this step...have cut out previous status messages, retaining immediate section.

; Loading "cl-nlp"
[package nlp.util]................................
[package nlp.core]................................
[package nlp.corpora].............................
[package nlp.tagging].............................
[package nlp.parsing].............................
[package nlp.generation]..........................
[package nlp-user]......
debugger invoked on a SB-INT:SIMPLE-FILE-ERROR:
failed to find the TRUENAME of /Users/maheshcr/tools/cl-nlp/src/core/general.lisp:
No such file or directory

Could you please help?

Add developer documentation, initial version

Cover the following in the initial developer documentation:

Introduction to CL-NLP, refer to README.md for introductory articles.
Set up.
Travis CI integration and running tests, with current state of tests.
Contributing to instructions.
Overall design/architecture
Tokenizers with design details, source/test code pointers, quick REPL run details

Add documentation for Tokenization

Add a document for tokenization under NLT Tasks in docs.

Doesn’t build on LispWorks 7

When I try to load cl-nlp on LispWorks 7, I get the following:

CL-USER 1 > (ql:quickload :cl-nlp)
To load "cl-nlp":
  Load 1 ASDF system:
    cl-nlp
; Loading "cl-nlp"
;;; Checking for wide character support... yes, using code points.
;;; Checking for wide character support... yes, using code points.
;;; Building Closure with CHARACTER RUNES
..

**++++ Error in NLP.UTIL:UNIQ: 
  The variable #:|table1111365| is unbound.
; *** 1 error detected, no fasl file produced.

What could be wrong?

Lispworks issues with chars.lisp

I've found a couple of compatibility issues with the chars.lisp file in src/utils/ and LispWorks

the first was in the +WHITE-CHARS+ param, LispWorks uses #\NO-BREAK-SPACE so I did:

(defparameter +white-chars+
  '(#\Space #\Tab #\Newline #\Return #\Linefeed
    ;; lispworks uses #\no-break-space
    #+(and lispworks unicode) #\no-break-space
    #+(or (and sbcl sb-unicode) (and allegro ics) (and clisp i18n)
    (and openmcl openmcl-unicode-strings))
    #\no-break_space
    )
  "Chars considered WHITESPACE.")

I expect there may be a better way to do this that fits with your project coding standards but I leave that integration to you. Example used for fix: link to CLSQL project

Once I put that in the compile got further into the file and I found a character encoding issue. Some of the quotation characters are multi-byte characters that LispWorks can't read properly. Emacs appears to have no problem displaying them, but when opened in Lispworks it doesn't display them properly and the compiler can't read the characters. LispWorks uses UTF-16 internally, and if there are char-codes for the characters you are using that are the same across UTF-8/16 that might work. There may also be a more elegant solution but I don't know enough about how LispWorks is treating the characters to figure anything else out.

I may just switch over to sbcl to test out this project. What lisp implementation are you developing in?

--Eric

Add CI support

Initially start off with Travis CI using SBCL 64 bit only.
Later on expand to more lisps.
Finally also use test-cl-grid.

dict-lemmatizer fails to build

Error below Kindly resolve most urgently

;;; Error:
;;; in file dict-lemmatizer.lisp, position 709
;;; at (DEFMETHOD LEMMATIZE ...)
;;; * The macro form (DEFMETHOD LEMMATIZE ((LEMMATIZER MEM-DICT) WORD &OPTIONAL POS) (UNLESS (LOOKUP (SMART-SLOT-VALUE LEMMATIZER 'WORDS) WORD) (RETURN-FROM LEMMATIZE WORD)) (LET ((POSS (POS-TAGS LEMMATIZER WORD) :TEST 'EQUALP)) (IF-IT (OR (MEMBER POS POSS) (UNLESS POS (MEMBER-IF (RUTILS.READTABLE::TRIVIAL-POSITIONAL-LAMBDA (= 2 (LENGTH (PRINC-TO-STRING (? % 0))))) POSS))) (VALUES WORD IT) (WITH ((WORD-POS PRESENT? (IF POS (COND-IT ((? (SMART-SLOT-VALUE LEMMATIZER 'FORMS) (WORD/POS WORD POS)) (VALUES IT T)) ((? (SMART-SLOT-VALUE LEMMATIZER 'FORMS) (WORD/POS WORD (FIRST (MKLIST POS)))) (VALUES (ARGMAX 'IDENTITY IT :KEY (RUTILS.READTABLE::TRIVIAL-POSITIONAL-LAMBDA (PRECEDENCE LEMMATIZER (? % 0 0)))) T))) (|GET#| WORD (SMART-SLOT-VALUE LEMMATIZER 'FORMS))))) (:= WORD-POS (REMOVE-DUPLICATES WORD-POS :TEST 'EQUALP)) (IF PRESENT? (VALUES (? WORD-POS 0 0) (? WORD-POS 0 1) (REST WORD-POS)) (VALUES NIL NIL (MAPCAR (RUTILS.READTABLE::TRIVIAL-POSITIONAL-LAMBDA (PAIR WORD %)) POSS))))))) was not expanded successfully.
;;; Error detected:
;;; In form
;;; (LET ((POSS (POS-TAGS LEMMATIZER WORD) :TEST 'EQUALP)) (IF-IT (OR (MEMBER POS POSS) (UNLESS POS (MEMBER-IF (RUTILS.READTABLE::TRIVIAL-POSITIONAL-LAMBDA (= 2 (LENGTH (PRINC-TO-STRING (? % 0))))) POSS))) (VALUES WORD IT) (WITH ((WORD-POS PRESENT? (IF POS (COND-IT ((? (SMART-SLOT-VALUE LEMMATIZER 'FORMS) (WORD/POS WORD POS)) (VALUES IT T)) ((? (SMART-SLOT-VALUE LEMMATIZER 'FORMS) (WORD/POS WORD (FIRST (MKLIST POS)))) (VALUES (ARGMAX 'IDENTITY IT :KEY (RUTILS.READTABLE::TRIVIAL-POSITIONAL-LAMBDA (PRECEDENCE LEMMATIZER (? % 0 0)))) T))) (|GET#| WORD (SMART-SLOT-VALUE LEMMATIZER 'FORMS))))) (:= WORD-POS (REMOVE-DUPLICATES WORD-POS :TEST 'EQUALP)) (IF PRESENT? (VALUES (? WORD-POS 0 0) (? WORD-POS 0 1) (REST WORD-POS)) (VALUES NIL NIL (MAPCAR (RUTILS.READTABLE::TRIVIAL-POSITIONAL-LAMBDA (PAIR WORD %)) POSS))))))
;;; LET: Ill formed declaration.
Condition of type: COMPILE-FILE-ERROR
COMPILE-FILE-ERROR while compiling #<cl-source-file "cl-nlp" "src" "lexics" "dict-lemmatizer">

Available restarts:

(RETRY) Retry compiling #<cl-source-file "cl-nlp" "src" "lexics" "dict-lemmatizer">.
(ACCEPT) Continue, treating compiling #<cl-source-file "cl-nlp" "src" "lexics" "dict-lemmatizer"> as having been successful.
(RETRY) Retry ASDF operation.
(CLEAR-CONFIGURATION-AND-RETRY) Retry ASDF operation after resetting the configuration.
(ABORT) Give up on "cl-nlp"
(REGISTER-LOCAL-PROJECTS) Register local projects and try again.
(RESTART-TOPLEVEL) Go back to Top-Level REPL.

Add groups and back reference support in regex tokenizers

As cl-ppcre support groups and back references, add this to regex tokenizers.

Travis Disk space error

Travis Build 2 https://travis-ci.org/vseloved/cl-nlp/builds/40830896 shows a no disk space error.

Fix existing tests

Fix the tests do not pass.
Add to system test operation.

ngrams

As far as I understood, the functions for computing ngrams are not in the core.nlp package yet, right? I found them only in the nltk files:

http://lisp-univ-etc.blogspot.com.br/2013/02/nltk-13-computing-with-language-simple.html

Do you have already a function for computing ngrams in a given set of text files (one sentence per line)?

Compilation errors with latest Quicklisp

After updating my Quicklisp installation with:

(ql:update-dist "quicklisp")

I can longer load cl-nlp with

 (ql:quickload "cl-nlp")

Drakma has moved

Drakma has moved from weitz.de to GitHub: https://github.com/edicl/drakma

Please update README with the correct link

Training/ML backed tokenizers

Add tokenizer starting with the perceptron ML model.

v.1.0 checklist

For v.1.0

Post-1.0 experiments

Implement NNSE word embeddings
Implement AMR parsing (using stack-buffer-parser)
Implement Anchor topic model
Maybe, use AROW instead of AvgPerceptron (see cl-online-learning)
Implement classifier calibration

Condition handler for multiple sentences for word tokenizer?

As per the current word tokenizer contract, it handles only single sentence. Should we add a condition handler if multiple sentences are submitted for tokenization?

Following the instructions on writing a POS tagger results in error on text-tokens (CCL::UNDEFINED-FUNCTION-CALL).

On docs/user-guide/examples/eng-pos-tagger.md are given some instructions that fail:

The following code:

NLP> (let ((words-dist #h(equal))
       (map-corpus :ptb-tagged (corpus-file "ptb/TAGGED/POS/WSJ")
                   #`(dolist (sent (text-tokens %))
                       (dolist (tok sent)
                         (unless (in# (token-word tok) words-dist)
                           (:= (get# (token-word tok) words-dist) #h()))
                         (:+ (get# (token-tag tok)
                                   (get# (token-word tok) words-dist)
                                   0))))
                   :ext "POS")
       words-dist)
#<HASH-TABLE :TEST EQUAL :COUNT 51457 {10467E6543}>
NLP> (reduce #'+ (mapcan #'ht-vals (ht-vals *)))
1289201

... apears to be two separate forms: the let and the reduce form.

The let form appears to be unbalanced. It lacks one parenthesis.
If we add a closing parenthesis on (words-dist #h(equal))), there appears two errors:

NLP> (let ((words-dist #h(equal)))
       (map-corpus :ptb-tagged (corpus-file "ptb/TAGGED/POS/WSJ")
                   #`(dolist (sent (text-tokens %))
                       (dolist (tok sent)
                         (unless (in# (token-word tok) words-dist)
                           (:= (get# (token-word tok) words-dist) #h()))
                         (:+ (get# (token-tag tok)
                                   (get# (token-word tok) words-dist)
                                   0))))
                   :ext "POS")
       words-dist)

The first error is that there is no file WSJ under corpora/ptb/TAGGED/POS/

But if we change it to an existing corpora under corpora/, as "onf-wsj":

NLP> (let ((words-dist #h(equal)))
       (map-corpus :ptb-tagged (corpus-file "onf-wsj")
                   #`(dolist (sent (text-tokens %))
                       (dolist (tok sent)
                         (unless (in# (token-word tok) words-dist)
                           (:= (get# (token-word tok) words-dist) #h()))
                         (:+ (get# (token-tag tok)
                                   (get# (token-word tok) words-dist)
                                   0))))
                   :ext "POS")
       words-dist)

Then CCL:UNDEFINED-FUNCTION-CALL is spawned. There is no such function.

Any clues?

I'm using Clozure Common Lisp 1.10. Under SBCL, it made a thread-error by just running the first let. Using Windows 8 64-bit.

can't load using quicklisp

I had the following problem after clone the repo and tried to load it using quicklisp:

failed to find the TRUENAME of /Users/arademaker/quicklisp/local-projects/cl-nlp/src/corpora/util.lisp:
No such file or directory
[Condition of type SB-INT:SIMPLE-FILE-ERROR]

In fact, the util.lisp is not in the corpora directory. Any idea?

vseloved / cl-nlp Goto Github PK

cl-nlp's Introduction

CL-NLP -- a Lisp NLP toolkit

Brief description

How to start working with CL-NLP

Technical notes

Dependencies

License

cl-nlp's People

Contributors

Stargazers

Watchers

Forkers

cl-nlp's Issues

Error below Kindly resolve most urgently

For v.1.0

Post-1.0 experiments

Recommend Projects

Recommend Topics

Recommend Org