Giter VIP home page Giter VIP logo

cl-edit-distance's Introduction

edit-distance

Build Status Coverage Status

Using

Computes the edit distance (aka Levenshtein distance) between two sequences. This is a common distance measure.

For example, to compute the distance between two sequences:

(edit-distance:distance '(1 2 3 4 5) '(2 3 4 5 6))

To compute the sequence of edit operations needed to convert sequence 1 into sequence two, use the diff function

(edit-distance:diff '(1 2 3 4 5) '(2 3 4 5 6))

That will return two values a structure, as follows, and the distance.

((:DELETION 1 NIL) (:MATCH 2 2) (:MATCH 3 3) (:MATCH 4 4) (:MATCH 5 5) (:INSERTION NIL 6))
2

That structure can be printed more readibly with the FORMAT-DIFF function

(edit-distance:format-diff *path*)

Or, you can compute the diff and print it readably together by calling PRINT-DIFF:

(edit-distance:print-diff '(1 2 3 4 5) '(2 3 4 5 6))

which will print a result like this:

seq1: 1   2 3 4 5 *** []
seq2: *** 2 3 4 5 6   []

Several options are available to the FORMAT-DIFF and PRINT-DIFF to print a prefix and suffix for each line. Note that displaying substitutions relys on captialization and so substitutions are not visible for non-alphabetic sequence elements.

Additionally, other equality functions can be used, so this evaluates to a distance of zero:

 (edit-distance:distance '("ONE" "TWO" "THREE") '("one" "two"
 "three") :test 'string-equal)

Any type of sequence may be used, but for speed the input sequences are copied into new arrays. If speed is a major concern make sure to provide simple vectors as your input sequences.

Testing

To test with sbcl, run:

sbcl --eval "(asdf:load-system 'edit-distance-test)" --eval "(unless (lisp-unit:run-tests :all :edit-distance-tests) (uiop:quit 1))"

Creative Commons License
cl-edit-distance by Ben Lambert is licensed under a Creative Commons Attribution 4.0 International License.

cl-edit-distance's People

Contributors

belambert avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

cl-edit-distance's Issues

PRINT-DIFFERENCES: #<TYPE-ERROR expected-type: SEQUENCE datum: #\p>

Hi,

first of all, thank you for this library!

Sadly it doesn't seem to work as expected, or at least as expected by me (sorry! ;).

PRINT-DIFFERENCES says Given a 'path' as produced by the above function LEVENSTEIN-DISTANCE function above, but trying to do that I get a type error:

(edit-distance:print-differences 
  (edit-distance:levenshtein-distance
    "foo1"
    "foo2"
    :test #'eql
    :return-path t)
  )
#<TYPE-ERROR expected-type: SEQUENCE datum: #\f>

I guess I'm misunderstanding how that should be used, so perhaps the solution is a better doc-string?!

control the weights

("a seat for one person, with a support for the back; “he put his coat over the back of the chair and sat down” ; "
  "a seat for one person, with a support for the back; \"he put his coat over the back of the chair and sat down\""
  ((:MATCH #\a #\a) (:MATCH #\  #\ ) (:MATCH #\s #\s) (:MATCH #\e #\e)
   (:MATCH #\a #\a) (:MATCH #\t #\t) (:MATCH #\  #\ ) (:MATCH #\f #\f)
   (:MATCH #\o #\o) (:MATCH #\r #\r) (:MATCH #\  #\ ) (:MATCH #\o #\o)
   (:MATCH #\n #\n) (:MATCH #\e #\e) (:MATCH #\  #\ ) (:MATCH #\p #\p)
   (:MATCH #\e #\e) (:MATCH #\r #\r) (:MATCH #\s #\s) (:MATCH #\o #\o)
   (:MATCH #\n #\n) (:MATCH #\, #\,) (:MATCH #\  #\ ) (:MATCH #\w #\w)
   (:MATCH #\i #\i) (:MATCH #\t #\t) (:MATCH #\h #\h) (:MATCH #\  #\ )
   (:MATCH #\a #\a) (:MATCH #\  #\ ) (:MATCH #\s #\s) (:MATCH #\u #\u)
   (:MATCH #\p #\p) (:MATCH #\p #\p) (:MATCH #\o #\o) (:MATCH #\r #\r)
   (:MATCH #\t #\t) (:MATCH #\  #\ ) (:MATCH #\f #\f) (:MATCH #\o #\o)
   (:MATCH #\r #\r) (:MATCH #\  #\ ) (:MATCH #\t #\t) (:MATCH #\h #\h)
   (:MATCH #\e #\e) (:MATCH #\  #\ ) (:MATCH #\b #\b) (:MATCH #\a #\a)
   (:MATCH #\c #\c) (:MATCH #\k #\k) (:MATCH #\; #\;) (:MATCH #\  #\ )
   (:SUBSTITUTION #\LEFT_DOUBLE_QUOTATION_MARK #\") (:MATCH #\h #\h)
   (:MATCH #\e #\e) (:MATCH #\  #\ ) (:MATCH #\p #\p) (:MATCH #\u #\u)
   (:MATCH #\t #\t) (:MATCH #\  #\ ) (:MATCH #\h #\h) (:MATCH #\i #\i)
   (:MATCH #\s #\s) (:MATCH #\  #\ ) (:MATCH #\c #\c) (:MATCH #\o #\o)
   (:MATCH #\a #\a) (:MATCH #\t #\t) (:MATCH #\  #\ ) (:MATCH #\o #\o)
   (:MATCH #\v #\v) (:MATCH #\e #\e) (:MATCH #\r #\r) (:MATCH #\  #\ )
   (:MATCH #\t #\t) (:MATCH #\h #\h) (:MATCH #\e #\e) (:MATCH #\  #\ )
   (:MATCH #\b #\b) (:MATCH #\a #\a) (:MATCH #\c #\c) (:MATCH #\k #\k)
   (:MATCH #\  #\ ) (:MATCH #\o #\o) (:MATCH #\f #\f) (:MATCH #\  #\ )
   (:MATCH #\t #\t) (:MATCH #\h #\h) (:MATCH #\e #\e) (:MATCH #\  #\ )
   (:MATCH #\c #\c) (:MATCH #\h #\h) (:MATCH #\a #\a) (:MATCH #\i #\i)
   (:MATCH #\r #\r) (:MATCH #\  #\ ) (:MATCH #\a #\a) (:MATCH #\n #\n)
   (:MATCH #\d #\d) (:MATCH #\  #\ ) (:MATCH #\s #\s) (:MATCH #\a #\a)
   (:MATCH #\t #\t) (:MATCH #\  #\ ) (:MATCH #\d #\d) (:MATCH #\o #\o)
   (:MATCH #\w #\w) (:MATCH #\n #\n)
   (:DELETION #\RIGHT_DOUBLE_QUOTATION_MARK NIL) (:DELETION #\  NIL)
   (:DELETION #\; NIL) (:SUBSTITUTION #\  #\")))

I would like to control the weights to have In the end of the list above:

   ...
   (:SUBSTITUTION #\RIGHT_DOUBLE_QUOTATION_MARK #\")
   (:DELETION #\  NIL) (:DELETION #\; NIL) (:DELETION #\  NIL))

Idea?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.