Giter VIP home page Giter VIP logo

liblevenshtein-coffeescript's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

liblevenshtein-coffeescript's Issues

How to preserve the order of transduced items?

First of all thanks for sharing your work with the world!

I'm using the Javascript version, and I have a question regarding the .dictionary(locations, false) bit here:

(code below is in es6)

import levenshtein from 'liblevenshtein';
const transducer = new levenshtein
  .Builder()
  .dictionary(locations, false) // What does false mean?
  .algorithm('transposition')
  .include_distance(false)
  .build();
...
const matches = transducer.transduce(query, 1);

When should I pass true and when should I pass false to .dictionary()?

How could I guarantee that matches are in the same order as locations?

For example, if query = 'elbourne', and locations is:

['melbourne', 'sydney', 'london', 'selbourne', 'adelaide', 'delbourne'];

I want to make sure that matches is:

['melbourne', 'selbourne', 'delbourne']

(the order of locations is preserved in matches)

How to make it seach a database?

This is good work. Wondering how to extend this to make it search against a database (instead of against in-memory dictionary)?

For example, if the original dictionary of strings were stored in, say, ElasticSearch or MongoDB etc. what is the best way to utilize this ?

Construction of Dawg Stomps on Prefixes

Please see that the following test will fail:

// https://github.com/universal-automata/liblevenshtein-coffeescript/blob/master/src/collection/dawg.coffee
// Test 
var dictionary = [ "about", "after", "all", "and", "about and", "afterall" ];
var dawg = new Dawg(dictionary);
console.log(dawg.accepts("after"));

This fails because the construction of the Dawg is stomping on the nodes of prefixes, thus causing the 'is_final' attribute to be set to false.

The important thing to realize is that the original article which this algorithm is patterned on requires that a list be sorted before it is added. The reason for this is because that is the only way to be certain that the previous word has the possibility of being a prefix to the next word being inserted.

This won't work because there is no guarantee that, given an unsorted list, the previous word will share a prefix with the current word. (That guarantee is ensured by having a presorted list):

# Find longest common prefix between word and previous word
i = 0; previous_word = @previous_word
upper_bound =
  if word.length < previous_word.length
    word.length
  else
    previous_word.length

i += 1 while i < upper_bound and word[i] is previous_word[i]

As far as I can tell, the only way to construct a Dawg from an unsorted list would be to construct the entire trie first, then compress it. The reason that the referenced algorithm is so efficient is because pre-sorting guarantees that a trie branch can be minimized directly after construction.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.