Giter VIP home page Giter VIP logo

lexpy's People

Contributors

aosingh avatar tomsonboylett avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

lexpy's Issues

adding word information

As an update is there any way that we can add some information about a word for exmaple wordcount in a text or POS tag or something and result both the searched word and its value

example input: arc:1200
art:1450
bar:2300

example output: dawg.search_with_prefix("a")
[arc:1200,art:1450]

The wildcard pattern `?*` should be treated the same way as `*?`

In Lexpy, ? means "zero or one character" and * means "zero or more characters". Based on this, why is the pattern ?* considered illegal while *? is allowed? Don't they both have the same semantics here:

*?: zero or more || zero or one -> zero || zero, zero || one, more || zero, more || one -> zero, one, more -> zero or more
?*: zero or one || zero or more -> zero || zero, zero || more, one || zero, one || more -> zero, more, one -> zero or more

The code at _utils.py#L15 already translates *? to *, why isn't this also done for ?*?

result = re.sub('(\*\?)+', '*', result) # Replace consecutive '*?' with a single group '*'

Incorrect order of answers when using the wildcard '*' in DAWG

Hi,

I wonder if there is a small issue in the file automata.py, function __words_with_wildcard, between lines 128 and 147, when the case letter=='*' is processed.

If the dictionary is made of, for example, "CHIAC" and "CHIC", and the query is "CHI*C", the result will be return in an incorrect alphabetical order : "CHIC" then "CHIAC".

This is because the case words_at_current_level is processed before checking the children.

So, for "CHI*C",

  • words_at_current_level will first find "CHIC"
  • then the loop for child in node.children will find "CHIAC", resulting in an incorrect order of answers.

Any idea? Or maybe did I misunderstood the code?

Best,
Lionel

search_with_suffix function

Do you think there's a way to implement a search_with_suffix function that looks for words in the DAWG that contain some suffix? Also is there a way to search the DAWG for words that contain a substring? For instance, if I wanted words that contained the substring "ST," the function would return "first," "star," and "sophisticated" Thanks!

Dawg nodeid issue

HI,

There is an issue with your Dawg data-structure. The maximum nodeId that it can reach is 2. I tried to print the nodeid and val while inserting and here is the output i got

id and val at current is 1
id and val at current is 2 a
id and val at current is 2 n
id and val at current is 2 p
id and val at current is 2 e
id and val at current is 2 p
id and val at current is 2 l
id and val at current is 2 e
id and val at current is 2 b
id and val at current is 2 a
id and val at current is 2 n
id and val at current is 2 a
id and val at current is 2 n
id and val at current is 2 a
id and val at current is 2 t
id and val at current is 2 c
id and val at current is 2 a
id and val at current is 2 n
id and val at current is 2 a
id and val at current is 2 n
id and val at current is 2 a

it is not creating a child id after first child from root

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.