Giter VIP home page Giter VIP logo

Comments (12)

c0001 avatar c0001 commented on September 21, 2024 1

I've check the leeoniya/uFuzzy#14 issue which says ufuzzy may
need a more complex regexp specified for its option entry while used
for CJK terms based search, and below is my experimentation snippet:

const ufuzzy = require('@leeoniya/ufuzzy')

let opts = {
    unicode: false,
    interSplit: "(\p{Unified_Ideograph=yes})+"
};

let uf = new ufuzzy(opts)

const list = ['Javascript太棒了', 'Javascript素晴らしい', 'Javascript너무훌륭해훌륭해']

let strs = {cn: '太棒', jp: 'らし', kr: '훌륭'}

for (let i in strs) {
    console.log(i + ':' + uf.filter(list, strs[i]));
}

If I don't specify interSplit with CJK full stack range as above
(which I've taken from here which see Unified_Ideograph),
the search is failed as returning null even if I using the latest
version of uFuzzy. It seems uFuzzy can not use CJK chars for
calculating matches.

I just make a little bit try, hope that's useful.

Could you also send me some text examples which cause the issue?
Then I can use this to replicate.

The contents in above code example may be enough?

There we already have a newer version. I could try a dependency
update to that and if that works, I can create a new release

I've tested this lastest commit yet, still throwing same error.

If not, could you consider creating your report at the uFuzzy GitHub issues?

Yeah, by the investigation, I think that should be but may be
duplicated with leeoniya/uFuzzy#14, thus that may be later or
... any suggestion issued to?

Thanks!
Briant

from search-bookmarks-history-and-tabs.

c0001 avatar c0001 commented on September 21, 2024 1

Of course I will, I've made some changes to that but hadn't got the best way even if following the leeoniya's suggestion to use fully punctuations and whitespace binding to interSplit since the new unterminated chars classes problem is found out. Uh...relax :)

Shall we close this or keep it open? Your call.

I've preferred to halt this issue as Close as not planned since there's no exact fix found here. I halt this now, if any problem is presenting on, whenever reopening is welcom.

from search-bookmarks-history-and-tabs.

Fannon avatar Fannon commented on September 21, 2024

Hi @c0001,

thanks for the report and the investigation!
From what you write, I'm rather sure that the problem is actually coming from the https://github.com/leeoniya/uFuzzy library. There we already have a newer version. I could try a dependency update to that and if that works, I can create a new release. If not, could you consider creating your report at the uFuzzy GitHub issues?

There seems to be one issue already: leeoniya/uFuzzy#14

Could you also send me some text examples which cause the issue? Then I can use this to replicate.

Thanks!
Simon

from search-bookmarks-history-and-tabs.

Fannon avatar Fannon commented on September 21, 2024

Hi Briant,

ok, I got it. So if I can fix this issue by just configuring the interSplit regexp, then I'll do that. I'm just not sure on the consequences on this.

But since you obviously can check out and build this project, I'll create a PR and can merge it - before making a release out of it. Then you and I can test a bit.

from search-bookmarks-history-and-tabs.

Fannon avatar Fannon commented on September 21, 2024

Ok, merged it. For me it seemed to work fine - could you also try out and verify?

from search-bookmarks-history-and-tabs.

c0001 avatar c0001 commented on September 21, 2024

(Sorry for above duplicated commit references caused by my four times rebase, but I don't know how to remove them)

@Fannon

Thanks, that's seems worked for me as well, but I think that's not a usual way since interSplit is not a exposed fixed user option based on uFuzzy's README, and the commentary of interSplit (here) just say that's a term segmentation & punct/whitespace merging and defaults to [^A-Za-z0-9']+ which let me thought of that the p{Unified_Ideograph}=yes regexp is considerred as a divider filter rather than a mactch? If so, the fix of this may not be proper.

Thus, I make a new pull-request to provide a ufuzzyOptions user side customization which let us can do more test and tricks, is that OK?

from search-bookmarks-history-and-tabs.

leeoniya avatar leeoniya commented on September 21, 2024

yes, interSplit should only be used for white-space and punctuation. the default setting excludes apostrophes so that don't does not get matched as don t.

https://stackoverflow.com/questions/4328500/how-can-i-strip-all-punctuation-from-a-string-in-javascript-using-regex

from search-bookmarks-history-and-tabs.

c0001 avatar c0001 commented on September 21, 2024

Hi uFuzzy author @leeoniya :

Thanks for issuing the internal mechanism of option interSplit, and I've checked it with the regexp range covering whitespace and unicode punctuations, and worked for most of cases for example run bellow test.js:

const process = require('process')
const ufuzzy  = require('@leeoniya/ufuzzy')

let opts = {
    interSplit: "(^\\s|\\p{P})+",
};

let uf = new ufuzzy(opts)

const list = ['Javascript 太棒了', 'Javascript素晴らしい', 'Javascript너무훌륭해훌륭해']

let strs = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '太 棒', '素\t晴 ら', '훌”륭\t\t해 훌']

let c = 0
for (let i in strs) {
    let j = uf.filter(list, strs[i])
    if (!j) {
        console.log('\'' + strs[i] + '\'' + ': ' + j);
        c++;
    }
}
if (c!=0) {
    console.log("--> Failure of some letter search, Abort!")
    process.exit(1)
} else {
    console.log("--> All of searching available, Done!")
}

This given the result of

>$ node test.js 
--> All of searching available, Done!

But if I changed interSplit regexp to ([^a-zA-Z']|\\s|\\p{P})+ that plusses your defaults, the result is failure of all CJK patterns:

>$ node test_ufuzzy.js 
'太 棒': null
'素     晴 ら': null
'훌”륭          해 훌': null
--> Failure of some letter search, Abort!

That's why?

With checking uppon my previous post in this issue, if I change the interSplit regexp (\p{Unified_Ideograph=yes})+ to either [^a-zA-Z'\p{Unified_Ideograph=yes}]+ or ([^a-zA-Z']|\p{Unified_Ideograph=yes})+ for a compatibility test, the test is still failure with all of cn,jp and kr search result are null, even if I replace [^a-zA-Z] to purely simple "not" logical regexp char classes bracket (such as [^a]). Unless I remove the ^. It seems like the ^ keyword will also affect the result?

In the other hand, can you give more description of uFuzzy's designation of interSplit and either for its sibling intraSplit since I'm confused about what that is based on the understanding of interSplit which you've pointed out. Or return to the original issue, my question is that does uFuzzy can fully support fuzzy searching with extending Latin/Roman alphabet with CJK terms and any existent workaround can be released out here?

Thanks!
Briant

from search-bookmarks-history-and-tabs.

Fannon avatar Fannon commented on September 21, 2024

Thanks @c0001 for working on further clarifying this!

So I'll revert my (then premature) fix in the master branch and wait until we found a good solution that works well for both Latin/Roman and CJK search items.

I do like the idea of just exposing uFuzzyOptions directly for power users. But I hope that we also find a more general, user friendly fix :)

from search-bookmarks-history-and-tabs.

c0001 avatar c0001 commented on September 21, 2024

According to #91, We have flexible uFuzzyOptions user
customization place now, so the issue can be halt temporarily.

The problem is for those users are not familiar with uFuzzy
specification that how we make a guide for them to be involved in. Or
based on this discussion of #90 , a simple supportCJK may be
approachable for common user. To my wish, let user-end be with
non-feeling of this change (i.e. find a robust way to make both of
them workable internally) may be Best.

Further more for any one first time dating with this problem, this
previous post
may be useful.

Thanks!
Briant

from search-bookmarks-history-and-tabs.

Fannon avatar Fannon commented on September 21, 2024

@c0001 : If you have any further insights how to best configure uFuzzy for your case, feel free to propose a better README fix or mention. In case we find a solution which works general-purpose, we can either have it in by default or if it comes with drawbacks, then have it as a dedicated option (like enableCJK: true)

Shall we close this or keep it open? Your call.

from search-bookmarks-history-and-tabs.

Fannon avatar Fannon commented on September 21, 2024

Ok, great. Thanks again!
I've now created a new release and submitted it to the stores. Usually this takes a few days.

from search-bookmarks-history-and-tabs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.