Comments (12)
I've check the leeoniya/uFuzzy#14 issue which says ufuzzy may
need a more complex regexp specified for its option entry while used
for CJK terms based search, and below is my experimentation snippet:
const ufuzzy = require('@leeoniya/ufuzzy')
let opts = {
unicode: false,
interSplit: "(\p{Unified_Ideograph=yes})+"
};
let uf = new ufuzzy(opts)
const list = ['Javascript太棒了', 'Javascript素晴らしい', 'Javascript너무훌륭해훌륭해']
let strs = {cn: '太棒', jp: 'らし', kr: '훌륭'}
for (let i in strs) {
console.log(i + ':' + uf.filter(list, strs[i]));
}
If I don't specify interSplit
with CJK full stack range as above
(which I've taken from here which see Unified_Ideograph),
the search is failed as returning null
even if I using the latest
version of uFuzzy. It seems uFuzzy can not use CJK chars for
calculating matches.
I just make a little bit try, hope that's useful.
Could you also send me some text examples which cause the issue?
Then I can use this to replicate.
The contents in above code example may be enough?
There we already have a newer version. I could try a dependency
update to that and if that works, I can create a new release
I've tested this lastest commit yet, still throwing same error.
If not, could you consider creating your report at the uFuzzy GitHub issues?
Yeah, by the investigation, I think that should be but may be
duplicated with leeoniya/uFuzzy#14, thus that may be later or
... any suggestion issued to?
Thanks!
Briant
from search-bookmarks-history-and-tabs.
Of course I will, I've made some changes to that but hadn't got the best way even if following the leeoniya's suggestion to use fully punctuations and whitespace binding to interSplit
since the new unterminated chars classes
problem is found out. Uh...relax :)
Shall we close this or keep it open? Your call.
I've preferred to halt this issue as Close as not planned
since there's no exact fix found here. I halt this now, if any problem is presenting on, whenever reopening is welcom.
from search-bookmarks-history-and-tabs.
Hi @c0001,
thanks for the report and the investigation!
From what you write, I'm rather sure that the problem is actually coming from the https://github.com/leeoniya/uFuzzy library. There we already have a newer version. I could try a dependency update to that and if that works, I can create a new release. If not, could you consider creating your report at the uFuzzy GitHub issues?
There seems to be one issue already: leeoniya/uFuzzy#14
Could you also send me some text examples which cause the issue? Then I can use this to replicate.
Thanks!
Simon
from search-bookmarks-history-and-tabs.
Hi Briant,
ok, I got it. So if I can fix this issue by just configuring the interSplit
regexp, then I'll do that. I'm just not sure on the consequences on this.
But since you obviously can check out and build this project, I'll create a PR and can merge it - before making a release out of it. Then you and I can test a bit.
from search-bookmarks-history-and-tabs.
Ok, merged it. For me it seemed to work fine - could you also try out and verify?
from search-bookmarks-history-and-tabs.
(Sorry for above duplicated commit references caused by my four times rebase, but I don't know how to remove them)
Thanks, that's seems worked for me as well, but I think that's not a usual way since interSplit
is not a exposed fixed user option based on uFuzzy's README, and the commentary of interSplit
(here) just say that's a term segmentation & punct/whitespace merging
and defaults to [^A-Za-z0-9']+
which let me thought of that the p{Unified_Ideograph}=yes
regexp is considerred as a divider filter rather than a mactch? If so, the fix of this may not be proper.
Thus, I make a new pull-request to provide a ufuzzyOptions
user side customization which let us can do more test and tricks, is that OK?
from search-bookmarks-history-and-tabs.
yes, interSplit
should only be used for white-space and punctuation. the default setting excludes apostrophes so that don't
does not get matched as don t
.
from search-bookmarks-history-and-tabs.
Thanks for issuing the internal mechanism of option interSplit
, and I've checked it with the regexp range covering whitespace and unicode punctuations, and worked for most of cases for example run bellow test.js
:
const process = require('process')
const ufuzzy = require('@leeoniya/ufuzzy')
let opts = {
interSplit: "(^\\s|\\p{P})+",
};
let uf = new ufuzzy(opts)
const list = ['Javascript 太棒了', 'Javascript素晴らしい', 'Javascript너무훌륭해훌륭해']
let strs = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '太 棒', '素\t晴 ら', '훌”륭\t\t해 훌']
let c = 0
for (let i in strs) {
let j = uf.filter(list, strs[i])
if (!j) {
console.log('\'' + strs[i] + '\'' + ': ' + j);
c++;
}
}
if (c!=0) {
console.log("--> Failure of some letter search, Abort!")
process.exit(1)
} else {
console.log("--> All of searching available, Done!")
}
This given the result of
>$ node test.js
--> All of searching available, Done!
But if I changed interSplit
regexp to ([^a-zA-Z']|\\s|\\p{P})+
that plusses your defaults, the result is failure of all CJK patterns:
>$ node test_ufuzzy.js
'太 棒': null
'素 晴 ら': null
'훌”륭 해 훌': null
--> Failure of some letter search, Abort!
That's why?
With checking uppon my previous post in this issue, if I change the interSplit
regexp (\p{Unified_Ideograph=yes})+
to either [^a-zA-Z'\p{Unified_Ideograph=yes}]+
or ([^a-zA-Z']|\p{Unified_Ideograph=yes})+
for a compatibility test, the test is still failure with all of cn,jp and kr search result are null
, even if I replace [^a-zA-Z]
to purely simple "not" logical regexp char classes bracket (such as [^a]
). Unless I remove the ^
. It seems like the ^
keyword will also affect the result?
In the other hand, can you give more description of uFuzzy's designation of interSplit
and either for its sibling intraSplit
since I'm confused about what that is based on the understanding of interSplit
which you've pointed out. Or return to the original issue, my question is that does uFuzzy can fully support fuzzy searching with extending Latin/Roman alphabet with CJK terms and any existent workaround can be released out here?
Thanks!
Briant
from search-bookmarks-history-and-tabs.
Thanks @c0001 for working on further clarifying this!
So I'll revert my (then premature) fix in the master branch and wait until we found a good solution that works well for both Latin/Roman and CJK search items.
I do like the idea of just exposing uFuzzyOptions directly for power users. But I hope that we also find a more general, user friendly fix :)
from search-bookmarks-history-and-tabs.
According to #91, We have flexible uFuzzyOptions
user
customization place now, so the issue can be halt temporarily.
The problem is for those users are not familiar with uFuzzy
specification that how we make a guide for them to be involved in. Or
based on this discussion of #90 , a simple supportCJK
may be
approachable for common user. To my wish, let user-end be with
non-feeling of this change (i.e. find a robust way to make both of
them workable internally) may be Best.
Further more for any one first time dating with this problem, this
previous post may be useful.
Thanks!
Briant
from search-bookmarks-history-and-tabs.
@c0001 : If you have any further insights how to best configure uFuzzy for your case, feel free to propose a better README fix or mention. In case we find a solution which works general-purpose, we can either have it in by default or if it comes with drawbacks, then have it as a dedicated option (like enableCJK: true
)
Shall we close this or keep it open? Your call.
from search-bookmarks-history-and-tabs.
Ok, great. Thanks again!
I've now created a new release and submitted it to the stores. Usually this takes a few days.
from search-bookmarks-history-and-tabs.
Related Issues (20)
- I like it HOT 4
- How to search inside a specific bookmark folder HOT 10
- Feature request: Vim based navigation instead of arrow keys to move up and down HOT 5
- Open URL in current tab is not working HOT 3
- Arc (chromium) Browser: Cannot open "Options" or "Tips" pages HOT 3
- Can't open a PDF file saved as a bookmark in edge browser. HOT 2
- [Feature Request] Search Folder HOT 3
- Favicons needed HOT 5
- Not finding items HOT 20
- Keep last search result selected HOT 2
- Feature request: search bookmark folder to bookmark current tab in it or create new folder in this returned folder to bookmark current tab inside it HOT 2
- Extension for Brave HOT 2
- Getting errors HOT 8
- Remove "hybrid" search
- ColorStripe and BookmarkColor HOT 2
- Separator Line between Bookmarks and History HOT 4
- Search Through Older History HOT 3
- multiple languages support? HOT 6
- Limit (vertical) size of link title
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from search-bookmarks-history-and-tabs.