Comments (4)
The problem can also be reproduced by changing the caseInsensitive test case to:
@Test
public void caseInsensitive() {
Trie trie = new Trie().caseInsensitive().onlyWholeWords();
trie.addKeyword("turning");
trie.addKeyword("once");
trie.addKeyword("again");
trie.addKeyword("börkü");
Collection<Emit> emits = trie.parseText("TurninG OnCe AgAiN BÖRKÜ");
assertEquals(4, emits.size()); // Match must not be made
Iterator<Emit> it = emits.iterator();
checkEmit(it.next(), 0, 6, "turning");
checkEmit(it.next(), 8, 11, "once");
checkEmit(it.next(), 13, 17, "again");
checkEmit(it.next(), 19, 23, "börkü");
}
Tests in error:
caseInsensitive(org.ahocorasick.trie.TrieTest): String index out of range: 24
from aho-corasick.
diff --git a/src/main/java/org/ahocorasick/trie/Trie.java b/src/main/java/org/ahocorasick/trie/Trie.java
index 0d28c9b..f25490d 100644
--- a/src/main/java/org/ahocorasick/trie/Trie.java
+++ b/src/main/java/org/ahocorasick/trie/Trie.java
@@ -117,7 +117,7 @@ public class Trie {
for (Emit emit : collectedEmits) {
if ((emit.getStart() == 0 ||
!Character.isAlphabetic(searchText.charAt(emit.getStart() - 1))) &&
-
(emit.getEnd() == size ||
-
(emit.getEnd() == size - 1 || !Character.isAlphabetic(searchText.charAt(emit.getEnd() + 1)))) { continue; }
(I think :))
from aho-corasick.
Great, I've reproduced the error thanks to your finding. Looking into it right now. Thanks!
from aho-corasick.
The error was in this boundary check:
if ((emit.getStart() == 0 ||
!Character.isAlphabetic(searchText.charAt(emit.getStart() - 1))) &&
(emit.getEnd() == size ||
!Character.isAlphabetic(searchText.charAt(emit.getEnd() + 1)))) {
continue;
}
The emit.getEnd() == size does not do a proper boundary check. I've changed it to:
(emit.getEnd() + 1 == size ||
Will commit and release a new version.
Thanks again for submitting your issue!
from aho-corasick.
Related Issues (20)
- [Question]How to parse text which does not connected to previous character
- Suggestion: Attachable Metadata to Keyword HOT 1
- Adding RemoveKeyword API HOT 1
- Caveats of ignoreOverlaps() not following the longest, left-most match strategy
- Enquiry on suppression of match
- New versions unavailable on Maven Central HOT 1
- Question about thread safety HOT 2
- Production ready? HOT 1
- Building repeatedly. HOT 1
- Do you plan to upload latest version to Maven Central? HOT 3
- Add a real license file
- PayloadTrie.parseText method inconsistencies
- Missing releases on Maven Central HOT 6
- Release new artifacts to repositories? HOT 3
- onlyWholeWords vs onlyWholeWordsWhiteSpaceSeparated HOT 1
- Build order calls produce different scan-results HOT 1
- some question HOT 1
- Can you release the OpenHarmony OS JS version? HOT 1
- Add Matching Whole Word to the Emit class HOT 1
- NPE when the input text its null. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aho-corasick.