Comments (9)
죄송하지만 재현이 안 되는데요:
"하...나는 아이유가 좋아요." --> "하/.../나/는/ /아이유/가/ /좋/아요/."
KoreanTokenizer.tokenize("하...나는 아이유가 좋아요.").map(_.text).mkString("/") 의 결과입니다. 어떤 플랫폼에서 실행하고 계신지요?
from twitter-korean-text.
플랫폼은 Mac OS X 10.10.4 이고요, 저도 master branch 가져와서 테스트하면 정상적으로 tokenizing 되는데 maven 에서 4.1.4 버전 가져와서 사용하면 해당 이슈가 발생합니다. ^^
from twitter-korean-text.
스칼라 버전은요?
On Tue, Nov 10, 2015 at 6:21 PM Hosang Jeon [email protected]
wrote:
플랫폼은 Mac OS X 10.10.4 이고요, 저도 master branch 가져와서 테스트하면 정상적으로 tokenizing
되는데 maven 에서 4.1.4 버전 가져와서 사용하면 해당 이슈가 발생합니다. ^^—
Reply to this email directly or view it on GitHub
https://github.com/twitter/twitter-korean-text/issues/81#issuecomment-155633494
.Will Hohyon Ryu
유호현
Senior Software Engineer at Twitter
from twitter-korean-text.
Scala version은 2.9 입니다. 2.11로 해도 동일한 현상이 발생합니다.
from twitter-korean-text.
2.9는 지원을 하지 않고 있는데요, 2.10버전용을 따로 퍼블리쉬했고요 2.11버전이 마스터입니다.
On Tue, Nov 10, 2015 at 9:06 PM Hosang Jeon [email protected]
wrote:
Scala version은 2.9 입니다.
—
Reply to this email directly or view it on GitHub
https://github.com/twitter/twitter-korean-text/issues/81#issuecomment-155667782
.Will Hohyon Ryu
유호현
Senior Software Engineer at Twitter
from twitter-korean-text.
아 네. 그럼 스칼라 버전의 문제이군요. 감사합니다. ^^
from twitter-korean-text.
확인해 보았는데, tokenizing 은 정상적으로 되는 것 같습니다. 그런데, 해당 토큰에 대한 stemming 과정에서 아래와 같은 오류가 발생합니다.
재현 방법은 다음과 같습니다.
Seq<KoreanTokenizer.KoreanToken> tokens = TwitterKoreanProcessorJava.tokenize("하...나는 아이유가 좋아요.");
Seq<KoreanTokenizer.KoreanToken> stemmed = TwitterKoreanProcessorJava.stem(tokens);
java.util.NoSuchElementException: head of empty list
at scala.collection.immutable.Nil$.head(List.scala:420)
at scala.collection.immutable.Nil$.head(List.scala:417)
at com.twitter.penguin.korean.stemmer.KoreanStemmer$$anonfun$1.apply(KoreanStemmer.scala:29)
at com.twitter.penguin.korean.stemmer.KoreanStemmer$$anonfun$1.apply(KoreanStemmer.scala:27)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at com.twitter.penguin.korean.stemmer.KoreanStemmer$.stem(KoreanStemmer.scala:27)
at com.twitter.penguin.korean.TwitterKoreanProcessor$.stem(TwitterKoreanProcessor.scala:57)
at com.twitter.penguin.korean.TwitterKoreanProcessor.stem(TwitterKoreanProcessor.scala)
at com.twitter.penguin.korean.TwitterKoreanProcessorJava.stem(TwitterKoreanProcessorJava.java:127)
from twitter-korean-text.
@nlpenguin 바쁘신 듯하여 patch PR을 보내드렸습니다. 확인 한번 부탁드리겠습니다. ^^
from twitter-korean-text.
스테밍 버그 픽스 반영하여 4.1.6을 릴리즈하였습니다.
On Tue, Nov 10, 2015 at 10:18 PM Hosang Jeon [email protected]
wrote:
확인해 보았는데, tokenizing 은 정상적으로 되는 것 같습니다. 그런데, 해당 토큰에 대한 stemming 과정에서 아래와
같은 오류가 발생합니다.재현 방법은 다음과 같습니다.
Seq<KoreanTokenizer.KoreanToken> tokens = TwitterKoreanProcessorJava.tokenize("하...나는 아이유가 좋아요.");
Seq<KoreanTokenizer.KoreanToken> stemmed = TwitterKoreanProcessorJava.stem(tokens);java.util.NoSuchElementException: head of empty list
at scala.collection.immutable.Nil$.head(List.scala:420)
at scala.collection.immutable.Nil$.head(List.scala:417)
at com.twitter.penguin.korean.stemmer.KoreanStemmer$$anonfun$1.apply(KoreanStemmer.scala:29)
at com.twitter.penguin.korean.stemmer.KoreanStemmer$$anonfun$1.apply(KoreanStemmer.scala:27)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at com.twitter.penguin.korean.stemmer.KoreanStemmer$.stem(KoreanStemmer.scala:27)
at com.twitter.penguin.korean.TwitterKoreanProcessor$.stem(TwitterKoreanProcessor.scala:57)
at com.twitter.penguin.korean.TwitterKoreanProcessor.stem(TwitterKoreanProcessor.scala)
at com.twitter.penguin.korean.TwitterKoreanProcessorJava.stem(TwitterKoreanProcessorJava.java:127)—
Reply to this email directly or view it on GitHub
https://github.com/twitter/twitter-korean-text/issues/81#issuecomment-155679733
.Will Hohyon Ryu
유호현
Senior Software Engineer at Twitter
from twitter-korean-text.
Related Issues (20)
- Update the contribution guide
- Scala 2.12 support HOT 1
- OOM Issue HOT 1
- java.util.regex.PatternSyntaxException: Look-behind pattern matches must have a bounded maximum length near index 9 HOT 7
- I want to add user-dictionary. is it possible?? HOT 1
- Trying to build using Scala IDE (Scala version: 2.12.3) HOT 2
- 한국말
- % should not be detached from the preceding number in the phrase extractor
- Some characters are classified as a foreign, but I think these would be a punctuation. HOT 2
- Hashtag classifying issue HOT 4
- Changing required scala version to 2.11.5+ HOT 12
- slf4j-nop needs a non-default scope HOT 2
- [Bug] tokenization solutions is sometimes empty. HOT 4
- Add more comprehensive error messages with inputs
- Implement detokenization HOT 2
- Detokenizer throws exception with certain inputs HOT 1
- Allow user dictionary
- Tokenizer throws exception with certain input HOT 1
- Wrong stemming : A rule does not obey the spelling system HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from twitter-korean-text.