Giter VIP home page Giter VIP logo

Comments (9)

hohyon-ryu avatar hohyon-ryu commented on September 23, 2024

죄송하지만 재현이 안 되는데요:

"하...나는 아이유가 좋아요." --> "하/.../나/는/ /아이유/가/ /좋/아요/."

KoreanTokenizer.tokenize("하...나는 아이유가 좋아요.").map(_.text).mkString("/") 의 결과입니다. 어떤 플랫폼에서 실행하고 계신지요?

from twitter-korean-text.

jhsbeat avatar jhsbeat commented on September 23, 2024

플랫폼은 Mac OS X 10.10.4 이고요, 저도 master branch 가져와서 테스트하면 정상적으로 tokenizing 되는데 maven 에서 4.1.4 버전 가져와서 사용하면 해당 이슈가 발생합니다. ^^

from twitter-korean-text.

hohyon-ryu avatar hohyon-ryu commented on September 23, 2024

스칼라 버전은요?
On Tue, Nov 10, 2015 at 6:21 PM Hosang Jeon [email protected]
wrote:

플랫폼은 Mac OS X 10.10.4 이고요, 저도 master branch 가져와서 테스트하면 정상적으로 tokenizing
되는데 maven 에서 4.1.4 버전 가져와서 사용하면 해당 이슈가 발생합니다. ^^


Reply to this email directly or view it on GitHub
https://github.com/twitter/twitter-korean-text/issues/81#issuecomment-155633494
.

Will Hohyon Ryu
유호현
Senior Software Engineer at Twitter

from twitter-korean-text.

jhsbeat avatar jhsbeat commented on September 23, 2024

Scala version은 2.9 입니다. 2.11로 해도 동일한 현상이 발생합니다.

from twitter-korean-text.

hohyon-ryu avatar hohyon-ryu commented on September 23, 2024

2.9는 지원을 하지 않고 있는데요, 2.10버전용을 따로 퍼블리쉬했고요 2.11버전이 마스터입니다.
On Tue, Nov 10, 2015 at 9:06 PM Hosang Jeon [email protected]
wrote:

Scala version은 2.9 입니다.


Reply to this email directly or view it on GitHub
https://github.com/twitter/twitter-korean-text/issues/81#issuecomment-155667782
.

Will Hohyon Ryu
유호현
Senior Software Engineer at Twitter

from twitter-korean-text.

jhsbeat avatar jhsbeat commented on September 23, 2024

아 네. 그럼 스칼라 버전의 문제이군요. 감사합니다. ^^

from twitter-korean-text.

jhsbeat avatar jhsbeat commented on September 23, 2024

확인해 보았는데, tokenizing 은 정상적으로 되는 것 같습니다. 그런데, 해당 토큰에 대한 stemming 과정에서 아래와 같은 오류가 발생합니다.

재현 방법은 다음과 같습니다.

Seq<KoreanTokenizer.KoreanToken> tokens = TwitterKoreanProcessorJava.tokenize("하...나는 아이유가 좋아요.");
Seq<KoreanTokenizer.KoreanToken> stemmed = TwitterKoreanProcessorJava.stem(tokens);
java.util.NoSuchElementException: head of empty list
    at scala.collection.immutable.Nil$.head(List.scala:420)
    at scala.collection.immutable.Nil$.head(List.scala:417)
    at com.twitter.penguin.korean.stemmer.KoreanStemmer$$anonfun$1.apply(KoreanStemmer.scala:29)
    at com.twitter.penguin.korean.stemmer.KoreanStemmer$$anonfun$1.apply(KoreanStemmer.scala:27)
    at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
    at scala.collection.immutable.List.foldLeft(List.scala:84)
    at com.twitter.penguin.korean.stemmer.KoreanStemmer$.stem(KoreanStemmer.scala:27)
    at com.twitter.penguin.korean.TwitterKoreanProcessor$.stem(TwitterKoreanProcessor.scala:57)
    at com.twitter.penguin.korean.TwitterKoreanProcessor.stem(TwitterKoreanProcessor.scala)
    at com.twitter.penguin.korean.TwitterKoreanProcessorJava.stem(TwitterKoreanProcessorJava.java:127)

from twitter-korean-text.

jhsbeat avatar jhsbeat commented on September 23, 2024

@nlpenguin 바쁘신 듯하여 patch PR을 보내드렸습니다. 확인 한번 부탁드리겠습니다. ^^

from twitter-korean-text.

hohyon-ryu avatar hohyon-ryu commented on September 23, 2024

스테밍 버그 픽스 반영하여 4.1.6을 릴리즈하였습니다.

On Tue, Nov 10, 2015 at 10:18 PM Hosang Jeon [email protected]
wrote:

확인해 보았는데, tokenizing 은 정상적으로 되는 것 같습니다. 그런데, 해당 토큰에 대한 stemming 과정에서 아래와
같은 오류가 발생합니다.

재현 방법은 다음과 같습니다.

Seq<KoreanTokenizer.KoreanToken> tokens = TwitterKoreanProcessorJava.tokenize("하...나는 아이유가 좋아요.");
Seq<KoreanTokenizer.KoreanToken> stemmed = TwitterKoreanProcessorJava.stem(tokens);

java.util.NoSuchElementException: head of empty list
at scala.collection.immutable.Nil$.head(List.scala:420)
at scala.collection.immutable.Nil$.head(List.scala:417)
at com.twitter.penguin.korean.stemmer.KoreanStemmer$$anonfun$1.apply(KoreanStemmer.scala:29)
at com.twitter.penguin.korean.stemmer.KoreanStemmer$$anonfun$1.apply(KoreanStemmer.scala:27)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at com.twitter.penguin.korean.stemmer.KoreanStemmer$.stem(KoreanStemmer.scala:27)
at com.twitter.penguin.korean.TwitterKoreanProcessor$.stem(TwitterKoreanProcessor.scala:57)
at com.twitter.penguin.korean.TwitterKoreanProcessor.stem(TwitterKoreanProcessor.scala)
at com.twitter.penguin.korean.TwitterKoreanProcessorJava.stem(TwitterKoreanProcessorJava.java:127)


Reply to this email directly or view it on GitHub
https://github.com/twitter/twitter-korean-text/issues/81#issuecomment-155679733
.

Will Hohyon Ryu
유호현
Senior Software Engineer at Twitter

from twitter-korean-text.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.