Giter VIP home page Giter VIP logo

Comments (8)

stevewyl avatar stevewyl commented on June 8, 2024 2

不知道字典的存放位置是否影响?
我输入如下代码,结果还是没有返回我想要的结果,字典文件中已添加我想要的词
image
thu1 = thulac.thulac(user_dict="D:/python/text_preprocessing/dict.txt")
thu1.cut('我爱深度学习和机器学习', text=True)
Out[14]: '我_r 爱_v 深度_n 学习_v 和_c 机器_n 学习_v'
不知道哪里出错了?= =

from thulac-python.

gzp9595 avatar gzp9595 commented on June 8, 2024 1

您好,非常感谢对THULAC的支持,其中用户词典的定义方法在ReadMe中已经说明~

在定义thulac类的时候,用户词典作为一个参数载入即可~

thulac(user_dict=None, model_path=None, T2S=False, seg_only=False, filt=False, deli='_')初始化程序,进行自定义设置

 user_dict	      	设置用户词典,用户词典中的词会被打上uw标签。词典中每一个词一行,UTF8编码
 T2S					默认False, 是否将句子从繁体转化为简体
 seg_only	   		默认False, 时候只进行分词,不进行词性标注
 filt		   		默认False, 是否使用过滤器去除一些没有意义的词语,例如“可以”。
 model_path	 	    设置模型文件所在文件夹,默认为models/
 deli	 	      	默认为‘_’, 设置词与词性之间的分隔符

from thulac-python.

MaJunhua avatar MaJunhua commented on June 8, 2024

#coding:utf-8
import thulac

thu1 = thulac.thulac(seg_only=True, user_dict="mydict.txt") #设置模式为行分词模式
a = thu1.cut("我爱北京***", text=True)

mydict.txt 内容每词一行:
机器学习
数据挖掘
...
我爱北京***

from thulac-python.

yangqinj avatar yangqinj commented on June 8, 2024

好的,谢谢。

from thulac-python.

MaJunhua avatar MaJunhua commented on June 8, 2024

位置不对python应该会直接报一个file not found吧,你试试
for line in open("D:/python/text_preprocessing/dict.txt) 看看内容对不对?
还是找不到问题可能是windows和linux/mac环境不同了

from thulac-python.

ashengtx avatar ashengtx commented on June 8, 2024

请教一下,用户词典里的词如果有空格,有没办法将其分出来,比如,justin bieber是一个歌手分成justin bieber一个歌手

from thulac-python.

gzp9595 avatar gzp9595 commented on June 8, 2024

因为空格的出现更多的是在英文中,从现在的处理中暂时无法达到这样的效果,我们会在下一个版本尽量解决这个问题

from thulac-python.

cchwill avatar cchwill commented on June 8, 2024

您好,在使用 thulac.thulac(user_dict=myDictFile)會出現以下 encoding 的 問題,也試著將 T2S 參數設為 True (dict 檔已是 'utf-8'), 請問可以怎麼處理呢? 謝謝!

UnicodeDecodeError: 'cp950' codec can't decode byte 0xe6 in position 0: illegal multibyte sequence

from thulac-python.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.