Giter VIP home page Giter VIP logo

genius's People

Contributors

duanhongyi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

genius's Issues

encoding issue of readme.md file

when i install the package with pip and python 3.4 32bits on windows 7, the following error report. I assume it is encoding issue of readme.md. if you save the file with encoding gb2312 with utf8 without BOM, it should be OK. I did not dig in what actually happened.

C:\Python34>python.exe -m pip install genius
Downloading/unpacking genius
  Running setup.py (path:C:\Users\sduan\AppData\Local\Temp\pip_build_sduan\genius\setup.py) egg_info for package genius
    Traceback (most recent call last):
      File "<string>", line 17, in <module>
      File "C:\Users\sduan\AppData\Local\Temp\pip_build_sduan\genius\setup.py", line 7, in <module>
        README = open(os.path.join(here, 'README.md')).read()
    UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 24: illegal multibyte sequence
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 17, in <module>

  File "C:\Users\sduan\AppData\Local\Temp\pip_build_sduan\genius\setup.py", line 7, in <module>

    README = open(os.path.join(here, 'README.md')).read()

UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 24: illegal multibyte sequence

Error with comma

因为我的中文不好,所以我写英文。不好意思。

I get this error:

>>> genius.seg_text('Kate, 坐吧。', use_tagging=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/Ale/Desktop/hsk-vocab/python_env/lib/python3.6/site-packages/genius/__init__.py", line 29, in seg_text
    pre_words = processes['tagging'](**kwargs).process(pre_words)
  File "/Users/Ale/Desktop/hsk-vocab/python_env/lib/python3.6/site-packages/genius/process.py", line 311, in process
    word.tagging = taggings[index]
IndexError: list index out of range

But if I change , for in the sentence, it works no problem. Also with use_tagging=False it works...

Thank you for this great NLP package. 谢谢!

Error coming on such type of text. Could you p;ossibly explain why ?

太湖世界文化论坛第五届年会
共识
2018

10

18
日至
19
日,来自世界五大洲近
40
个国家和地区以及国际组织的著名政治家、哲学家、社会科学家、文化学者、企业家、媒体领军人和社会各界朋友共
1000
余人,共聚北京,参加太湖世界文化论坛第五届年会。围绕本届年会主题“文化对话:构建人类命运共同体”,与会者经过平等交流、深入探讨,达成如下共识。
一,构建人类命运共同体,建设一个持久和平、普遍安全、共同繁荣、开放包容、清洁美丽的世界,这是世界各国人民的根本利益和共同利益所系,是人类社会普遍的价值追求所在,是当今时代和平、合作、发展的历史潮流大势所趋。
二,构建人类命运共同体是一项宏大而紧迫的时代任务,对文化建设提出了历史性新要求。构建人类命运共同体又是一个漫长而曲折的历史过程,始终需要文化给予坚强有力的支撑。忠实反映人民的意愿,加强文化创新,为合力构建人类命运共同体发挥长期的、基础的促进作用,这是当今世界文化繁荣发展的根本大计。
三,相互尊重、和而不同,建设一个远离恐惧、普遍安全、持久和平的世界。和平安宁,始终是世界各国人民的根本利益和世代梦想。要秉持和而不同的精神,坚持不同国家平等相待、和平相处;尊重各国人民自主选择社会制度和发展道路;相互尊重国家主权、独立和领土完整,互不干涉内政;加强国际合作,共同破解气候变化、恐怖主义、核扩散、难民潮等全球难题。不搞霸权主义、双重标准,不搞冷战思维、文化偏见。出现矛盾冲突要对话不要对抗,要协商不要霸凌,反对唯我独尊、强加于人,摒弃弱肉强食的丛林法则。发挥文化的力量,坚持不懈地传播和平理念、密切和平交往、凝聚和平力量,共谋、共护、共享世界和平与安宁。
四,平等互利、合作共赢,建设一个远离贫困、共同繁荣的世界。当今世界,开放型的国际合作、以规则为基础的多边贸易体制,推动经济全球化和自由贸易蓬勃发展,各国经济联系空前紧密,形成你中有我、我中有你的利益共同体。世界经济越来越远离孤立封闭的旧时代,越来越颠覆我赢你输、零和博弈的旧逻辑。倒退没有出路。经济霸凌、单边主义和保护主义,以损人开始、注定以害己告终。要顺应历史大势,走开放融通、互利共赢之路,协商管控分歧,以改革和发展化解矛盾冲突,推动经济全球化健康发展。深入推进“一带一路”国际合作,增添共同发展新动力,做世界和平的建设者、全球发展的贡献者、国际秩序的维护者。
五,求同存异、互学互鉴,建设一个远离封闭、开放包容的世界。世界文明丰富多彩,无优劣之分,各有自己的特色与优长。要尊重文明多样性。无论哪个国家,开放包容总是带来进步,封闭自大必然导致落后。要倡导不同文明相互尊重、交流对话。能不能以文明交流超越文明隔阂、文明互鉴超越文明冲突、文明共存超越文明优越,这关系全世界的安危兴衰,关系全人类的前途命运。加强不同国家的文化交流,发挥文化沟通心灵的特殊桥梁作用,让各国人民不断增进相互了解和相互信任,克服傲慢与偏见、减少误会与误判,为构建人类命运共同体提供愈益广大的民意基础和连绵不断的文化支撑。
六,尊崇自然、同舟共济,建设一个山清水秀、清洁美丽的世界。气候变化的严峻挑战,对人类的生存延续构成越来越严重的威胁。无论哪个国家,都无法独善其身,更不能罔顾事实、放弃历史责任。面对全球性共同挑战,惟有同舟共济、合力应对,才是世界各国的正确选择。要深刻吸取历史教训,牢固树立尊重自然、顺应自然、保护自然的理念,以人与自然和谐相处、共生共存为目标,转变生产生活方式,共同呵护地球家园,共同医治生态环境的累累伤痕,构筑尊崇自然、绿色发展的全球生态体系,造福于全人类,造福于子孙后代。
2018

10

19
日于北京


error: missing tokens, cannot apply pattern

关于拼音分词

请问拼音分词是基于什么方法实现的?对于有歧义的拼音(例如:xianjintianxiaxuele)是如何消歧的呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.