Giter VIP home page Giter VIP logo

himkt / konoha Goto Github PK

View Code? Open in Web Editor NEW
214.0 7.0 21.0 1.28 MB

๐ŸŒฟ An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.

Home Page: https://pypi.org/project/konoha

License: MIT License

Python 74.72% Dockerfile 2.34% Makefile 0.12% Jupyter Notebook 21.06% Jsonnet 1.76%
nlp text-processing mecab kytea sudachi sentencepiece natural-language-processing japanese janome

konoha's People

Contributors

altescy avatar chigichan24 avatar dependabot[bot] avatar garfieldnate avatar himkt avatar nzw0301 avatar sobamchan avatar upura avatar vegai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

konoha's Issues

Determine whether installing required packages or not

Hi,
I like this sentence tokenizer.
But, I use this package as only sentence tokenizer.
Therefore I do not like to install required packages ['natto-py', 'kytea', 'sentencepiece']
I would like to determine install_requires interactively.

For example, how about adding the following codes to setup.py?

from setuptools import find_packages
from setuptools import setup

install_requires = ['natto-py', 'kytea', 'sentencepiece']
if input('> Do you use this package as only sentence tokenizer? [Y/n]').upper() == 'Y':
    install_requires = []

setup(name='tiny_tokenizer',
      version='1.3.0',
      description='Tiny Word/Sentence Tokenizer',
      author='himkt',
      author_email='[email protected]',
      install_requires=install_requires,
      url='https://github.com/himkt/tiny_tokenizer',
      packages=find_packages())

Raise if `model_path` is not specified in `sentencepiece`.

  File "./konoha/api/tokenizers.py", line 35, in tokenize
    tokenizer = WordTokenizer(tokenizer=params.tokenizer)
  File "./konoha/word_tokenizer.py", line 39, in __init__
    self.__setup_tokenizer()
  File "./konoha/word_tokenizer.py", line 56, in __setup_tokenizer
    model_path=self.model_path
  File "./konoha/word_tokenizers/sentencepiece_tokenizer.py", line 27, in __init__
    self.tokenizer.load(model_path)
  File "/usr/local/lib/python3.6/dist-packages/sentencepiece.py", line 214, in load
    return _sentencepiece.SentencePieceProcessor_load(self, filename)
TypeError: not a string

Support Janome

Janome is a morphological analyzer purely written in Python.
We can analyze sentences after it runs pip install janome, which is very handy.

Relax requirements on Python forward compatibility packages

There is too strong constraint on importlib-metadata package which is "official" package for providing feature of the newer version of Python to the older ones. You fixed the package version up to major 4. It breaks transitive dependencies via markdown and tensorboard to torch for Python 3.8.

Nothing is actually broken except dependency resolution. By the moment, the latest importlib-metadata package has version 6.

word normalize support for mecab tokenizer

Hi, I have been using konoha a lot lately, thanks for the great lib.

I have a question about MeCab pipeline.
I see konoha's normalized_form only supports Sudachi.
Is there any particular reason why it does not support MeCab?

If there is not, I would like to contribute here.

Thanks.

Version 0.9 of Allennlp is installed instead of version 1.0.0 when installing

System:
OS: macOS Catalina | 10.15.6
Python version: 3.6.9
AllenNLP version: 1.0.0

Question
Why is version 0.9 of allennlp being installed instead of 1.0.0?

Log

pip install 'konoha[janome,allennlp]'
Requirement already satisfied: konoha[allennlp,janome] in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (4.3.0)
Collecting overrides<3.0.0,>=2.8.0 (from konoha[allennlp,janome])
Collecting allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations" (from konoha[allennlp,janome])
  Using cached https://files.pythonhosted.org/packages/bb/bb/041115d8bad1447080e5d1e30097c95e4b66e36074277afce8620a61cee3/allennlp-0.9.0-py3-none-any.whl
Requirement already satisfied: janome<0.4.0,>=0.3.10; extra == "janome" or extra == "all" or extra == "all_with_integrations" in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from konoha[allennlp,janome]) (0.3.10)
Requirement already satisfied: editdistance in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.5.3)
Requirement already satisfied: word2number>=1.1 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.1)
Requirement already satisfied: scipy in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.4.1)
Requirement already satisfied: unidecode in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.1.1)
Requirement already satisfied: sqlparse>=0.2.4 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.3.1)
Requirement already satisfied: flask>=1.0.2 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.1.1)
Requirement already satisfied: boto3 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.12.21)
Requirement already satisfied: parsimonious>=0.8.0 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.8.1)
Requirement already satisfied: tqdm>=4.19 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (4.43.0)
Requirement already satisfied: numpy in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.18.1)
Requirement already satisfied: flaky in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (3.6.1)
Requirement already satisfied: conllu==1.3.1 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.3.1)
Requirement already satisfied: nltk in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (3.4.5)
Requirement already satisfied: requests>=2.18 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (2.23.0)
Requirement already satisfied: pytorch-pretrained-bert>=0.6.0 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.6.2)
Requirement already satisfied: scikit-learn in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.22.2.post1)
Requirement already satisfied: responses>=0.7 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.10.12)
Requirement already satisfied: spacy<2.2,>=2.1.0 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (2.1.9)
Requirement already satisfied: jsonpickle in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.3)
Requirement already satisfied: numpydoc>=0.8.0 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.9.2)
Requirement already satisfied: ftfy in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (5.7)
Requirement already satisfied: pytz>=2017.3 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (2019.3)
Requirement already satisfied: pytest in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (5.4.1)
Requirement already satisfied: flask-cors>=3.0.7 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (3.0.8)
Requirement already satisfied: tensorboardX>=1.2 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (2.0)
Requirement already satisfied: gevent>=1.3.6 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.4.0)
Requirement already satisfied: jsonnet>=0.10.0; sys_platform != "win32" in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.15.0)
Requirement already satisfied: h5py in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (2.10.0)
Requirement already satisfied: matplotlib>=2.2.3 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (3.2.1)
Requirement already satisfied: torch>=1.2.0 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.5.1)
Requirement already satisfied: pytorch-transformers==1.1.0 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.1.0)
Requirement already satisfied: click>=5.1 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from flask>=1.0.2->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (7.1.1)
Requirement already satisfied: Jinja2>=2.10.1 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from flask>=1.0.2->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (2.11.1)
Requirement already satisfied: Werkzeug>=0.15 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from flask>=1.0.2->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.0.0)
Requirement already satisfied: itsdangerous>=0.24 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from flask>=1.0.2->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.1.0)
Requirement already satisfied: botocore<1.16.0,>=1.15.21 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from boto3->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.15.21)
Requirement already satisfied: s3transfer<0.4.0,>=0.3.0 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from boto3->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.3.3)
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from boto3->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.9.5)
Requirement already satisfied: six>=1.9.0 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from parsimonious>=0.8.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.14.0)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from requests>=2.18->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.25.8)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from requests>=2.18->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (2.9)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from requests>=2.18->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (2019.11.28)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from requests>=2.18->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (3.0.4)
Requirement already satisfied: regex in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from pytorch-pretrained-bert>=0.6.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (2020.2.20)
Requirement already satisfied: joblib>=0.11 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from scikit-learn->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.14.1)
Requirement already satisfied: plac<1.0.0,>=0.9.6 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from spacy<2.2,>=2.1.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.9.6)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from spacy<2.2,>=2.1.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (2.0.3)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from spacy<2.2,>=2.1.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.0.2)
Requirement already satisfied: blis<0.3.0,>=0.2.2 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from spacy<2.2,>=2.1.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.2.4)
Requirement already satisfied: wasabi<1.1.0,>=0.2.0 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from spacy<2.2,>=2.1.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.6.0)
Requirement already satisfied: thinc<7.1.0,>=7.0.8 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from spacy<2.2,>=2.1.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (7.0.8)
Requirement already satisfied: srsly<1.1.0,>=0.0.6 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from spacy<2.2,>=2.1.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.0.2)
Requirement already satisfied: preshed<2.1.0,>=2.0.1 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from spacy<2.2,>=2.1.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (2.0.1)
Requirement already satisfied: sphinx>=1.6.5 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from numpydoc>=0.8.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (2.4.4)
Requirement already satisfied: wcwidth in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from ftfy->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.1.8)
Requirement already satisfied: pluggy<1.0,>=0.12 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from pytest->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.13.1)
Requirement already satisfied: attrs>=17.4.0 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from pytest->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (19.3.0)
Requirement already satisfied: more-itertools>=4.0.0 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from pytest->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (8.2.0)
Requirement already satisfied: packaging in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from pytest->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (20.3)
Requirement already satisfied: py>=1.5.0 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from pytest->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.8.1)
Requirement already satisfied: importlib-metadata>=0.12; python_version < "3.8" in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from pytest->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.5.0)
Requirement already satisfied: protobuf>=3.8.0 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from tensorboardX>=1.2->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (3.11.3)
Requirement already satisfied: greenlet>=0.4.14; platform_python_implementation == "CPython" in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from gevent>=1.3.6->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.4.15)
Requirement already satisfied: cycler>=0.10 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from matplotlib>=2.2.3->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.10.0)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from matplotlib>=2.2.3->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (2.8.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from matplotlib>=2.2.3->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (2.4.6)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from matplotlib>=2.2.3->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.1.0)
Requirement already satisfied: future in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from torch>=1.2.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.18.2)
Requirement already satisfied: sentencepiece in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from pytorch-transformers==1.1.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.1.85)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from Jinja2>=2.10.1->flask>=1.0.2->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.1.1)
Requirement already satisfied: docutils<0.16,>=0.10 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from botocore<1.16.0,>=1.15.21->boto3->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.15.2)
Requirement already satisfied: alabaster<0.8,>=0.7 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from sphinx>=1.6.5->numpydoc>=0.8.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (0.7.12)
Requirement already satisfied: imagesize in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from sphinx>=1.6.5->numpydoc>=0.8.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.2.0)
Requirement already satisfied: Pygments>=2.0 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from sphinx>=1.6.5->numpydoc>=0.8.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (2.6.1)
Requirement already satisfied: sphinxcontrib-devhelp in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from sphinx>=1.6.5->numpydoc>=0.8.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.0.2)
Requirement already satisfied: sphinxcontrib-jsmath in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from sphinx>=1.6.5->numpydoc>=0.8.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.0.1)
Requirement already satisfied: snowballstemmer>=1.1 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from sphinx>=1.6.5->numpydoc>=0.8.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (2.0.0)
Requirement already satisfied: setuptools in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from sphinx>=1.6.5->numpydoc>=0.8.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (46.0.0)
Requirement already satisfied: babel!=2.0,>=1.3 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from sphinx>=1.6.5->numpydoc>=0.8.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (2.8.0)
Requirement already satisfied: sphinxcontrib-qthelp in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from sphinx>=1.6.5->numpydoc>=0.8.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.0.3)
Requirement already satisfied: sphinxcontrib-serializinghtml in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from sphinx>=1.6.5->numpydoc>=0.8.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.1.4)
Requirement already satisfied: sphinxcontrib-applehelp in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from sphinx>=1.6.5->numpydoc>=0.8.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.0.2)
Requirement already satisfied: sphinxcontrib-htmlhelp in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from sphinx>=1.6.5->numpydoc>=0.8.0->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (1.0.3)
Requirement already satisfied: zipp>=0.5 in /usr/local/var/pyenv/versions/3.6.9/envs/sandbox369/lib/python3.6/site-packages (from importlib-metadata>=0.12; python_version < "3.8"->pytest->allennlp<0.10.0,>=0.9.0; extra == "allennlp" or extra == "all_with_integrations"->konoha[allennlp,janome]) (3.1.0)
allennlp-models 1.0.0 has requirement allennlp==1.0.0, but you'll have allennlp 0.9.0 which is incompatible.
allennlp-models 1.0.0 has requirement conllu==3.0, but you'll have conllu 1.3.1 which is incompatible.
Installing collected packages: overrides, allennlp
  Found existing installation: overrides 3.0.0
    Uninstalling overrides-3.0.0:
      Successfully uninstalled overrides-3.0.0
  Found existing installation: allennlp 1.0.0
    Uninstalling allennlp-1.0.0:
      Successfully uninstalled allennlp-1.0.0
Successfully installed allennlp-0.9.0 overrides-2.8.0
You are using pip version 18.1, however version 20.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

It is better to be pre-compiled a regular expression objects of the SentenceTokenizer class

Hi, @himkt
Now SentenceTokenizer is like below.

class SentenceTokenzier
    PATTERNS = [
        r"๏ผˆ.*?๏ผ‰",
        r"ใ€Œ.*?ใ€",
    ]

    def tokenize(self, document):
        for pattern in SentenceTokenizer.PATTERNS:
            pattern = re.compile(pattern)
            document = re.sub(pattern, self.conv_period, document)
        ....

However, this would harm speed since the regular expression objects is compiled in every loop.
I would say that the regular expression objects should be pre-compiled, like below.

class SentenceTokenizer:
    PATTERNS = [
        re.compile(r"๏ผˆ.*?๏ผ‰"),
        re.compile(r"ใ€Œ.*?ใ€"),
    ]

Then, we use them with fewer overheads.

class SentenceTokenzier
    ...
    def tokenize(self, document):
        for pattern in SentenceTokenizer.PATTERNS:
            document = pattern.sub(self.conv_period, document)

I created an evaluation repository.
hppRC/konoha-sentence-tokenizer-regex-compile
Please check it out for more detailed results.

Raise if `mode` is not specified when tokenizer is `sudachi`.

  File "./konoha/api/tokenizers.py", line 35, in tokenize
    tokenizer = WordTokenizer(tokenizer=params.tokenizer)
  File "./konoha/word_tokenizer.py", line 39, in __init__
    self.__setup_tokenizer()
  File "./konoha/word_tokenizer.py", line 76, in __setup_tokenizer
    with_postag=self.with_postag
  File "./konoha/word_tokenizers/sudachi_tokenizer.py", line 42, in __init__
    _mode = mode.capitalize()
AttributeError: 'NoneType' object has no attribute 'capitalize'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.