View Code? Open in Web Editor
NEW
A lightweight but powerful library to build token indices for NLP tasks, compatible with major Deep Learning frameworks like PyTorch and Tensorflow.
Home Page: http://token2index.readthedocs.io
License: GNU General Public License v3.0
token2index's Issues
Describe the solution you'd like
Allow the T2I.build
method to accept List[List[str]].
Example:
tokenized_corpus = [
["This", "is", "a", "sentence"],
["This", "is", "another", "sentence"]
]
T2I.build(tokenized_corpus)
Currently it can be implemented as follows, but it would be nice to have it supported automatically:
# flatten List[List[str]] with a generator to avoid memory usage
gen = (i for sent in tokenized_corpus for i in sent)
T2I.build(gen)