feralvam / easse Goto Github PK

Easier Automatic Sentence Simplification Evaluation

License: GNU General Public License v3.0

Python 4.21% Shell 0.02% Roff 95.76%

text-simplification sentence-simplification evaluation-metrics natural-language-generation natural-language-processing nlproc

easse's People

Contributors

Stargazers

Watchers

easse's Issues

How to calculate fkgl, Bleu in python?

Kindly provide an example of how to calculate fkgl and bleu in python.

regards

result dismatch with paper

the result of SARI and bleu to model is mismatch with the original paper
WHY?????

Hello, I am running EASSE on a PyCharm virtual environment with Python 3.7 and all metrics except for SAMSA are working. I already installed tupa and I fixed the following error message by using the pip install protobuf==3.20.* command:

TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

Now I can execute SAMSA but it is still not working. This is my console output:

G:\My Drive\M5\Masterarbeit\implementation_metrics\easse\venv\Scripts\python.exe" "G:\My Drive\M5\Masterarbeit\implementation_metrics\easse\run_dennis.py" 
Warning: SAMSA metric is long to compute (120 sentences ~ 4min), disable it if you need fast evaluation.
Loading spaCy model 'en_core_web_md'... Done (33.254s).
Loading from 'G:\My Drive\M5\Masterarbeit\implementation_metrics\easse\easse\resources\tools\ucca-bilstm-1.3.10\models\ucca-bilstm.json'.
[dynet] random seed: 1
[dynet] allocating memory: 512MB
[dynet] memory allocation done.
[dynet] 2.1
Loading from 'G:\My Drive\M5\Masterarbeit\implementation_metrics\easse\easse\resources\tools\ucca-bilstm-1.3.10\models\ucca-bilstm.enum'... Done (0.121s).
Loading model from 'G:\My Drive\M5\Masterarbeit\implementation_metrics\easse\easse\resources\tools\ucca-bilstm-1.3.10\models\ucca-bilstm': 23param [02:14,  5.86s/param]
Loading model from 'G:\My Drive\M5\Masterarbeit\implementation_metrics\easse\easse\resources\tools\ucca-bilstm-1.3.10\models\ucca-bilstm': 100%|██████████| 23/23 [02:06<00:00,  5.51s/param]
Loading from 'G:\My Drive\M5\Masterarbeit\implementation_metrics\easse\easse\resources\tools\ucca-bilstm-1.3.10\models\ucca-bilstm.nlp.json'.
tupa --hyperparams "shared --lstm-layers 2" "amr --max-edge-labels 110 --node-label-dim 20 --max-node-labels 1000 --node-category-dim 5 --max-node-categories 25" "sdp --max-edge-labels 70" "conllu --max-edge-labels 60" --log parse.log --max-words 0 --max-words-external 249861 --vocab G:\My Drive\M5\Masterarbeit\implementation_metrics\easse\easse\resources\tools\ucca-bilstm-1.3.10\vocab\en_core_web_lg.csv --word-vectors ../word_vectors/wiki.en.vec
Loading 'G:\My Drive\M5\Masterarbeit\implementation_metrics\easse\easse\resources\tools\ucca-bilstm-1.3.10\vocab\en_core_web_lg.csv': 1340694 rows [00:06, 218144.04 rows/s]
2 passages [00:01,  1.05 passages/s, en ucca=1_0]
Starting server with command: java -Xmx5G -cp G:\My Drive\M5\Masterarbeit\implementation_metrics\easse\easse\resources\tools\stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 60000 -threads 40 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-21b4f872deb94b0d.props -preload tokenize,ssplit,pos,lemma,ner,depparse
Traceback (most recent call last):
  File "G:\My Drive\M5\Masterarbeit\implementation_metrics\easse\run_dennis.py", line 23, in <module>
    sys_sents=["About 95 you now get in.", "Cat on mat."])
  File "G:\My Drive\M5\Masterarbeit\implementation_metrics\easse\easse\samsa.py", line 305, in corpus_samsa
    return np.mean(get_samsa_sentence_scores(orig_sents, sys_sents, lowercase, tokenizer, verbose))
  File "G:\My Drive\M5\Masterarbeit\implementation_metrics\easse\easse\samsa.py", line 281, in get_samsa_sentence_scores
    verbose=verbose,
  File "G:\My Drive\M5\Masterarbeit\implementation_metrics\easse\easse\samsa.py", line 30, in syntactic_parse_ucca_scenes
    verbose=verbose,
  File "G:\My Drive\M5\Masterarbeit\implementation_metrics\easse\easse\aligner\corenlp_utils.py", line 144, in syntactic_parse_texts
    raw_parse_result = client.annotate(text)
  File "G:\My Drive\M5\Masterarbeit\implementation_metrics\easse\venv\lib\site-packages\stanfordnlp\server\client.py", line 398, in annotate
    r = self._request(text.encode('utf-8'), request_properties, **kwargs)
  File "G:\My Drive\M5\Masterarbeit\implementation_metrics\easse\venv\lib\site-packages\stanfordnlp\server\client.py", line 311, in _request
    self.ensure_alive()
  File "G:\My Drive\M5\Masterarbeit\implementation_metrics\easse\venv\lib\site-packages\stanfordnlp\server\client.py", line 137, in ensure_alive
    raise PermanentlyFailedException("Timed out waiting for service to come alive.")
stanfordnlp.server.client.PermanentlyFailedException: Timed out waiting for service to come alive.

Process finished with exit code 1

Is the problem maybe because the folder "My Drive" has a space? I haven't changed this folder name because changing it is quite a hassle.

Qualitative Outputs missing labels

Qualitative Outputs (e.g. Randomly sampled simplifications) in the HTML report don't have Source and Prediction labels. It is hard to judge if the text is the source or predictions. Labels/Headings for the text would be nice.

Truecase and detokenise test files

Add requirements for 'simalign' and 'bert_score'

Hello, it seems that the installation of 'simalign' and 'bert_score' packages is necessary to run an easse command, however they are not present in requirement.txt (or cited as requirements)

potential issue with SARI n-gram add-score

Hi, I have observed a particular situation with the SARI implementation where system outputs can receive a <100 score even when they are identical to the reference (where there is only a single reference).

Basically, if a reference does not introduce new tokens, it will receive a 0.00 unigram add-score, but 100 for all n>1-grams.

Take the following example:

sources=["Shu Abe (born June 7 1984) is a former Japanese football player."]
predictions=["Shu Abe (born June 7 1984) is a Japanese football player."]
references=[["Shu Abe (born June 7 1984) is a Japanese football player."]]
sari_score = corpus_sari(sources, predictions, references)
print(sari_score)

>>> 91.66666666666667

In this case, the add score will be 75.0 because there are no new unigrams (because of the if sys_total > 0: checks in compute_precision_recall_f1()) but there are technically new bigrams, trigrams, and 4-grams around the location of the deleted word (["a japanese", "a japanese football", "is a japanese"], etc.).

I am just curious of whether this is the expected behaviour or if a definitive 0.00 or 100.0 result for the add-score would be more desirable?

Thanks in advance for any insight.

Apply EASSE on custom dataset.

I run it successfully.. Thank you so much.. I am so glad that I reached this point..

On the other hand, when I am trying to apply it on other custom datasets (mine), It result in this error. As the files is present in the current directory.

looking forward to your reply.

Originally posted by @ykkhan in #69 (comment)

easse: command not found

File "/home/***/easse/quality_estimation.py", line 3, in
from tseval.feature_extraction import (get_compression_ratio, count_sentence_splits, ......
ModuleNotFoundError: No module named 'tseval'

EASSE has been installed successfully, but when I run the command-line interface with the easse command.
easse: command not found

resourceKilled

Dear Fernando,
Thank you for developing Easse tool. It helps me a lot. However, I’m trying to use SAMSA metric on my output but it fails to compute it. Could you help me to solve it? I tried to download SAMSA but the tool suffers from insufficient info about how to use it and I didn’t understand the code.
Here is the error message:

rita@rita-VirtualBox:~/easse$ easse evaluate -t turkcorpus_test -m 'samsa' -q < easse/resources/data/system_outputs/turkcorpus/test/R
Warning: SAMSA metric is long to compute (120 sentences ~ 4min), disable it if you need fast evaluation.
Loading spaCy model 'en_core_web_md'... Done (76.791s).
Loading from '/home/rita/.local/lib/python3.8/site-packages/easse/resources/tools/ucca-bilstm-1.3.10/models/ucca-bilstm.json'.
[dynet] random seed: 1
[dynet] allocating memory: 512MB
[dynet] memory allocation done.
[dynet] 2.1.2
Loading from '/home/rita/.local/lib/python3.8/site-packages/easse/resources/tools/ucca-bilstm-1.3.10/models/ucca-bilstm.enum'... Done (0.295s).
Loading model from '/home/rita/.local/lib/python3.8/site-packages/easse/resourceKilled

Thanks in advance!

strange score

Everyone has encountered that every time a reference is added, the bleu score will increase and the sari will decrease. Then when I add all the refs, the bleu score will be very high and the sari will be very low.

<Issue solved > AssertionError: Please make sure to use IPA dictionary for MeCab

When I run "pip install ."
it results in this error. I have python version 3.8

Referenceless Quality Estimation

Hi everyone,

I'm interested in using this feature: Referenceless Quality Estimation to compare the inputs and the system-generated output.

However, it always asks for the test set. Is this something I can compute with ease? This is my command line:

easse evaluate --orig_sents_path file1.txt --sys_sents_path file2.txt -m fkgl

If so, what features are available in this setting besides FKGL?

Thanks for your support,

Laura

Not Found for url: https://pypi.org/simple/tseval/

WHEN I "pip install ."

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://pypi.org/simple/tseval/

Why? Thanks.

"Show more examples" option in the report

It would be nice to have a "show more examples" option in the report

corpus_level=True might be confusing

easse/easse/sari.py

Line 171 in f8bcb43

corpus_level=True):

The corpus_level=True argument might be confusing for users, I think we should rename it to something like use_deletion_recall.

System outputs for MUSST

Sir/Ma'am
Kindly provide system outputs for MUSST system for PWKP and Mturk corpus.

https://www.aclweb.org/anthology/I17-3007.pdf

regards

multiple system report error

when i want to get the report for mulriple system report i get this error,

when run easse report -t asset_test -i ./ACCESS -p /Users/man/Desktop/MonkAcademic/academic/monkSS/evaluate/result/report/SBMT-SARI-asset.html
and easse report -t asset_test -i ./SBMT-SARI -p /Users/man/Desktop/MonkAcademic/academic/monkSS/evaluate/result/report/SBMT-SARI-asset.html
it is fine
why this error?

Adding Quality Estimation to easse

Hi,

I am working in adding the QE features to easse, I have two questions:

I have a bunch of features that can be computed either on the prediction (i.e. length, complexity, lm proba ...) or on both the source and prediction (compression ratio, word embeddings comparison). They can all be found here: https://github.com/facebookresearch/text-simplification-evaluation/blob/master/tseval/feature_extraction.py
What do we want to do with those? Is there a subset of interesting features that we want to include in the evaluate script?
Two options on how to integrate them:
a. Install and import tseval as an external package (most straightforward)
b. Integrate tseval features to easse (might not be very useful)
I would suggest choosing a.

How to apply easse to custom data?

I am trying to evaluate a customized set of asset data. However, I am not very sure how to use the syntax. Currently I am using this syntax

!easse evaluate --refs_sents_paths ref_data --orig_sents_path orig_data --sys_sents_path test_pred_dir -t custom -m 'bleu,sari,fkgl' -q < easse/easse/resources/data/system_outputs/asset/test

which will report this error:

Fatal Python error: _PySys_BeginInit: is a directory, cannot continue

Current thread 0x00007f5625e9b780 (most recent call first):
`

I am using google collar, and all the path can correctly access the files.

Can you please let me know at which step is my syntax wrong ? Thank you so much!

-tok moses raises error ?

easse/easse/cli.py

Line 33 in f8bcb43

 '--tokenizer', '-tok', type=click.Choice(['13a', 'intl', 'moses', 'plain']), default='13a', 

-tok moses might raise an error, we need to double check.

Use custom test set

Add an option to use a custom dataset for evaluation and computation of the metrics.

sacrebleu version

Hi,
When I was doing evaluation the error module 'sacrebleu' has no attribute 'TOKENIZERS' occurs. I think it is because sacrebleu has a latest version of 2.0.0. install sacrebleu==1.5.1 fixes this error :)

(pip) ERROR: Failed building wheel for easse

Hello to all you wonderful people,
I've been using your framework for months and I must say, it's great!
However, recently the same package installation command which is pip install . has been running into errors and fails to build a wheel for easse.

I share the error message here for more information.
(It should be noted that this is a standard Google Colab notebook and pip is upgraded to the latest version.)

Processing /content/gdrive/My Drive/EASSE/easse
Preparing metadata (setup.py) ... done
.
.
.
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from python-Levenshtein->tseval@ git+https://github.com/facebookresearch/text-simplification-evaluation.git->easse==0.2.4) (57.4.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->simalign@ git+https://github.com/cisnlp/simalign.git->easse==0.2.4) (3.0.0)
Collecting smmap<6,>=3.0.1
Downloading smmap-5.0.0-py3-none-any.whl (24 kB)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->transformers>=3.0.0numpy->bert_score->easse==0.2.4) (3.6.0)
Building wheels for collected packages: easse, nltk, simalign, tseval, yattag, python-Levenshtein
Building wheel for easse (setup.py) ... error

ERROR: Failed building wheel for easse

Running setup.py clean for easse
Building wheel for nltk (setup.py) ... done
Created wheel for nltk: filename=nltk-3.4.3-py3-none-any.whl size=1448609 sha256=46052f128d317e2f399f382e992a766ee4e58e8cbe73488125ed811cad6bd10d
Stored in directory: /root/.cache/pip/wheels/8f/12/6d/7d1ecf74380e441128c7895cafb1931c746b484237be23a229
Building wheel for simalign (pyproject.toml) ... done
Created wheel for simalign: filename=simalign-0.3-py3-none-any.whl size=8101 sha256=cefdc8c81226f8c6a1e8a237242dd88aaace6e06a6fa35e8e1adfb343af7cd22
Stored in directory: /tmp/pip-ephem-wheel-cache-aypivvq8/wheels/7c/fd/e8/feb79b708710c76e78b833a417552cadc858dd3d2ee5897585
.
.
.
Failed to build easse
Installing collected packages: smmap, pyyaml, tokenizers, sacremoses, huggingface-hub, gitdb, transformers, python-Levenshtein, portalocker, nltk, networkx, gitpython, colorama, yattag, tseval, stanfordnlp, simalign, sacrebleu, bert-score, easse
Attempting uninstall: pyyaml
Found existing installation: PyYAML 3.13
Uninstalling PyYAML-3.13:
Successfully uninstalled PyYAML-3.13
Attempting uninstall: nltk
Found existing installation: nltk 3.2.5
Uninstalling nltk-3.2.5:
Successfully uninstalled nltk-3.2.5
Attempting uninstall: networkx
Found existing installation: networkx 2.6.3
Uninstalling networkx-2.6.3:
Successfully uninstalled networkx-2.6.3
Running setup.py install for easse ... error

ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/content/gdrive/My Drive/EASSE/easse/setup.py'"'"'; file='"'"'/content/gdrive/My Drive/EASSE/easse/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-qi6q_mqi/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.7/easse Check the logs for full command output.

Expose -lowercase and -tokeniser options

Difference between easse and huggingface/datasets SARI computation

Hello, can someone help me understand the difference between EASSE and huggingface/datasets SARI computation? Using the defaults for both libraries, I see aa 3-6 point discrepancy on the ASSET benchmark for some models I fine-tuned.

I understand that EASSE has use_f1_for_deletion=True as its default, and datasets uses precision. But with use_f1_for_deletion set to False in EASSE, I still see a small difference in SARI score (~0.6) between the two libraries.

Thanks!

installation issue in windows 10

Log output
Please guide if possible

(torchenv) C:\Users\avish>cd easse

(torchenv) C:\Users\avish\easse>pip install .
Processing c:\users\avish\easse
Requirement already satisfied: click in c:\users\avish\anaconda3\envs\torchenv\lib\site-packages (from easse==0.2.4) (7.1.2)
Requirement already satisfied: matplotlib in c:\users\avish\anaconda3\envs\torchenv\lib\site-packages (from easse==0.2.4) (3.3.2)
Requirement already satisfied: numpy in c:\users\avish\anaconda3\envs\torchenv\lib\site-packages (from easse==0.2.4) (1.19.1)
Requirement already satisfied: pandas in c:\users\avish\anaconda3\envs\torchenv\lib\site-packages (from easse==0.2.4) (1.1.3)
Requirement already satisfied: requests>=2.21.0 in c:\users\avish\anaconda3\envs\torchenv\lib\site-packages (from easse==0.2.4) (2.24.0)
Requirement already satisfied: sacrebleu>=1.4.13 in c:\users\avish\anaconda3\envs\torchenv\lib\site-packages (from easse==0.2.4) (1.4.14)
Requirement already satisfied: sacremoses in c:\users\avish\anaconda3\envs\torchenv\lib\site-packages (from easse==0.2.4) (0.0.43)
Requirement already satisfied: seaborn in c:\users\avish\anaconda3\envs\torchenv\lib\site-packages (from easse==0.2.4) (0.11.1)
Requirement already satisfied: tqdm>=4.32.2 in c:\users\avish\anaconda3\envs\torchenv\lib\site-packages (from easse==0.2.4) (4.47.0)
Requirement already satisfied: plotly>=4.0.0 in c:\users\avish\anaconda3\envs\torchenv\lib\site-packages (from easse==0.2.4) (4.14.1)
Collecting ucca@ git+https://github.com/louismartin/ucca.git
Cloning https://github.com/louismartin/ucca.git to c:\users\avish\appdata\local\temp\pip-install-18t60lo8\ucca_f7611c81e0a345f6b4173600639dba69
Collecting tseval@ git+https://github.com/facebookresearch/text-simplification-evaluation.git
Cloning https://github.com/facebookresearch/text-simplification-evaluation.git to c:\users\avish\appdata\local\temp\pip-install-18t60lo8\tseval_b5d2a62677ad4f11a57de91eafd65538
Collecting simalign@ git+https://github.com/cisnlp/simalign.git
Cloning https://github.com/cisnlp/simalign.git to c:\users\avish\appdata\local\temp\pip-install-18t60lo8\simalign_4653354f4cb6415e9156e63e83d2cc24
Requirement already satisfied: torch in c:\users\avish\anaconda3\envs\torchenv\lib\site-packages (from simalign@ git+https://github.com/cisnlp/simalign.git->easse==0.2.4) (1.6.0)
Requirement already satisfied: scipy in c:\users\avish\anaconda3\envs\torchenv\lib\site-packages (from simalign@ git+https://github.com/cisnlp/simalign.git->easse==0.2.4) (1.5.4)
Requirement already satisfied: transformers>=3.1.0 in c:\users\avish\anaconda3\envs\torchenv\lib\site-packages (from simalign@ git+https://github.com/cisnlp/simalign.git->easse==0.2.4) (4.0.0)
Requirement already satisfied: regex in c:\users\avish\anaconda3\envs\torchenv\lib\site-packages (from simalign@ git+https://github.com/cisnlp/simalign.git->easse==0.2.4) (2020.10.28)
Requirement already satisfied: scikit_learn in c:\users\avish\anaconda3\envs\torchenv\lib\site-packages (from simalign@ git+https://github.com/cisnlp/simalign.git->easse==0.2.4) (0.23.2)
Requirement already satisfied: gitpython in c:\users\avish\anaconda3\envs\torchenv\lib\site-packages (from tseval@ git+https://github.com/facebookresearch/text-simplification-evaluation.git->easse==0.2.4) (3.1.11)
Collecting networkx==2.4
Using cached networkx-2.4-py3-none-any.whl (1.6 MB)
Requirement already satisfied: decorator>=4.3.0 in c:\users\avish\anaconda3\envs\torchenv\lib\site-packages (from networkx==2.4->simalign@ git+https://github.com/cisnlp/simalign.git->easse==0.2.4) (4.4.2)
Collecting nltk==3.4.3
Using cached nltk-3.4.3.zip (1.4 MB)
Requirement already satisfied: six in c:\users\avish\anaconda3\envs\torchenv\lib\site-packages (from nltk==3.4.3->easse==0.2.4) (1.15.0)
Collecting spacy==2.1.3
Using cached spacy-2.1.3.tar.gz (27.7 MB)
Installing build dependencies ... error
ERROR: Command errored out with exit status 1:
command: 'c:\users\avish\anaconda3\envs\torchenv\python.exe' 'c:\users\avish\anaconda3\envs\torchenv\lib\site-packages\pip' install --ignore-installed --no-user --prefix 'C:\Users\avish\AppData\Local\Temp\pip-build-env-v4tl30sz\overlay' --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- setuptools 'wheel>0.32.0.<0.33.0' Cython 'cymem>=2.0.2,<2.1.0' 'preshed>=2.0.1,<2.1.0' 'murmurhash>=0.28.0,<1.1.0' thinc==7.0.0.dev6
cwd: None
Complete output (119 lines):
Collecting thinc==7.0.0.dev6
Using cached thinc-7.0.0.dev6-cp38-cp38-win_amd64.whl
Collecting cymem<2.1.0,>=2.0.2
Using cached cymem-2.0.5-cp38-cp38-win_amd64.whl (36 kB)
Collecting murmurhash<1.1.0,>=0.28.0
Using cached murmurhash-1.0.5-cp38-cp38-win_amd64.whl (21 kB)
Collecting preshed<2.1.0,>=2.0.1
Using cached preshed-2.0.1.tar.gz (113 kB)
Collecting wheel>0.32.0.<0.33.0
Using cached wheel-0.36.2-py2.py3-none-any.whl (35 kB)
Collecting blis<0.3.0,>=0.2.1
Using cached blis-0.2.4.tar.gz (1.5 MB)
Collecting numpy>=1.7.0
Using cached numpy-1.19.4-cp38-cp38-win_amd64.whl (13.0 MB)
Collecting plac<1.0.0,>=0.9.6
Using cached plac-0.9.6-py2.py3-none-any.whl (20 kB)
Collecting six<2.0.0,>=1.10.0
Using cached six-1.15.0-py2.py3-none-any.whl (10 kB)
Collecting srsly<1.1.0,>=0.0.5
Using cached srsly-1.0.5-cp38-cp38-win_amd64.whl (178 kB)
Collecting thinc-gpu-ops<0.1.0,>=0.0.1
Using cached thinc_gpu_ops-0.0.4-py3-none-any.whl
Collecting tqdm<5.0.0,>=4.10.0
Using cached tqdm-4.55.0-py2.py3-none-any.whl (68 kB)
Collecting wasabi<1.1.0,>=0.0.9
Using cached wasabi-0.8.0-py3-none-any.whl (23 kB)
Collecting wrapt<1.11.0,>=1.10.0
Using cached wrapt-1.10.11-cp38-cp38-win_amd64.whl
Collecting Cython
Using cached Cython-0.29.21-cp38-cp38-win_amd64.whl (1.7 MB)
Collecting setuptools
Using cached setuptools-51.1.1-py3-none-any.whl (2.0 MB)
Building wheels for collected packages: preshed, blis
Building wheel for preshed (setup.py): started
Building wheel for preshed (setup.py): finished with status 'done'
Created wheel for preshed: filename=preshed-2.0.1-cp38-cp38m-win_amd64.whl size=79134 sha256=86512a35c98ce3fb730879436319707087bd6a2a7efc386945eb51e38ee60e01
Stored in directory: c:\users\avish\appdata\local\pip\cache\wheels\5a\d0\29\7f6993a759349eae3d0ecca7e2fbc88acdd8650b25e6c6ad8a
Building wheel for blis (setup.py): started
Building wheel for blis (setup.py): finished with status 'error'
ERROR: Command errored out with exit status 1:
command: 'c:\users\avish\anaconda3\envs\torchenv\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\avish\AppData\Local\Temp\pip-install-o73hzbxv\blis_26e7809c4855422bbbd89fdb5d389c5c\setup.py'"'"'; file='"'"'C:\Users\avish\AppData\Local\Temp\pip-install-o73hzbxv\blis_26e7809c4855422bbbd89fdb5d389c5c\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\avish\AppData\Local\Temp\pip-wheel-empit83q'
cwd: C:\Users\avish\AppData\Local\Temp\pip-install-o73hzbxv\blis_26e7809c4855422bbbd89fdb5d389c5c
Complete output (31 lines):
BLIS_COMPILER? None
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-3.8
creating build\lib.win-amd64-3.8\blis
copying blis\about.py -> build\lib.win-amd64-3.8\blis
copying blis\benchmark.py -> build\lib.win-amd64-3.8\blis
copying blis_init_.py -> build\lib.win-amd64-3.8\blis
creating build\lib.win-amd64-3.8\blis\tests
copying blis\tests\common.py -> build\lib.win-amd64-3.8\blis\tests
copying blis\tests\test_dotv.py -> build\lib.win-amd64-3.8\blis\tests
copying blis\tests\test_gemm.py -> build\lib.win-amd64-3.8\blis\tests
copying blis\tests_init_.py -> build\lib.win-amd64-3.8\blis\tests
copying blis\cy.pyx -> build\lib.win-amd64-3.8\blis
copying blis\py.pyx -> build\lib.win-amd64-3.8\blis
copying blis\cy.pxd -> build\lib.win-amd64-3.8\blis
copying blis_init_.pxd -> build\lib.win-amd64-3.8\blis
running build_ext
c:\users\avish\anaconda3\envs\torchenv\lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\avish\AppData\Local\Temp\pip-install-o73hzbxv\blis_26e7809c4855422bbbd89fdb5d389c5c\blis\cy.pxd
tree = Parsing.p_module(s, pxd, full_module_name)
c:\users\avish\anaconda3\envs\torchenv\lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\avish\AppData\Local\Temp\pip-install-o73hzbxv\blis_26e7809c4855422bbbd89fdb5d389c5c\blis\py.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
Processing blis\cy.pyx
Processing blis\py.pyx
error: [WinError 2] The system cannot find the file specified
msvc
py_compiler msvc
{'LS_COLORS': 'rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.Z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:.xspf=00;36:', 'HOSTTYPE': 'x86_64', 'LESSCLOSE': '/usr/bin/lesspipe %s %s', 'LANG': 'C.UTF-8', 'OLDPWD': '/home/matt/repos/flame-blis', 'VIRTUAL_ENV': '/home/matt/repos/cython-blis/env3.6', 'USER': 'matt', 'PWD': '/home/matt/repos/cython-blis', 'HOME': '/home/matt', 'NAME': 'LAPTOP-OMKOB3VM', 'XDG_DATA_DIRS': '/usr/local/share:/usr/share:/var/lib/snapd/desktop', 'SHELL': '/bin/bash', 'TERM': 'xterm-256color', 'SHLVL': '1', 'LOGNAME': 'matt', 'PATH': '/home/matt/repos/cython-blis/env3.6/bin:/tmp/google-cloud-sdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/mnt/c/Users/matt/Documents/cmder/vendor/conemu-maximus5/ConEmu/Scripts:/mnt/c/Users/matt/Documents/cmder/vendor/conemu-maximus5:/mnt/c/Users/matt/Documents/cmder/vendor/conemu-maximus5/ConEmu:/mnt/c/Python37/Scripts:/mnt/c/Python37:/mnt/c/Program Files (x86)/Intel/Intel(R) Management Engine Components/iCLS:/mnt/c/Program Files/Intel/Intel(R) Management Engine Components/iCLS:/mnt/c/Windows/System32:/mnt/c/Windows:/mnt/c/Windows/System32/wbem:/mnt/c/Windows/System32/WindowsPowerShell/v1.0:/mnt/c/Program Files (x86)/Intel/Intel(R) Management Engine Components/DAL:/mnt/c/Program Files/Intel/Intel(R) Management Engine Components/DAL:/mnt/c/Program Files (x86)/Intel/Intel(R) Management Engine Components/IPT:/mnt/c/Program Files/Intel/Intel(R) Management Engine Components/IPT:/mnt/c/Program Files/Intel/WiFi/bin:/mnt/c/Program Files/Common Files/Intel/WirelessCommon:/mnt/c/Program Files (x86)/NVIDIA Corporation/PhysX/Common:/mnt/c/ProgramData/chocolatey/bin:/mnt/c/Program Files/Git/cmd:/mnt/c/Program Files/LLVM/bin:/mnt/c/Windows/System32:/mnt/c/Windows:/mnt/c/Windows/System32/wbem:/mnt/c/Windows/System32/WindowsPowerShell/v1.0:/mnt/c/Windows/System32/OpenSSH:/mnt/c/Program Files/nodejs:/mnt/c/Users/matt/AppData/Local/Microsoft/WindowsApps:/mnt/c/Users/matt/AppData/Local/Programs/Microsoft VS Code/bin:/mnt/c/Users/matt/AppData/Roaming/npm:/snap/bin:/mnt/c/Program Files/Oracle/VirtualBox', 'PS1': '(env3.6) \[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ ', 'VAGRANT_HOME': '/home/matt/.vagrant.d/', 'LESSOPEN': '| /usr/bin/lesspipe %s', '': '/home/matt/repos/cython-blis/env3.6/bin/python'}
clang -c C:\Users\avish\AppData\Local\Temp\pip-install-o73hzbxv\blis_26e7809c4855422bbbd89fdb5d389c5c\blis_src\config\bulldozer\bli_cntx_init_bulldozer.c -o C:\Users\avish\AppData\Local\Temp\tmp417dls1b\bli_cntx_init_bulldozer.o -O2 -funroll-all-loops -std=c99 -D_POSIX_C_SOURCE=200112L -DBLIS_VERSION_STRING="0.5.0-6" -DBLIS_IS_BUILDING_LIBRARY -Iinclude\windows-x86_64 -I.\frame\3\ -I.\frame\ind\ukernels\ -I.\frame\1m\ -I.\frame\1f\ -I.\frame\1\ -I.\frame\include -IC:\Users\avish\AppData\Local\Temp\pip-install-o73hzbxv\blis_26e7809c4855422bbbd89fdb5d389c5c\blis_src\include\windows-x86_64
----------------------------------------
ERROR: Failed building wheel for blis
Running setup.py clean for blis
Successfully built preshed
Failed to build blis
Installing collected packages: numpy, cymem, wrapt, wasabi, tqdm, thinc-gpu-ops, srsly, six, preshed, plac, murmurhash, blis, wheel, thinc, setuptools, Cython
Running setup.py install for blis: started
Running setup.py install for blis: finished with status 'error'
ERROR: Command errored out with exit status 1:
command: 'c:\users\avish\anaconda3\envs\torchenv\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\avish\AppData\Local\Temp\pip-install-o73hzbxv\blis_26e7809c4855422bbbd89fdb5d389c5c\setup.py'"'"'; file='"'"'C:\Users\avish\AppData\Local\Temp\pip-install-o73hzbxv\blis_26e7809c4855422bbbd89fdb5d389c5c\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\avish\AppData\Local\Temp\pip-record-jbidoyjp\install-record.txt' --single-version-externally-managed --prefix 'C:\Users\avish\AppData\Local\Temp\pip-build-env-v4tl30sz\overlay' --compile --install-headers 'C:\Users\avish\AppData\Local\Temp\pip-build-env-v4tl30sz\overlay\Include\blis'
cwd: C:\Users\avish\AppData\Local\Temp\pip-install-o73hzbxv\blis_26e7809c4855422bbbd89fdb5d389c5c
Complete output (31 lines):
BLIS_COMPILER? None
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.8
creating build\lib.win-amd64-3.8\blis
copying blis\about.py -> build\lib.win-amd64-3.8\blis
copying blis\benchmark.py -> build\lib.win-amd64-3.8\blis
copying blis_init_.py -> build\lib.win-amd64-3.8\blis
creating build\lib.win-amd64-3.8\blis\tests
copying blis\tests\common.py -> build\lib.win-amd64-3.8\blis\tests
copying blis\tests\test_dotv.py -> build\lib.win-amd64-3.8\blis\tests
copying blis\tests\test_gemm.py -> build\lib.win-amd64-3.8\blis\tests
copying blis\tests_init_.py -> build\lib.win-amd64-3.8\blis\tests
copying blis\cy.pyx -> build\lib.win-amd64-3.8\blis
copying blis\py.pyx -> build\lib.win-amd64-3.8\blis
copying blis\cy.pxd -> build\lib.win-amd64-3.8\blis
copying blis_init_.pxd -> build\lib.win-amd64-3.8\blis
running build_ext
c:\users\avish\anaconda3\envs\torchenv\lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\avish\AppData\Local\Temp\pip-install-o73hzbxv\blis_26e7809c4855422bbbd89fdb5d389c5c\blis\cy.pxd
tree = Parsing.p_module(s, pxd, full_module_name)
c:\users\avish\anaconda3\envs\torchenv\lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\avish\AppData\Local\Temp\pip-install-o73hzbxv\blis_26e7809c4855422bbbd89fdb5d389c5c\blis\py.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
Processing blis\cy.pyx
Processing blis\py.pyx
error: [WinError 2] The system cannot find the file specified
msvc
py_compiler msvc
{'LS_COLORS': 'rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.Z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:.xspf=00;36:', 'HOSTTYPE': 'x86_64', 'LESSCLOSE': '/usr/bin/lesspipe %s %s', 'LANG': 'C.UTF-8', 'OLDPWD': '/home/matt/repos/flame-blis', 'VIRTUAL_ENV': '/home/matt/repos/cython-blis/env3.6', 'USER': 'matt', 'PWD': '/home/matt/repos/cython-blis', 'HOME': '/home/matt', 'NAME': 'LAPTOP-OMKOB3VM', 'XDG_DATA_DIRS': '/usr/local/share:/usr/share:/var/lib/snapd/desktop', 'SHELL': '/bin/bash', 'TERM': 'xterm-256color', 'SHLVL': '1', 'LOGNAME': 'matt', 'PATH': '/home/matt/repos/cython-blis/env3.6/bin:/tmp/google-cloud-sdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/mnt/c/Users/matt/Documents/cmder/vendor/conemu-maximus5/ConEmu/Scripts:/mnt/c/Users/matt/Documents/cmder/vendor/conemu-maximus5:/mnt/c/Users/matt/Documents/cmder/vendor/conemu-maximus5/ConEmu:/mnt/c/Python37/Scripts:/mnt/c/Python37:/mnt/c/Program Files (x86)/Intel/Intel(R) Management Engine Components/iCLS:/mnt/c/Program Files/Intel/Intel(R) Management Engine Components/iCLS:/mnt/c/Windows/System32:/mnt/c/Windows:/mnt/c/Windows/System32/wbem:/mnt/c/Windows/System32/WindowsPowerShell/v1.0:/mnt/c/Program Files (x86)/Intel/Intel(R) Management Engine Components/DAL:/mnt/c/Program Files/Intel/Intel(R) Management Engine Components/DAL:/mnt/c/Program Files (x86)/Intel/Intel(R) Management Engine Components/IPT:/mnt/c/Program Files/Intel/Intel(R) Management Engine Components/IPT:/mnt/c/Program Files/Intel/WiFi/bin:/mnt/c/Program Files/Common Files/Intel/WirelessCommon:/mnt/c/Program Files (x86)/NVIDIA Corporation/PhysX/Common:/mnt/c/ProgramData/chocolatey/bin:/mnt/c/Program Files/Git/cmd:/mnt/c/Program Files/LLVM/bin:/mnt/c/Windows/System32:/mnt/c/Windows:/mnt/c/Windows/System32/wbem:/mnt/c/Windows/System32/WindowsPowerShell/v1.0:/mnt/c/Windows/System32/OpenSSH:/mnt/c/Program Files/nodejs:/mnt/c/Users/matt/AppData/Local/Microsoft/WindowsApps:/mnt/c/Users/matt/AppData/Local/Programs/Microsoft VS Code/bin:/mnt/c/Users/matt/AppData/Roaming/npm:/snap/bin:/mnt/c/Program Files/Oracle/VirtualBox', 'PS1': '(env3.6) \[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ ', 'VAGRANT_HOME': '/home/matt/.vagrant.d/', 'LESSOPEN': '| /usr/bin/lesspipe %s', '': '/home/matt/repos/cython-blis/env3.6/bin/python'}
clang -c C:\Users\avish\AppData\Local\Temp\pip-install-o73hzbxv\blis_26e7809c4855422bbbd89fdb5d389c5c\blis_src\config\bulldozer\bli_cntx_init_bulldozer.c -o C:\Users\avish\AppData\Local\Temp\tmp2fqmgjbi\bli_cntx_init_bulldozer.o -O2 -funroll-all-loops -std=c99 -D_POSIX_C_SOURCE=200112L -DBLIS_VERSION_STRING="0.5.0-6" -DBLIS_IS_BUILDING_LIBRARY -Iinclude\windows-x86_64 -I.\frame\3\ -I.\frame\ind\ukernels\ -I.\frame\1m\ -I.\frame\1f\ -I.\frame\1\ -I.\frame\include -IC:\Users\avish\AppData\Local\Temp\pip-install-o73hzbxv\blis_26e7809c4855422bbbd89fdb5d389c5c\blis_src\include\windows-x86_64
----------------------------------------
ERROR: Command errored out with exit status 1: 'c:\users\avish\anaconda3\envs\torchenv\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\avish\AppData\Local\Temp\pip-install-o73hzbxv\blis_26e7809c4855422bbbd89fdb5d389c5c\setup.py'"'"'; file='"'"'C:\Users\avish\AppData\Local\Temp\pip-install-o73hzbxv\blis_26e7809c4855422bbbd89fdb5d389c5c\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\avish\AppData\Local\Temp\pip-record-jbidoyjp\install-record.txt' --single-version-externally-managed --prefix 'C:\Users\avish\AppData\Local\Temp\pip-build-env-v4tl30sz\overlay' --compile --install-headers 'C:\Users\avish\AppData\Local\Temp\pip-build-env-v4tl30sz\overlay\Include\blis' Check the logs for full command output.

ERROR: Command errored out with exit status 1: 'c:\users\avish\anaconda3\envs\torchenv\python.exe' 'c:\users\avish\anaconda3\envs\torchenv\lib\site-packages\pip' install --ignore-installed --no-user --prefix 'C:\Users\avish\AppData\Local\Temp\pip-build-env-v4tl30sz\overlay' --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- setuptools 'wheel>0.32.0.<0.33.0' Cython 'cymem>=2.0.2,<2.1.0' 'preshed>=2.0.1,<2.1.0' 'murmurhash>=0.28.0,<1.1.0' thinc==7.0.0.dev6 Check the logs for full command output.

(torchenv) C:\Users\avish\easse>_

EASSE does not work with latest version of scarebleu

From facebookresearch/access#10

File "/usr/local/lib/python3.6/dist-packages/easse/cli.py", line 130, in evaluate_system_output
lowercase=lowercase)
File "/usr/local/lib/python3.6/dist-packages/easse/bleu.py", line 22, in corpus_bleu
sys_sents = [utils_prep.normalize(sent, lowercase, tokenizer) for sent in sys_sents]
File "/usr/local/lib/python3.6/dist-packages/easse/bleu.py", line 22, in
sys_sents = [utils_prep.normalize(sent, lowercase, tokenizer) for sent in sys_sents]
File "/usr/local/lib/python3.6/dist-packages/easse/utils/preprocessing.py", line 12, in normalize
normalized_sent = sacrebleu.tokenize_13a(sentence)
AttributeError: module 'sacrebleu' has no attribute 'tokenize_13a'

bert score breaks with current latest version of matplotlib

If I try to do:

from bert_score import Bert_Scorer

I get

AttributeError: module 'matplotlib.cbook' has no attribute '_make_class_factory'

I can fix this by installing matpotlib 3.4.3, in which case the error goes.

I can put 3.4.3 in the matplotlib reqs for EASSE, but really this should be solved in the BertScore reqs. I'm not sure if putting it in bertscore's reqs will propagate through to EASSE on instalation though, or if easse's req for the latest matplotlib will override this. Any strong opinions on where to put this?

Issue with SARI scores

Hi, thanks for making this available! I'm running the corpus_sari() function according to your example and I'm seeing some oddness with the scores. For example, system sentences without any token overlap with the source or references are being scored higher than those with overlap. Here's an example of a sentence without overlap:

corpus_sari(orig_sents=["About 95 species are currently accepted ."],
            sys_sents=["This is my simplified sentence that has no token overlap with the source or reference sentences."], 
            refs_sents=[["About 95 species are currently known .", "About 95 species are now accepted .",  "95 species are now accepted ."]])                                                                            
Out[4]: 19.246031746031743

Whereas I get a lower score for this sentence that does have overlap:

corpus_sari(orig_sents=["About 95 species are currently accepted ."], 
            sys_sents=["species accepted ."], 
            refs_sents=[["About 95 species are currently known .",  "About 95 species are now accepted .", "95 species are now accepted ."]])                                                                            
Out[5]: 16.402116402116405

I get different results using the code in https://github.com/XingxingZhang/pysari that also implements SARI: ~16.078 for my first example system sentence above and ~24.05 for the second example, which is the score ordering I'd expect though I haven't verified the correctness of anything. Is there a parameter setting that needs to be specified when calling the corpus_sari() function that's affecting the results? Thanks!

Not loading original sentences for samsa with turk test set

Hi,

Shouldn't we load the original sentences with samsa as well on turk set?

easse/easse/cli/cli.py

Line 52 in 44744da

if 'sari' in metrics:

ERROR: local variable 'NISTTokenizer' referenced before assignment

I am trying to run this command
easse evaluate -t turkcorpus_test -m 'bleu,sari' -q < easse/resources/data/system_outputs/turkcorpus/test/ACCESS
and I got this error
UnboundLocalError: local variable 'NISTTokenizer' referenced before assignment

tseval installs sklearn instead of scikit-learn

It seems like, in the easse repository, the requirements file has been updated 9 months ago to replace 'sklearn' with 'scikit-learn'. However, another of your requirements, git+tseval, still includes 'sklearn' in there requirements. This causes some installation error.

Can't install EASSE!

I tried EASSE before and it was working with me. Now I am trying to run it and getting errors, is there any problem with it?

ERROR: Failed building wheel for tokenizers
Successfully built tseval
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers which use PEP 517 and cannot be installed directly

Can EASSE compute SAMSA score. ?

I want to compute SAMSA score of corpus.. Can I use EASSE Technique to find out Perfect SAMSA score to Evaluate corpus?

Add information to the Report

Include the name of the test set that is being used.
Include a brief description of the test set: number of sentences, number of references per sentence, type of alignments.
Include what is being considered as Reference value.

Improvements for demo videos

Increase the font size for both videos. Right now, the one for report is smaller than the one for evaluate. Maybe make both of them bigger?
Make the videos run slower. It would be better if the user can visualise for a bit the command that is being run. Maybe pause after the command has been typed, or make the typing of the command part of the video (instead of copy-paste).
Make the videos smaller. Right now, the images are big, but most of them are covered by an empty command prompt.

Test for Statistical Significance

When comparing multiple systems, it would be useful to perform the most appropriate test to determine if the differences in scores for each metric are statistically significant. We could probably reuse some of the code from https://github.com/rtmdrr/testSignificanceNLP

Check that SAMSA works after installation with pip

Some requirements are not installed

Some requirements such as sklearn or tqdm are not automatically installed

easse in Google Colab

hello i want to install easse in Google Colab but i get this error can help me
Traceback (most recent call last):
File "/usr/local/bin/easse", line 33, in
sys.exit(load_entry_point('easse==0.2.1', 'console_scripts', 'easse')())
File "/usr/local/bin/easse", line 25, in importlib_load_entry_point
return next(matches).load()
File "/usr/local/lib/python3.6/dist-packages/importlib_metadata/init.py", line 96, in load
module = import_module(match.group('module'))
File "/usr/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 994, in _gcd_import
File "", line 971, in _find_and_load
File "", line 941, in _find_and_load_unlocked
File "", line 219, in _call_with_frames_removed
File "", line 994, in _gcd_import
File "", line 971, in _find_and_load
File "", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'easse'

HSplit contains escaped quotes `&quot`

easse/easse/resources/data/test_sets/hsplit/hsplit.tok.2

Line 11 in 109d25c

 Alessandro ( &quot; Sandro &quot; ) Mazzola ( born 8 November 1942 ) is an Italian . he is a former football player . 

HSplit seems to contain some escaped quotes with &quot on the simplified side but not on the source side.

How to use easse with a varying number of references?

I have a dataset for simplification in which I have one complex sentence and a varying number of simplifications.
For example, the first complex sentence can have 5 human-written simplifications, while the second only 3.

Is there a way to set easse to work in this case?

package taking long time to run and not providing output

hello, i'm trying to run easse on my own custom .csv files using the following command -
easse report -t custom --orig_sents_path "turksimp.csv" --refs_sents_paths "turksimpbacktranslated.csv"

however, it's taking a really long time and nothing is being outputted, so i suspect there's some error but it's just not showing the error and stopping. this happens to me a lot when i try to run such commands, wondering if there is any fix?

thanks!

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 5599: ordinal not in range(128)

Hi!

I've just downloaded this repo for installation and I got the ascii-related encoding error:

$ pip install .
Processing /tmp/easse
    ERROR: Command errored out with exit status 1:
     command: (..)/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-5iyt237l/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-5iyt237l/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-9kwmmcl4
         cwd: /tmp/pip-req-build-5iyt237l/
    Complete output (7 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-req-build-5iyt237l/setup.py", line 5, in <module>
        long_description = f.read()
      File "(..)/lib/python3.6/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 5599: ordinal not in range(128)

This is fixed by adding "encoding='utf-8'" to the following lines in the setup.py script:

with open("README.md", "r", encoding='utf-8') as f:
    long_description = f.read()

with open("requirements.txt", "r", encoding='utf-8') as f:
    requirements = f.read().strip().split("\n")

In case someone else gets the same error, it would be great to update it :)

Best,

Laura

Add all HSplit sentences

We should add all the HSplit sentences to the repo and take the first 70 only if samsa is used.

`lowercase=True` with Turk test set?

easse/easse/cli.py

Line 71 in eda5623

lowercase = False

Shouldn't we use lowercase=True with turk valid and test sets?

sentence_sari issue

I can't find the defination of the function sentence_sari()
Has it been deprecated?

Failed to compute SAMSA

Hi all,

I got this error message when computing SAMSA.

Here is my code:

from easse.samsa import sentence_samsa
ori_sent = 'I read the book that John wrote.'
simp_sent = 'John wrote a book. I read that book.'
sentence_samsa(orig_sent=ori_sent, sys_sent=simp_sent)

Here is the error message:

Starting server with command: java -Xmx5G -cp /Applications/anaconda3/envs/OpenNLP/lib/python3.7/site-packages/easse/resources/tools/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 60000 -threads 40 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-0b93463ac4344542.props -preload tokenize,ssplit,pos,lemma,ner,depparse
File "/Applications/anaconda3/envs/OpenNLP/lib/python3.7/site-packages/stanfordnlp/server/client.py", line 137, in ensure_alive
raise PermanentlyFailedException("Timed out waiting for service to come alive.")
PermanentlyFailedException: Timed out waiting for service to come alive.