Giter VIP home page Giter VIP logo

thu-keg / omnievent Goto Github PK

View Code? Open in Web Editor NEW
335.0 10.0 31.0 17.72 MB

A comprehensive, unified and modular event extraction toolkit.

Home Page: https://omnievent.readthedocs.io/

License: MIT License

Python 96.72% Shell 2.99% Makefile 0.04% Batchfile 0.05% CSS 0.14% JavaScript 0.03% HTML 0.04%
event-detection event-extraction big-models bmtrain deep-learning huggingface-transformers information-extration natural-language-generation natural-language-processing pytorch

omnievent's Introduction

A comprehensive, unified and modular event extraction toolkit.


Demo PyPI Documentation License

Table of Contents

Overview

OmniEvent is a powerful open-source toolkit for event extraction, including event detection and event argument extraction. We comprehensively cover various paradigms and provide fair and unified evaluations on widely-used English and Chinese datasets. Modular implementations make OmniEvent highly extensible.

Highlights

  • Comprehensive Capability

    • Support to do Event Extraction at once, and also to independently do its two subtasks: Event Detection, Event Argument Extraction.
    • Cover various paradigms: Token Classification, Sequence Labeling, MRC(QA) and Seq2Seq.
    • Implement Transformer-based (BERT, T5, etc.) and classical (DMCNN, CRF, etc.) models.
    • Both Chinese and English are supported for all event extraction sub-tasks, paradigms and models.
  • Unified Benchmark & Evaluation

    • Various datasets are processed into a unified format.
    • Predictions of different paradigms are all converted into a unified candidate set for fair evaluations.
    • Four evaluation modes (gold, loose, default, strict) well cover different previous evaluation settings.
  • Modular Implementation

    • All models are decomposed into four modules:
      • Input Engineering: Prepare inputs and support various input engineering methods like prompting.
      • Backbone: Encode text into hidden states.
      • Aggregation: Fuse hidden states (e.g., select [CLS], pooling, GCN) to the final event representation.
      • Output Head: Map the event representation to the final outputs, such as Linear, CRF, MRC head, etc.
    • You can combine and reimplement different modules to design and implement your own new model.
  • Big Model Training & Inference

    • Efficient training and inference of big event extraction models are supported with BMTrain.
  • Easy to Use & Highly Extensible

    • Open datasets can be downloaded and processed with a single command.
    • Fully compatible with 🤗 Transformers and its Trainer.
    • Users can easily reproduce existing models and build customized models with OmniEvent.

Installation

With pip

This repository is tested on Python 3.9+, Pytorch 1.12.1+. OmniEvent can be installed with pip as follows:

pip install OmniEvent

From source

If you want to install the repository from local source, you can install as follows:

pip install .

And if you want to edit the repositoy, you can

pip install -e .

Easy Start

OmniEvent provides several off-the-shelf models for the users. Examples are shown below.

Make sure you have installed OmniEvent as instructed above. Note that it may take a few minutes to download checkpoint at the first time.

>>> from OmniEvent.infer import infer

>>> # Even Extraction (EE) Task
>>> text = "2022年北京市举办了冬奥会"
>>> results = infer(text=text, task="EE")
>>> print(results[0]["events"])
[
    {
        "type": "组织行为开幕", "trigger": "举办", "offset": [8, 10],
        "arguments": [
            {   "mention": "2022年", "offset": [9, 16], "role": "时间"},
            {   "mention": "北京市", "offset": [81, 89], "role": "地点"},
            {   "mention": "冬奥会", "offset": [0, 4], "role": "活动名称"},
        ]
    }
]

>>> text = "U.S. and British troops were moving on the strategic southern port city of Basra \ 
Saturday after a massive aerial assault pounded Baghdad at dawn"

>>> # Event Detection (ED) Task
>>> results = infer(text=text, task="ED")
>>> print(results[0]["events"])
[
    { "type": "attack", "trigger": "assault", "offset": [113, 120]},
    { "type": "injure", "trigger": "pounded", "offset": [121, 128]}
]

>>> # Event Argument Extraction (EAE) Task
>>> results = infer(text=text, triggers=[("assault", 113, 120), ("pounded", 121, 128)], task="EAE")
>>> print(results[0]["events"])
[
    {
        "type": "attack", "trigger": "assault", "offset": [113, 120],
        "arguments": [
            {   "mention": "U.S.", "offset": [0, 4], "role": "attacker"},
            {   "mention": "British", "offset": [9, 16], "role": "attacker"},
            {   "mention": "Saturday", "offset": [81, 89], "role": "time"}
        ]
    },
    {
        "type": "injure", "trigger": "pounded", "offset": [121, 128],
        "arguments": [
            {   "mention": "U.S.", "offset": [0, 4], "role": "attacker"},
            {   "mention": "Saturday", "offset": [81, 89], "role": "time"},
            {   "mention": "British", "offset": [9, 16], "role": "attacker"}
        ]
    }
]

Train your Own Model with OmniEvent

OmniEvent can help users easily train and evaluate their customized models on specific datasets.

We show a step-by-step example of using OmniEvent to train and evaluate an Event Detection model on ACE-EN dataset in the Seq2Seq paradigm. More examples are shown in examples.

Step 1: Process the dataset into the unified format

We provide standard data processing scripts for several commonly-used datasets. Checkout the details in scripts/data_processing.

dataset=ace2005-en  # the dataset name
cd scripts/data_processing/$dataset
bash run.sh

Step 2: Set up the customized configurations

We keep track of the configurations of dataset, model and training parameters via a single *.yaml file. See ./configs for details.

>>> from OmniEvent.arguments import DataArguments, ModelArguments, TrainingArguments, ArgumentParser
>>> from OmniEvent.input_engineering.seq2seq_processor import type_start, type_end

>>> parser = ArgumentParser((ModelArguments, DataArguments, TrainingArguments))
>>> model_args, data_args, training_args = parser.parse_yaml_file(yaml_file="config/all-datasets/ed/s2s/ace-en.yaml")

>>> training_args.output_dir = 'output/ACE2005-EN/ED/seq2seq/t5-base/'
>>> data_args.markers = ["<event>", "</event>", type_start, type_end]

Step 3: Initialize the model and tokenizer

OmniEvent supports various backbones. The users can specify the model and tokenizer in the config file and initialize them as follows.

>>> from OmniEvent.backbone.backbone import get_backbone
>>> from OmniEvent.model.model import get_model

>>> backbone, tokenizer, config = get_backbone(model_type=model_args.model_type, 
                           		       model_name_or_path=model_args.model_name_or_path, 
                           		       tokenizer_name=model_args.model_name_or_path, 
                           		       markers=data_args.markers,
                           		       new_tokens=data_args.markers)
>>> model = get_model(model_args, backbone)

Step 4: Initialize the dataset and evaluation metric

OmniEvent prepares the DataProcessor and the corresponding evaluation metrics for different task and paradigms.

Note that the metrics here are paradigm-dependent and are not used for the final unified evaluation.

>>> from OmniEvent.input_engineering.seq2seq_processor import EDSeq2SeqProcessor
>>> from OmniEvent.evaluation.metric import compute_seq_F1

>>> train_dataset = EDSeq2SeqProcessor(data_args, tokenizer, data_args.train_file)
>>> eval_dataset = EDSeq2SeqProcessor(data_args, tokenizer, data_args.validation_file)
>>> metric_fn = compute_seq_F1

Step 5: Define Trainer and train

OmniEvent adopts Trainer from 🤗 Transformers for training and evaluation.

>>> from OmniEvent.trainer_seq2seq import Seq2SeqTrainer

>>> trainer = Seq2SeqTrainer(
        args=training_args,
        model=model,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        compute_metrics=metric_fn,
        data_collator=train_dataset.collate_fn,
        tokenizer=tokenizer,
    )
>>> trainer.train()

Step 6: Unified Evaluation

Since the metrics in Step 4 depend on the paradigm, it is not fair to directly compare the performance of models in different paradigms.

OmniEvent evaluates models of different paradigms in a unified manner, where the predictions of different models are converted to predictions on the same candidate sets and then evaluated.

>>> from OmniEvent.evaluation.utils import predict, get_pred_s2s
>>> from OmniEvent.evaluation.convert_format import get_trigger_detection_s2s

>>> logits, labels, metrics, test_dataset = predict(trainer=trainer, tokenizer=tokenizer, data_class=EDSeq2SeqProcessor,
                                                    data_args=data_args, data_file=data_args.test_file,
                                                    training_args=training_args)
>>> # paradigm-dependent metrics
>>> print("{} test performance before converting: {}".formate(test_dataset.dataset_name, metrics["test_micro_f1"]))  
ACE2005-EN test performance before converting: 66.4215686224377

>>> preds = get_pred_s2s(logits, tokenizer)
>>> # convert to the unified prediction and evaluate
>>> pred_labels = get_trigger_detection_s2s(preds, labels, data_args.test_file, data_args, None)
ACE2005-EN test performance after converting: 67.41016109045849

For those datasets whose test set annotations are not public, such as MAVEN and LEVEN, OmniEvent provide scripts to generate submission files. See dump_result.py for details.

Supported Datasets & Models & Contests

Continually updated. Welcome to add more!

Datasets

Language Domain Task Dataset
English General ED MAVEN
General ED EAE ACE-EN
General ED EAE ACE-DYGIE
General ED EAE RichERE (KBP+ERE)
Chinese Legal ED LEVEN
General ED EAE DuEE
General ED EAE ACE-ZH
Financial ED EAE FewFC

Models

  • Paradigm
    • Token Classification (TC)
    • Sequence Labeling (SL)
    • Sequence to Sequence (Seq2Seq)
    • Machine Reading Comprehension (MRC)
  • Backbone
    • CNN / LSTM
    • Transformers (BERT, T5, etc.)
  • Aggregation
    • Select [CLS]
    • Dynamic/Max Pooling
    • Marker
    • GCN
  • Head
    • Linear / CRF / MRC heads

Consistent Evaluation

OmniEvent provides corresponding remedies for the three discrepancies in event extraction evaluation, as suggested in our ACL 2023 paper.

1. Consistent data preprocessing

We provide several preprocessing scripts in scripts/data_processing. For ACE 2005, we provide three mainstream scripts: ace2005-dygie, ace2005-oneie, and ace2005-en. Users can easily use the scripts to process the original data into a unified data format.

2. Output Standardization

We implement the output standardization in OmniEvent/evaluation/convert_format.py. Specifically, users can use corresponding functions to convert the output of different paradigms into the output space of the token classification method.

3. Pipeline Evaluation

As suggested in OmniEvent/evaluation/README.md, we provide several evaluation modes for evaluating event argument extraction. We recommend the strict mode for comparable evaluation. And we provide a unified extracted trigger set for pipeline evaluation of different event argument extraction methods. The triggers are extracted by an advanced ED model: CLEVE. The extracted triggers for different datasets (ACE 2005, RichERE, and TACKBP 2014-2017) are placed in here.

Experiments

We implement and evaluate state-of-the-art methods on some popular benchmarks using OmniEvent, and the results are shown in our ACL 2023 paper "The Devil is in the Details: On the Pitfalls of Event Extraction Evaluation".

Citation

If our codes help you, please cite us:

@inproceedings{peng2023devil,
  title={The Devil is in the Details: On the Pitfalls of Event Extraction Evaluation},
  author={Peng, Hao and Wang, Xiaozhi and Yao, Feng and Zeng, Kaisheng and Hou, Lei and Li, Juanzi and Liu, Zhiyuan and Shen, Weixing},
  booktitle={Findings of ACL 2023},
  year={2023}
}

omnievent's People

Contributors

bakser avatar dependabot[bot] avatar devross avatar h-peng17 avatar yaof20 avatar zimuwangnlp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

omnievent's Issues

用例代码data_class 未定义

您好 我再运行 readme时遇到这段
logits, labels, metrics, test_dataset = predict(trainer=trainer, tokenizer=tokenizer, data_class=data_class,data_args=data_args,data_file=data_args.test_file,training_args=training_args)
NameError: name 'data_class' is not defined
这个变量没有定义,请问如何获取

Question / Potential Bug re: Seq2Seq Example

For the evaluation code provided in https://github.com/THU-KEG/OmniEvent/blob/main/examples/EAE/seq2seq.py

logits, labels, metrics, test_dataset = predict(trainer=trainer, tokenizer=tokenizer, data_class=data_class,
                                                        data_args=data_args, data_file=data_args.test_file,
                                                        training_args=training_args)
  preds = get_pred_s2s(logits, tokenizer, pred_types=training_args.data_for_evaluation["pred_types"])

  logging.info("\n")
  logging.info("{}-EAE Evaluate Mode : {}-{}".format("-" * 25, data_args.eae_eval_mode, "-" * 25))
  logging.info("{}-Use Golden Trigger: {}-{}".format("-" * 25, data_args.golden_trigger, "-" * 25))

  if data_args.test_exists_labels:
      logging.info("{} test performance before converting: {}".format(data_args.dataset_name, metrics))
      get_ace2005_argument_extraction_s2s(preds, labels, data_args.test_file, data_args, None)

It seems that the labels being passed to get_ace2005_argument_extraction_s2s are still token ids, but the function is expecting it to have been parsed and prepared similar to how preds is formatted. Is there missing code here?

Thanks!

Note I am adapting this code for RAMs and using t5-base config.

Information on models fine-tuning used in OmniEvent.infer

Hello,

Thank you for this great package!

I would like to know on which datasets and how the two models that are used when running OmniEvent.infer were fine-tuned. That is, the 2 models which links are accessibles in the utils module.

In particular, I did notice that there is an option "schema" in OmniEvent.infer. I took it as suggesting that the models where fine-tuned all on the schemas available. Yet, when digging a bit further I noticed that none of these schemas have been passed as special_tokens to the tokenizer. Thus I'm wondering how the model would know that we are refering to a specific task, that is the fine-tuning on a specific dataset, when prepending each text with f"<txt_schema>". To be sure, when given "<maven>The king married the queen" how does the model understand that I want it to focus on what it learned when being fine-tuned on the maven dataset?

I ran a test only with the EDProcessor class using the schema "maven" and indeed it treated it as any other token.

Thank you

您好我发现windows环境无法安装

我在您给的条件的基础上,安装了cuda,但是仍无法安装完成,您的安装环境需要的lscpu命令是我无法使用的,请问有无解决方案

运行EAE任务时出错

运行时出现以下错误
image

原因是OmniEvent/infer.py 文件的第134行 do_event_argument_extraction()函数少加了个参数‘device'

可以在seq2seq.py文件下找到对应函数
image
在添加'device'参数后,可以正常运行

找不到train

您好我按照您的说明文档进行模型训练的时候,当我进行到如图片所示的这一步时,我发现您的代码里没有train.sh该文件,请问这样该如何进行模型训练呢
1709622557287

base_processor.py的136行是否有bug呢?

发现一个小bug,/OmniEvent/OmniEvent/input_engineering/base_processor.py这个文件里的136行写的是:input_template: Optional[str][str] = None,但是程序报错了,说Optional[str][str]的语法是错的。因此我改成了Optional[str]=None,就可以运行了。
不知道我这样改对不对呢?

ERE: Event Relation Extraction

Hello,

Thank you for this amazing toolkit. I have a question as I tried to run the Code 2 example on page 4 in the paper (https://arxiv.org/pdf/2309.14258.pdf)

event extraction & relation extraction

all_results = infer(text=text, task="EE & ERE")

That part of the code throws the following exception:

Traceback (most recent call last):
File "/storage/home/grads/ehussein/OmniEvent/test.py", line 20, in
all_results = infer(text=text, task="ERE")
File "/storage/home/grads/ehussein/OmniEvent/OmniEvent/infer.py", line 107, in infer
assert task in ['ED', 'EAE', 'EE']
AssertionError

The toolkit does not support the ERE part yet. Do you think I need to do something to infer the event relation extraction? or will this part be released soon?

Thank you

Event Ontology

Hi,
I have been using this library for a project. I am using it for event detection, but I have not found the exact event ontology used to train the model.

Does this ontology comprise event types from both ACE and MAVEN? Or is there any custom event ontology for the Event Detection model? Where can I access the ontology file?

Thank you.

运行环境

你好,请问该代码需要在windows下运行还是linux呢

readme里面的运行步骤 写到一个py文件中

from OmniEvent.arguments import DataArguments, ModelArguments, TrainingArguments, ArgumentParser
from OmniEvent.input_engineering.seq2seq_processor import EDSeq2SeqProcessor, type_start, type_end
from OmniEvent.backbone.backbone import get_backbone
from OmniEvent.model.model import get_model
from OmniEvent.evaluation.metric import compute_seq_F1
from OmniEvent.trainer_seq2seq import Seq2SeqTrainer
from OmniEvent.evaluation.utils import predict, get_pred_s2s
from OmniEvent.evaluation.convert_format import get_trigger_detection_s2s
from transformers import T5ForConditionalGeneration, T5TokenizerFast
from ipdb import set_trace

def main():

# Step 2: Set up the customized configurations
parser = ArgumentParser((ModelArguments, DataArguments, TrainingArguments))
model_args, data_args, training_args = parser.parse_yaml_file(yaml_file="config/all-datasets/ed/s2s/duee.yaml")
training_args.output_dir = 'output/duee/ED/seq2seq/t5-base/'
data_args.markers = ["<event>", "</event>", type_start, type_end]
print('==================================step2 数据集配置文件yaml结束==================================')

# Step 3: Initialize the model and tokenizer
model_args.model_name_or_path = '/pretrained_model/t5'
model = T5ForConditionalGeneration.from_pretrained(model_args.model_name_or_path)
backbone = model
tokenizer = T5TokenizerFast.from_pretrained(model_args.model_name_or_path, never_split=data_args.markers)
config = model.config

model = get_model(model_args, backbone)
print("======================step3 模型初始化结束====================================")

# Step 4: Initialize the dataset and evaluation metric
data_args.train_file = '/data/processed/DuEE1.0/train.unified.jsonl'
data_args.test_file = "/data/processed/DuEE1.0/test.unified.jsonl"
data_args.validation_file = "/data/processed/DuEE1.0/valid.unified.jsonl"
train_dataset = EDSeq2SeqProcessor(data_args, tokenizer, data_args.train_file)
eval_dataset = EDSeq2SeqProcessor(data_args, tokenizer, data_args.validation_file)
metric_fn = compute_seq_F1

# Step 5: Define Trainer and train
trainer = Seq2SeqTrainer(
    args=training_args,
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=metric_fn,
    data_collator=train_dataset.collate_fn,
    tokenizer=tokenizer,
)
resume_from_checkpoint = 'OmniEvent-main/output/duee/ED/seq2seq/t5-base/checkpoint-7440'
if resume_from_checkpoint :
    trainer.train(resume_from_checkpoint)
else:
    trainer.train()
print('*****************************************训练结束********************************************')

# Step 6: Unified Evaluation
logits, labels, metrics, test_dataset = predict(trainer=trainer, tokenizer=tokenizer, data_class=EDSeq2SeqProcessor,
                                                data_args=data_args, data_file=data_args.test_file,
                                                training_args=training_args)
set_trace()
# paradigm-dependent metrics
print("{} test performance before converting: {}".format(test_dataset.dataset_name, metrics["test_micro_f1"]))

preds = get_pred_s2s(logits, tokenizer)
# convert to the unified prediction and evaluate
pred_labels = get_trigger_detection_s2s(preds, labels, data_args.test_file, data_args, None)
print("{} test performance after converting: {}".format(test_dataset.dataset_name, pred_labels["test_micro_f1"]))

if name == "main":
main()

您好,我在尝试将您readme里面的例子,用duee数据集,写成了py的格式。但是遇到了一些问题,例如metrics["test_micro_f1"]里为metrics["micro_f1"]、并且这里为0。请问您那边是否有这个的py文件,是否方便提供一下

problems with installation

I wander which version of transformers should be used.
I have problems like ModuleNotFoundError: No module named 'BartForConditionalGeneration'

There is a fatal bug ,please fix it .

我用pip install OmniEvent安装的库,跑代码的时候发现一个bug,请修复它。bug位于OmniEvent/input_engineering/seq2seq_processor.py文件第52行,请及时处理。
image

No CUDA GPUs are available

您好,我在git clone这个repo,并用pip install -e .安装库以后,运行代码出现No CUDA GPUs are available的问题,但实际上我是在服务器上运行代码的,在命令行运行nvidia-smi也是正常的
问题出现在OmniEvent/examples/ED/token_classification.py的第88行model.cuda()

Constrained Decoding

Is there example code on how to integrate Constrained Decoding for the Seq2Seq example model?

成功安装后首次运行报错

成功安装后运行报错
`from OmniEvent.infer import infer

Even Extraction (EE) Task

text = "2022年北京市举办了冬奥会"
results = infer(text=text, task="EE")
print(results[0]["events"])`
发生如下报错:
Downloading: 0%| | 0.00/1.77G [00:00<?, ?B/s]1901858561
Downloading
Downloading: 100%|████████████████████████████████████████████████████████████████| 1.77G/1.77G [01:14<00:00, 25.4MB/s]
Archive: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed.zip
creating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_5.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/config.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_3.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/pytorch_model.bin
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/spiece.model
extracting: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/latest
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_7.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_0.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/special_tokens_map.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/trainer_state.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/tokenizer.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_4.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/args.yaml
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/zero_to_fp32.py
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_1.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/tokenizer_config.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_2.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_6.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/added_tokens.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/training_args.bin
load from local file: C:\Users\lenovo/.cache/OmniEvent_Model\s2s-mt5-ed tokenizer
download from web, cache will be save to: C:\Users\lenovo/.cache/OmniEvent_Model/s2s-mt5-eae.zip
Downloading: 0%| | 0.00/3.88G [00:00<?, ?B/s]4167695152
Downloading
Downloading: 100%|████████████████████████████████████████████████████████████████| 3.88G/3.88G [03:04<00:00, 22.6MB/s]
Archive: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-eae.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-eae.zip,
and cannot find C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-eae.zip.zip, period.
Traceback (most recent call last):
File "D:\python3.10安装\lib\site-packages\transformers\configuration_utils.py", line 623, in _get_config_dict
resolved_config_file = cached_path(
File "D:\python3.10安装\lib\site-packages\transformers\utils\hub.py", line 284, in cached_path
output_path = get_from_cache(
File "D:\python3.10安装\lib\site-packages\transformers\utils\hub.py", line 562, in get_from_cache
raise ValueError(
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "D:\python3.10安装\lib\site-packages\OmniEvent\infer.py", line 135, in infer
eae_model, eae_tokenizer = get_pretrained("s2s-mt5-eae", device)
File "D:\python3.10安装\lib\site-packages\OmniEvent\infer.py", line 67, in get_pretrained
model = get_model(model_args, model_name_or_path)
File "D:\python3.10安装\lib\site-packages\OmniEvent\infer.py", line 57, in get_model
model = get_model_cls(model_args).from_pretrained(path)
File "D:\python3.10安装\lib\site-packages\transformers\modeling_utils.py", line 1840, in from_pretrained
config, model_kwargs = cls.config_class.from_pretrained(
File "D:\python3.10安装\lib\site-packages\transformers\configuration_utils.py", line 534, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "D:\python3.10安装\lib\site-packages\transformers\configuration_utils.py", line 561, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "D:\python3.10安装\lib\site-packages\transformers\configuration_utils.py", line 656, in _get_config_dict
raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like C:\Users\lenovo/.cache/OmniEvent_Model/s2s-mt5-eae is not the path to a directory containing a config.json file.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
屏幕截图 2024-02-26 224947

ace2005-zh-novalue

Hello,

What's the difference between the ace2005-zh-novalue.py with ace2005-zh.py in the data processing scripts?

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.