thu-keg / omnievent Goto Github PK

A comprehensive, unified and modular event extraction toolkit.

Home Page: https://omnievent.readthedocs.io/

License: MIT License

Python 96.72% Shell 2.99% Makefile 0.04% Batchfile 0.05% CSS 0.14% JavaScript 0.03% HTML 0.04%

event-detection event-extraction big-models bmtrain deep-learning huggingface-transformers information-extration natural-language-generation natural-language-processing pytorch

omnievent's Introduction

A comprehensive, unified and modular event extraction toolkit.

Table of Contents
Overview
- Highlights
Installation
- With pip
- From source
Easy Start
Train your Own Model with OmniEvent
Supported Datasets & Models & Contests
- Datasets
- Models
Consistent Evaluation
Experiments
Citation

Overview

OmniEvent is a powerful open-source toolkit for event extraction, including event detection and event argument extraction. We comprehensively cover various paradigms and provide fair and unified evaluations on widely-used English and Chinese datasets. Modular implementations make OmniEvent highly extensible.

Highlights

Comprehensive Capability
- Support to do Event Extraction at once, and also to independently do its two subtasks: Event Detection, Event Argument Extraction.
- Cover various paradigms: Token Classification, Sequence Labeling, MRC(QA) and Seq2Seq.
- Implement Transformer-based (BERT, T5, etc.) and classical (DMCNN, CRF, etc.) models.
- Both Chinese and English are supported for all event extraction sub-tasks, paradigms and models.
Unified Benchmark & Evaluation
- Various datasets are processed into a unified format.
- Predictions of different paradigms are all converted into a unified candidate set for fair evaluations.
- Four evaluation modes (gold, loose, default, strict) well cover different previous evaluation settings.
Modular Implementation
- All models are decomposed into four modules:
  - Input Engineering: Prepare inputs and support various input engineering methods like prompting.
  - Backbone: Encode text into hidden states.
  - Aggregation: Fuse hidden states (e.g., select [CLS], pooling, GCN) to the final event representation.
  - Output Head: Map the event representation to the final outputs, such as Linear, CRF, MRC head, etc.
- You can combine and reimplement different modules to design and implement your own new model.
Big Model Training & Inference
- Efficient training and inference of big event extraction models are supported with BMTrain.
Easy to Use & Highly Extensible
- Open datasets can be downloaded and processed with a single command.
- Fully compatible with 🤗 Transformers and its Trainer.
- Users can easily reproduce existing models and build customized models with OmniEvent.

Installation

With pip

This repository is tested on Python 3.9+, Pytorch 1.12.1+. OmniEvent can be installed with pip as follows:

pip install OmniEvent

From source

If you want to install the repository from local source, you can install as follows:

pip install .

And if you want to edit the repositoy, you can

pip install -e .

Easy Start

OmniEvent provides several off-the-shelf models for the users. Examples are shown below.

Make sure you have installed OmniEvent as instructed above. Note that it may take a few minutes to download checkpoint at the first time.

>>> from OmniEvent.infer import infer

>>> # Even Extraction (EE) Task
>>> text = "2022年北京市举办了冬奥会"
>>> results = infer(text=text, task="EE")
>>> print(results[0]["events"])
[
    {
        "type": "组织行为开幕", "trigger": "举办", "offset": [8, 10],
        "arguments": [
            {   "mention": "2022年", "offset": [9, 16], "role": "时间"},
            {   "mention": "北京市", "offset": [81, 89], "role": "地点"},
            {   "mention": "冬奥会", "offset": [0, 4], "role": "活动名称"},
        ]
    }
]

>>> text = "U.S. and British troops were moving on the strategic southern port city of Basra \ 
Saturday after a massive aerial assault pounded Baghdad at dawn"

>>> # Event Detection (ED) Task
>>> results = infer(text=text, task="ED")
>>> print(results[0]["events"])
[
    { "type": "attack", "trigger": "assault", "offset": [113, 120]},
    { "type": "injure", "trigger": "pounded", "offset": [121, 128]}
]

>>> # Event Argument Extraction (EAE) Task
>>> results = infer(text=text, triggers=[("assault", 113, 120), ("pounded", 121, 128)], task="EAE")
>>> print(results[0]["events"])
[
    {
        "type": "attack", "trigger": "assault", "offset": [113, 120],
        "arguments": [
            {   "mention": "U.S.", "offset": [0, 4], "role": "attacker"},
            {   "mention": "British", "offset": [9, 16], "role": "attacker"},
            {   "mention": "Saturday", "offset": [81, 89], "role": "time"}
        ]
    },
    {
        "type": "injure", "trigger": "pounded", "offset": [121, 128],
        "arguments": [
            {   "mention": "U.S.", "offset": [0, 4], "role": "attacker"},
            {   "mention": "Saturday", "offset": [81, 89], "role": "time"},
            {   "mention": "British", "offset": [9, 16], "role": "attacker"}
        ]
    }
]

Train your Own Model with OmniEvent

OmniEvent can help users easily train and evaluate their customized models on specific datasets.

We show a step-by-step example of using OmniEvent to train and evaluate an Event Detection model on ACE-EN dataset in the Seq2Seq paradigm. More examples are shown in examples.

Step 1: Process the dataset into the unified format

We provide standard data processing scripts for several commonly-used datasets. Checkout the details in scripts/data_processing.

dataset=ace2005-en  # the dataset name
cd scripts/data_processing/$dataset
bash run.sh

Step 2: Set up the customized configurations

We keep track of the configurations of dataset, model and training parameters via a single *.yaml file. See ./configs for details.

>>> from OmniEvent.arguments import DataArguments, ModelArguments, TrainingArguments, ArgumentParser
>>> from OmniEvent.input_engineering.seq2seq_processor import type_start, type_end

>>> parser = ArgumentParser((ModelArguments, DataArguments, TrainingArguments))
>>> model_args, data_args, training_args = parser.parse_yaml_file(yaml_file="config/all-datasets/ed/s2s/ace-en.yaml")

>>> training_args.output_dir = 'output/ACE2005-EN/ED/seq2seq/t5-base/'
>>> data_args.markers = ["<event>", "</event>", type_start, type_end]

Step 3: Initialize the model and tokenizer

OmniEvent supports various backbones. The users can specify the model and tokenizer in the config file and initialize them as follows.

>>> from OmniEvent.backbone.backbone import get_backbone
>>> from OmniEvent.model.model import get_model

>>> backbone, tokenizer, config = get_backbone(model_type=model_args.model_type, 
                           		       model_name_or_path=model_args.model_name_or_path, 
                           		       tokenizer_name=model_args.model_name_or_path, 
                           		       markers=data_args.markers,
                           		       new_tokens=data_args.markers)
>>> model = get_model(model_args, backbone)

Step 4: Initialize the dataset and evaluation metric

OmniEvent prepares the DataProcessor and the corresponding evaluation metrics for different task and paradigms.

Note that the metrics here are paradigm-dependent and are not used for the final unified evaluation.

>>> from OmniEvent.input_engineering.seq2seq_processor import EDSeq2SeqProcessor
>>> from OmniEvent.evaluation.metric import compute_seq_F1

>>> train_dataset = EDSeq2SeqProcessor(data_args, tokenizer, data_args.train_file)
>>> eval_dataset = EDSeq2SeqProcessor(data_args, tokenizer, data_args.validation_file)
>>> metric_fn = compute_seq_F1

Step 5: Define Trainer and train

OmniEvent adopts Trainer from 🤗 Transformers for training and evaluation.

>>> from OmniEvent.trainer_seq2seq import Seq2SeqTrainer

>>> trainer = Seq2SeqTrainer(
        args=training_args,
        model=model,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        compute_metrics=metric_fn,
        data_collator=train_dataset.collate_fn,
        tokenizer=tokenizer,
    )
>>> trainer.train()

Step 6: Unified Evaluation

Since the metrics in Step 4 depend on the paradigm, it is not fair to directly compare the performance of models in different paradigms.

OmniEvent evaluates models of different paradigms in a unified manner, where the predictions of different models are converted to predictions on the same candidate sets and then evaluated.

>>> from OmniEvent.evaluation.utils import predict, get_pred_s2s
>>> from OmniEvent.evaluation.convert_format import get_trigger_detection_s2s

>>> logits, labels, metrics, test_dataset = predict(trainer=trainer, tokenizer=tokenizer, data_class=EDSeq2SeqProcessor,
                                                    data_args=data_args, data_file=data_args.test_file,
                                                    training_args=training_args)
>>> # paradigm-dependent metrics
>>> print("{} test performance before converting: {}".formate(test_dataset.dataset_name, metrics["test_micro_f1"]))  
ACE2005-EN test performance before converting: 66.4215686224377

>>> preds = get_pred_s2s(logits, tokenizer)
>>> # convert to the unified prediction and evaluate
>>> pred_labels = get_trigger_detection_s2s(preds, labels, data_args.test_file, data_args, None)
ACE2005-EN test performance after converting: 67.41016109045849

For those datasets whose test set annotations are not public, such as MAVEN and LEVEN, OmniEvent provide scripts to generate submission files. See dump_result.py for details.

Supported Datasets & Models & Contests

Continually updated. Welcome to add more!

Datasets

Language	Domain	Task	Dataset
English	General	ED	MAVEN
	General	ED EAE	ACE-EN
	General	ED EAE	ACE-DYGIE
	General	ED EAE	RichERE (KBP+ERE)
Chinese	Legal	ED	LEVEN
	General	ED EAE	DuEE
	General	ED EAE	ACE-ZH
	Financial	ED EAE	FewFC

Models

Paradigm
- Token Classification (TC)
- Sequence Labeling (SL)
- Sequence to Sequence (Seq2Seq)
- Machine Reading Comprehension (MRC)
Backbone
- CNN / LSTM
- Transformers (BERT, T5, etc.)
Aggregation
- Select [CLS]
- Dynamic/Max Pooling
- Marker
- GCN
Head
- Linear / CRF / MRC heads

Consistent Evaluation

OmniEvent provides corresponding remedies for the three discrepancies in event extraction evaluation, as suggested in our ACL 2023 paper.

1. Consistent data preprocessing

We provide several preprocessing scripts in scripts/data_processing. For ACE 2005, we provide three mainstream scripts: ace2005-dygie, ace2005-oneie, and ace2005-en. Users can easily use the scripts to process the original data into a unified data format.

2. Output Standardization

We implement the output standardization in OmniEvent/evaluation/convert_format.py. Specifically, users can use corresponding functions to convert the output of different paradigms into the output space of the token classification method.

3. Pipeline Evaluation

As suggested in OmniEvent/evaluation/README.md, we provide several evaluation modes for evaluating event argument extraction. We recommend the strict mode for comparable evaluation. And we provide a unified extracted trigger set for pipeline evaluation of different event argument extraction methods. The triggers are extracted by an advanced ED model: CLEVE. The extracted triggers for different datasets (ACE 2005, RichERE, and TACKBP 2014-2017) are placed in here.

Experiments

We implement and evaluate state-of-the-art methods on some popular benchmarks using OmniEvent, and the results are shown in our ACL 2023 paper "The Devil is in the Details: On the Pitfalls of Event Extraction Evaluation".

Citation

If our codes help you, please cite us:

@inproceedings{peng2023devil,
  title={The Devil is in the Details: On the Pitfalls of Event Extraction Evaluation},
  author={Peng, Hao and Wang, Xiaozhi and Yao, Feng and Zeng, Kaisheng and Hou, Lei and Li, Juanzi and Liu, Zhiyuan and Shen, Weixing},
  booktitle={Findings of ACL 2023},
  year={2023}
}

omnievent's People

Contributors

Stargazers

Watchers

omnievent's Issues

用例代码data_class 未定义

您好我再运行 readme时遇到这段
logits, labels, metrics, test_dataset = predict(trainer=trainer, tokenizer=tokenizer, data_class=data_class,data_args=data_args,data_file=data_args.test_file,training_args=training_args)
NameError: name 'data_class' is not defined
这个变量没有定义，请问如何获取

Question / Potential Bug re: Seq2Seq Example

For the evaluation code provided in https://github.com/THU-KEG/OmniEvent/blob/main/examples/EAE/seq2seq.py

logits, labels, metrics, test_dataset = predict(trainer=trainer, tokenizer=tokenizer, data_class=data_class,
                                                        data_args=data_args, data_file=data_args.test_file,
                                                        training_args=training_args)
  preds = get_pred_s2s(logits, tokenizer, pred_types=training_args.data_for_evaluation["pred_types"])

  logging.info("\n")
  logging.info("{}-EAE Evaluate Mode : {}-{}".format("-" * 25, data_args.eae_eval_mode, "-" * 25))
  logging.info("{}-Use Golden Trigger: {}-{}".format("-" * 25, data_args.golden_trigger, "-" * 25))

  if data_args.test_exists_labels:
      logging.info("{} test performance before converting: {}".format(data_args.dataset_name, metrics))
      get_ace2005_argument_extraction_s2s(preds, labels, data_args.test_file, data_args, None)

It seems that the labels being passed to get_ace2005_argument_extraction_s2s are still token ids, but the function is expecting it to have been parsed and prepared similar to how preds is formatted. Is there missing code here?

Thanks!

Note I am adapting this code for RAMs and using t5-base config.

do_event_argument_extraction() missing 1 required positional argument: 'device'

Information on models fine-tuning used in OmniEvent.infer

Hello,

Thank you for this great package!

I would like to know on which datasets and how the two models that are used when running OmniEvent.infer were fine-tuned. That is, the 2 models which links are accessibles in the utils module.

In particular, I did notice that there is an option "schema" in OmniEvent.infer. I took it as suggesting that the models where fine-tuned all on the schemas available. Yet, when digging a bit further I noticed that none of these schemas have been passed as special_tokens to the tokenizer. Thus I'm wondering how the model would know that we are refering to a specific task, that is the fine-tuning on a specific dataset, when prepending each text with f"<txt_schema>". To be sure, when given "<maven>The king married the queen" how does the model understand that I want it to focus on what it learned when being fine-tuned on the maven dataset?

I ran a test only with the EDProcessor class using the schema "maven" and indeed it treated it as any other token.

Thank you

FileNotFoundError: [Errno 2] No such file or directory: '~/OmniEvent_Model'

您好我发现windows环境无法安装

我在您给的条件的基础上，安装了cuda，但是仍无法安装完成，您的安装环境需要的lscpu命令是我无法使用的，请问有无解决方案

您好，训练语料库(DuEE)需要多大的显存？

Can you provide a maven-ere Omnievent pretrained model used for event relation extraction?

在您的OmniEvent 的 Demo中有一个完整的关于Event Extraction & Event Relation Extraction的样例，似乎是基于maven-ere训练出的？
请问我们是否可以获得这个模型？并用其进行infer任务？

运行EAE任务时出错

运行时出现以下错误

原因是OmniEvent/infer.py 文件的第134行 do_event_argument_extraction()函数少加了个参数‘device'

可以在seq2seq.py文件下找到对应函数

在添加'device'参数后，可以正常运行

你好，请问下脚本中的ere在哪里下载？

找不到train

您好我按照您的说明文档进行模型训练的时候，当我进行到如图片所示的这一步时，我发现您的代码里没有train.sh该文件，请问这样该如何进行模型训练呢

您好，如何将自己训练好的参数加载到模型中，完成事件抽取任务？

base_processor.py的136行是否有bug呢？

发现一个小bug，/OmniEvent/OmniEvent/input_engineering/base_processor.py这个文件里的136行写的是：input_template: Optional[str][str] = None,但是程序报错了，说Optional[str][str]的语法是错的。因此我改成了Optional[str]=None，就可以运行了。
不知道我这样改对不对呢？

ERE: Event Relation Extraction

Hello,

Thank you for this amazing toolkit. I have a question as I tried to run the Code 2 example on page 4 in the paper (https://arxiv.org/pdf/2309.14258.pdf)

event extraction & relation extraction

all_results = infer(text=text, task="EE & ERE")

That part of the code throws the following exception:

Traceback (most recent call last):
File "/storage/home/grads/ehussein/OmniEvent/test.py", line 20, in
all_results = infer(text=text, task="ERE")
File "/storage/home/grads/ehussein/OmniEvent/OmniEvent/infer.py", line 107, in infer
assert task in ['ED', 'EAE', 'EE']
AssertionError

The toolkit does not support the ERE part yet. Do you think I need to do something to infer the event relation extraction? or will this part be released soon?

Thank you

Event Ontology

Hi,
I have been using this library for a project. I am using it for event detection, but I have not found the exact event ontology used to train the model.

Does this ontology comprise event types from both ACE and MAVEN? Or is there any custom event ontology for the Event Detection model? Where can I access the ontology file?

Thank you.

运行环境

你好，请问该代码需要在windows下运行还是linux呢

服务器连接不上huggingface，无法加载model，请教有没有什么更好的解决方案，谢谢！

readme里面的运行步骤写到一个py文件中

from OmniEvent.arguments import DataArguments, ModelArguments, TrainingArguments, ArgumentParser
from OmniEvent.input_engineering.seq2seq_processor import EDSeq2SeqProcessor, type_start, type_end
from OmniEvent.backbone.backbone import get_backbone
from OmniEvent.model.model import get_model
from OmniEvent.evaluation.metric import compute_seq_F1
from OmniEvent.trainer_seq2seq import Seq2SeqTrainer
from OmniEvent.evaluation.utils import predict, get_pred_s2s
from OmniEvent.evaluation.convert_format import get_trigger_detection_s2s
from transformers import T5ForConditionalGeneration, T5TokenizerFast
from ipdb import set_trace

def main():

# Step 2: Set up the customized configurations
parser = ArgumentParser((ModelArguments, DataArguments, TrainingArguments))
model_args, data_args, training_args = parser.parse_yaml_file(yaml_file="config/all-datasets/ed/s2s/duee.yaml")
training_args.output_dir = 'output/duee/ED/seq2seq/t5-base/'
data_args.markers = ["<event>", "</event>", type_start, type_end]
print('==================================step2 数据集配置文件yaml结束==================================')

# Step 3: Initialize the model and tokenizer
model_args.model_name_or_path = '/pretrained_model/t5'
model = T5ForConditionalGeneration.from_pretrained(model_args.model_name_or_path)
backbone = model
tokenizer = T5TokenizerFast.from_pretrained(model_args.model_name_or_path, never_split=data_args.markers)
config = model.config

model = get_model(model_args, backbone)
print("======================step3 模型初始化结束====================================")

# Step 4: Initialize the dataset and evaluation metric
data_args.train_file = '/data/processed/DuEE1.0/train.unified.jsonl'
data_args.test_file = "/data/processed/DuEE1.0/test.unified.jsonl"
data_args.validation_file = "/data/processed/DuEE1.0/valid.unified.jsonl"
train_dataset = EDSeq2SeqProcessor(data_args, tokenizer, data_args.train_file)
eval_dataset = EDSeq2SeqProcessor(data_args, tokenizer, data_args.validation_file)
metric_fn = compute_seq_F1

# Step 5: Define Trainer and train
trainer = Seq2SeqTrainer(
    args=training_args,
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=metric_fn,
    data_collator=train_dataset.collate_fn,
    tokenizer=tokenizer,
)
resume_from_checkpoint = 'OmniEvent-main/output/duee/ED/seq2seq/t5-base/checkpoint-7440'
if resume_from_checkpoint :
    trainer.train(resume_from_checkpoint)
else:
    trainer.train()
print('*****************************************训练结束********************************************')

# Step 6: Unified Evaluation
logits, labels, metrics, test_dataset = predict(trainer=trainer, tokenizer=tokenizer, data_class=EDSeq2SeqProcessor,
                                                data_args=data_args, data_file=data_args.test_file,
                                                training_args=training_args)
set_trace()
# paradigm-dependent metrics
print("{} test performance before converting: {}".format(test_dataset.dataset_name, metrics["test_micro_f1"]))

preds = get_pred_s2s(logits, tokenizer)
# convert to the unified prediction and evaluate
pred_labels = get_trigger_detection_s2s(preds, labels, data_args.test_file, data_args, None)
print("{} test performance after converting: {}".format(test_dataset.dataset_name, pred_labels["test_micro_f1"]))

if name == "main":
main()

您好，我在尝试将您readme里面的例子，用duee数据集，写成了py的格式。但是遇到了一些问题，例如metrics["test_micro_f1"]里为metrics["micro_f1"]、并且这里为0。请问您那边是否有这个的py文件，是否方便提供一下

problems with installation

I wander which version of transformers should be used.
I have problems like ModuleNotFoundError: No module named 'BartForConditionalGeneration'

There is a fatal bug ,please fix it .

我用pip install OmniEvent安装的库，跑代码的时候发现一个bug,请修复它。bug位于OmniEvent/input_engineering/seq2seq_processor.py文件第52行，请及时处理。

No CUDA GPUs are available

您好，我在git clone这个repo，并用pip install -e .安装库以后，运行代码出现No CUDA GPUs are available的问题，但实际上我是在服务器上运行代码的，在命令行运行nvidia-smi也是正常的
问题出现在OmniEvent/examples/ED/token_classification.py的第88行model.cuda()

无法安装deepspeed

没办法在windows里安装deepspeed库，需要如何解决

Constrained Decoding

Is there example code on how to integrate Constrained Decoding for the Seq2Seq example model?

成功安装后首次运行报错

成功安装后运行报错
`from OmniEvent.infer import infer

Even Extraction (EE) Task

text = "2022年北京市举办了冬奥会"
results = infer(text=text, task="EE")
print(results[0]["events"])`
发生如下报错：
Downloading: 0%| | 0.00/1.77G [00:00<?, ?B/s]1901858561
Downloading
Downloading: 100%|████████████████████████████████████████████████████████████████| 1.77G/1.77G [01:14<00:00, 25.4MB/s]
Archive: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed.zip
creating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_5.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/config.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_3.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/pytorch_model.bin
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/spiece.model
extracting: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/latest
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_7.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_0.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/special_tokens_map.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/trainer_state.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/tokenizer.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_4.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/args.yaml
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/zero_to_fp32.py
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_1.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/tokenizer_config.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_2.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_6.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/added_tokens.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/training_args.bin
load from local file: C:\Users\lenovo/.cache/OmniEvent_Model\s2s-mt5-ed tokenizer
download from web, cache will be save to: C:\Users\lenovo/.cache/OmniEvent_Model/s2s-mt5-eae.zip
Downloading: 0%| | 0.00/3.88G [00:00<?, ?B/s]4167695152
Downloading
Downloading: 100%|████████████████████████████████████████████████████████████████| 3.88G/3.88G [03:04<00:00, 22.6MB/s]
Archive: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-eae.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-eae.zip,
and cannot find C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-eae.zip.zip, period.
Traceback (most recent call last):
File "D:\python3.10安装\lib\site-packages\transformers\configuration_utils.py", line 623, in _get_config_dict
resolved_config_file = cached_path(
File "D:\python3.10安装\lib\site-packages\transformers\utils\hub.py", line 284, in cached_path
output_path = get_from_cache(
File "D:\python3.10安装\lib\site-packages\transformers\utils\hub.py", line 562, in get_from_cache
raise ValueError(
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "D:\python3.10安装\lib\site-packages\OmniEvent\infer.py", line 135, in infer
eae_model, eae_tokenizer = get_pretrained("s2s-mt5-eae", device)
File "D:\python3.10安装\lib\site-packages\OmniEvent\infer.py", line 67, in get_pretrained
model = get_model(model_args, model_name_or_path)
File "D:\python3.10安装\lib\site-packages\OmniEvent\infer.py", line 57, in get_model
model = get_model_cls(model_args).from_pretrained(path)
File "D:\python3.10安装\lib\site-packages\transformers\modeling_utils.py", line 1840, in from_pretrained
config, model_kwargs = cls.config_class.from_pretrained(
File "D:\python3.10安装\lib\site-packages\transformers\configuration_utils.py", line 534, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "D:\python3.10安装\lib\site-packages\transformers\configuration_utils.py", line 561, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "D:\python3.10安装\lib\site-packages\transformers\configuration_utils.py", line 656, in _get_config_dict
raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like C:\Users\lenovo/.cache/OmniEvent_Model/s2s-mt5-eae is not the path to a directory containing a config.json file.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

ace2005-zh-novalue

Hello,

What's the difference between the ace2005-zh-novalue.py with ace2005-zh.py in the data processing scripts?

Thanks.

thu-keg / omnievent Goto Github PK

omnievent's Introduction

Table of Contents

Overview

Highlights

Installation

With pip

From source

Easy Start

Train your Own Model with OmniEvent

Step 1: Process the dataset into the unified format

Step 2: Set up the customized configurations

Step 3: Initialize the model and tokenizer

Step 4: Initialize the dataset and evaluation metric

Step 5: Define Trainer and train

Step 6: Unified Evaluation

Supported Datasets & Models & Contests

Datasets

Models

Consistent Evaluation

1. Consistent data preprocessing

2. Output Standardization

3. Pipeline Evaluation

Experiments

Citation

omnievent's People

Contributors

Stargazers

Watchers

Forkers

omnievent's Issues

event extraction & relation extraction

Even Extraction (EE) Task

Recommend Projects

Recommend Topics

Recommend Org