coopercoppers / pfn Goto Github PK
View Code? Open in Web Editor NEWEMNLP 2021 - A Partition Filter Network for Joint Entity and Relation Extraction
License: MIT License
EMNLP 2021 - A Partition Filter Network for Joint Entity and Relation Extraction
License: MIT License
中文的数据格式需要转换成哪张数据集的 格式才可以
When I finished the model training and began to test, the OOM occurred, since the model does not optimize with multi-gpu, did you have this problem before?
Here
Line 271 in 6173b3e
we see
re_tail_score = self.re_tail(h_share, h_re, mask)
i think it should be
re_tail_score = self.re_tail(h_re, h_share, mask)
just from gradient flow considerations we actually have two almost identical modules, but with inputs being swapped h_re and h_share have gradients from upper layers for semantically different tasks/losses. Besides of that, corrected variant learns slightly better according to my experiments.
Thanks for your talented and excellent work!
I wondered whether there are some considerable ways to make up for relation information gain that out-of-triples entity missed?
您好,非常感谢您非常优秀的工作成果,让我重新认识了relation识别对entity识别有帮助的可能。在阅读paper时,有一些疑问,希望能够得到解答。
if the parameter of args.do_eval is false. the entity_best and triple_best in save_file.save method will be none.
I found that do_eval parameter in the training command-line that you listed is none, so the default do_eval parameter will be False.
saved_file.save("best test result ner-p: {:.4f} \t ner-r: {:.4f} \t ner-f: {:.4f} \t re-p: {:.4f} \t re-r: {:.4f} \t re-f: {:.4f} ".format(entity_best["p"],
entity_best["r"], entity_best["f"], triple_best["p"], triple_best["r"], triple_best["f"]))
您好,感谢您额外展示出encoding scheme相关的NER结果。
基于您展示的结果,我观察到在NER的结果上Sequential >>Parallel > original。
如果original model是您文中提出的PFN模型的话,这是否说明PFN的编码方式损害了NER的性能。因为Sequential方式是只将entity信息送给Relation model而不将relation信息送入entity model,而Sequential的NER结果远好于original model。甚至Parallel 也是略好于original的。
但是您文中的核心论点是与以前的related work结论相反, 您证明了re是对ner有利的,这也是最吸引我的一个观点。
所以我想问一下,这个Extension on Ablation Study的实验结果和您的结论是否矛盾?希望能够得到您的解答。
micro and macro, what is the differences?
您好,我应该用的是30系列的显卡,然后按照这个requirements的环境要求安装了相关的环境,然而,当我运行
python main.py
--data CONLL04
--do_train
--do_eval
--embed_mode albert
--batch_size 10
--lr 0.00002
--output_file conll04
--eval_metric micro
--clip 1.0
--epoch 200
这个命令的时候,给我报错如下:
11/22/2021 14:59:58 - INFO - main - ['main.py', '--data', 'CONLL04', '--do_train', '--do_eval', '--embed_mode', 'albert', '--batch_size', '10', '--lr', '0.00002', '--output_file', 'conll04', '--eval_metric', 'micro', '--clip', '1.0', '--epoch', '200']
11/22/2021 14:59:58 - INFO - main - Namespace(batch_size=10, clip=1.0, data='CONLL04', do_eval=True, do_train=True, dropconnect=0.1, dropout=0.1, embed_mode='albert', epoch=200, eval_batch_size=10, eval_metric='micro', hidden_size=300, linear_warmup_rate=0.0, lr=2e-05, max_seq_len=128, output_file='conll04', seed=0, steps=50, weight_decay=0)
11/22/2021 15:00:19 - INFO - main - ------Training------
Some weights of the model checkpoint at albert-xxlarge-v1 were not used when initializing AlbertModel: ['predictions.decoder.bias', 'predictions.bias', 'predictions.LayerNorm.weight', 'predictions.dense.bias', 'predictions.decoder.weight', 'predictions.LayerNorm.bias', 'predictions.dense.weight']
This IS expected if you are initializing AlbertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing AlbertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
/home/ps/anaconda3/envs/lwc/lib/python3.7/site-packages/torch/cuda/init.py:106: UserWarning:
NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
/home/ps/anaconda3/envs/lwc/lib/python3.7/site-packages/torch/cuda/init.py:106: UserWarning:
NVIDIA GeForce RTX 3080 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3080 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
0%| | 0/93 [00:00<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 196, in
ner_pred, re_pred = model(text, mask)
File "/home/ps/anaconda3/envs/lwc/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ps/lwc/PFN/model/pfn.py", line 260, in forward
x = self.bert(**x)[0]
File "/home/ps/anaconda3/envs/lwc/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ps/anaconda3/envs/lwc/lib/python3.7/site-packages/transformers/models/albert/modeling_albert.py", line 715, in forward
extended_attention_mask = extended_attention_mask.to(dtype=self.dtype) # fp16 compatibility
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
不是很明白这个错误。pytorch的社区解释说,是pytorch的版本不对而引起的,但是没有找到合理的解决方案,请您帮助,谢谢。
你好, 我在阅读论文的时候,论文提及会使用公式6将memory做一个线性变换然后得到feature,但是我在阅读代码的时候发现好像没有实现公式6,而是直接将memory作为了最终的feature。请问你们是不是在哪里实现了同样的等价操作呢?谢谢!
大佬您好,请教个问题:
我看实体特征里面,实体开始结束特征为什么这么定义,能说说您的想法吗?
为什么repeat(1, length, 1, 1)就表示开始特征,repeat(length, 1, 1, 1)表示结尾特征,有什么含义在里面吗?
`st = h_ner.unsqueeze(1).repeat(1, length, 1, 1)
en = h_ner.unsqueeze(0).repeat(length, 1, 1, 1)
ner = torch.cat((st, en, h_global), dim=-1)
`
以及,关系抽取的时候,关系特征,为什么会分为r1和r2两个子特征,是表示头实体对于关系r的特征和尾实体对于关系r的特征吗?
`r1 = h_re.unsqueeze(1).repeat(1, length, 1, 1)
r2 = h_re.unsqueeze(0).repeat(length, 1, 1, 1)
re = torch.cat((r1, r2, h_global), dim=-1)
`
多谢大佬指教!
如题
I have a dataset that is a translated version of SEMEVAL. I tried to change your code to evaluate your model on my dataset but I didn't get good results. so I want to know why you didn't report your model's performance on SEMEVAL? you didn't try at all or for some reason it didn't work?
如题,想知道怎么改代码和数据完成消融实验的encoding scheme呀,对比sequential encoding和parellel encoding两种方式。谢谢
Hi,
I have tried Evaluation on Pre-trained Model for NYT and WEBNLG, but the system shows an error about the ner2idx.json file. The files are there but have no content. I have also tried to generate it using dataloder.py, but it shows the following error ModuleNotFoundError: No module named 'utils'.
Can you please fix them or suggest to me an alternate solution?
Thanks for sharing such a nice repo.
I appreciate the work you shared, but I'm having some problems
I have many samples to predict, what should I do with them?
python inference.py
--model_file ${the path of your saved model}
--sent ${sentence you want to evaluate, str type restricted}
This approach seems to load the model once to process only one sample, which is very slow. Is there any way to process all samples after loading the model once? Do you have any suggestions?
Thanks.
请教一下,您论文中关于测试OOT实体时,训练集,验证集也有分割成OOT实体的数据集吗?然后模型是基于OOT的训练集重新训练一遍,测试OOT实体,还是拿原训练集(包含oot数据和in-triple数据)训练好的模型,来预测oot的测试集?
您好,我将您的模型用于实验室的生物文献数据集上,我的格式是是按照casrel的格式处理的,但是发现您在处理实体的时候默认所有的实体都是一个单词,而我的数据集的实体大部分都是多单词,会在预处理时就报错。
您将许多的多单词实体的最后一个单词作为实体是处于什么样的考虑呢?
还有对于这种存在多单词的实体数据集,您推荐使用哪种启动参数配置才能处理呢?
{ "text": "HES1 as an independent prognostic marker in esophageal squamous cell carcinoma .", "triple_list": [ [ "HES1", "/Gene/Cancer/prognostic_factor_orMarkers", "esophageal squamous cell carcinoma" ] ] }
您好:
膜拜您的设计和实现!!!有一个小问题想请教一下:
您的介绍中提到:PFN-nested is an enhanced version of PFN. It is better in leveraging entity tail information and capable of handling nested triple prediction. 在PFN-nested网络结构中(PFN.py),有这样的代码:
re_head_score = self.re_head(h_re, h_share, mask)
re_tail_score = self.re_tail(h_share, h_re, mask)
分别是利用实体head和tail的信息进行关系抽取对吧?在这里self.re_head和self.re_tail都是re_unit结构,仅仅将这里的h_share, h_re换一下位置,是如何利用的tail信息的呢?self.re_tail(h_share, h_re, mask)利用的是h_share中的信息计算的r1和r2,如何体现的tail信息呢?
I run the code using bert-base on the dataset Conll04, and got F1-scores approximately 66. I find the f1 is much lower than using albert-large. I wonder whether the comparison between this model using albert-large and the previous work using bert-base is really reasonable?
大佬您好,感谢您提供您工作的开源代码,您的工作对我十分有意义!想询问下关于论文结果复现的问题:在您提供的代码中按照您提供的参数训练测试了webnlg数据集,但结果总达不到您链接中提供的结果,一共进行了三次实验,但最终结果都不理想,想询问下我的参数设置是否存在问题,万分感谢!以下是我的训练log
WEBNLG_baseline_true.log
WEBNLG_baseline_true.txt
我按照您对数据的处理(nytAndWebnlg),应用在其他数据集上:
我猜测您dataloader之中
` subj = entity[subj_idx]
obj = entity[obj_idx]
rc_head_labels+=[subj['start'], obj['start'], re['type']]
rc_tail_labels+=[subj['end']-1, obj['end']-1, re['type']]
的含义是实体变成
[1, 1, 'None', 16, 20, 'None'],两个数字是实体的单词起始和结束下标,None是类型 关系是:
rc_head_labels = [1, 1, '/location/location/contains],rc_tail_labels= [16, 20, '/location/location/contains]`,即头实体和尾实体的单词下标对和关系类型。
我想请教的是:
Traceback (most recent call last):
File "B:\work\pycharm\PYCHARM\PyCharm Community Edition 2020.3.4\plugins\python-ce\helpers\pydev\pydevd.py", line 1483, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "B:\work\pycharm\PYCHARM\PyCharm Community Edition 2020.3.4\plugins\python-ce\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "B:/model/PFN-nested/main.py", line 197, in
ner_pred, re_head_pred, re_tail_pred = model(text, mask)
File "B:\work\anaconda\envs\pfn\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "B:\model\PFN-nested\model\pfn.py", line 260, in forward
x = self.bert(**x)[0]
File "B:\work\anaconda\envs\pfn\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "B:\work\anaconda\envs\pfn\lib\site-packages\transformers\models\bert\modeling_bert.py", line 989, in forward
past_key_values_length=past_key_values_length,
File "B:\work\anaconda\envs\pfn\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "B:\work\anaconda\envs\pfn\lib\site-packages\transformers\models\bert\modeling_bert.py", line 221, in forward
embeddings += position_embeddings
RuntimeError: The size of tensor a (588) must match the size of tensor b (512) at non-singleton dimension 1
Process finished with exit code 1
作者你好!
您论文中的实体抽取和关系抽取的F1的分数取的是联合分数还是单项最高分数?您实验过程中有存在单项最高的分数可能不在同一个epoch里的情况吗?谢谢!
你好,我最近想用这个方法去结合半监督学习来实现实体关系抽取,我将数据集拆分成有标签和无标签的,但是在训练时,一直会报一些错误。比如:RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.这是我仅运行有标签训练部分
代码时发生的,运行整个框架时也会发生。但是我有标签训练部分代码没修改几乎是源码,为什么会发生这种问题?下面是部分代码:
if args.do_train:
logger.info("------Training------")
if args.embed_mode == "albert":
input_size = 4096
else:
input_size = 768
model = PFN(args, input_size, ner2idx, rel2idx)
model.to(device)
optimizer = optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.weight_decay)
if args.eval_metric == "micro":
metric = micro(rel2idx, ner2idx)
else:
metric = macro(rel2idx, ner2idx)
BCEloss = loss()
best_result = 0
triple_best = None
entity_best = None
for epoch in range(args.epoch):
steps, train_loss, loss_unlabeled, loss_labeled = 0, 0, 0, 0
file_num = 1
model.train()
for labeled_data in tqdm(labeled_batch):
steps += 1
optimizer.zero_grad()
# 有标签数据
text = labeled_data[0]
ner_label = labeled_data[1].to(device)
re_label = labeled_data[2].to(device)
mask = labeled_data[-1].to(device)
ner_pred, re_pred = model(text, mask)
labeled_loss = BCEloss(ner_pred, ner_label, re_pred, re_label)
labeled_loss.backward()
train_loss += labeled_loss.item()
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=args.clip)
optimizer.step()
if steps % args.steps == 0:
logger.info("Epoch: {}, step: {} / {}, train_loss = {:.4f}".format
(epoch, steps, len(labeled_batch), (train_loss) / steps))
logger.info("------ Training Set Results ------")
logger.info("loss : {:.4f}".format((train_loss) / steps))
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.