pfn's Issues

OOM for my own bigger datasets

When I finished the model training and began to test, the OOM occurred, since the model does not optimize with multi-gpu, did you have this problem before?

Small inconsistency, or not?


re_tail_score = self.re_tail(h_share, h_re, mask)

we see

re_tail_score = self.re_tail(h_share, h_re, mask)

i think it should be

re_tail_score = self.re_tail(h_re, h_share, mask)

just from gradient flow considerations we actually have two almost identical modules, but with inputs being swapped h_re and h_share have gradients from upper layers for semantically different tasks/losses. Besides of that, corrected variant learns slightly better according to my experiments.

关于results of ablation study


  1. 文章中说到在SciERC上进行了ablation study实验,但是不同ablation的结果都很相近,请问下您的结果是单轮的训练结果吗?我运行了您的代码,发现SciERC上的结果非常不稳定,在不同random seed上结果变化较大,您是否能够给出显著性的值呢?或者在其他数据集上ablation study的结果和SciERC数据集上结论相同吗?
  2. ablation study实验结果中只显示了relation extraction的结果,请问不同方式对entity extraction结果有怎样的影响呢?是否有导致relation extraction结果高但是entity extraction结果低的情况呢?
  3. 您使用了partition操作但是并没有验证这个操作的效果,请问在使用filter操作的情况下,使用partition操作和其他方式的实验结果区别是怎样的呢?

When training model, is it necessary to set args.do_eval=True?

if the parameter of args.do_eval is false. the entity_best and triple_best in method will be none.
I found that do_eval parameter in the training command-line that you listed is none, so the default do_eval parameter will be False."best test result ner-p: {:.4f} \t ner-r: {:.4f} \t ner-f: {:.4f} \t re-p: {:.4f} \t re-r: {:.4f} \t re-f: {:.4f} ".format(entity_best["p"],
                        entity_best["r"], entity_best["f"], triple_best["p"], triple_best["r"], triple_best["f"]))


关于Extension on Ablation Study

您好,感谢您额外展示出encoding scheme相关的NER结果。
基于您展示的结果,我观察到在NER的结果上Sequential >>Parallel > original。
如果original model是您文中提出的PFN模型的话,这是否说明PFN的编码方式损害了NER的性能。因为Sequential方式是只将entity信息送给Relation model而不将relation信息送入entity model,而Sequential的NER结果远好于original model。甚至Parallel 也是略好于original的。
但是您文中的核心论点是与以前的related work结论相反, 您证明了re是对ner有利的,这也是最吸引我的一个观点。
所以我想问一下,这个Extension on Ablation Study的实验结果和您的结论是否矛盾?希望能够得到您的解答。


--data CONLL04
--embed_mode albert
--batch_size 10
--lr 0.00002
--output_file conll04
--eval_metric micro
--clip 1.0
--epoch 200

11/22/2021 14:59:58 - INFO - main - ['', '--data', 'CONLL04', '--do_train', '--do_eval', '--embed_mode', 'albert', '--batch_size', '10', '--lr', '0.00002', '--output_file', 'conll04', '--eval_metric', 'micro', '--clip', '1.0', '--epoch', '200']
11/22/2021 14:59:58 - INFO - main - Namespace(batch_size=10, clip=1.0, data='CONLL04', do_eval=True, do_train=True, dropconnect=0.1, dropout=0.1, embed_mode='albert', epoch=200, eval_batch_size=10, eval_metric='micro', hidden_size=300, linear_warmup_rate=0.0, lr=2e-05, max_seq_len=128, output_file='conll04', seed=0, steps=50, weight_decay=0)
11/22/2021 15:00:19 - INFO - main - ------Training------
Some weights of the model checkpoint at albert-xxlarge-v1 were not used when initializing AlbertModel: ['predictions.decoder.bias', 'predictions.bias', 'predictions.LayerNorm.weight', 'predictions.dense.bias', 'predictions.decoder.weight', 'predictions.LayerNorm.bias', 'predictions.dense.weight']

  • This IS expected if you are initializing AlbertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).

  • This IS NOT expected if you are initializing AlbertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    /home/ps/anaconda3/envs/lwc/lib/python3.7/site-packages/torch/cuda/ UserWarning:
    NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
    The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
    If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at

    warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
    /home/ps/anaconda3/envs/lwc/lib/python3.7/site-packages/torch/cuda/ UserWarning:
    NVIDIA GeForce RTX 3080 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation.
    The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
    If you want to use the NVIDIA GeForce RTX 3080 Ti GPU with PyTorch, please check the instructions at

    warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
    0%| | 0/93 [00:00<?, ?it/s]
    Traceback (most recent call last):
    File "", line 196, in
    ner_pred, re_pred = model(text, mask)
    File "/home/ps/anaconda3/envs/lwc/lib/python3.7/site-packages/torch/nn/modules/", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
    File "/home/ps/lwc/PFN/model/", line 260, in forward
    x = self.bert(**x)[0]
    File "/home/ps/anaconda3/envs/lwc/lib/python3.7/site-packages/torch/nn/modules/", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
    File "/home/ps/anaconda3/envs/lwc/lib/python3.7/site-packages/transformers/models/albert/", line 715, in forward
    extended_attention_mask = # fp16 compatibility
    RuntimeError: CUDA error: no kernel image is available for execution on the device
    CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
    For debugging consider passing CUDA_LAUNCH_BLOCKING=1.



你好, 我在阅读论文的时候,论文提及会使用公式6将memory做一个线性变换然后得到feature,但是我在阅读代码的时候发现好像没有实现公式6,而是直接将memory作为了最终的feature。请问你们是不是在哪里实现了同样的等价操作呢?谢谢!


为什么repeat(1, length, 1, 1)就表示开始特征,repeat(length, 1, 1, 1)表示结尾特征,有什么含义在里面吗?
`st = h_ner.unsqueeze(1).repeat(1, length, 1, 1)

en = h_ner.unsqueeze(0).repeat(length, 1, 1, 1)

ner =, en, h_global), dim=-1)

`r1 = h_re.unsqueeze(1).repeat(1, length, 1, 1)

r2 = h_re.unsqueeze(0).repeat(length, 1, 1, 1)

re =, r2, h_global), dim=-1)

SEMEVAL dataset

I have a dataset that is a translated version of SEMEVAL. I tried to change your code to evaluate your model on my dataset but I didn't get good results. so I want to know why you didn't report your model's performance on SEMEVAL? you didn't try at all or for some reason it didn't work?

FileNotFoundError: [Errno 2] No such file or directory: 'data/data/NYT/ner2idx.json'

I have tried Evaluation on Pre-trained Model for NYT and WEBNLG, but the system shows an error about the ner2idx.json file. The files are there but have no content. I have also tried to generate it using, but it shows the following error ModuleNotFoundError: No module named 'utils'.
Can you please fix them or suggest to me an alternate solution?
Thanks for sharing such a nice repo.

evaluate the model with customized input

I appreciate the work you shared, but I'm having some problems

I have many samples to predict, what should I do with them?
--model_file ${the path of your saved model}
--sent ${sentence you want to evaluate, str type restricted}
This approach seems to load the model once to process only one sample, which is very slow. Is there any way to process all samples after loading the model once? Do you have any suggestions?






{ "text": "HES1 as an independent prognostic marker in esophageal squamous cell carcinoma .", "triple_list": [ [ "HES1", "/Gene/Cancer/prognostic_factor_orMarkers", "esophageal squamous cell carcinoma" ] ] }

为什么PFN-nested model在关系训练时候可以利用实体tail信息呢?

您的介绍中提到:PFN-nested is an enhanced version of PFN. It is better in leveraging entity tail information and capable of handling nested triple prediction. 在PFN-nested网络结构中(,有这样的代码:
re_head_score = self.re_head(h_re, h_share, mask)
re_tail_score = self.re_tail(h_share, h_re, mask)
分别是利用实体head和tail的信息进行关系抽取对吧?在这里self.re_head和self.re_tail都是re_unit结构,仅仅将这里的h_share, h_re换一下位置,是如何利用的tail信息的呢?self.re_tail(h_share, h_re, mask)利用的是h_share中的信息计算的r1和r2,如何体现的tail信息呢?




` subj = entity[subj_idx]

        obj = entity[obj_idx]

        rc_head_labels+=[subj['start'], obj['start'], re['type']]

        rc_tail_labels+=[subj['end']-1, obj['end']-1, re['type']]

的含义是实体变成[1, 1, 'None', 16, 20, 'None'],两个数字是实体的单词起始和结束下标,None是类型 关系是:rc_head_labels = [1, 1, '/location/location/contains],rc_tail_labels= [16, 20, '/location/location/contains]`,即头实体和尾实体的单词下标对和关系类型。


  1. 我对于数据的处理是否理解正确?
  2. 实体的类型为None或者实际的类型有什么区别吗?
  3. 在训练中,进行forward()的时候,出现了维度不匹配的问题,这是什么原因呢?应当如何解决?

Traceback (most recent call last):
File "B:\work\pycharm\PYCHARM\PyCharm Community Edition 2020.3.4\plugins\python-ce\helpers\pydev\", line 1483, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "B:\work\pycharm\PYCHARM\PyCharm Community Edition 2020.3.4\plugins\python-ce\helpers\", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "B:/model/PFN-nested/", line 197, in
ner_pred, re_head_pred, re_tail_pred = model(text, mask)
File "B:\work\anaconda\envs\pfn\lib\site-packages\torch\nn\modules\", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "B:\model\PFN-nested\model\", line 260, in forward
x = self.bert(**x)[0]
File "B:\work\anaconda\envs\pfn\lib\site-packages\torch\nn\modules\", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "B:\work\anaconda\envs\pfn\lib\site-packages\transformers\models\bert\", line 989, in forward
File "B:\work\anaconda\envs\pfn\lib\site-packages\torch\nn\modules\", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "B:\work\anaconda\envs\pfn\lib\site-packages\transformers\models\bert\", line 221, in forward
embeddings += position_embeddings
RuntimeError: The size of tensor a (588) must match the size of tensor b (512) at non-singleton dimension 1

Process finished with exit code 1



RuntimeError: CUDA error: device-side assert triggered

你好,我最近想用这个方法去结合半监督学习来实现实体关系抽取,我将数据集拆分成有标签和无标签的,但是在训练时,一直会报一些错误。比如:RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.这是我仅运行有标签训练部分
if args.do_train:"------Training------")
if args.embed_mode == "albert":
input_size = 4096
input_size = 768

    model = PFN(args, input_size, ner2idx, rel2idx)

    optimizer = optim.Adam(model.parameters(),, weight_decay=args.weight_decay)

    if args.eval_metric == "micro":
        metric = micro(rel2idx, ner2idx)
        metric = macro(rel2idx, ner2idx)

    BCEloss = loss()
    best_result = 0
    triple_best = None
    entity_best = None

    for epoch in range(args.epoch):
        steps, train_loss, loss_unlabeled, loss_labeled = 0, 0, 0, 0
        file_num = 1
        for labeled_data in tqdm(labeled_batch):
            steps += 1
            # 有标签数据
            text = labeled_data[0]
            ner_label = labeled_data[1].to(device)
            re_label = labeled_data[2].to(device)
            mask = labeled_data[-1].to(device)

            ner_pred, re_pred = model(text, mask)
            labeled_loss = BCEloss(ner_pred, ner_label, re_pred, re_label)
            train_loss += labeled_loss.item()
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=args.clip)

            if steps % args.steps == 0:
      "Epoch: {}, step: {} / {}, train_loss = {:.4f}".format
                            (epoch, steps, len(labeled_batch), (train_loss) / steps))"------ Training Set Results ------")"loss : {:.4f}".format((train_loss) / steps))



