king-yyf / cmekg_tools Goto Github PK

View Code? Open in Web Editor NEW

979.0 979.0 364.0 6.37 MB

License: MIT License

Python 100.00%

cmekg_tools's Introduction

👋 Hi, I’m @king-yyf
👀 I’m interested in programming,
🌱 I’m currently learning CP
💞️ I’m looking to collaborate on ...
📫 How to reach me ...

cmekg_tools's People

Contributors

Stargazers

Watchers

Forkers

kanandian yepgang lincolnfan christiaaaan 1egend cwlseu yuzhang112 jsusu dongdongdong04 bingzhen sech-io cancangit angus9077 tengben0905 chenmosha johnnywang92 vincentwei2021 ltyunique fengrk tianyudizhua zgdkik codeofrina benzite mhkmars liangsuoliver wengbenjue mayi140611 jia0511 uncarman2017 54huige jerrylxx gaoyb923 sixlife dimwalker jichengyuan leavingangle muguizi waterbroz seabeauty qijunl chenzepei anrerbo up2hcs lzyccc tututou lightyear416 lalalashenle sunny635 tangwest ronnie88597 gshan4056 noelcarlton xiexie1993 lwpnnx lemon5269 xbutterflyx laremn lbeing bluep0int chenjl121 cherish-zyq zhulongpeng0129 yccckid cztgit yuconggen d68321 harzva rfvqwas liaozhihui qiuchenpro sixawn pidada yuanxw0828 xutianhan zhaohengmaster existencein yangyang8599 jeromecn unusaulwu nn-123 luhggit m-gao shuifuture shadow-linux ljt1469 liul21cn saga518 maowase little2000 ume-technology godflyfly lpffernando williamgjn zhaochangyou forestsha newcorder xuanchenguang 976339067 magicpwn chenzhih03

cmekg_tools's Issues

NER任务测试集

hi，我们在测试ner任务的时候没有测试集，可以发一份ner任务的测试集出来做测试吗，非常感谢

从百度云下载了模型文件，更新 medical_cws.py 对应的模型路径后，运行 medical_cws.py 报错了，怎么解决？以下是日志
(base) ubuntu@ubuntu-test3:~/knowledgegraph/CMeKG_tools/CMeKG_tools-main$ python medical_cws.py
Some weights of the model checkpoint at /home/ubuntu/knowledgegraph/CMeKG_tools/CMeKG_tools-main/models/medical_cws were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias']

This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Traceback (most recent call last):
File "medical_cws.py", line 157, in
res = meg.predict_sentence("肾上腺由皮质和髓质两个功能不同的内分泌器官组成，皮质分泌肾上腺皮质激素，髓质分泌儿茶酚胺激素。")
File "medical_cws.py", line 105, in predict_sentence
self.model.load_state_dict(torch.load(self.NEWPATH,map_location=self.device))
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 847, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for BERT_LSTM_CRF:
Missing key(s) in state_dict: "word_embeds.embeddings.position_ids".

如何训练自己的数据集？

作者您好，对于medical_re.py目前我们是加载您训练好的模型来进行train_example.json数据测试，如果我们使用自己的数据集，那么又该如何训练自己的模型呢？可以讲一下如何训练自己的模型流程成嘛？非常感谢，期待您的回复。

BUG：是否存在代码问题

你好，我仔细看了一下您的代码，关于re有两点想讨论一下：
1、extract_spoes()函数中，L280-L291，我清晰你希望完成的是当同一输入文本中有多个主语定位词时遍历每一组，并在model4po模型中作为mask，与hidden_state进行叠加，希望在提取宾语与实体关系词时仅关注该主语起始位置，这样就免除了依存分析的内容。但是这一部分遍历只会取到第一组。只是因为在get_triples中用“。”切割，通常情况下一句只有一个主语，因此看起来表现是对的。
2、同上所述，在model4po模型定义时，看起来将s直接填充进了所有有效token对应的位置，all_s[b, :cue_len, :] = s，无法起到长文本的mask作用，这一步骤添加对第二段po提取的训练是无意义的。

ner的长度怎么改，看到限制的是512

about the the version of "transformers" package

Hello,

Could you please provide the specific version of transformers?

兄弟ner数据集能给个demo吗

球球了

您好，可以有偿付费咨询一下吗，按小时计费

我这边加载模型这块不太明白，同时我想用您的代码，训练自己的非医学的数据，是否可以呢？可否有偿指导一下

TypeError: dropout(): argument 'input' (position 1) must be Tensor, not str

请问这是transformer版本库导致的问题吗

train_data.json

Excuse me, train_data.json file mentioned in medical_re.py file from where to get?

网站打不开

无论是直接打开还是挂梯子还是用流量打开都不行。。

下载的NER模型在读取时报错疑似缺少某些参数想请教如何解决

您好，我遇到了这样的报错：

Traceback (most recent call last):
  File "medical_ner.py", line 184, in <module>
    res = my_pred.predict_sentence(sentence)
  File "medical_ner.py", line 103, in predict_sentence
    self.model.load_state_dict(torch.load(self.NEWPATH, map_location=device))
  File "/home/amax/.conda/envs/torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for BERT_LSTM_CRF:
        Missing key(s) in state_dict: "word_embeds.embeddings.position_ids".

我做的操作是这样的：

git clone这个仓库
配置环境和依赖库
下载NER模型（链接:https://pan.baidu.com/s/16TPSMtHean3u9dJSXF9mTw ）后解压压缩包
修改medical_ner.py中的这几行中的路径，使之指向我服务器上正确的路径：

       self.NEWPATH = '/Users/yangyf/workplace/model/medical_ner/model.pkl'
        self.vocab = load_vocab('/Users/yangyf/workplace/model/medical_ner/vocab.txt')
        self.vocab_reverse = {v: k for k, v in self.vocab.items()}

        self.model = BERT_LSTM_CRF('/Users/yangyf/workplace/model/medical_ner', tagset_size, 768, 200, 2,
                              dropout_ratio=0.5, dropout1=0.5, use_cuda=use_cuda)

跑medical_ner.py
最后就出现上述那个错误

我检查了一下，当前这个模型【需要】以下这些参数：

word_embeds.embeddings.position_ids      torch.Size([1, 512])
word_embeds.embeddings.word_embeddings.weight    torch.Size([21128, 768])
word_embeds.embeddings.position_embeddings.weight        torch.Size([512, 768])
word_embeds.embeddings.token_type_embeddings.weight      torch.Size([2, 768])
word_embeds.embeddings.LayerNorm.weight          torch.Size([768])
word_embeds.embeddings.LayerNorm.bias    torch.Size([768])
word_embeds.encoder.layer.0.attention.self.query.weight          torch.Size([768, 768])
word_embeds.encoder.layer.0.attention.self.query.bias    torch.Size([768])
word_embeds.encoder.layer.0.attention.self.key.weight    torch.Size([768, 768])
word_embeds.encoder.layer.0.attention.self.key.bias      torch.Size([768])
word_embeds.encoder.layer.0.attention.self.value.weight          torch.Size([768, 768])
word_embeds.encoder.layer.0.attention.self.value.bias    torch.Size([768])
word_embeds.encoder.layer.0.attention.output.dense.weight        torch.Size([768, 768])
...省略...

我猜测load进来的checkpoint中（也就是model.pkl中），可能没有word_embeds.embeddings.position_ids这项。劳烦您能否拨冗查看一下，是我的执行步骤有误？还是训练好的模型checkpoint有问题？谢谢！