NLP算法工程师一枚🤓 多实践、多交流、多思考
zhoujx4 / duee Goto Github PK
View Code? Open in Web Editor NEW百度2021年语言与智能技术竞赛多形态信息抽取赛道事件抽取部分torch版baseline
百度2021年语言与智能技术竞赛多形态信息抽取赛道事件抽取部分torch版baseline
您好,这部分需要运行的文件应该是duee_1_postprocess.py这个文件吧?您在readme里面写的是duee_1_data_prepare.py,这里是不是写的有些问题?
问下数据处理的时候调用的from data.data_utils import schema_process, data_process
data这个文件夹不存在,能发下么
hugging face网站跑代码403,没有找到您的模型文件
有的预测结果event_list是空的,提交上去计算出来的f1值都是0,请问这两者之间有关系吗,还是说提交文件的格式有错误
运行:
CUDA_VISIBLE_DEVICES=2 python predict_cls.py --dataset=DuEE-Fin --event_type=enum -- max_len=256 --per_gpu_eval_batch_size=32 --model_name_or_path=/home/user/pretrained-model/chinese-roberta-wwm-ext-large --fine_tunning_model_path=./output/DuEE-Fin/enum/best_model.pkl --test_json=./data/DuEE-Fin/sentence/test.json
报错:
predict_cls.py: error: the following arguments are required: --fine_tunning_model_path, --test_json
请问您复现的时候对于重叠标签是怎么处理的呢?是舍弃吗?
您好!我在复现时碰到以下问题:
File "run_ner.py", line 179, in
main()
File "run_ner.py", line 161, in main
eval_p, eval_r, eval_f1, eval_loss = evaluate(args, eval_iter, model, metric)
File "run_ner.py", line 46, in evaluate
n_infer, n_label, n_correct = metric.compute(batch["all_seq_lens"], preds, batch['all_labels'])
File "/home//DuEE/metric/metric.py", line 74, in compute
] for sent_index in range(len(lengths))]
File "/home//DuEE/metric/metric.py", line 74, in
] for sent_index in range(len(lengths))]
File "/home/***/DuEE/metric/metric.py", line 73, in
for index in labels[sent_index][:lengths[sent_index]]
KeyError: -1
打印出来label
[[-1 26 26 ... -1 -1 -1]
[-1 26 26 ... -1 -1 -1]
[-1 26 26 ... 26 26 26]
...
[-1 26 26 ... -1 -1 -1]
[-1 26 26 ... -1 -1 -1]
[-1 26 26 ... -1 -1 -1]]
对比一下,除了与paddle的ChunkEvaluator类中,相关的下标不同以外
好像没有其他区别。
请问有没有解决方法?
再问下DuEE-Fin预测时的predict_sequence_labeling.py文件是不是就是predict_ner.py文件?
请问验证集上触发词F1分数为80多,论元F1为60多是正常范围吗,因为比赛结束没法提交到官方看测试集总体效果?
您好,请问 doc["event_list"]为一个dict:有'trigger'、'event_type'、'arguments' ,对应的值是用什么方法识别的呢
你好~
按照你的步骤,源代码很轻松就跑下来了,比paddle版本结构更清晰,非常感谢~
后来我在这个版本的基础上,加lstm_crf层以后,状况就比较多了~提示内存不足,所以我就考虑改为多卡运行。因为我的服务器有2个GPU,多卡部分代码如下:
model = bert_lstm_crf(args.model_name_or_path, args.id2label,num_classes=args.num_classes,
rnn_hidden_size=args.rnn_hidden_size,rnn_layers=args.rnn_layers)
model.cuda()
net = nn.DataParallel(model)
model=net.module
# model.to(args.device)
依旧是1个GPU在跑,而且显存已经快爆了,但是利用率却不高。另外一个GPU一直不动。在你的代码里,有一个参数是do_distri_train,跟它有关系吗?
求帮忙,谢谢~~
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.