visualjoyce / chengyubert Goto Github PK
View Code? Open in Web Editor NEW[COLING 2020] BERT-based Models for Chengyu
License: MIT License
[COLING 2020] BERT-based Models for Chengyu
License: MIT License
I have trained 19552 steps, and only have outputs of log.
There are no ckpts.
Is that right?
file:///home/chen/mydisk/2021-01-05%2012-53-53%E5%B1%8F%E5%B9%95%E6%88%AA%E5%9B%BE.png
couldn't find 'idioms_pretrain.json' in ChID-Dataset, besides .csv files in competition directory
Thanks for your work!
I can not find this file train-embeddings-base-1gpu.json mentioned in ReadMe.md, but found bert-wwm-ext_literature file. Does the bert-wwm-ext_literature file replace the former file?
Thanks a lot!
For some special reasons, I can't use docker and horovod.
Can I remove them?
Hi, how can I use this huggingface pretrained model to produce chengyu embeddings? https://huggingface.co/visualjoyce/chengyubert_2stage_stage1_wwm_ext ,
since chinese-BERT-wwm only produces token based embedding.
Traceback (most recent call last): | 0/24822 [00:00<?, ?it/s]
File "train_official.py", line 470, in <module>
main(args)
File "train_official.py", line 317, in main
best_ckpt = train(model, dataloaders, opts)
File "train_official.py", line 145, in train
opts, global_step)
File "train_official.py", line 235, in evaluation
log.update(validate(opts, model, loader, split, global_step))
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "train_official.py", line 177, in validate
loss = F.cross_entropy(logits, targets, reduction='sum')
File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 2422, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 2228, in nll_loss
out_size, target.size()))
ValueError: Expected target size (72, 1), got torch.Size([72])
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.0, please update your driver to a newer version, or use an earlier cuda container: unknown. docker: Error response from daemon: requirement error: unsatisfied condition: cuda>=11.0, please update your driver to a newer version, or use an earlier cuda container: unknown.
I'm sorry to bother you again.
I wanna know whether the codes of paper ( ' A BERT-based two-stage model for Chinese Chengyu recommendation ' about two-stage) are only using ' train_pretrain.py ' and ' train_official.py '?
What's the difference between the stage-1-pretain and using 'train_pretrain. py'?
What's more, What's the difference among w/o Pre-Training 、w/o Fine-Tuning 、 w/o 𝐿V and w/o 𝐿A. (I don't quite understand what you're showing in your paper.)
Could you describe more details? Thanks very much.
你好,我用你放出来的2stage_stage1_wwm_ext训练了第二阶段的official ‘chengyubert-2stage-stage2’,结果只有77。按照您_A BERT-based Two-Stage Model for Chinese Chengyu Recommendation_Table4的结果,应该是85.43吧。请问为什么会差这么多呢,是我哪里没有注意到吗?
I used the parameters showed on your paper.
pre-trained BERT:Chinese with Whole Word Masking (WWM)
the maximum length:128
batch size:40 (4X10 GPU cards)
initial learning rate: 0.00005
warm-up steps:1000
optimizer:AdamW
scheduler:WarmupLinearSchedule
epoch:5 (num_train_steps about 80800)
Because of my device (1 * GTX2080Ti), I set train_batch_size = 6000, num_train_steps about 80800. The epoch of the experiment is just 5. The batch size is just 40.
But I can not approach your accuracy, the following picture shows my experiment accuracy.
That's a difference of nearly 3~6 %.
That's my trainning config json:
{ "train_txt_db": "official_train.db", "val_txt_db": "official_dev.db", "test_txt_db": "official_test.db", "out_txt_db": "official_out.db", "sim_txt_db": "official_sim.db", "ran_txt_db": "official_ran.db", "pretrained_model_name_or_path": "hfl/chinese-bert-wwm-ext", "model": "chengyubert-dual", "dataset_cls": "chengyu-masked", "eval_dataset_cls": "chengyu-masked-eval", "output_dir": "storage", "candidates": "combined", "len_idiom_vocab": 3848, "max_txt_len": 128, "train_batch_size": 6000, "val_batch_size": 20000, "gradient_accumulation_steps": 1, "learning_rate": 0.00005, "valid_steps": 100, "num_train_steps": 80800, "optim": "adamw", "betas": [ 0.9, 0.98 ], "adam_epsilon": 1e-08, "dropout": 0.1, "weight_decay": 0.01, "grad_norm": 1.0, "warmup_steps": 1000, "seed": 77, "fp16": true, "n_workers": 0, "pin_mem": true, "location_only": false }
What's wrong with the parameters?
请问下competition_train.db是做什么的呢?
我在熟读您的代码的时候,有几个疑问:
1、Preprocessing中:
这些official_*.db是干嘛的?可以替换吗?
└── txt_db
├── hfl
│ └── chinese-bert-wwm-ext
│ ├── external_pretrain.db
│ ├── official_dev.db
│ ├── official_out.db
│ ├── official_ran.db
│ ├── official_sim.db
│ ├── official_test.db
│ └── official_train.db
└── visualjoyce
└── chengyubert_2stage_stage1_wwm_ext -> ../hfl/chinese-bert-wwm-ext
这些db文件没有下载路径,麻烦解答下哈,感谢
Hello! I want to load the model in https://huggingface.co/visualjoyce/chengyubert_2stage_stage1_wwm_ext/tree/main
However, the config says that the len_idiom_vocab is 33237, the vocab.txt supported in the link isn't the idioms' vocab and the size of the vocab.txt isn't 33237. I find in the Google Drive you in README, and find a file "idioms_pretrain.json", but the size of this file is 33238. So can you tell me, to load the model in the huggingface, what vocab should I use?
Sry, I have one more question about the codes.
What's the difference or purpose among the following model classes:
@register_model('chengyubert-2stage-stage2-mask')
@register_model('chengyubert-2stage-stage2-cls')
@register_model('chengyubert-2stage-stage2-window')
@register_model('chengyubert-2stage-stage2-mask-window')
Hi, may I ask what those "scope", "num" columns stand for?
idiom num explanation
偃武崇文 0 停息武备,崇尚文教。
洪乔捎书 0 指言而无信的人。
南郭先生 103 比喻无才而占据其位的人。
scope idiom id
Scope I 见义勇为 0
Scope II 偃武崇文 3848
Scope III 亏于一篑 33237
query synonym query_id synonym_id overlapping
黯然销魂 六神无主 14726 1333 0
黯然销魂 丧魂失魄 14726 2704 1
塞翁失马,焉知非福 塞翁失马,安知非福 24524 32175 8
I thought "overlapping" is related with the number of Chinese character overlapped, but the last one shows 8, which is presumably 7.
Thanks!
Hi, where can i find the dataset for embedding evaluation~
thx!
In your structure, the files above are all included, however, no matter in google drive or huggingface, i cant find them.
请问在modeling_bert.py,ChengyuBertForClozeChid类的前向传播中,pooled_output = self.dropout(multiply_result)
,这个地方为什么要对乘积之后的结果进行dropout呢,能谈谈您的想法吗?
`
[1,0]:
[1,0]:Traceback (most recent call last):
[1,0]: File "train_official.py", line 468, in
[1,0]: main(args)
[1,0]: File "train_official.py", line 317, in main
[1,0]: best_ckpt = train(model, dataloaders, opts)
[1,0]: File "train_official.py", line 145, in train
[1,0]: opts, global_step)
[1,0]: File "train_official.py", line 235, in evaluation
[1,0]: log.update(validate(opts, model, loader, split, global_step))
[1,0]: File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
[1,0]: return func(*args, **kwargs)
[1,0]: File "train_official.py", line 176, in validate
[1,0]: logits, over_logits, cond_logits = model(**batch, targets=None, compute_loss=False)
[1,0]:ValueError: not enough values to unpack (expected 3, got 2)
`
Recently I want to use the prediction layer weight in chengyuBERT as the initial idiom embedding in my work, however im struggling to it. Can you give an instruction? Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.