Giter VIP home page Giter VIP logo

chengyubert's People

Contributors

vimos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

chengyubert's Issues

How many steps does it save the model?

I have trained 19552 steps, and only have outputs of log.
There are no ckpts.
Is that right?

file:///home/chen/mydisk/2021-01-05%2012-53-53%E5%B1%8F%E5%B9%95%E6%88%AA%E5%9B%BE.png

embedding training config file

Thanks for your work!

I can not find this file train-embeddings-base-1gpu.json mentioned in ReadMe.md, but found bert-wwm-ext_literature file. Does the bert-wwm-ext_literature file replace the former file?

Thanks a lot!

Error when train Bert-chid

Traceback (most recent call last):                                                                                      | 0/24822 [00:00<?, ?it/s]
  File "train_official.py", line 470, in <module>
    main(args)
  File "train_official.py", line 317, in main
    best_ckpt = train(model, dataloaders, opts)
  File "train_official.py", line 145, in train
    opts, global_step)
  File "train_official.py", line 235, in evaluation
    log.update(validate(opts, model, loader, split, global_step))
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "train_official.py", line 177, in validate
    loss = F.cross_entropy(logits, targets, reduction='sum')
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 2422, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 2228, in nll_loss
    out_size, target.size()))
ValueError: Expected target size (72, 1), got torch.Size([72])

Problems met when trying the code

docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.0, please update your driver to a newer version, or use an earlier cuda container: unknown. docker: Error response from daemon: requirement error: unsatisfied condition: cuda>=11.0, please update your driver to a newer version, or use an earlier cuda container: unknown. 

About Two-stage

I'm sorry to bother you again.

I wanna know whether the codes of paper ( ' A BERT-based two-stage model for Chinese Chengyu recommendation ' about two-stage) are only using ' train_pretrain.py ' and ' train_official.py '?
What's the difference between the stage-1-pretain and using 'train_pretrain. py'?

What's more, What's the difference among w/o Pre-Training 、w/o Fine-Tuning 、 w/o 𝐿V and w/o 𝐿A. (I don't quite understand what you're showing in your paper.)

Could you describe more details? Thanks very much.

accuracy

你好,我用你放出来的2stage_stage1_wwm_ext训练了第二阶段的official ‘chengyubert-2stage-stage2’,结果只有77。按照您_A BERT-based Two-Stage Model for Chinese Chengyu Recommendation_Table4的结果,应该是85.43吧。请问为什么会差这么多呢,是我哪里没有注意到吗?

About parameters

I used the parameters showed on your paper.
image

pre-trained BERT:Chinese with Whole Word Masking (WWM)
the maximum length:128
batch size:40 (4X10 GPU cards)
initial learning rate: 0.00005
warm-up steps:1000
optimizer:AdamW
scheduler:WarmupLinearSchedule
epoch:5 (num_train_steps about 80800)

Because of my device (1 * GTX2080Ti), I set train_batch_size = 6000, num_train_steps about 80800. The epoch of the experiment is just 5. The batch size is just 40.

But I can not approach your accuracy, the following picture shows my experiment accuracy.
image
That's a difference of nearly 3~6 %.
image

That's my trainning config json:
{ "train_txt_db": "official_train.db", "val_txt_db": "official_dev.db", "test_txt_db": "official_test.db", "out_txt_db": "official_out.db", "sim_txt_db": "official_sim.db", "ran_txt_db": "official_ran.db", "pretrained_model_name_or_path": "hfl/chinese-bert-wwm-ext", "model": "chengyubert-dual", "dataset_cls": "chengyu-masked", "eval_dataset_cls": "chengyu-masked-eval", "output_dir": "storage", "candidates": "combined", "len_idiom_vocab": 3848, "max_txt_len": 128, "train_batch_size": 6000, "val_batch_size": 20000, "gradient_accumulation_steps": 1, "learning_rate": 0.00005, "valid_steps": 100, "num_train_steps": 80800, "optim": "adamw", "betas": [ 0.9, 0.98 ], "adam_epsilon": 1e-08, "dropout": 0.1, "weight_decay": 0.01, "grad_norm": 1.0, "warmup_steps": 1000, "seed": 77, "fp16": true, "n_workers": 0, "pin_mem": true, "location_only": false }

What's wrong with the parameters?

训练时报错,请问下competition_train.db是做什么的,

请问下competition_train.db是做什么的呢?
我在熟读您的代码的时候,有几个疑问:
1、Preprocessing中:
image
这些official_*.db是干嘛的?可以替换吗?
└── txt_db
├── hfl
│   └── chinese-bert-wwm-ext
│   ├── external_pretrain.db
│   ├── official_dev.db
│   ├── official_out.db
│   ├── official_ran.db
│   ├── official_sim.db
│   ├── official_test.db
│   └── official_train.db
└── visualjoyce
└── chengyubert_2stage_stage1_wwm_ext -> ../hfl/chinese-bert-wwm-ext
这些db文件没有下载路径,麻烦解答下哈,感谢

The config of chengyubert_2stage_stage1

Hello! I want to load the model in https://huggingface.co/visualjoyce/chengyubert_2stage_stage1_wwm_ext/tree/main
However, the config says that the len_idiom_vocab is 33237, the vocab.txt supported in the link isn't the idioms' vocab and the size of the vocab.txt isn't 33237. I find in the Google Drive you in README, and find a file "idioms_pretrain.json", but the size of this file is 33238. So can you tell me, to load the model in the huggingface, what vocab should I use?

About modeling.

Sry, I have one more question about the codes.

What's the difference or purpose among the following model classes:

@register_model('chengyubert-2stage-stage2-mask')
@register_model('chengyubert-2stage-stage2-cls')
@register_model('chengyubert-2stage-stage2-window')
@register_model('chengyubert-2stage-stage2-mask-window')

What is "scope", "num" columns in the corpus?

Hi, may I ask what those "scope", "num" columns stand for?

In "idioms_pretrain.json" ,

idiom num explanation
偃武崇文 0 停息武备,崇尚文教。
洪乔捎书 0 指言而无信的人。
南郭先生 103 比喻无才而占据其位的人。

In "idioms_scopes.tsv",

scope idiom id
Scope I 见义勇为 0
Scope II 偃武崇文 3848
Scope III 亏于一篑 33237

In "idiom_synonyms.tsv",

query synonym query_id synonym_id overlapping
黯然销魂 六神无主 14726 1333 0
黯然销魂 丧魂失魄 14726 2704 1
塞翁失马,焉知非福 塞翁失马,安知非福 24524 32175 8

I thought "overlapping" is related with the number of Chinese character overlapped, but the last one shows 8, which is presumably 7.

Thanks!

The network structure Questions

你好,预先感谢您的热情回答。

有几个问题关于框架的细节想咨询您。

第一个问题是,下图两个embedding是否是随机初始化的,只初始化一次还是说每次取batch的时候也初始化呢?

image

第二个问题是,对于右边的embedding,无法对应每一个成语的embedding,因为每次取一条数据只有7个选项,并且这7个选项每次都在变化。(左边的我能理解是对应了每个成语的embedding在修正,因为范围是3848,但右边范围似乎只有局部的7选项范围),我不知道这么理解是否正确?您方便指教一下吗?

image

dropout

请问在modeling_bert.py,ChengyuBertForClozeChid类的前向传播中,pooled_output = self.dropout(multiply_result),这个地方为什么要对乘积之后的结果进行dropout呢,能谈谈您的想法吗?

error when training chengyubert-twostage

`
[1,0]:
[1,0]:Traceback (most recent call last):
[1,0]: File "train_official.py", line 468, in
[1,0]: main(args)
[1,0]: File "train_official.py", line 317, in main
[1,0]: best_ckpt = train(model, dataloaders, opts)
[1,0]: File "train_official.py", line 145, in train
[1,0]: opts, global_step)
[1,0]: File "train_official.py", line 235, in evaluation
[1,0]: log.update(validate(opts, model, loader, split, global_step))
[1,0]: File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
[1,0]: return func(*args, **kwargs)
[1,0]: File "train_official.py", line 176, in validate
[1,0]: logits, over_logits, cond_logits = model(**batch, targets=None, compute_loss=False)
[1,0]:ValueError: not enough values to unpack (expected 3, got 2)

`

About the prediction layer weight

Recently I want to use the prediction layer weight in chengyuBERT as the initial idiom embedding in my work, however im struggling to it. Can you give an instruction? Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.