Giter VIP home page Giter VIP logo

knover's People

Contributors

christineaa avatar l3str4nge avatar luhua-rain avatar portia1026 avatar raindrops2sea avatar shanetian avatar sserdoubleh avatar vonderland avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

knover's Issues

可以用CPU运行interact吗?

用gpu运行plato-2/scripts/24L_plato_interact.sh是可以的,但是我在没有gpu的机器上安装了cpu版本的paddlepaddle运行时出现问题:
E0811 21:28:37.228886 20545 pybind.cc:1277] Cannot use GPU because you have installed CPU version PaddlePaddle.
If you want to use GPU, please try to install GPU version PaddlePaddle by: pip install paddlepaddle-gpu
If you only have CPU, please change CUDAPlace(0) to be CPUPlace().

我已经尝试把所有的CUDAPlace(0)都替换成了CPUPlace(),请问还有哪不对吗?。
这是git diff的结果:

https://gist.github.com/fancyerii/fa04cea4e94cf9408c5d6091697fd9fa

运行convert_data_to_numerical时需要"sentence_piece_model"

请问这个sentence_piece_model是需要自己准备的吗?另外如何使用这个模型的文档可以详细点吗,或者说可能是我自己没找到详细说明的地方。工作很好,不过作为一个新手上手有点困难欸,麻烦解答了,谢谢。

AttributeError: 'Plato' object has no attribute 'args'

训练plato模型时报错:
File "/home/zzg/workspace/pycharm/Knover/knover/core/model.py", line 225, in load
self.args.start_step = start_step[0]
AttributeError: 'Plato' object has no attribute 'args'
查看model.py代码如下:
if is_checkpoint:
print(f"Load model from checkpoint: {model_path}")
start_step = get_tensor("@LR_DECAY_COUNTER@")
if start_step is not None:
self.args.start_step = start_step[0]
原因:初始化init时确实没有初始化args。
疑问:需要在init中加上self.args=args吗?感觉好像没用到self.args吧

建议把如何从check_point继续训练的方式也在文档里写一下

这个训练一般会持续很久,很可能会断了之后继续训练,所以继续训练也是个刚需。建议把如何继续训练写到文档里面。

还有就是现在要继续训练要自己在参数里填check_point路径和当前的start_step,这样还是太麻烦了,建议在保存check_point的时候把这个信息保存一下,这样继续训练的时候先检测这个信息,然后自动从上次最后的step开始训练

为什么vocab中有很多带下划线的单词

感谢开源这么好的对话训练工具。最近学习过程中,发现示例的英语字典vacab.txt中有很多带下划线的单词,同时也有对应不带下划线的单词。不理解这些带下划线单词的作用,它增加了vacab长度,同时也会增加模型的收敛难度,那为什么会存在呢?下面是几个字典中的例子:
▁of 50
▁be 51
be 408
of 2530

在预测中num_samples和topk的关系

我看文档里给出的预测参数里默认是--num_samples 20 --topk 5.不太理解num_samples和topk的关系。
topk5的话,是说按概率大小排序后,从前5个里面采样一个作为输出token。
那num_samples 20是做什么用的呢?

训练代码中对角色信息是怎么处理的

百度的预训练模型很强大,推理时可以将角色放在上下文中做出响应。我想用自己的数据做加入自己设定的角色进行训练,参照了数据example/train.tsv中加入your persona:的做法,但查看代码dialog_reader.py中,里面并没有针对"your persona:"字段做特殊处理,请问百度训练时,对带有"your persona:"的信息是怎么处理的?

用NSP做infer的时候报错

按着文档做infer时报了下面的错误:
UnavailableError: Load operator fail to open file output/NSP/infer_model/encoder_layer_0_multi_head_att_key_fc.b_0, please check whether the model file is complete or damaged.
[Hint: Expected static_cast(fin) == true, but received static_cast(fin):0 != true:1.] (at /paddle/paddle/fluid/operators/load_op.h:41)
[operator < load > error]
是NSP模型有问题吗?

请问有没有做过数据增强

做情感分类时做过随机mask和ngram的数据增强,请问对话任务,使用这种增强方式效果会好吗?还有其他有效的数据增强方式吗?

sh ./scripts/24L_plato_training.sh报错,请帮忙看看是什么原因

我装的paddle版本是1.8.2.post107 paddlehub版本是1.5.3,错误信息如下:
ERROR 2020-07-21 14:44:20,106 utils.py:422] ABORT!!! Out of all 2 trainers, the trainer process with rank=[0] was aborted. Please check its log.
Traceback (most recent call last):
File "/home/li.ma/anaconda3/lib/python3.7/site-packages/paddle/distributed/utils.py", line 406, in watch_local_trainers
terminate_local_procs(procs)
File "/home/li.ma/anaconda3/lib/python3.7/site-packages/paddle/distributed/utils.py", line 257, in terminate_local_procs
p.proc.join(timeout=1)
AttributeError: 'Popen' object has no attribute 'join'

Context to a conversation in PLATO-2

Is it possible to give a context to the conversation, so that the setting description/persona can be given to the pretrained models beforehand?

If not, is there a possibility of incorporating something like this with the pretrained models?

Low download speeds for the pretrained models

Hi, first of all, thanks for the really nice work!
I'm facing very low download speeds for the 24L model -- close to 20-30 KB/s using wget. Could you please help with an alternative mirror link? Thanks!

NSP任务的代码应该是有问题的,是不是该修复一下了

metrics = {}

74 | fc_out = self._calc_logits(outputs["enc_out"], inputs["tgt_pos"])
75 | lm_loss = layers.softmax_with_cross_entropy(logits=fc_out, label=inputs["tgt_pos"])

models/nsp_model.py的74行应该是有问题的
参数错误,中间应该还有一个checkpoints参数
然后nsp的forward函数应该和UnifiedTransformer一样有个存checkpoints数据的操作

另外能给出plato2更具体的训练demo吗
比如给出如何先只训练UnifiedTransformer,如何后续训练nsp和隐状态那个

为什么train.py比infer.py快的多?

在调用train.py时,batch_size可以设为8000左右,且一步用时在200s左右,而调用infer.py时,batch_size只能设的很小,4,12或更小,超过32就可能爆显存。这与平时的直观经验不一致啊。平时eval模式下应该比train模式下更快,占用内存也更小才对啊。请问是什么原因呢?

Undefined name do_test

Hello,

In dialog_reader found followoing error:
/Knover/readers/dialog_reader.py:374:73: F821 undefined name 'do_test'

关于plato-2论文的一个问题

最近在拜读plato-2,有个疑问

在stage1.1 粗训的时候,生成1to1的mapping,这里使用的NLL损失,可是公式前面的E指的是什么呢?
image

stage2.1中代指的是从z的分布中取样得出一个z,这个能理解,可是stage1.1里面的E就不太懂了
image

有关NSPModel训练

1)我看paper中的NSPModel,“To select the most appropriate responses generated by the fine-grained generation model, the evaluation model is trained to estimate the coherence of the responses.”
理解为用stage 2.1生成的候选 + label 做分类model
而代码中的 mix_negative_sample 实现是随机替换tgt做负例,感觉不一致。
2)最后上线的模型是 用2.1 先生成候选 再用2.2 排序么?

Exception: 'feed_targets' does not have label_pos variable

Traceback (most recent call last):
File "/home/perfectworld/anaconda3/envs/dialogue/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/perfectworld/anaconda3/envs/dialogue/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/perfectworld/gx/Knover/knover/scripts/interact.py", line 83, in
interact(args)
File "/home/perfectworld/gx/Knover/knover/scripts/interact.py", line 70, in interact
pred = task.infer_step(model, data)[0]
File "/home/perfectworld/gx/Knover/knover/core/task.py", line 46, in infer_step
outputs = self._post_process_infer_output(predictions)
File "/home/perfectworld/gx/Knover/knover/tasks/dialog_generation.py", line 162, in _post_process_infer_output
return self._post_process_generation_output(predictions)
File "/home/perfectworld/gx/Knover/knover/tasks/dialog_generation.py", line 91, in _post_process_generation_output
get_nsp_score_batch(self.nsp_predictor, predictions)
File "/home/perfectworld/gx/Knover/knover/tasks/dialog_generation.py", line 404, in get_nsp_score_batch
outputs = nsp_predictor(data)
File "/home/perfectworld/gx/Knover/knover/utils/inference_utils.py", line 44, in predict
return_numpy=True)
File "/home/perfectworld/anaconda3/envs/dialogue/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1110, in run
six.reraise(*sys.exc_info())
File "/home/perfectworld/anaconda3/envs/dialogue/lib/python3.7/site-packages/six.py", line 703, in reraise
raise value
File "/home/perfectworld/anaconda3/envs/dialogue/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1108, in run
return_merged=return_merged)
File "/home/perfectworld/anaconda3/envs/dialogue/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1238, in _run_impl
use_program_cache=use_program_cache)
File "/home/perfectworld/anaconda3/envs/dialogue/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1313, in _run_program
fetch_var_name=fetch_var_name)
File "/home/perfectworld/anaconda3/envs/dialogue/lib/python3.7/site-packages/paddle/fluid/executor.py", line 624, in _add_feed_fetch_ops
if not has_feed_operators(global_block, feed, feed_var_name):
File "/home/perfectworld/anaconda3/envs/dialogue/lib/python3.7/site-packages/paddle/fluid/executor.py", line 280, in has_feed_operators
format(feed_target_name))
Exception: 'feed_targets' does not have label_pos variable

Dear sir, do you meet this problem? How to fix it?

Plato model infers error!!! The same config for train process is OK, but it fails for inferrence.

aistudio@jupyter-208728-1765888:~/Knover$ git branch -av
  develop                      dcf05a0 Support PaddlePaddle 2.0.
* master                       4bad22c Fix checkpoints and add document for continuous training (#31)
  remotes/origin/HEAD          -> origin/develop
  remotes/origin/develop       dcf05a0 Support PaddlePaddle 2.0.
  remotes/origin/dygraph       5a2fbec Support dygraph in PaddlePaddle 2.0 and add lic2021 baseline
  remotes/origin/luge-dialogue 1b03ac1 update score
  remotes/origin/master        4bad22c Fix checkpoints and add document for continuous training (#31)
  remotes/origin/plato-2       4bad22c Fix checkpoints and add document for continuous training (#31)
aistudio@jupyter-208728-1765888:~/Knover$ python infer.py --model Plato --task DialogGeneration --vocab_path ./projects/lic2021/conf/vocab.txt --spm_model_file ./projects/lic2021/conf/spm.model --infer_file ./data/lic2021/test.txt --data_format numerical --file_format file --config_path ./projects/lic2021/conf/12L_P.json --init_pretraining_params Plato --batch_size 2 --max_src_len 384 --max_tgt_len 128 --max_seq_len 512 --output_name response --decoding_strategy topk_sampling --do_generation True --num_samples 4 --topk 5 --is_cn True --do_generation true --save_path ./projects/lic2021/infer/output --log_step 10 
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
{
  "is_distributed": false,
  "save_path": "./projects/lic2021/infer/output",
  "infer_file": "./data/lic2021/test.txt",
  "output_name": "response",
  "log_steps": 10,
  "Model": {
    "model": "Plato",
    "config_path": "./projects/lic2021/conf/12L_P.json",
    "init_checkpoint": "",
    "init_pretraining_params": "Plato",
    "learning_rate": 1e-05,
    "warmup_steps": 0,
    "weight_decay": 0.0,
    "max_grad_norm": 0.1,
    "use_recompute": false,
    "use_amp": false,
    "amp_loss_scaling": 12800,
    "max_seq_len": 512,
    "weight_sharing": true,
    "mem_efficient": false,
    "use_bow": true,
    "use_entropy": false,
    "pre_encoder_cmd": "d",
    "preprocess_cmd": "n",
    "postprocess_cmd": "da",
    "post_cls_cmd": "n",
    "cls_bias": true,
    "attention_probs_dropout_prob": 0.1,
    "hidden_act": "gelu",
    "hidden_dropout_prob": 0.1,
    "hidden_size": 768,
    "initializer_range": 0.02,
    "max_position_embeddings": 512,
    "latent_type_size": 20,
    "num_attention_heads": 12,
    "num_hidden_layers": 12,
    "type_vocab_size": 2,
    "role_type_size": 32,
    "vocab_size": 30004
  },
  "Generator": {
    "min_dec_len": 1,
    "max_dec_len": 64,
    "decoding_strategy": "topk_sampling",
    "temperature": 1.0,
    "ignore_unk": true,
    "num_samples": 4,
    "topk": 5,
    "topp": 0.9,
    "beam_size": 10,
    "length_average": true,
    "length_penalty": 0.0
  },
  "Task": {
    "task": "DialogGeneration",
    "do_generation": true,
    "is_cn": true,
    "nsp_inference_model_path": null,
    "nsp_attention_style": "bidirectional",
    "ranking_score": "decode_score"
  },
  "Reader": {
    "max_src_len": 384,
    "max_tgt_len": 128,
    "truncate_first_turn": false,
    "file_format": "file",
    "data_format": "numerical",
    "in_tokens": false,
    "batch_size": 2,
    "continuous_position": true,
    "random_seed": 11,
    "sort_pool_size": 65536
  },
  "Tokenizer": {
    "tokenizer": "SentencePieceTokenizer",
    "vocab_path": "./projects/lic2021/conf/vocab.txt",
    "do_lower_case": false,
    "spm_model_file": "./projects/lic2021/conf/spm.model"
  },
  "run_infer": true
}
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/Knover/models/unified_transformer.py:119
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  op_type, op_type, EXPRESSION_MAP[method_name]))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/Knover/models/transformer_block.py:116
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  op_type, op_type, EXPRESSION_MAP[method_name]))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/Knover/models/transformer_block.py:217
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  op_type, op_type, EXPRESSION_MAP[method_name]))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/Knover/models/generator.py:161
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  op_type, op_type, EXPRESSION_MAP[method_name]))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  return (isinstance(seq, collections.Sequence) and
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/Knover/models/generator.py:209
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
  op_type, op_type, EXPRESSION_MAP[method_name]))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/Knover/models/generator.py:209
The behavior of expression A / B has been unified with elementwise_div(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_div(X, Y, axis=0) instead of A / B. This transitional warning will be dropped in the future.
  op_type, op_type, EXPRESSION_MAP[method_name]))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/Knover/models/generator.py:239
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
  op_type, op_type, EXPRESSION_MAP[method_name]))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/Knover/models/generator.py:239
The behavior of expression A - B has been unified with elementwise_sub(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_sub(X, Y, axis=0) instead of A - B. This transitional warning will be dropped in the future.
  op_type, op_type, EXPRESSION_MAP[method_name]))
W0412 19:20:59.318835  4704 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.0, Runtime API Version: 10.1
W0412 19:20:59.322726  4704 device_context.cc:372] device: 0, cuDNN Version: 7.6.
Load pretraining parameters from Plato.
Traceback (most recent call last):
  File "infer.py", line 139, in <module>
    infer(args)
  File "infer.py", line 86, in infer
    predictions = task.infer_step(model, data)
  File "/home/aistudio/Knover/tasks/task_base.py", line 43, in infer_step
    predictions = model.infer_step(inputs)
  File "/home/aistudio/Knover/models/plato.py", line 280, in infer_step
    return super(Plato, self).infer_step(inputs)
  File "/home/aistudio/Knover/models/unified_transformer.py", line 439, in infer_step
    predictions = self._run_generation(inputs)
  File "/home/aistudio/Knover/models/unified_transformer.py", line 394, in _run_generation
    return_numpy=False)
  File "/home/aistudio/Knover/models/model_base.py", line 266, in _execute
    fetch_vars = self.exe.run(program, feed, fetch_list, **kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1110, in run
    six.reraise(*sys.exc_info())
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1108, in run
    return_merged=return_merged)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1238, in _run_impl
    use_program_cache=use_program_cache)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1328, in _run_program
    [fetch_var_name])
ValueError: In user code:

    File "infer.py", line 139, in <module>
      infer(args)
    File "infer.py", line 72, in infer
      model = models.create_model(args, place)
    File "/home/aistudio/Knover/models/__init__.py", line 49, in create_model
      return MODEL_REGISTRY[args.model](args, place)
    File "/home/aistudio/Knover/models/plato.py", line 49, in __init__
      super(Plato, self).__init__(args, place)
    File "/home/aistudio/Knover/models/unified_transformer.py", line 93, in __init__
      super(UnifiedTransformer, self).__init__(args, place)
    File "/home/aistudio/Knover/models/model_base.py", line 74, in __init__
      self._build_programs()
    File "/home/aistudio/Knover/models/model_base.py", line 91, in _build_programs
      predictions = self.infer(inputs, outputs)
    File "/home/aistudio/Knover/models/unified_transformer.py", line 380, in infer
      return self.generator.inference(self, inputs, outputs)
    File "/home/aistudio/Knover/models/generator.py", line 175, in inference
      gather_idx=parent_idx)
    File "/home/aistudio/Knover/models/unified_transformer.py", line 178, in _generation_network
      gather_idx=gather_idx)
    File "/home/aistudio/Knover/models/unified_transformer.py", line 202, in _encode
      store=caches is not None
    File "/home/aistudio/Knover/models/transformer_block.py", line 376, in encoder
      store=store)
    File "/home/aistudio/Knover/models/transformer_block.py", line 288, in encoder_layer
      store=store)
    File "/home/aistudio/Knover/models/transformer_block.py", line 158, in multi_head_attention
      dropout_rate)
    File "/home/aistudio/Knover/models/transformer_block.py", line 116, in scaled_dot_product_attention
      product += attn_bias
    File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py", line 304, in __impl__
      attrs={'axis': axis})
    File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py", line 3023, in append_op
      attrs=kwargs.get("attrs", None))
    File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2107, in __init__
      for frame in traceback.extract_stack():

    InvalidArgumentError: Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [160, 12, 160, 427] and the shape of Y = [160, 12, 1, 268]. Received [427] in X is not equal to [268] in Y at i:3.
      [Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at /paddle/paddle/fluid/operators/elementwise/elementwise_op_function.h:160)
      [operator < elementwise_add > error]
aistudio@jupyter-208728-1765888:~/Knover$ 

Cublas Error - CUBLAS_STATUS_EXECUTION_FAILED with interact scipt

Hi, while running the interactive script for both the 24L and 32L models, I faced the following CUBLAS error.
I'm running the script on Ubuntu 18.04 with 4 Tesla T4 16GB GPUs on GCP.

W0715 13:57:42.359799 16069 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 75, Driver API Version: 11.0, Runtime API Version: 10.0
W0715 13:57:42.362675 16069 device_context.cc:260] device: 0, cuDNN Version: 8.0.
Load pretraining parameters from ./24L/Plato.
Enter [EXIT] to quit the interaction, [NEXT] to start a new conversation.
[Human]: hey
/home/bakht/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception.
  "The following exception is not an EOF exception.")
Traceback (most recent call last):
  File "./interaction.py", line 83, in <module>
    interact(args)
  File "./interaction.py", line 72, in interact
    pred = task.infer_step(model, data)[0]
  File "/mnt/disks/disk-huge/bakht/Knover/tasks/task_base.py", line 46, in infer_step
    predictions = model.infer_step(inputs)
  File "/mnt/disks/disk-huge/bakht/Knover/models/plato.py", line 243, in infer_step
    return super(Plato, self).infer_step(inputs)
  File "/mnt/disks/disk-huge/bakht/Knover/models/unified_transformer.py", line 506, in infer_step
    return self._run_generation(inputs)
  File "/mnt/disks/disk-huge/bakht/Knover/models/unified_transformer.py", line 462, in _run_generation
    return_numpy=False)
  File "/mnt/disks/disk-huge/bakht/Knover/models/model_base.py", line 258, in _execute
    fetch_vars = self.exe.run(program, feed, fetch_list, return_numpy=return_numpy)
  File "/home/bakht/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1071, in run
    six.reraise(*sys.exc_info())
  File "/home/bakht/anaconda3/envs/paddle/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/home/bakht/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1066, in run
    return_merged=return_merged)
  File "/home/bakht/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1154, in _run_impl
    use_program_cache=use_program_cache)
  File "/home/bakht/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1229, in _run_program
    fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet:

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2   void paddle::operators::math::Blas<paddle::platform::CUDADeviceContext>::GEMM<float>(CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, int, int, int, float, float const*, float const*, float, float*) const
3   void paddle::operators::math::Blas<paddle::platform::CUDADeviceContext>::MatMul<float>(paddle::framework::Tensor const&, paddle::operators::math::MatDescriptor const&, paddle::framework::Tensor const&, paddle::operators::math::MatDescriptor const&, float, paddle::framework::Tensor*, float) const
4   paddle::operators::MatMulKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const
5   std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::MatMulKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::MatMulKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::MatMulKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
6   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const
7   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
8   paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
9   paddle::framework::Executor::RunPartialPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, long, long, bool, bool, bool)
10  paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool)
11  paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocator<std::string> > const&, bool, bool)

------------------------------------------
Python Call Stacks (More useful to users):
------------------------------------------
  File "/home/bakht/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2610, in append_op
    attrs=kwargs.get("attrs", None))
  File "/home/bakht/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
    return self.main_program.current_block().append_op(*args, **kwargs)
  File "/home/bakht/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/layers/nn.py", line 6414, in matmul
    attrs=attrs)
  File "/mnt/disks/disk-huge/bakht/Knover/models/plato.py", line 194, in forward
    latent_emb = layers.matmul(x=weights, y=latent_embeddings, transpose_y=True)
  File "/mnt/disks/disk-huge/bakht/Knover/models/model_base.py", line 90, in _build_programs
    outputs = self.forward(inputs, is_infer=True)
  File "/mnt/disks/disk-huge/bakht/Knover/models/model_base.py", line 74, in __init__
    self._build_programs()
  File "/mnt/disks/disk-huge/bakht/Knover/models/unified_transformer.py", line 98, in __init__
    super(UnifiedTransformer, self).__init__(args, place)
  File "/mnt/disks/disk-huge/bakht/Knover/models/plato.py", line 50, in __init__
    super(Plato, self).__init__(args, place)
  File "/mnt/disks/disk-huge/bakht/Knover/models/__init__.py", line 49, in create_model
    return MODEL_REGISTRY[args.model](args, place)
  File "./interaction.py", line 54, in interact
    model = models.create_model(args, place)
  File "./interaction.py", line 83, in <module>
    interact(args)

----------------------
Error Message Summary:
----------------------
ExternalError:  Cublas error, CUBLAS_STATUS_EXECUTION_FAILED  at (/paddle/paddle/fluid/operators/math/blas_impl.cu.h:34)
  [operator < matmul > error]

关于paddle的docker image问题

下载paddle的2.1.0镜像创建容器后按照步骤install了所需要的包,再运行plato-2的interact的时候会报这个错误
1
2

源代码只在interaction.py中添加了
"import paddle
paddle.enable_static()"
这俩行代码

想知道可能的原因是什么

An PLATO-2 inference error for data pre-process???

I try to use plato to infer a example data with the instrunction of https://github.com/PaddlePaddle/Knover/tree/develop/projects/PLATO-2.

But I encounter an error below. And my code branch is develop and paddle is 2.0.1.

could you give me some help for this issue?

aistudio@jupyter-208728-1765888:~/develop/Knover$ git branch

  • develop
    aistudio@jupyter-208728-1765888:/develop/Knover$ pip list | grep paddle
    paddlehub 2.0.4
    paddlenlp 2.0.0rc7
    paddlepaddle-gpu 2.0.1.post101
    tb-paddle 0.3.6
    aistudio@jupyter-208728-1765888:
    /develop/Knover$ ./scripts/local/job.sh ./projects/PLATO-2/pretrain/24L_infer.conf
  • [[ 1 == 1 ]]
  • job_conf=./projects/PLATO-2/pretrain/24L_infer.conf
  • source ./projects/PLATO-2/pretrain/24L_infer.conf
    ++ job_script=./scripts/distributed/infer.sh
    ++ model=Plato
    ++ task=DialogGeneration
    ++ vocab_path=./package/dialog_en/vocab.txt
    ++ spm_model_file=./package/dialog_en/spm.model
    ++ infer_file=./data/dailydialog_test_60.tsv
    ++ data_format=raw
    ++ file_format=file
    ++ config_path=./projects/PLATO-2/24L.json
    ++ init_params=./24L/Plato
    ++ nsp_init_params=./24L/NSP
    ++ in_tokens=false
    ++ batch_size=5
    ++ log_steps=1
    ++ log_dir=./log
    ++ save_path=./output
    ++ output_name=response
    ++ infer_args='--ranking_score nsp_score'
  • export FLAGS_sync_nccl_allreduce=1
  • FLAGS_sync_nccl_allreduce=1
  • export FLAGS_fuse_parameter_memory_size=64
  • FLAGS_fuse_parameter_memory_size=64
  • mkdir -p ./output
  • [[ ./log != '' ]]
  • distributed_args=' --log_dir ./log'
  • [[ ./24L/NSP != '' ]]
  • [[ ! -e ./24L/NSP/model ]]
  • infer_args='--ranking_score nsp_score --nsp_inference_model_path ./24L/NSP'
  • python -m paddle.distributed.launch --log_dir ./log ./knover/scripts/infer.py --is_distributed true --model Plato --task DialogGeneration --vocab_path ./package/dialog_en/vocab.txt --do_lower_case false --spm_model_file ./package/dialog_en/spm.model --init_pretraining_params ./24L/Plato --infer_file ./data/dailydialog_test_60.tsv --data_format raw --file_format file --config_path ./projects/PLATO-2/24L.json --output_name response --ranking_score nsp_score --nsp_inference_model_path ./24L/NSP --in_tokens false --batch_size 5 --save_path ./output
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp
    ----------- Configuration Arguments -----------
    gpus: None
    heter_worker_num: None
    heter_workers:
    http_port: None
    ips: 127.0.0.1
    log_dir: ./log
    nproc_per_node: None
    server_num: None
    servers:
    training_script: ./knover/scripts/infer.py
    training_script_args: ['--is_distributed', 'true', '--model', 'Plato', '--task', 'DialogGeneration', '--vocab_path', './package/dialog_en/vocab.txt', '--do_lower_case', 'false', '--spm_model_file', './package/dialog_en/spm.model', '--init_pretraining_params', './24L/Plato', '--infer_file', './data/dailydialog_test_60.tsv', '--data_format', 'raw', '--file_format', 'file', '--config_path', './projects/PLATO-2/24L.json', '--output_name', 'response', '--ranking_score', 'nsp_score', '--nsp_inference_model_path', './24L/NSP', '--in_tokens', 'false', '--batch_size', '5', '--save_path', './output']
    worker_num: None
    workers:

WARNING 2021-04-14 12:08:40,192 launch.py:316] Not found distinct arguments and compiled with cuda. Default use collective mode
launch train in GPU mode
INFO 2021-04-14 12:08:40,193 launch_utils.py:471] Local start 1 processes. First process distributed environment info (Only For Debug):
+=======================================================================================+
| Distributed Envs Value |
+---------------------------------------------------------------------------------------+
| PADDLE_TRAINER_ID 0 |
| PADDLE_CURRENT_ENDPOINT 127.0.0.1:56451 |
| PADDLE_TRAINERS_NUM 1 |
| PADDLE_TRAINER_ENDPOINTS 127.0.0.1:56451 |
| FLAGS_selected_gpus 0 |
+=======================================================================================+

INFO 2021-04-14 12:08:40,193 launch_utils.py:475] details abouts PADDLE_TRAINER_ENDPOINTS can be found in ./log/endpoints.log, and detail running logs maybe found in ./log/workerlog.0
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
{
"is_distributed": true,
"save_path": "./output",
"infer_file": "./data/dailydialog_test_60.tsv",
"output_name": "response",
"log_steps": 1,
"Model": {
"model": "Plato",
"config_path": "./projects/PLATO-2/24L.json",
"init_checkpoint": "",
"init_pretraining_params": "./24L/Plato",
"optimizer": "AdamW",
"learning_rate": 1e-05,
"warmup_steps": 0,
"lr_scheduler": "noam",
"max_training_steps": 2000,
"min_learning_rate": 0,
"weight_decay": 0.0,
"max_grad_norm": 0.1,
"use_recompute": false,
"use_amp": false,
"amp_loss_scaling": 32768.0,
"weight_sharing": true,
"mem_efficient": false,
"use_role": false,
"use_bow": true,
"use_entropy": false,
"pre_encoder_cmd": "d",
"preprocess_cmd": "n",
"postprocess_cmd": "da",
"post_cls_cmd": "n",
"cls_bias": true,
"attention_probs_dropout_prob": 0.1,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 1024,
"initializer_range": 0.02,
"max_position_embeddings": 256,
"latent_type_size": 20,
"num_attention_heads": 16,
"num_hidden_layers": 24,
"type_vocab_size": 2,
"vocab_size": 8001
},
"Generator": {
"min_dec_len": 1,
"max_dec_len": 64,
"decoding_strategy": "topk_sampling",
"temperature": 1.0,
"ignore_unk": true,
"num_samples": null,
"topk": 10,
"topp": 0.9,
"beam_size": 10,
"length_average": true,
"length_penalty": 0.0
},
"Task": {
"task": "DialogGeneration",
"do_generation": true,
"is_cn": false,
"filter_cross_repetition": true,
"nsp_inference_model_path": "./24L/NSP",
"ranking_score": "nsp_score"
},
"Reader": {
"max_src_len": 128,
"max_tgt_len": 128,
"max_seq_len": 256,
"max_knowledge_len": 0,
"knowledge_position": "post_src",
"knowledge_style": "original",
"truncate_first_turn": false,
"file_format": "file",
"data_format": "raw",
"in_tokens": false,
"batch_size": 5,
"position_style": "continuous",
"random_seed": 11,
"shuffle_pool_size": 0,
"sort_pool_size": 65536
},
"Tokenizer": {
"tokenizer": "SentencePieceTokenizer",
"vocab_path": "./package/dialog_en/vocab.txt",
"specials_path": "",
"do_lower_case": false,
"spm_model_file": "./package/dialog_en/spm.model"
},
"run_infer": true
}
W0414 12:08:41.338814 1234 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.0, Runtime API Version: 10.1
W0414 12:08:41.343097 1234 device_context.cc:372] device: 0, cuDNN Version: 7.6.
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/develop/Knover/knover/models/unified_transformer.py:140
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
op_type, op_type, EXPRESSION_MAP[method_name]))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/develop/Knover/knover/modules/transformer_block.py:113
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
op_type, op_type, EXPRESSION_MAP[method_name]))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/develop/Knover/knover/modules/transformer_block.py:213
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
op_type, op_type, EXPRESSION_MAP[method_name]))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
return (isinstance(seq, collections.Sequence) and
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/develop/Knover/knover/modules/generator.py:225
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
op_type, op_type, EXPRESSION_MAP[method_name]))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/develop/Knover/knover/modules/generator.py:225
The behavior of expression A / B has been unified with elementwise_div(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_div(X, Y, axis=0) instead of A / B. This transitional warning will be dropped in the future.
op_type, op_type, EXPRESSION_MAP[method_name]))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/develop/Knover/knover/modules/generator.py:255
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
op_type, op_type, EXPRESSION_MAP[method_name]))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/develop/Knover/knover/modules/generator.py:255
The behavior of expression A - B has been unified with elementwise_sub(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_sub(X, Y, axis=0) instead of A - B. This transitional warning will be dropped in the future.
op_type, op_type, EXPRESSION_MAP[method_name]))
Loading model from ./24L/Plato.
Load pretraining parameters from ./24L/Plato
Traceback (most recent call last):
File "./knover/scripts/infer.py", line 140, in
infer(args)
File "./knover/scripts/infer.py", line 81, in infer
predictions = task.infer_step(model, data)
File "/home/aistudio/develop/Knover/knover/core/task.py", line 46, in infer_step
outputs = self._post_process_infer_output(predictions)
File "/home/aistudio/develop/Knover/knover/tasks/dialog_generation.py", line 162, in _post_process_infer_output
return self._post_process_generation_output(predictions)
File "/home/aistudio/develop/Knover/knover/tasks/dialog_generation.py", line 91, in _post_process_generation_output
get_nsp_score_batch(self.nsp_predictor, predictions)
File "/home/aistudio/develop/Knover/knover/tasks/dialog_generation.py", line 404, in get_nsp_score_batch
outputs = nsp_predictor(data)
File "/home/aistudio/develop/Knover/knover/utils/inference_utils.py", line 44, in predict
return_numpy=True)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1110, in run
six.reraise(*sys.exc_info())
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 703, in reraise
raise value
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1108, in run
return_merged=return_merged)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1238, in _run_impl
use_program_cache=use_program_cache)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1313, in _run_program
fetch_var_name=fetch_var_name)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 624, in _add_feed_fetch_ops
if not has_feed_operators(global_block, feed, feed_var_name):
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 280, in has_feed_operators
format(feed_target_name))
Exception: 'feed_targets' does not have label_pos variable
INFO 2021-04-14 12:08:55,230 launch_utils.py:307] terminate all the procs
ERROR 2021-04-14 12:08:55,230 launch_utils.py:545] ABORT!!! Out of all 1 trainers, the trainer process with rank=[0] was aborted. Please check its log.
INFO 2021-04-14 12:08:58,233 launch_utils.py:307] terminate all the procs

  • exit_code=1
  • [[ 1 != 0 ]]
  • rm './output/.finish'
    rm: cannot remove './output/
    .finish': No such file or directory
  • exit 1
    aistudio@jupyter-208728-1765888:~/develop/Knover$

Context to a conversation

Is it possible to give a context to the conversation, so that the setting description/persona can be given to the pretrained models beforehand?

If not, is there a possibility of incorporating something like this with the pretrained models?

关于vocab字典格式错误

我用sentencepiece生成了vocab字典,model type是unigram。字典中第二列不是index整数,而是float的概率,所以运行训练时报错:ValueError: invalid literal for int() with base 10:
代码位置是vocab[token] = int(index)
因为index是浮点数,所以转换失败。
请问这里我要改字典还是代码呢?

有关训练效果

530w数据,从头训stage1, stage2.1 。仍然明显有safe response&重复的现象,请问是我训练的不够充分吗?
stage1 batch_size=16 训练了320000 step,stage2.1训练了batch_size=1024 18000step

image

改成选随机的候选感觉好一些。
image

the missing full source code of plato-2

Hi thanks for your great work! I explore the plato-2 directory and just found there are .sh files, may I ask where is the .py files? so I could try the chatbot interaction, thanks for your help!

运行interact.py问题

运行interact进行多伦问答时,发现第2,3...轮的回答还是针对第一轮的问题的,没有对后面的问题作回答。
请问这是为什么?
源码把所有历史信息和当前的问题连起来作为输入token_ids,并且type_ids都为0,不知道训练是不是也是这样的。

NSP reader中的mask策略有时会使tgt_label采样为空,导致报错(paddle1.8版本)

你好,我在一些数据上重训nsp model,发现mask策略会使tgt_label采样为空。
具体在nsp_reader.py 的_pad_batch_records函数中
batch_mask_token_ids, tgt_label, tgt_pos, label_pos = mask(
batch_tokens=batch_token_ids,
vocab_size=self.vocab_size,
bos_id=self.bos_id,
eos_id=self.eos_id,
mask_id=self.mask_id,
sent_b_starts=batch_tgt_start_idx,
labels=batch_label,
is_unidirectional=False)

而mask策略,多次采样有时候prob 均> 0.15 ,导致mask_label、mask_pos都为空。

我在这块多次采样直到非空,暂时解决了这个问题。

多轮对话训练数据的组织

有关多轮对话训练数据组织,我有个疑问。
比如 a, b , c, d 是一段对话。
应该生成 a,b ,c ->d一个pair的数据,还是枚举所有上文生成下文 a b, a b ->c, abc->d 呢。

第一种方式好像会损失一些tgt,只学最后一句;
第二种方式又显得有些冗余。

多伦对话的max_src_len是不是要设置长一些?

我理解的max_src_len是包括了人设,历史信息,本轮对话的上一句,这三部分加和后的最大长度,max_tgt_len是本轮对话的下一句的最大长度,max_seq_len是max_src_len+max_tgt_len的最大长度,这样理解对吗?如果对的话是不是要把max_src_len设置的长一些?中文对话训练里一个文字是2个字节,所以我设置的max_src_len=1600,这样对训练有什么影响呢?

方便透露一下数据的来源吗?

中文数据不开源我可以理解,但透露一下数据的来源没问题吧?

论文只是简单的提了一下中文数据来自中文的社交媒体,能否具体一点呢?
微博,豆瓣小组,还是百度贴吧?

不同的来源上文谈话的内容风格和话题差异还是比较大的,希望可以提供一下。
谢谢

请问一下论文中的L_{BOW}

根据论文:
图片

我的理解是:topic隐变量z对于的向量h_z乘以W_2,W_2 \in R^{V \times D}为每一个词的向量,这样就可以一次计算z和所有词的内积,然后softmax变成概率。然后我们优化的目标是target里出现的词对应的logits大,从而loss小。如果是这样的话,为什么又对f_{r_t}再计算一次softmax呢?

为什么vocab里必须既有[UNK]又有<unk>呢?

看代码的规则,vocab里既要有[UNK]又要有<unk>,否则会报错,这两个token都代表未知词吧,有什么区别吗?
另外我看例子中英语的vocab有些token的ids重复了,如下,不明白为什么,重复的id不会被覆盖吗?自己做vocab的时候也要改成重复的吗?
<unk> 0
<s> 1
</s> 2
[UNK] 0
[PAD] 0
[CLS] 1
[SEP] 2

Incorrect arguments in bash scripts

This line in projects/Plato-2/README.md:

bash ./scripts/local/job.sh ./project/PLATO-2/pretrain/24L_inference.conf

should be:

bash ./scripts/local/job.sh ./projecst/PLATO-2/pretrain/24L_infer.conf

Same for all similar lines...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.