按照文章的介绍，是为了更好地进行一对多生成，但事实上seq2seq模型本身就可以通过采样生成（而不是beam search确定性生成），所以原则上seq2seq模型本身就包

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

关于PLATO模型：为什么需要隐变量？ about research HOT 7 OPEN

paddlepaddle commented on June 1, 2024

关于PLATO模型：为什么需要隐变量？

from research.

Comments (7)

WorldEditors commented on June 1, 2024

A nice question as Meena (Google) has also claimed that top-K was enough to gain diversity. Our point of view is that the diversity of top-K sampling is different from that of latent space. We believe Neural Network (NN) is, after all, a mapping function, and mapping function can only do one-to-one mapping.

As an intuitive explanation of this, consider A，B, C to be correct responses. We may end up producing response D as NN averages out the responses A, B and C. However, it is possible that D does not contain any information of A,B and C but has totally different meaning. If we do top-K sampling we may resample 3 responses E,F,G around D, which won’t guarantee that you can recover A,B and C.

from research.

WorldEditors commented on June 1, 2024

For the second question, the BOW loss and the generative loss actually "push" the latent variable to “leak” information of responses as much as possible. Thus, it would do against the target loss function to collapse the latent distribution to a single pattern. And indeed we have never observed that phenomenon.

from research.

songyouwei commented on June 1, 2024

和隐变量产生直接联系的好像只有 non-regressive 的 BoW loss ？

from research.

WorldEditors commented on June 1, 2024

The main contribution of latent variable z is to improve the generative model（p(r|c,z)), BOW loss is regarded as an auxiliary loss. We do not need the BOW loss for learning theoretically. However, practically BOW loss is important for accelerating the convergence of the recognition network p(z|c,r), such that the generative model p(r|c,z) receives the correct input z.

from research.

eyuansu62 commented on June 1, 2024

@WorldEditors 您好，关于隐变量是怎么确定的呢？不同数据集会选择不同的隐变量吗？隐变量的取值也是需要事先规定的吗，还是只是一些向量而已？

from research.

WorldEditors commented on June 1, 2024

We only need to specify the number of classes (K) of latent variables manually. The value of the latent vector is optimized during the training process. To specify K remains a question here, we'd like to see more future works in this problem.

from research.

eyuansu62 commented on June 1, 2024

Thanks a lot!

from research.

Recommend Projects

关于PLATO模型：为什么需要隐变量？ about research HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent