jxhe / vae-lagging-encoder Goto Github PK

View Code? Open in Web Editor NEW

183.0 183.0 33.0 855 KB

PyTorch implementation of "Lagging Inference Networks and Posterior Collapse in Variational Autoencoders" (ICLR 2019)

License: MIT License

Python 75.89% Jupyter Notebook 24.11%

image-generation posterior-collapse text-generation vae

vae-lagging-encoder's People

Contributors

Stargazers

Watchers

vae-lagging-encoder's Issues

generating/sampling sentences

I was wondering if you have a script for sampling or generating sentences. I can see there is a function sample_sentences in text.py but am not able to run it properly. Can you help me out in using that function to generate sentences so that i can see some samples?

Thanks

Prior and Posterior sentence generator generates same sentences

I trained the model with my own data. My dataset size is 5M. After I trained the model, I ran the reconstruction code.

python text.py --dataset [dataset] --decode_from [pretrained model path] --decode_input [a text file for reconstruction]

I end up getting the same sentences

My config:


params={
   'enc_type': 'lstm',
   'dec_type': 'lstm',
   'nz': 32,
   'ni': 512,
   'enc_nh': 1024,
   'dec_nh': 1024,
   'dec_dropout_in': 0.5,
   'dec_dropout_out': 0.5,
   'batch_size': 32,
   'epochs': 100,
   'test_nepoch': 5,
   'train_data': 'datasets/bj/train.txt',
   'val_data': 'datasets/bj/valid.txt',
   'test_data': 'datasets/bj/test.txt'
}

Evaluation of beta vae model in the paper

Hello,
Thanks for the code. I was wondering what is the value of beta to be used during testing for the baseline beta vae models in your paper. Is it always one or the fixed beta?

Also, in case of VAE + annealing, how is the model selection during training done? By evaluating NLL on current value of beta or beta=1?

Thanks

Code for mutual information test?

Hello,

Thanks for putting up this code! I was curious to know what part of the code you used to generate the mutual information plots shown in Figure 5 of the paper. Can you point me to that?

Training time for text data

Hi,

Thanks for the code and a very nice work! I was trying to run your code on yahoo dataset on GPU and it seems to be taking longer than what is reported in the paper. In the paper, it is reported 11 hours, but I have been running for 2 days and it is still in epoch 3. Am I doing something wrong? Any information on this would be great.

Is the reconstruction procedure indispensable for latent modes?

Hi, thanks for your code, it helps me a lot.

I have been learning latent variable models within weeks and feel puzzled about this field. As reconstruction loss (or decoder) is expensive to compute and prone to collapse, I`m wondering is the reconstruction procedure indispensable for a latent mode to capture useful information?

For example, in a supervised multi-task setting, if I want latent space can capture domain-specific signal, how can I towards this end by just using classification label and domain label but reconstruction loss, are there any relevant literatures ?

I am stuck in this question, hope you can direct me out.

Does a vanilla VAE show PC for text model?

Hi,

From your experiments, does a vanilla VAE show posterior collapse for say yahoo dataset? I ran with
python text.py --dataset yahoo --kl_start 1.0 and found that KL term remains non zero while MI and AU go to zero (In fact MI is negative). Is this the expected behaviour ?

Question regarding the NLL term

Hi, how are you?
Thanks for a great paper git repository.

vae-lagging-encoder/text.py

Line 433 in d78230a

report_rec_loss / report_num_sents, time.time() - start))

What is the upside in dividing the losses with report_num_sents comparing to returning the mean from both loss_rc and loss_kl?
From what I understand, this is the same except when we have batches in different sizes (let's say at the end of an epoch).
I believe returning micro average sounds more like it than macro average.
Am I missing something?
What do you think?

Thanks.
Matan.

Regarding NLL loss, Recon loss and PPL.

Hi Junxian,

Thank you so much for your paper and code, it helps me a lot.

But I am a little confused by the loss calculation here in test method in test.py.

**test_loss = (report_rec_loss + report_kl_loss) / report_num_sents

nll = (report_kl_loss + report_rec_loss) / report_num_sents
kl = report_kl_loss / report_num_sents
ppl = np.exp(nll * report_num_sents / report_num_words)**

what I saw and did before was rec_loss as nll_loss (it is calculated using nll_loss) and ppl is calculated purely on rec_loss (nll_loss).

But I see you use kl in the term here, so I wonder what is the intuition here and it seems test_loss is the same as nll?

Any explanations would be much appreciated.

Thank you in advance.

Code for computing true posterior

Hi,

In the paper, it says that the true posterior can be computed by sum(z*p(z|x) in Appendix A. However, in the code it seems that it is calculating the likelihood P(z,x)? Why?

Testing on large-scale dataset

Hello,
I just want to ask if u have ever trained your aggressive VAE training on large scale ImageNet dataset 32x32 or CIFAR 10 ?

jxhe / vae-lagging-encoder Goto Github PK

vae-lagging-encoder's People

Contributors

Stargazers

Watchers

Forkers

vae-lagging-encoder's Issues

generating/sampling sentences

Prior and Posterior sentence generator generates same sentences

Evaluation of beta vae model in the paper

Code for mutual information test?

Training time for text data

Is the reconstruction procedure indispensable for latent modes?

Does a vanilla VAE show PC for text model?

Question regarding the NLL term

Regarding NLL loss, Recon loss and PPL.

Code for computing true posterior

Testing on large-scale dataset

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent