Giter VIP home page Giter VIP logo

vmf_vae_nlp's Introduction

Spherical Latent Spaces for Stable Variational Autoencoders (vMF-VAE)

In this repo, we provide the experimental setups and implementation for the algorithms described in:

Spherical Latent Spaces for Stable Variational Autoencoders.
Jiacheng Xu and Greg Durrett. EMNLP 2018.

Author: Jiacheng Xu and Greg Durrett.

The arXiv version has been posted: arXiv.

About

Keyword: PyTorch, VAE, NLP, vMF

What to get from this repo:

  • Original Gaussian VAE;
  • Novel von-Mises Fisher VAE (vMF-VAE) with tuned hyper-parameters.

Illustration

Optimization

Illustration of the optimization

Figure: the visualization of optimization of how q varies over time for a single example during learning.

  • Gaussian: the KL term tends to pull the model towards the prior (moving from μ,σ to μ′,σ′);
  • vMF: there is no such pressure towards a single distribution. If we fix the dimension and the kappa, the KL term is a constant in the optimization objective.

Comparison of the Model

alt text Figure: the model architecture. The left one is Gaussian and the right one is vMF. More details in the paper.

Setup

The environment base is Python 3.6 and Anaconda.

The codes are originally developed in PyTorch 0.3.1 and upgraded to PyTorch 0.4.1.

conda install pytorch=0.4.1 torchvision -c pytorch
pip install tensorboardX

Data

Data for Document Model

In this paper, we use the exact same pre-processed dataset, 20NG and RC, as Miao et al. used in Neural Variational Inference for Text Processing. Here is the link to Miao's repo.

  • Download RC (Email me or submit an issue if it doesn't work)
  • Location of 20 News Group(20ng): data/20ng.

Data for Language Model

We use the standard PTB and Yelp. Datasets are included in data.

Running

Set up Device: CUDA or CPU

The choice of cpu or gpu can be modified at NVLL/util/gpu_flag.py.

Explanation of Options & Arguments

If you want to play around with the {Gaussian, vMF} VAE model for {document model, language model}, there are many possible settings. You can pass command-line arguments and the NVLL/argparser.py will handle the arguments. I will explain some nontrivial arguments.

Option Usage Value (Range)
dist which distribution to use {nor, vmf}
lat_dim dimension of the latent variable Int
kl_weight the weight term before KL term in the objective funtion 1 (default), any
exp_path The location for exp logs and saving files. YOUR/EXP
root_path The location of the git repo. data is a sub-dir. YOUR/vmf_vae_nlp
---- Args Below Only availble for NVRNN ----
input_z Input the latent code z at every time step during decoding True (default) or False
mix_unk How much of the input is mixed with UNK token. 0 = Using ground truth as input (Standard setting) and 1 = Using UNK as input (Inputless setting) [0.0, 1.0]
cd_bow Condition on Bag-of-words. 0=not conditioned on BoW. 0 or int(like 200).
cd_bit Condition on sentiment bit for the Yelp dataset. Not available for PTB. 0=not conditioned on sentiment bit. 0 or int(like 50).

Train and Test

See here for hyper-parameter configuration.

Training Neural Variational Document Model (NVDM)

cd NVLL
# Training vMF VAE on 20 News group
PYTHONPATH=../ python nvll.py --lr 1 --batch_size 50 --eval_batch_size 50 --log_interval 75 --model nvdm --epochs 100  --optim sgd  --clip 1 --data_path data/20ng --data_name 20ng  --dist vmf --exp_path YOUR/EXP --root_path YOUR/vmf_vae_nlp   --dropout 0.1 --emsize 100 --nhid 400 --aux_weight 0.0001 --dist vmf --kappa 100 --lat_dim 25

# Training vMF VAE on RC
PYTHONPATH=../ python nvll.py --lr 1 --batch_size 50 --eval_batch_size 50 --log_interval 1000 --model nvdm --epochs 100  --optim sgd  --clip 1 --data_path data/rcv --data_name rcv  --dist vmf --exp_path YOUR/EXP --root_path YOUR/vmf_vae_nlp   --dropout 0.1 --emsize 400 --nhid 800 --aux_weight 0.0001 --dist vmf --kappa 150 --lat_dim 50

Training Neural Variational Recurrent Language Model (NVRNN)

 cd NVLL
 
 # --cd_bow 200
 # Condition on BoW
 # --cd_bow 0
 # NOT Condition on BoW. Default setting
 
 
 # --cd_bit 50
 # Condition on sentiment bit. Only for Yelp.
 # --cd_bit 0
 # Not Condition on sentiment bit. Only for Yelp. In this case, sentiment bit will be skipped. Default Setting.
 
 # --swap 0.0 --replace 0.0
 # Default setting. No random swapping of encoding sequence.
 
 
 # Training vMF VAE on PTB
 # Example
 PYTHONPATH=../ python nvll.py --lr 10 --batch_size 20 --eval_batch_size 20 --log_interval 500 --model nvrnn --epochs 100  --optim sgd --data_name ptb --data_path data/ptb --clip 0.25 --input_z --dropout 0.5 --emsize 100 --nhid 400 --aux_weight 0  --nlayers 1 --swap 0.0 --replace 0.0   --exp_path YOUR/EXP --root_path YOUR/vmf_vae_nlp --cd_bit 0 --cd_bow 0 --dist vmf --kappa 80 --mix_unk 1 --lat_dim 50 --norm_max 1
 
 # Training vMF VAE on Yelp
 PYTHONPATH=../ python nvll.py --lr 10 --batch_size 20 --eval_batch_size 20 --log_interval 500 --model nvrnn --epochs 100  --optim sgd --data_name ptb --data_path data/ptb --clip 0.25 --input_z --dropout 0.5 --emsize 100 --nhid 400 --aux_weight 0  --nlayers 1 --swap 0.0 --replace 0.0   --exp_path YOUR/EXP --root_path YOUR/vmf_vae_nlp --cd_bit 50 --cd_bow 0 --dist vmf --kappa 80 --mix_unk 0 --lat_dim 50 --norm_max 1

Reference

Please cite:

@InProceedings{xu2018,
  author =      "Xu, Jiacheng and Durrett, Greg",
  title =       "Spherical Latent Spaces for Stable Variational Autoencoders",
  booktitle =   "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
  year =        "2018",
}

For other implementation of vMF VAE, please refer to

  1. Kelvin Guu's repo and paper.
  2. Tim R. Davidson's repo (TensorFlow), repo (PyTorch), paper and Supplementary material

Contact

Submit an issue here or find more information in my homepage.

vmf_vae_nlp's People

Contributors

gregdurrett avatar jiacheng-xu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vmf_vae_nlp's Issues

when I use yelp (with condition bit 50), I got this error message.

Hi,
Thanks for sharing your code :)
I got the error if I use the dataset 'yelp'. Do you know how to fix it?
I am currently using torch 0.4.1 and cuda 9.0.

Traceback (most recent call last):
File "D:/Generation/vMF-VAE/nvll.py", line 175, in
main()
File "D:/Generation/vMF-VAE/nvll.py", line 170, in main
runner.start()
File "D:\Generation\vMF-VAE\NVLL\framework\train_eval_nvrnn.py", line 45, in start
epoch_start_time, self.glob_iter)
File "D:\Generation\vMF-VAE\NVLL\framework\train_eval_nvrnn.py", line 225, in train_epo
acc_avg_cos += tup['avg_cos'].data
RuntimeError: expand(torch.cuda.FloatTensor{[1]}, size=[]): the number of sizes provided (0) must be greater or equal to the number of dimensions in the tensor (1)

How to Test your model

Hi, I have two questions !!
(I finished training of your nvrnn (yelp-normal, yelp-vmf, ptb-normal, ptb-vmf).)

1. How to see the generated samples?
After the end of training, I could see the [Recon Loss, KL Loss, Test Loss and Test PPL]
But I don't know how to see the generated samples given specific token.
Do you know how to see generated samples using Yelp dataset with label?

2. How to Test your model?
I couldn't find explanation for how to test your model. Is it eval_nvrnn.py?
I changed the directory path and run eval_nvrnn.py, but I got this error.

C:\Users\hsko0\Anaconda3\lib\site-packages\torch\nn\functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
Traceback (most recent call last):
File "D:/Generation/vMF-VAE/eval_nvrnn.py", line 336, in
cur_loss, cur_kl, test_loss = player.eva()
File "D:/Generation/vMF-VAE/eval_nvrnn.py", line 60, in eva
self.data.test)
File "D:/Generation/vMF-VAE/eval_nvrnn.py", line 91, in evaluate
recon_loss, kld, aux_loss, tup, vecs = model(feed, target)
ValueError: too many values to unpack (expected 5)

Thanks for reading this issue and sorry for bothering your time.
Thank you so much. Sincerely,
Hyeseon

Reconstruct Results / Implementation Details

Hi,

I have some questions regarding the implementation, and I can't reproduce the perplexities reported in the paper.

  1. I'd be interested in #5, as well.
  2. I can't reproduce the results mentioned in the paper. Even with the configurations suggested in #3 and #7. My best model on PTB produces a PPL of 110.
  3. Why don't you fix #4 in your code? I don't think the majority of users still uses pytorch0.3.1.
  4. Why are you computing the perplexity like exp(recon_loss + kl)? As far as I understand (wikipedia, here and here), the perplexity is a measure of how well the model output matches the given data. It should treat the model as a black box, like exp(entropy) or exp(cross-entropy). Especially, models with a high value for kappa are impacted negatively, due to the constant kl term. E.g., kappa->infty, which is aequivalent to a (not variational) autoencoder on the hypersphere, always produces a PPL of infinity.
  5. Why do you sample in the latent space multiple times (nsample here, set to 3 be default), and then compute the mean? As far as I understand, the sampling and mean process produces samples that are not part of the vMF anymore, and the resulting samples have a recuded variance compared to the original samples. The reduced variance could as well be achieved by increasing kappa, which would result in an increased PPL with the definition in point 3. The other linked implementation does not do this.

Thanks!

About Implementation

Hi Jiacheng

Thanks for providing your source codes.
According to the questions in the Issues, Does this parameter setting (Dataptb_Distvmf_Modelnvrnn_EnclstmBiFalse_Emb100_Hid400_lat50_lr10.0_drop0.5_kappa35.0_auxw0.0001_normfFalse_nlay1_mixunk0.0_inpzTrue_cdbit0_cdbow0) achieve the best performance on PTB dataset? I will use your proposed model as the baselines.

This is the result.
| End of training | Recon Loss 4.62 | KL Loss 0.16 | Test Loss 4.78 | Test PPL 119.00

And for the results (NLL and PPL), you mentioned in the paper that reported values are actually a lower bound on the true NLL, computed from the ELBO by sampling z.

Does this mean that first you draw a sampe from vmf or Gaussian distribution, and then feed it in the decoder part (standard and inputless mode), finally compute the sum of its reconstruction loss and kl loss?

Sampling from prior with kappa = 0

Hi, I am not sure how you sample from the prior at test time. For the sampling from the prior, which is a vMF with kappa = 0, do you use vmf_batch.py? If I understood correctly it should not be dependent on mu while kappa = 0. According to the formulation of z = w*mu+v*(1-w^2)^(1/2), w must be zero, while I think it is not necessarily equal to zero in vmf_batch.py. Do you use another function?

speed of vmf sampling on GPU?

Hi, I would like to use your vmf_batch.py code, and I am wondering if this can reduce the speed on GPU due to the use of scipy for _vmf_kld_davidson since I think they are done on CPU.

Which file include writing 'label.test.txt' of a certain dataset(ex.yelp)?

Hi, sorry for asking too much....
When I run label_matching code, I got this error.
I couldn't find which code should be run before label_matching. (I am stuck here.)

I run the code in this order.

  1. nvll.py
  2. analyze_nvrnn.py
  3. analyze_samples.py
  4. label_matching.py (i think i am stuck here)
  5. train_classifier.py

Thank you so much for reading this issue.
Hyeseon.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.