roxot / aevnmt.pt Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 10.0 7.25 MB

PyTorch implementation of Auto-Encoding Variational Neural Machine Translation

License: MIT License

Python 99.47% Shell 0.53%

aevnmt.pt's People

Contributors

Stargazers

Watchers

Forkers

wilkeraziz vitaka illc-uva martin-top arin-deniz eelcovdw goncalomcorreia probabll nbortych

aevnmt.pt's Issues

Transformer training needs more improvements to catch up to SOTA

Transformer training still needs some improvements:

Naom training schedule does not work great. Fairseq seems to have another schedule that works better (sqrt)
Batching based on number of tokens instead of number of sentences would let us use larger batch sizes on average
Possibly another optimizer than Adam.

Have prior.size arg support multiple sizes, and remove prior.sizes

When prior.sizes is defined, prior.size is not needed (since sum(prior.sizes) == prior.size).

Make prior.size behave the same way as prior.params; Multiple semi-colon separated values, which gets converted to a list of ints by the argument parser. Changes need to be applied in the arg parser, and in the code base wherever hparams.prior.size / hparams.prior.sizes is used.

log_prob method should be removed from model components, and moved to a separate Distribution object.

log_prob should not be part of any model component, rather a component should return an object with a log_prob method. Because of padding this is not possible for Categorical distributions.

Possible solution: Add a class PaddedCategorical(Categorical) which has a padding argument in log_prob (and other required methods). This should be straightforward to extend to other likelihoods as well.

Testing functionalities for base components

For the library to grow, it really needs some testing functionality.

Some parts of the code are out of date and will not work with the current codebase (Like aux. likelihoods), but it is not always clear what works and what doesnt.
(unit)testing would speed up development a lot, as going back and bugfixing parts of the code would be much less common, and updating older components would be easier.
flickr tests have to be updated (can just copy the iwslt configs), and some other benchmarks would be nice (IWSLT14 might be too large for a quick test benchmark)

Expand generative model feed_z options

feed_z arguments (for example, gen.tm.dec.feed_z) are now boolean which mirrors the RNN implementations, but the Transformer architectures support multiple options.

Additionally, We could also support feed_z methods for TM encoders: RNN architectures always initialize with z as hidden state, but for Transformers there are multiple options.

Training scripts need refactoring

Currently our training scripts are a bit of a mess, because we combine multiple models and different diagnostics (tensorboard etc).
Gonçalo suggests pytorch lightning (https://github.com/PyTorchLightning/pytorch-lightning). Seems really nice
Suggestion Wilker for experiment management: wandb. This might not be compatible with lightning.
Tensorboard could be more flexible: Let the Trainer object decide what to save during training for each model, not the train.py script.
Why are we using TensorboardX and not pytorch tensorboard?

new hparams need more modularity

While the new hparams are much better than before, it still gets very verbose. One reason is that we have one argparser for any model and evaluation, which results in a lot of unused+confusing args.

I've added support for adding/removing arguments with arg groups (see bottom of args.py). These could be used to construct different argparsers, based on which model is used in which context.

Possible solution: Each model in the library gets its own arg groups with model-specific arguments, which is combined with train/eval specific arg groups and (if needed) other general arguments. These can already be combined by the existing argparser in hparams/hparams.py

Vocabulary not reading all lines from vocab file

When translating, Vocabulary will import the vocab file with from_file(). This will open the file with the following encoding: ISO-8859-2. When this is removed, the vocab file is correctly read.

Switch multi-GPU training to DistributedDataParallel

The current implementation of distributed training is done with DataParallel, which has some drawbacks.

Pytorch is switching to DistributedDataParallel which is much better, but requires us to rewrite most training code (and possibly some model parts).
DataParallel is not compatible with RNN cells, and can give worse results with the Transformer (due to normalization?).
My suggestion: Rewrite training code (Possibly with Pytorch-lighting), build it around DistributedDataParallel.

Hyperparameters requires hparams_file as first argument

Jsonargparse changes its behaviour depending on the index of the --hparams_file argument; If any command line arguments are before --hparams_file, they are overridden by the contents of hparams_file.

This can cause issues that are hard to track, and the preferred behavior is that the command line argument always have precedence over config file arguments.

I've added a temporary solution by checking for the first argument in hparams.py, but a better solution would be to somehow remove this feature from jsonargparse.

AEVNMTTrainer and new Loss functions do not support aux likelihoods

I've left a TODO in the AEVNMTTrainer code for this issue for now.

To fix this issue, the Loss functions need arguments that accept these aux likelihoods, and a method that adds the additional likelihoods to the loss (mixture, auxilliary, dir prior, etc).

initialization.initialize_model does not support different cell_types

The RNN initialization in initialize_models uses different parameters for cell_type=lstm, and initializes all params with "rnn." in the parameter name. The new hparam format supports different cell types for inference, tm and lm, which causes a problem with this method.

Possible solution: Split initialize_model to initialize_tm, initialize_lm, initialize_inf