when I train a multi-source based transformer model using neuralmonkey, I encounter so

you have two options here. First, you can use the preprocess and postprocess obj

some questions about multi-source based transformer model about neuralmonkey HOT 2 CLOSED

wyjllm commented on June 11, 2024

some questions about multi-source based transformer model

from neuralmonkey.

Comments (2)

jindrahelcl commented on June 11, 2024

you have two options here. First, you can use the preprocess and postprocess objects to apply BPE at runtime. this is however not ideal since you can apply BPE beforehand and store the preprocessed data on disk to speed up the training a bit. You can use e.g. fastBPE to extract a vocabulary and apply BPEs to your data. You will need to prepare a vocabulary file which is compatible with neural monkey's function from_wordlist from the vocabulary module. It takes a TSV file that looks like this:

<pad>
<s>
</s>
<unk>
First
word
and
so
on
[...]

With this file format, you also need to set contains_frequencies and contain_header to False in the from_wordlist function. Note that the ordering of the four special tokens matters.

If you mean the model_dimension of the noam_decay function, it corresponds to the $d_{model}$ variable in the Attention is All You Need paper. I can't really help you with finding the right learning rate scheme parameters, you need to try what works best for your data and the rest of the hyper-parameters.

from neuralmonkey.

wyjllm commented on June 11, 2024

Thank you very much.

from neuralmonkey.

some questions about multi-source based transformer model about neuralmonkey HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent