Comments (5)
Thank you very much!
from xlm.
No, we don't cut anything. Instead, we add padding tokens to have a number of tokens which is a multiple of n_batches * bptt.
Bptt is basically the sequence length (truncated back propagation through time, maybe the term is not well chosen here..). For the good value it depends on what you want to do with it. If you train on language modeling task, for a downstream task where sentences are less than 200 words, then 200 is good. In practice we usually use 256 or 512 (be sure it's a multiple of 8 if you use fp16).
from xlm.
I have limited the maximum length of sentences to 100 before training. Could I set bptt to 128? Does it have a big impact on performance? What differences between class StreamDataset and class DataSet. Thank you very much.
from xlm.
I noticed that params.split_data is False when we use multi-gpus to training. Why do you do it in that way?
from xlm.
StreamDataset returns continuous streams of sentences of size (bptt, batch_size). You can have an arbitrary number of sentences in a batch. While Dataset returns one and only one sentence per sequence. Dataset uses padding to pad sentences of different lengths, StreamDataset does not use and padding. Bptt = 128 is fine but if you have sentences longer than that then it won't work well because the position embeddings for these sentences won't be properly trained. If you set max_len <= 128 then you should be fine.
params.split_data = True
is good when your dataset is so big that you cannot load it 8 times in memory.
from xlm.
Related Issues (20)
- Add memory to transformer
- XLM LICENSE
- Error when using the uploaded en-fr model for NMT (translate from English to French) HOT 1
- Error in Training HOT 3
- Generate multiple optimal results(beam search)
- Training data details for XLM-15 model HOT 1
- Question about parameters for further training of a preexisiting model?
- default params for PKM
- supervised machine translation HOT 1
- How is sentence piece model trained in XLM-R?
- [Question] Does XLM-R follows RoBERTa or XLM for MLM?
- ./get-data-para.sh HOT 3
- Checkpoint for TLM objective
- confusion about `lm_head`'s size? HOT 2
- How can I expand it to a new language which is Romanised? For example, Marathi Romanized?
- e
- get-data-glue.sh 400 Bad Request
- bt_steps meaning
- how to save the entire model instead of just the model parameters
- Predict a masked word
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xlm.