Hi, thanks for your work. The following code line confused me: <div class="Bo

StreamDataset and bptt about xlm HOT 5 CLOSED

facebookresearch commented on July 29, 2024

StreamDataset and bptt

from xlm.

Comments (5)

liujiqiang999 commented on July 29, 2024 1

Thank you very much！

from xlm.

glample commented on July 29, 2024

No, we don't cut anything. Instead, we add padding tokens to have a number of tokens which is a multiple of n_batches * bptt.

Bptt is basically the sequence length (truncated back propagation through time, maybe the term is not well chosen here..). For the good value it depends on what you want to do with it. If you train on language modeling task, for a downstream task where sentences are less than 200 words, then 200 is good. In practice we usually use 256 or 512 (be sure it's a multiple of 8 if you use fp16).

from xlm.

liujiqiang999 commented on July 29, 2024

I have limited the maximum length of sentences to 100 before training. Could I set bptt to 128? Does it have a big impact on performance? What differences between class StreamDataset and class DataSet. Thank you very much.

from xlm.

liujiqiang999 commented on July 29, 2024

I noticed that params.split_data is False when we use multi-gpus to training. Why do you do it in that way?

from xlm.

glample commented on July 29, 2024

StreamDataset returns continuous streams of sentences of size (bptt, batch_size). You can have an arbitrary number of sentences in a batch. While Dataset returns one and only one sentence per sequence. Dataset uses padding to pad sentences of different lengths, StreamDataset does not use and padding. Bptt = 128 is fine but if you have sentences longer than that then it won't work well because the position embeddings for these sentences won't be properly trained. If you set max_len <= 128 then you should be fine.

params.split_data = True is good when your dataset is so big that you cannot load it 8 times in memory.

from xlm.

Recommend Projects

StreamDataset and bptt about xlm HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent