Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Machine Translation Task with DiffuSeq about diffuseq HOT 6 OPEN

chiral-carbon commented on September 27, 2024

Machine Translation Task with DiffuSeq

from diffuseq.

Comments (6)

summmeer commented on September 27, 2024 1

It depends on what your goal is using diffusion model for MT tasks. Follow-up works are not exactly the same with DiffuSeq. SeqDiffuSeq is based on encoder-decoder architecture, while RDM is based on discrete text diffusion. This work also involves pre-trained MLMs. If you're aiming the performance, you could refer to the SOTA model.

from diffuseq.

summmeer commented on September 27, 2024 1

Hi,
Maybe you can try our updated version 2, which is 4x faster on training and 800x faster on sampling on QQP datasets. [We update the information of v2 in README.md]

from diffuseq.

summmeer commented on September 27, 2024

Hi,
You can have a try. But different hyper-parameters may lead to different results, including bsz, steps, dim, seq_len, and tokenizers. Currently many follow-up works achieve better MT performance and you can refer to their codebase, too.

from diffuseq.

chiral-carbon commented on September 27, 2024

Yeah makes sense, thanks! Are you referring to works like SeqDiffuSeq which builds on DiffuSeq directly?

from diffuseq.

chiral-carbon commented on September 27, 2024

@summmeer thanks, this is very helpful! in the paper DiNoiSer, the authors claim to have surpassed DiffuSeq's performance on the WMT14 EN->DE task, so I wanted to do a similar comparison between DiffuSeq and DiNoiSer on the IWSLT14 task, but DiffuSeq takes a long time to train.
Even with the QQP task reported in the paper, I tried training it to replicate the results and on 4 A100 GPUs it took 6.5 days to train (WandB overview), so do you think there is additional distributed training code required to train DiffuSeq more efficiently?

Sorry for the trivial question, your replies are really helpful, thanks!

from diffuseq.

chiral-carbon commented on September 27, 2024

I will, thanks a lot!

from diffuseq.

Recommend Projects

Machine Translation Task with DiffuSeq about diffuseq HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent