Giter VIP home page Giter VIP logo

Comments (6)

summmeer avatar summmeer commented on September 27, 2024 1

It depends on what your goal is using diffusion model for MT tasks. Follow-up works are not exactly the same with DiffuSeq. SeqDiffuSeq is based on encoder-decoder architecture, while RDM is based on discrete text diffusion. This work also involves pre-trained MLMs. If you're aiming the performance, you could refer to the SOTA model.

from diffuseq.

summmeer avatar summmeer commented on September 27, 2024 1

Hi,
Maybe you can try our updated version 2, which is 4x faster on training and 800x faster on sampling on QQP datasets. [We update the information of v2 in README.md]

from diffuseq.

summmeer avatar summmeer commented on September 27, 2024

Hi,
You can have a try. But different hyper-parameters may lead to different results, including bsz, steps, dim, seq_len, and tokenizers. Currently many follow-up works achieve better MT performance and you can refer to their codebase, too.

from diffuseq.

chiral-carbon avatar chiral-carbon commented on September 27, 2024

Yeah makes sense, thanks! Are you referring to works like SeqDiffuSeq which builds on DiffuSeq directly?

from diffuseq.

chiral-carbon avatar chiral-carbon commented on September 27, 2024

@summmeer thanks, this is very helpful! in the paper DiNoiSer, the authors claim to have surpassed DiffuSeq's performance on the WMT14 EN->DE task, so I wanted to do a similar comparison between DiffuSeq and DiNoiSer on the IWSLT14 task, but DiffuSeq takes a long time to train.
Even with the QQP task reported in the paper, I tried training it to replicate the results and on 4 A100 GPUs it took 6.5 days to train (WandB overview), so do you think there is additional distributed training code required to train DiffuSeq more efficiently?

Sorry for the trivial question, your replies are really helpful, thanks!

from diffuseq.

chiral-carbon avatar chiral-carbon commented on September 27, 2024

I will, thanks a lot!

from diffuseq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.