Giter VIP home page Giter VIP logo

diffusion-bert's People

Contributors

hzfinfdu avatar txsun1997 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

diffusion-bert's Issues

self is not defined for discrete_diffusion_predict_fn()

In the function def discrete_diffusion_predict_fn(), self.device() is called, however the self is not defined in this function. Code snippet here, self.device() is giving the error:

if predict_x0:
        init_state = SamplingState(x, x, torch.tensor([num_steps], device=self.device))
    else:
        init_state = SamplingState(x, None, torch.tensor([num_steps], device=self.device))

I tried to pass in device as function arg and manipulate the devices of variables here and didn't make it.

Please provide an updated discrete_diffusion_predict_fn() that addresses this device inconsistency if possible.

Inquiry on some details of the method.

As said in the second paragraph of Section 4.3, "We attribute the superior performance of DiffusionBERT to its onetime sampling of all tokens". I wonder the meaning of "onetime sampling of all tokens", does it mean generating all the tokens in a sentence at a time? If it does, it seems to conflict with the demonstration in Table 1. Thank you!

why TypeError?

Why am I running Word. py,the following error occurred:
Traceback (most recent call last):
File "C:\GithubProjects\Diffusion-BERT-main\word_freq.py", line 18, in
for iid in data['input_ids']:
TypeError: string indices must be integers

checkpoint

Thanks for your great work!

Can you also release trained checkpoints to make it more convenient to reproduce experiment result?

How did you calculate the perplexity of DiffusionLM

Greetings,

I am currently working on diffusion for text generation as well. In your paper you have included the PPL of DiffusionLM in your results for comparison. I would I assume you have derived this from the ELBO from the loss of the model, right? Would you please share more details of the computation? For example, what loss you are using and whether you have estimate the token level PPL or the sequence level PPL. It would be great if you can share the code for this part as well.

Thank you very much. Your help is appreciated as we would like to cite this method.

unfinished codebase?

Hello, first, thank you for your work. I find it fascinating!
I was wondering if the codebase isn't yet complete, since the predict.py and predict_downstream_conditionals.py still have missing imports, etc.

I was hoping to see how the model actually functions after I've trained it after one epoch.
Any plans on updating soon?

代码公布时间?

作者,你好!
请问最近有计划发布代码吗?大概是什么时候呢?

Missing key(s) in state_dict for unconditional

Hi,

When I was trying to load the checkpoint, it gives the following error:

Missing key(s) in state_dict: Missing key(s) in state_dict: "bert.embeddings.position_ids", "bert.embeddings.word_embeddings.weight", "bert.embeddings.position_embeddings.weight", "bert.embeddings.token_type_embeddings.weight", "bert.embeddings.LayerNorm.weight", "bert.embeddings.LayerNorm.bias", "bert.encoder.layer.0.attention.self.query.weight", "bert.encoder.layer.0.attention.self.query.bias", "bert.encoder.layer.0.attention.self.key.weight",......

and a lot of other layer infos.

It looks like the state_dict has keys "module.bert...." rather than "bert..."as expected. Seems it's similar to issue #17 so please kindly help. How would I fix this issue? Thanks in advance.

P.S. I got the model checkpoints by running DDP_main.py. I saved earlier-stage checkpoints and stopped training as it took too long in eval mode with warnings "NAN encountered ... times". Does your training look the same?

No module named 'perplexity

When I run predict.py, an error message 'No module named 'perplexity'' appears. After checking, I found that in addition to perplexity, the compute_metric package is also missing from the environment. How can I download these libraries?

code?

I don't see any code here, is there somewhere else to look?

Resuming training via `--load_step`

Thanks for the code release!

Heads up for other users who want to resume training from a checkpoint: you will want to

  1. de-indent DDP_main.py:80 so that all devices can load the checkpoint
  2. load the optimizer and scheduler states on line DDP_main:146
  3. set the index of the dataloader to the correct example before actually training

I'm not totally sure this solves everything like logging, but might work ok.

Note: There's also a separate issue that your checkpoints might get overwritten between epochs, so be sure you're loading the right thing and saving where you want.

Lower-case in LM1B

Hello!

In the paper you write

All text data are lower-cased to align with the settings of Austin et al. (2021)

But in D3PM paper it is never stated that LM1B data was lower-cased (and you can see samples from their model in the appendix where the sentences contain upper-case characters). So the perplexity comparison seems incorrect, because it is easier to model all-lowercased text. Am I missing something?

How to evaluate BLEU score on LM1B?

Dear authors,

I understand that you plan to release your code on January. But could you share more details regarding how you evaluate the BLEU score and PPL on the LM1B dataset? I am also working on Diffusion Model for text and may potentially cite your paper. Thanks!

Function missed

This work is fascinating and attractive, however, I have an issue when I read the code.
Line 511 in diffusion_word_freq.py "schedule_fn = utils.create_learning_rate_scheduler(" calls the function create_learning_rate_scheduler in file utils.py. But I don't find the definition in utils.py, maybe the code is incomplete?
Thanks.

No module named 'perplexity

When I run predict.py, an error message 'No module named 'perplexity'' appears. how can I download this library?

关于时间表

你好,我在您的论文中看到您新创建的一种新的Spindle方式,我了解到它首先需要用程序中的word_freq.py统计词频,然后会生成一个pt文件。但我发先在我没有运行统计词频代码情况下,还是可以继续后面的步骤进行训练。我在代码里也没有看到训练引用pt词频文件。我想问的是我该如何使用您的Spindle方式进行加噪

Here are some of my problems, please advise

Hi, Dear author :
1. the paper is in q (Xt | X0) this part if you use the denominator instead not to calculate?
2. Why predict that x0 is a floating point number and not map it to one-hot?
3. step = t - 1 and step = t + 1 appear frequently in your code. Do they have a specific meaning? thank you

GPT mentioend in Figure3

Dear authors,

Thanks for open-sourcing your wonderful work.

You mention GPT in Figure 3 when comparing the Pareto front across different models("AR models of the same size"). May I ask if this is a pre-trained GPT (e.g. GPT2-small) finetuned on the LM1B dataset, or a model with GPT architecture trained from scrach on the LM1B training set?

How to fine-tune it

It seems that a pre-trained language model. Could i run train on a lot of unconditional text to get a checkpoint then fine-tuning the model on Seq2seq tasks?

Reimplemented Diffusion-LM

Hi,

Thank you for the amazing work.

As I read your paper and noticed that you had reimplemented Diffusion-LM, do you have any plan to open source your implementation as well?

Thanks.

Missing key(s) in state_dict when testing using predict_downstream_condition.py

python predict_downstream_condition.py --ckpt_path model_name_roberta-base_taskname_qqp_lr_3e-05_seed_42_numsteps_2000_sample_Categorical_schedule_mutual_hybridlambda_0.0003_wordfreqlambda_0.0_fromscratch_False_timestep_none_ckpts/best(38899).th
using standard schedule with num_steps: 2000.
Traceback (most recent call last):
File "predict_downstream_condition.py", line 101, in
model.load_state_dict(ckpt['model'])
File "/opt/conda/envs/diff/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1672, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for RobertaForMaskedLM:
Missing key(s) in state_dict: "roberta.embeddings.position_ids", "roberta.embeddings.word_embeddings.weight", "roberta.embeddings.position_embeddings.weight", "roberta.embeddings.token_type_embeddings.weight", "roberta.embeddings.LayerNorm.weight", "roberta.embeddings.LayerNorm.bias", "roberta.encoder.layer.0.attention.self.query.weight", "roberta.encoder.layer.0.attention.self.query.bias".........................

Do you plan to release the checkpoints?

Hi,

I try to train a model on a different dataset but the loss doesn't change that much. I wonder if you could release the checkpoints so could first load the model and then finetune it on my own dataset?

Thanks

How to calculate entropy?

Dear authors,
Thank you for your paper. It was quite illuminating.

Your proposed noise schedule requires the entropy value of each word/token before noising, but I couldn't find how you calculated it. Is it per sentence/ngram/corpus/etc. ? Any libraries you used to calculate it, or was it manual?

Thank you for your time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.