hzfinfdu / diffusion-bert Goto Github PK
View Code? Open in Web Editor NEWACL'2023: DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models
License: Apache License 2.0
ACL'2023: DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models
License: Apache License 2.0
In the function def discrete_diffusion_predict_fn(), self.device() is called, however the self is not defined in this function. Code snippet here, self.device() is giving the error:
if predict_x0:
init_state = SamplingState(x, x, torch.tensor([num_steps], device=self.device))
else:
init_state = SamplingState(x, None, torch.tensor([num_steps], device=self.device))
I tried to pass in device as function arg and manipulate the devices of variables here and didn't make it.
Please provide an updated discrete_diffusion_predict_fn() that addresses this device inconsistency if possible.
When running predict.py, encountering the error message "FileNotFoundError: [Errno 2] No such file or directory: '/remote-home/zfhe/projects/diffusion_torch/D3PM_new_timestep_ckpts/best(1799999).th'". What should I do to make it run properly?
I encountered this error after the model was trained and ready for testing. I found it to be the same as #17, you mentioned under that issue that certain parameters need to be changed, what exactly should I do?
As said in the second paragraph of Section 4.3, "We attribute the superior performance of DiffusionBERT to its onetime sampling of all tokens". I wonder the meaning of "onetime sampling of all tokens", does it mean generating all the tokens in a sentence at a time? If it does, it seems to conflict with the demonstration in Table 1. Thank you!
Why am I running Word. py,the following error occurred:
Traceback (most recent call last):
File "C:\GithubProjects\Diffusion-BERT-main\word_freq.py", line 18, in
for iid in data['input_ids']:
TypeError: string indices must be integers
Thanks for your great work!
Can you also release trained checkpoints to make it more convenient to reproduce experiment result?
Greetings,
I am currently working on diffusion for text generation as well. In your paper you have included the PPL of DiffusionLM in your results for comparison. I would I assume you have derived this from the ELBO from the loss of the model, right? Would you please share more details of the computation? For example, what loss you are using and whether you have estimate the token level PPL or the sequence level PPL. It would be great if you can share the code for this part as well.
Thank you very much. Your help is appreciated as we would like to cite this method.
Hello, first, thank you for your work. I find it fascinating!
I was wondering if the codebase isn't yet complete, since the predict.py and predict_downstream_conditionals.py still have missing imports, etc.
I was hoping to see how the model actually functions after I've trained it after one epoch.
Any plans on updating soon?
作者,你好!
请问最近有计划发布代码吗?大概是什么时候呢?
Hi,
When I was trying to load the checkpoint, it gives the following error:
Missing key(s) in state_dict: Missing key(s) in state_dict: "bert.embeddings.position_ids", "bert.embeddings.word_embeddings.weight", "bert.embeddings.position_embeddings.weight", "bert.embeddings.token_type_embeddings.weight", "bert.embeddings.LayerNorm.weight", "bert.embeddings.LayerNorm.bias", "bert.encoder.layer.0.attention.self.query.weight", "bert.encoder.layer.0.attention.self.query.bias", "bert.encoder.layer.0.attention.self.key.weight",......
and a lot of other layer infos.
It looks like the state_dict has keys "module.bert...." rather than "bert..."as expected. Seems it's similar to issue #17 so please kindly help. How would I fix this issue? Thanks in advance.
P.S. I got the model checkpoints by running DDP_main.py. I saved earlier-stage checkpoints and stopped training as it took too long in eval mode with warnings "NAN encountered ... times". Does your training look the same?
When I run predict.py, an error message 'No module named 'perplexity'' appears. After checking, I found that in addition to perplexity, the compute_metric package is also missing from the environment. How can I download these libraries?
I don't see any code here, is there somewhere else to look?
Thanks for the code release!
Heads up for other users who want to resume training from a checkpoint: you will want to
I'm not totally sure this solves everything like logging, but might work ok.
Note: There's also a separate issue that your checkpoints might get overwritten between epochs, so be sure you're loading the right thing and saving where you want.
Hi,
Thanks for the interesting work.
May I ask when the code will be made public, and whether the parameter of Bert will change during training?
Thanks.
Hello!
In the paper you write
All text data are lower-cased to align with the settings of Austin et al. (2021)
But in D3PM paper it is never stated that LM1B data was lower-cased (and you can see samples from their model in the appendix where the sentences contain upper-case characters). So the perplexity comparison seems incorrect, because it is easier to model all-lowercased text. Am I missing something?
Hi @Hzfinfdu,
May I ask where can I find more details about conditional generation? Thank you!
Dear authors,
I understand that you plan to release your code on January. But could you share more details regarding how you evaluate the BLEU score and PPL on the LM1B dataset? I am also working on Diffusion Model for text and may potentially cite your paper. Thanks!
This work is fascinating and attractive, however, I have an issue when I read the code.
Line 511 in diffusion_word_freq.py "schedule_fn = utils.create_learning_rate_scheduler(" calls the function create_learning_rate_scheduler in file utils.py. But I don't find the definition in utils.py, maybe the code is incomplete?
Thanks.
Very interesting work! Curious is whether there is a plan to release the code recently?
When I run predict.py, an error message 'No module named 'perplexity'' appears. how can I download this library?
你好,我在您的论文中看到您新创建的一种新的Spindle方式,我了解到它首先需要用程序中的word_freq.py统计词频,然后会生成一个pt文件。但我发先在我没有运行统计词频代码情况下,还是可以继续后面的步骤进行训练。我在代码里也没有看到训练引用pt词频文件。我想问的是我该如何使用您的Spindle方式进行加噪
Hi, Dear author :
1. the paper is in q (Xt | X0) this part if you use the denominator instead not to calculate?
2. Why predict that x0 is a floating point number and not map it to one-hot?
3. step = t - 1 and step = t + 1 appear frequently in your code. Do they have a specific meaning? thank you
Dear authors,
Thanks for open-sourcing your wonderful work.
You mention GPT in Figure 3 when comparing the Pareto front across different models("AR models of the same size"). May I ask if this is a pre-trained GPT (e.g. GPT2-small) finetuned on the LM1B dataset, or a model with GPT architecture trained from scrach on the LM1B training set?
It seems that a pre-trained language model. Could i run train on a lot of unconditional text to get a checkpoint then fine-tuning the model on Seq2seq tasks?
Thanks you!
Thanks
Hi,
Thank you for the amazing work.
As I read your paper and noticed that you had reimplemented Diffusion-LM, do you have any plan to open source your implementation as well?
Thanks.
python predict_downstream_condition.py --ckpt_path model_name_roberta-base_taskname_qqp_lr_3e-05_seed_42_numsteps_2000_sample_Categorical_schedule_mutual_hybridlambda_0.0003_wordfreqlambda_0.0_fromscratch_False_timestep_none_ckpts/best(38899).th
using standard schedule with num_steps: 2000.
Traceback (most recent call last):
File "predict_downstream_condition.py", line 101, in
model.load_state_dict(ckpt['model'])
File "/opt/conda/envs/diff/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1672, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for RobertaForMaskedLM:
Missing key(s) in state_dict: "roberta.embeddings.position_ids", "roberta.embeddings.word_embeddings.weight", "roberta.embeddings.position_embeddings.weight", "roberta.embeddings.token_type_embeddings.weight", "roberta.embeddings.LayerNorm.weight", "roberta.embeddings.LayerNorm.bias", "roberta.encoder.layer.0.attention.self.query.weight", "roberta.encoder.layer.0.attention.self.query.bias".........................
Hi,
I try to train a model on a different dataset but the loss doesn't change that much. I wonder if you could release the checkpoints so could first load the model and then finetune it on my own dataset?
Thanks
Dear authors,
Thank you for your paper. It was quite illuminating.
Your proposed noise schedule requires the entropy value of each word/token before noising, but I couldn't find how you calculated it. Is it per sentence/ngram/corpus/etc. ? Any libraries you used to calculate it, or was it manual?
Thank you for your time.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.