herobd / dessurt Goto Github PK
View Code? Open in Web Editor NEWOfficial implementation for Dessurt
License: MIT License
Official implementation for Dessurt
License: MIT License
/content/dessurt/train.py in main(rank, config, resume, world_size)
136 print("Begin training")
137 #warnings.filterwarnings("error")
--> 138 trainer.train()
139
140
/content/dessurt/base/base_trainer.py in train(self)
352
353 if result is None:
--> 354 result = self._train_iteration(self.iteration)
355 #if self.retry_count>1:
356 # print('Failed all {} times!'.format(self.retry_count))
/content/dessurt/trainer/qa_trainer.py in _train_iteration(self, iteration)
130 batch_idx = (iteration-1) % len(self.data_loader)
131 try:
--> 132 thisInstance = self.data_loader_iter.next()
133 except StopIteration:
134 self.data_loader_iter = iter(self.data_loader)
AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute 'next'
I am facing above error when I am fine tuneing on dessurt model.
Hi,
Thank you for sharing interesting paper. I was wondering if there is an expected date on when you will be releasing your code and datasets for pre-training Dessurt.
Hi,
Thanks for releasing the great code! I'm trying to pre-train a Dessurt-like model on your provided pre-training datasets. However, I couldn't find how to render the synthetic forms generated from GPT-2.
Could you explain it or provide the code?
Hi, thanks for releasing your great work!
By the way, I'm currently having trouble with training and evaluating on official DocVQA dataset.
Here's the error message I got.
start valid loop
Traceback (most recent call last):
File "train.py", line 236, in <module>
main(None,config, args.resume)
File "train.py", line 138, in main
trainer.train()
File "/root/dessurt/base/base_trainer.py", line 428, in train
val_result = self._valid_epoch()
File "/root/dessurt/trainer/qa_trainer.py", line 259, in _valid_epoch
losses,log_run, out = self.run(instance,valid=True)
File "/root/dessurt/trainer/qa_trainer.py", line 693, in run
assert len(b_metadata['all_answers'])==1
TypeError: 'NoneType' object is not subscriptable
From debugging, I found that metadata is missing in DocVQA dataset.
After I fixed the following codeline, I could finally evaluate the model with official DocVQA dataset
Line 57 in 662fbcb
return None,None,None,form_metadata,qa
Could you confirm that this modification is that the code originally intended?
Thank you! :)
Hi~ Thanks for your great job! Does your pre-training model support Chinese or other languages?
Thank you for publishing this work!
The effort you did for the creation of the synthetic dataset is really great!
I tried to run the colab demo -
"Running Dessurt fine-tuned on DocVQA interactively: https://colab.research.google.com/drive/1rvjBv70Cguigp5Egay6VnuO-ZYgu24Ax?usp=sharing"
There is an error downloading the file with gdown, tried with
!gdown --fuzzy "https://drive.google.com/file/d/1Lj6xMvQcF9dSCxVQS2nia4SiEoPXbtCv/view?usp=sharing" -O dessurt_docvqa_best.pth
but i get following error:
Access denied with the following error:
Cannot retrieve the public link of the file. You may need to change
the permission to 'Anyone with the link', or have had many accesses.
You may still be able to access the file from the browser:
Could you possibly change the permission of the weihts? Downloading and reuploading it to Colab take excrutiatingly long time...
thank you!
Hi, Thanks for sharing your valuable work.
I try to finetune dessurt model to my own dataset. Contains receipt like documents. What is the exact supervision that ı need to provide. And Do ı have to give bounding boxes of answers etc... I would be so happy if you share the dataset you used or format of the dataset you used for docqva task.
Thanks in advance
Hey, Thanks for the great work and for open sourcing it !
I was trying to use your IIT-CDIP-annotations for pre-traning my own model for research purposes. But the annotations that you provided don't cover all the dataset. Is this intended ?
Thanks !
Hello,
Is there any explication why the results reported for Donut in the paper are much lower than those found in the original paper for DocVQA ?
Did you use a different resolution ? Or was the version used for Donut a previous one ?
Thanks !
Hello,
I just finished comparing Dessurt to Donut using various datasets.
Dessurt shows better performances but it is much slower than Donut.
Is it normale or did I miss something ?
For inference I use this code : output=net(image,[['mytasktoken']],RUN=True)
. Where net is an instance of Dessurt.
Thanks in advance !
Hi,
Thank you for making this valuable project publicly accessible. I am trying to fine-tune the Dessurt on a receipt-like documents on the natural_q~ task. I would like to feed bounding boxes for each question and answer. However, I could not understand the format for bounding boxes. It looks like each bbox has 16 values by looking at the crop_transform.py. I understand the first 8 values repesent the coordinates for 4 corners. Can you explain what are the next 8 used for? Is it like one bbox with 8 values for question and one bbox with the next 8 values for answer? If not, can you also explain how am I supposed to feed bbox for question and answer separately?
Thanks for your time and effort.
Hi @herobd,
I'm trying to finetune a Dessurt model on my own VQA task (predicting a few fields on proof of address documents, like the name of the person, his/her address, city, zip code, ...).
I've set "print_pred_every": 100
to control how the model behave during training phase. While not perfect, the model seems to give answers near the ground-truth in training phase, e.g.
iter 498800
0 [Q]:natural_q~Quelle est la ville du consommateur ? [A]:MONTIGNE LE BRILLANT [P]:STTRENY LE BRILLANT
Train iteration: 498800, mean abs grad: 0.000,
loss: 0.466, answerLoss: 7.456,
score_ed: 0.308, natural_q~_ED: 0.308,
sec_per_iter: 0.593, avg_mean abs grad: 0.000,
avg_loss: 0.257, avg_answerLoss: 4.117,
avg_score_ed: 0.347, avg_natural_q~_ED: 0.347,
(...)
iter 499400
0 [Q]:natural_q~Quelle est la ville du consommateur ? [A]:AIGREFEUILLE [P]:BEIGREFEUILLE
Train iteration: 499400, mean abs grad: 0.000,
loss: 0.205, answerLoss: 3.288,
score_ed: 0.160, natural_q~_ED: 0.160,
sec_per_iter: 0.589, avg_mean abs grad: 0.000,
avg_loss: 0.226, avg_answerLoss: 3.621,
avg_score_ed: 0.310, avg_natural_q~_ED: 0.310,
However, when using the latest weights in prediction mode using run.py script, I have very different results. For example, on the training sample with the right answer being "MONTIGNE LE BRILLANT" for the city question, here's the result.
>>> main("/home/qsuser/src/dessurt/saved/dessurt_qs_dom_qa_fra_finetune/checkpoint-iteration500000.pth", "/home/qsuser/src/dessurt/configs/cf_dessurt_qs_dom_qa_finetune.json", "/home/qsuser/Work/ProofOfAddress/Data/JDD_2023_05_09/images_dessurt_questions_fra/train/875fe8f4c8e3ad9e8559956a3fcbf058/image.jpg", [], True, default_task_token="natural_q~", dont_output_mask=False)
loaded dessurt_qs_dom_qa_fra_finetune iteration 500000
Using default task token: natural_q~
(if another token is entered with the query, the default is overridden)
Query: Quelle est la ville du consommateur ?
Answer: ST PIERRE DES CORPS
The model clearly hallucinates an answer, and the output mask seems to be completely random output (the answer is located in the upper right corner, inside the address block).
I have no clue why it seems to produce better answers during training phase. Do you have any idea ?
I'm also sharing my configuration file for reference: cf_dessurt_qs_dom_qa_finetune.json
I would be thankful for any help on training dessurt :)
2023-02-27 16:48:56.505009: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-27 16:48:56.505147: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-27 16:48:56.505167: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
loaded iteration 0
unspecified dataset: /content/drive/MyDrive/dataset/MDdata/valid/SKMBT_75122072616550_Page_45_Image_0001.png
getting data ready
could not import datasets
Traceback (most recent call last):
File "qa_eval.py", line 595, in
main(args.checkpoint, args.data_set_name, gpu=args.gpu, config=args.config, addToConfig=addtoconfig,test=args.test,verbose=args.verbosity,run=run,smaller_set=args.smaller_set,eval_full=args.eval_full,ner_do_before=args.ner_do_before)
File "qa_eval.py", line 419, in main
data_loader, valid_data_loader = getDataLoader(data_config,'train' if not test else 'test')
File "/content/dessurt/data_loader/data_loaders.py", line 36, in getDataLoader
from data_sets import multiple_dataset
File "/content/dessurt/data_sets/multiple_dataset.py", line 16, in
from .synth_form_dataset import SynthFormDataset
File "/content/dessurt/data_sets/synth_form_dataset.py", line 11, in
from .gen_daemon import GenDaemon
File "/content/dessurt/data_sets/gen_daemon.py", line 4, in
from synthetic_text_gen import SyntheticWord
ModuleNotFoundError: No module named 'synthetic_text_gen'
I am facing above error when I run eval.py to evaluate the dataset
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.