herobd / dessurt Goto Github PK

View Code? Open in Web Editor NEW

56.0 56.0 8.0 7.49 MB

Official implementation for Dessurt

License: MIT License

Python 100.00%

dessurt's People

Contributors

Stargazers

Watchers

Forkers

tahlor jprakash-1 aniketgurav mhhamdan daniyalaliev xdmickeyyau gurusura

dessurt's Issues

trainer.py file

/content/dessurt/train.py in main(rank, config, resume, world_size)
136 print("Begin training")
137 #warnings.filterwarnings("error")
--> 138 trainer.train()
139
140

/content/dessurt/base/base_trainer.py in train(self)
352
353 if result is None:
--> 354 result = self._train_iteration(self.iteration)
355 #if self.retry_count>1:
356 # print('Failed all {} times!'.format(self.retry_count))

/content/dessurt/trainer/qa_trainer.py in _train_iteration(self, iteration)
130 batch_idx = (iteration-1) % len(self.data_loader)
131 try:
--> 132 thisInstance = self.data_loader_iter.next()
133 except StopIteration:
134 self.data_loader_iter = iter(self.data_loader)

AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute 'next'

I am facing above error when I am fine tuneing on dessurt model.

Release code and datasets

Hi,

Thank you for sharing interesting paper. I was wondering if there is an expected date on when you will be releasing your code and datasets for pre-training Dessurt.

Details of rendering synthetic forms

Hi,

Thanks for releasing the great code! I'm trying to pre-train a Dessurt-like model on your provided pre-training datasets. However, I couldn't find how to render the synthetic forms generated from GPT-2.

Could you explain it or provide the code?

Missing metadata in DocVQA validation/evaluation process

Hi, thanks for releasing your great work!
By the way, I'm currently having trouble with training and evaluating on official DocVQA dataset.
Here's the error message I got.

start valid loop
Traceback (most recent call last):
  File "train.py", line 236, in <module>
    main(None,config, args.resume)
  File "train.py", line 138, in main
    trainer.train()
  File "/root/dessurt/base/base_trainer.py", line 428, in train
    val_result = self._valid_epoch()
  File "/root/dessurt/trainer/qa_trainer.py", line 259, in _valid_epoch
    losses,log_run, out = self.run(instance,valid=True)
  File "/root/dessurt/trainer/qa_trainer.py", line 693, in run
    assert len(b_metadata['all_answers'])==1
TypeError: 'NoneType' object is not subscriptable

From debugging, I found that metadata is missing in DocVQA dataset.
After I fixed the following codeline, I could finally evaluate the model with official DocVQA dataset

dessurt/data_sets/docvqa.py

Line 57 in 662fbcb

return None,None,None,None,qa

        return None,None,None,form_metadata,qa

Could you confirm that this modification is that the code originally intended?
Thank you! :)

Multi-Language support

Hi~ Thanks for your great job! Does your pre-training model support Chinese or other languages?

Colab Demo

Thank you for publishing this work!
The effort you did for the creation of the synthetic dataset is really great!

I tried to run the colab demo -
"Running Dessurt fine-tuned on DocVQA interactively: https://colab.research.google.com/drive/1rvjBv70Cguigp5Egay6VnuO-ZYgu24Ax?usp=sharing"

There is an error downloading the file with gdown, tried with
!gdown --fuzzy "https://drive.google.com/file/d/1Lj6xMvQcF9dSCxVQS2nia4SiEoPXbtCv/view?usp=sharing" -O dessurt_docvqa_best.pth

but i get following error:

Access denied with the following error:
Cannot retrieve the public link of the file. You may need to change
the permission to 'Anyone with the link', or have had many accesses.
You may still be able to access the file from the browser:

Could you possibly change the permission of the weihts? Downloading and reuploading it to Colab take excrutiatingly long time...

thank you!

Docvqa data and its format

Hi, Thanks for sharing your valuable work.
I try to finetune dessurt model to my own dataset. Contains receipt like documents. What is the exact supervision that ı need to provide. And Do ı have to give bounding boxes of answers etc... I would be so happy if you share the dataset you used or format of the dataset you used for docqva task.

Thanks in advance

ITT-CDIP annotation

Hey, Thanks for the great work and for open sourcing it !
I was trying to use your IIT-CDIP-annotations for pre-traning my own model for research purposes. But the annotations that you provided don't cover all the dataset. Is this intended ?
Thanks !

Results reported for Donut are much lower than those found in the original paper for DocVQA !

Hello,
Is there any explication why the results reported for Donut in the paper are much lower than those found in the original paper for DocVQA ?
Did you use a different resolution ? Or was the version used for Donut a previous one ?
Thanks !

Inference and training are really slow

Hello,
I just finished comparing Dessurt to Donut using various datasets.
Dessurt shows better performances but it is much slower than Donut.
Is it normale or did I miss something ?
For inference I use this code : output=net(image,[['mytasktoken']],RUN=True). Where net is an instance of Dessurt.
Thanks in advance !

Fine-Tunining on QA with Bounding Boxes

Hi,

Thank you for making this valuable project publicly accessible. I am trying to fine-tune the Dessurt on a receipt-like documents on the natural_q~ task. I would like to feed bounding boxes for each question and answer. However, I could not understand the format for bounding boxes. It looks like each bbox has 16 values by looking at the crop_transform.py. I understand the first 8 values repesent the coordinates for 4 corners. Can you explain what are the next 8 used for? Is it like one bbox with 8 values for question and one bbox with the next 8 values for answer? If not, can you also explain how am I supposed to feed bbox for question and answer separately?

Thanks for your time and effort.

Inference vs training phase discrepancy

Hi @herobd,

I'm trying to finetune a Dessurt model on my own VQA task (predicting a few fields on proof of address documents, like the name of the person, his/her address, city, zip code, ...).

I've set "print_pred_every": 100 to control how the model behave during training phase. While not perfect, the model seems to give answers near the ground-truth in training phase, e.g.

iter 498800
0 [Q]:natural_q~Quelle est la ville du consommateur ?	[A]:MONTIGNE LE BRILLANT	[P]:STTRENY LE BRILLANT
Train iteration: 498800,           mean abs grad: 0.000,
      loss: 0.466,            answerLoss: 7.456,
      score_ed: 0.308,            natural_q~_ED: 0.308,
      sec_per_iter: 0.593,            avg_mean abs grad: 0.000,
      avg_loss: 0.257,            avg_answerLoss: 4.117,
      avg_score_ed: 0.347,            avg_natural_q~_ED: 0.347,

(...)

iter 499400
0 [Q]:natural_q~Quelle est la ville du consommateur ?	[A]:AIGREFEUILLE	[P]:BEIGREFEUILLE
Train iteration: 499400,           mean abs grad: 0.000,
      loss: 0.205,            answerLoss: 3.288,
      score_ed: 0.160,            natural_q~_ED: 0.160,
      sec_per_iter: 0.589,            avg_mean abs grad: 0.000,
      avg_loss: 0.226,            avg_answerLoss: 3.621,
      avg_score_ed: 0.310,            avg_natural_q~_ED: 0.310,

However, when using the latest weights in prediction mode using run.py script, I have very different results. For example, on the training sample with the right answer being "MONTIGNE LE BRILLANT" for the city question, here's the result.

>>> main("/home/qsuser/src/dessurt/saved/dessurt_qs_dom_qa_fra_finetune/checkpoint-iteration500000.pth", "/home/qsuser/src/dessurt/configs/cf_dessurt_qs_dom_qa_finetune.json", "/home/qsuser/Work/ProofOfAddress/Data/JDD_2023_05_09/images_dessurt_questions_fra/train/875fe8f4c8e3ad9e8559956a3fcbf058/image.jpg", [], True, default_task_token="natural_q~", dont_output_mask=False)
loaded dessurt_qs_dom_qa_fra_finetune iteration 500000
Using default task token: natural_q~
 (if another token is entered with the query, the default is overridden)
Query: Quelle est la ville du consommateur ?
Answer: ST PIERRE DES CORPS

The model clearly hallucinates an answer, and the output mask seems to be completely random output (the answer is located in the upper right corner, inside the address block).

I have no clue why it seems to produce better answers during training phase. Do you have any idea ?

I'm also sharing my configuration file for reference: cf_dessurt_qs_dom_qa_finetune.json

I would be thankful for any help on training dessurt :)

synthetic_text_gen

2023-02-27 16:48:56.505009: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-27 16:48:56.505147: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-27 16:48:56.505167: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
loaded iteration 0
unspecified dataset: /content/drive/MyDrive/dataset/MDdata/valid/SKMBT_75122072616550_Page_45_Image_0001.png
getting data ready
could not import datasets
Traceback (most recent call last):
File "qa_eval.py", line 595, in
main(args.checkpoint, args.data_set_name, gpu=args.gpu, config=args.config, addToConfig=addtoconfig,test=args.test,verbose=args.verbosity,run=run,smaller_set=args.smaller_set,eval_full=args.eval_full,ner_do_before=args.ner_do_before)
File "qa_eval.py", line 419, in main
data_loader, valid_data_loader = getDataLoader(data_config,'train' if not test else 'test')
File "/content/dessurt/data_loader/data_loaders.py", line 36, in getDataLoader
from data_sets import multiple_dataset
File "/content/dessurt/data_sets/multiple_dataset.py", line 16, in
from .synth_form_dataset import SynthFormDataset
File "/content/dessurt/data_sets/synth_form_dataset.py", line 11, in
from .gen_daemon import GenDaemon
File "/content/dessurt/data_sets/gen_daemon.py", line 4, in
from synthetic_text_gen import SyntheticWord
ModuleNotFoundError: No module named 'synthetic_text_gen'

I am facing above error when I run eval.py to evaluate the dataset