Giter VIP home page Giter VIP logo

dessurt's People

Contributors

herobd avatar kexiii avatar stewartsetha avatar victoresque avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dessurt's Issues

trainer.py file

/content/dessurt/train.py in main(rank, config, resume, world_size)
136 print("Begin training")
137 #warnings.filterwarnings("error")
--> 138 trainer.train()
139
140

/content/dessurt/base/base_trainer.py in train(self)
352
353 if result is None:
--> 354 result = self._train_iteration(self.iteration)
355 #if self.retry_count>1:
356 # print('Failed all {} times!'.format(self.retry_count))

/content/dessurt/trainer/qa_trainer.py in _train_iteration(self, iteration)
130 batch_idx = (iteration-1) % len(self.data_loader)
131 try:
--> 132 thisInstance = self.data_loader_iter.next()
133 except StopIteration:
134 self.data_loader_iter = iter(self.data_loader)

AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute 'next'

I am facing above error when I am fine tuneing on dessurt model.

Release code and datasets

Hi,

Thank you for sharing interesting paper. I was wondering if there is an expected date on when you will be releasing your code and datasets for pre-training Dessurt.

Details of rendering synthetic forms

Hi,

Thanks for releasing the great code! I'm trying to pre-train a Dessurt-like model on your provided pre-training datasets. However, I couldn't find how to render the synthetic forms generated from GPT-2.

Could you explain it or provide the code?

Missing metadata in DocVQA validation/evaluation process

Hi, thanks for releasing your great work!
By the way, I'm currently having trouble with training and evaluating on official DocVQA dataset.
Here's the error message I got.

start valid loop
Traceback (most recent call last):
  File "train.py", line 236, in <module>
    main(None,config, args.resume)
  File "train.py", line 138, in main
    trainer.train()
  File "/root/dessurt/base/base_trainer.py", line 428, in train
    val_result = self._valid_epoch()
  File "/root/dessurt/trainer/qa_trainer.py", line 259, in _valid_epoch
    losses,log_run, out = self.run(instance,valid=True)
  File "/root/dessurt/trainer/qa_trainer.py", line 693, in run
    assert len(b_metadata['all_answers'])==1
TypeError: 'NoneType' object is not subscriptable

From debugging, I found that metadata is missing in DocVQA dataset.
After I fixed the following codeline, I could finally evaluate the model with official DocVQA dataset

return None,None,None,None,qa

=>

        return None,None,None,form_metadata,qa

Could you confirm that this modification is that the code originally intended?
Thank you! :)

Multi-Language support

Hi~ Thanks for your great job! Does your pre-training model support Chinese or other languages?

Colab Demo

Thank you for publishing this work!
The effort you did for the creation of the synthetic dataset is really great!

I tried to run the colab demo -
"Running Dessurt fine-tuned on DocVQA interactively: https://colab.research.google.com/drive/1rvjBv70Cguigp5Egay6VnuO-ZYgu24Ax?usp=sharing"

There is an error downloading the file with gdown, tried with
!gdown --fuzzy "https://drive.google.com/file/d/1Lj6xMvQcF9dSCxVQS2nia4SiEoPXbtCv/view?usp=sharing" -O dessurt_docvqa_best.pth

but i get following error:

Access denied with the following error:
Cannot retrieve the public link of the file. You may need to change
the permission to 'Anyone with the link', or have had many accesses.
You may still be able to access the file from the browser:

Could you possibly change the permission of the weihts? Downloading and reuploading it to Colab take excrutiatingly long time...

thank you!

Docvqa data and its format

Hi, Thanks for sharing your valuable work.
I try to finetune dessurt model to my own dataset. Contains receipt like documents. What is the exact supervision that ı need to provide. And Do ı have to give bounding boxes of answers etc... I would be so happy if you share the dataset you used or format of the dataset you used for docqva task.

Thanks in advance

ITT-CDIP annotation

Hey, Thanks for the great work and for open sourcing it !
I was trying to use your IIT-CDIP-annotations for pre-traning my own model for research purposes. But the annotations that you provided don't cover all the dataset. Is this intended ?
Thanks !

Inference and training are really slow

Hello,
I just finished comparing Dessurt to Donut using various datasets.
Dessurt shows better performances but it is much slower than Donut.
Is it normale or did I miss something ?
For inference I use this code : output=net(image,[['mytasktoken']],RUN=True). Where net is an instance of Dessurt.
Thanks in advance !

Fine-Tunining on QA with Bounding Boxes

Hi,

Thank you for making this valuable project publicly accessible. I am trying to fine-tune the Dessurt on a receipt-like documents on the natural_q~ task. I would like to feed bounding boxes for each question and answer. However, I could not understand the format for bounding boxes. It looks like each bbox has 16 values by looking at the crop_transform.py. I understand the first 8 values repesent the coordinates for 4 corners. Can you explain what are the next 8 used for? Is it like one bbox with 8 values for question and one bbox with the next 8 values for answer? If not, can you also explain how am I supposed to feed bbox for question and answer separately?

Thanks for your time and effort.

Inference vs training phase discrepancy

Hi @herobd,

I'm trying to finetune a Dessurt model on my own VQA task (predicting a few fields on proof of address documents, like the name of the person, his/her address, city, zip code, ...).

I've set "print_pred_every": 100 to control how the model behave during training phase. While not perfect, the model seems to give answers near the ground-truth in training phase, e.g.

iter 498800
0 [Q]:natural_q~Quelle est la ville du consommateur ?	[A]:MONTIGNE LE BRILLANT	[P]:STTRENY LE BRILLANT
Train iteration: 498800,           mean abs grad: 0.000,
      loss: 0.466,            answerLoss: 7.456,
      score_ed: 0.308,            natural_q~_ED: 0.308,
      sec_per_iter: 0.593,            avg_mean abs grad: 0.000,
      avg_loss: 0.257,            avg_answerLoss: 4.117,
      avg_score_ed: 0.347,            avg_natural_q~_ED: 0.347,

(...)

iter 499400
0 [Q]:natural_q~Quelle est la ville du consommateur ?	[A]:AIGREFEUILLE	[P]:BEIGREFEUILLE
Train iteration: 499400,           mean abs grad: 0.000,
      loss: 0.205,            answerLoss: 3.288,
      score_ed: 0.160,            natural_q~_ED: 0.160,
      sec_per_iter: 0.589,            avg_mean abs grad: 0.000,
      avg_loss: 0.226,            avg_answerLoss: 3.621,
      avg_score_ed: 0.310,            avg_natural_q~_ED: 0.310,

However, when using the latest weights in prediction mode using run.py script, I have very different results. For example, on the training sample with the right answer being "MONTIGNE LE BRILLANT" for the city question, here's the result.

>>> main("/home/qsuser/src/dessurt/saved/dessurt_qs_dom_qa_fra_finetune/checkpoint-iteration500000.pth", "/home/qsuser/src/dessurt/configs/cf_dessurt_qs_dom_qa_finetune.json", "/home/qsuser/Work/ProofOfAddress/Data/JDD_2023_05_09/images_dessurt_questions_fra/train/875fe8f4c8e3ad9e8559956a3fcbf058/image.jpg", [], True, default_task_token="natural_q~", dont_output_mask=False)
loaded dessurt_qs_dom_qa_fra_finetune iteration 500000
Using default task token: natural_q~
 (if another token is entered with the query, the default is overridden)
Query: Quelle est la ville du consommateur ?
Answer: ST PIERRE DES CORPS

The model clearly hallucinates an answer, and the output mask seems to be completely random output (the answer is located in the upper right corner, inside the address block).
image

I have no clue why it seems to produce better answers during training phase. Do you have any idea ?

I'm also sharing my configuration file for reference: cf_dessurt_qs_dom_qa_finetune.json

I would be thankful for any help on training dessurt :)

synthetic_text_gen

2023-02-27 16:48:56.505009: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-27 16:48:56.505147: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-27 16:48:56.505167: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
loaded iteration 0
unspecified dataset: /content/drive/MyDrive/dataset/MDdata/valid/SKMBT_75122072616550_Page_45_Image_0001.png
getting data ready
could not import datasets
Traceback (most recent call last):
File "qa_eval.py", line 595, in
main(args.checkpoint, args.data_set_name, gpu=args.gpu, config=args.config, addToConfig=addtoconfig,test=args.test,verbose=args.verbosity,run=run,smaller_set=args.smaller_set,eval_full=args.eval_full,ner_do_before=args.ner_do_before)
File "qa_eval.py", line 419, in main
data_loader, valid_data_loader = getDataLoader(data_config,'train' if not test else 'test')
File "/content/dessurt/data_loader/data_loaders.py", line 36, in getDataLoader
from data_sets import multiple_dataset
File "/content/dessurt/data_sets/multiple_dataset.py", line 16, in
from .synth_form_dataset import SynthFormDataset
File "/content/dessurt/data_sets/synth_form_dataset.py", line 11, in
from .gen_daemon import GenDaemon
File "/content/dessurt/data_sets/gen_daemon.py", line 4, in
from synthetic_text_gen import SyntheticWord
ModuleNotFoundError: No module named 'synthetic_text_gen'

I am facing above error when I run eval.py to evaluate the dataset

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.