Giter VIP home page Giter VIP logo

itabqa's Introduction

TabIQA: Table Questions Answering on Business Document Images

This is the instruction on how to reproduce TabIQA experiments on VQAonBD 2023.


Install

Install itabqa

git clone https://github.com/phucty/itabqa.git
cd itabqa

conda create -n itabqa python=3.8
conda activate itabqa
pip install poetry
poetry shell
poetry install

Install MTL-TabNet

git clone https://github.com/phucty/MTL-TabNet.git

Please follow the instruction of MTL-TabNet to install the module

Install OmniTab

git clone https://github.com/phucty/OmniTab.git

Please follow the instruction of OmniTab to install the tool. You might need to install omnitab in a different conda env and different pytorch version.


Config itabqa

Please setup working directory to your setting in itabqa/config.py file:

  • HOME_ROOT: itabqa project directory

    e.g,: /home/phuc/itabqa

  • DATA_ROOT: store models, and dataset

    e.g,: /disks/strg16-176/VQAonBD2023/data

  • DATA_VQA: VQAonBD 2023 dataset

    e.g, {DATA_ROOT}/vqaondb2023

  • DATA_PUBTAB: HTML tables infered from table structure extraction,

    e.g., {DATA_ROOT}/TR_output


Table structure extraction

Please install MTL-TabNet (the table structure extraction module) in your GPU server and download the checkpoint file in here. Run the the following command to gerate HTML tables from document images (you can change the path of input images, the path of outputs and the path of the checkpoint file):

CUDA_VISIBLE_DEVICES=0 python3 -u ./table_recognition/table_inference_VQAonBD2023_inference.py 1 0

You can run the script in multi GPUs (4 GPUs) by the following commands:

CUDA_VISIBLE_DEVICES=0 python3 -u ./table_recognition/table_inference_VQAonBD2023_inference.py 4 0
CUDA_VISIBLE_DEVICES=1 python3 -u ./table_recognition/table_inference_VQAonBD2023_inference.py 4 1
CUDA_VISIBLE_DEVICES=2 python3 -u ./table_recognition/table_inference_VQAonBD2023_inference.py 4 2
CUDA_VISIBLE_DEVICES=3 python3 -u ./table_recognition/table_inference_VQAonBD2023_inference.py 4 3

Generate training samples for QA model

python run_gen_training_samples.py

Fine-tune with OmniTab

Note: We fine-tune OmniTab on 4 A100 40GB. If you have V100 please change per_device_train_batch_size, and per_device_eval_batch_size to 6

cd OmniTab
conda activate ominitab

python -m torch.distributed.launch --nproc_per_node=4 run.py \
    --do_train \
    --train_file /disks/strg16-176/VQAonBD2023/data/train_all_rawjson \
    --validation_file /disks/strg16-176/VQAonBD2023/data/train_100_raw.json \
    --model_name_or_path neulab/omnitab-large \
    --output_dir /disks/strg16-176/VQAonBD2023/models/omnitab-large-finetuned-qa-all-raw \
    --max_source_length 1024 \
    --max_target_length 128 \
    --val_max_target_length 128 \
    --per_device_train_batch_size 12 \
    --gradient_accumulation_steps 2 \
    --per_device_eval_batch_size 12 \
    --num_train_epochs 50.0 \
    --warmup_ratio 0.1 \
    --learning_rate 2e-5 \
    --fp16 \
    --logging_steps 100 \
    --eval_steps 1000000 \
    --save_steps 50000 \
    --evaluation_strategy steps \
    --predict_with_generate \
    --num_beams 5 \
    --generation_max_length 128 \
    --overwrite_output_dir

Run QA:

The qa models are in itabqa/qa.py. After fine-tuning, like previous example, the model is in /disks/strg16-176/VQAonBD2023/models/omnitab-large-finetuned-qa-all-raw. The pretrained model is here. We can run QA inference as

cd ..
python run_qa_inference.py

The answers will be saved in answers/raw_3_all


Cite

If you find TabIQA tool useful in your work, and you want to cite our work, please use the following referencee:

@article{nguyen2023tabiqa,
  title={TabIQA: Table Questions Answering on Business Document Images},
  author={Nguyen, Phuc and Ly, Nam Tuan and Takeda, Hideaki and Takasu, Atsuhiro},
  journal={arXiv preprint arXiv:2303.14935},
  year={2023}
}

itabqa's People

Contributors

namtuanly avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

itabqa's Issues

what is run_exp lib

Hello. Thank you very much for sharing this code. Just a question about the "from run_exp import run_eval_split". Where is this library from?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.