Giter VIP home page Giter VIP logo

spade's Introduction

SPADE

Introduction

  • This repository contains the source code of our recent publication Spatial Dependency Parsing for Semi-Structured Document Information Extraction. The paper is accepted at Findings of ACL 2021.

  • SPADE♠️ (SPAtial DEpendency parsing) accepts 2D text (text segments and their xy-coordinates).

  • SPADE♠ generates the graph that represents semi-structured documents (such as receipts, name cards, invoices).

Task

img_1.png

Setup

  • The code is tested on NVIDIA-P40, NAME="Ubuntu", VERSION="16.04.6 LTS (Xenial Xerus)"
  1. conda create --name spade python==3.7.10
  2. conda activate spade
  3. git clone [this-repo]
  4. pip install -r requirements
  5. Download data.tar.gz from here. The file also include the small model trained on CORD dataset.
mv data.tar.gz [project-dir]
tar xvfz data.tar.gz
  1. Download pretrained multi-lingual bert
cd scripts
python download_pretrained_models.py
  1. Test the code with the sample data (input: ./data/samples/cord_predict.json)

    bash scripts/predict_cord.sh

  2. (Optional) Download funsd dataset

    bash scripts/preprocess_funsd.sh

Data

Input (type1)

  • Example from CORD-dev (data/sample/cord_dev.jsonl)
    {
      "data_id": 0, 
      "fields": ["menu.cnt", "menu.discountprice", "menu.itemsubtotal", "menu.nm", "menu.price", "menu.sub_cnt", "menu.sub_nm", "menu.sub_price", "menu.unitprice", "menu.sub_num", "menu.discountprice", "menu.num", "menu.sub_discountprice", "menu.sub_etc", "menu.etc", "menu.vatyn", "menu.itemsubtotal", "menu.sub_unitprice", "sub_total.discount_price", "sub_total.service_price", "sub_total.subtotal_price", "sub_total.tax_price", "sub_total.tax_and_service", "sub_total.etc", "sub_total.othersvc_price", "total.total_price", "total.menuqty_cnt", "total.total_etc", "total.emoneyprice", "total.menutype_cnt", "total.cashprice", "total.changeprice", "total.creditcardprice", "void_menu.nm", "void_menu.cnt", "void_menu.price", "void_menu.unitprice", "void_total.total_price", "void_total.subtotal_price", "void_total.tax_price", "void_total.etc"],
      "field_rs": ["menu.nm", "sub_total.subtotal_price", "total.total_price", "void_menu.nm", "void_total.total_price"], 
      "text": ["1", "REAL", "GANACHE", "16,500", ...]  
      "label": [[[1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]], 
      "coord": [[[176, 556], [194, 556], [194, 586], [176, 586]], [[202, 554], [266, 554], [266, 586], [202, 586]], [[272, 554], [372, 554], [372, 586], [272, 586]], [[580, 552], [664, 552], [664, 584], [580, 584]], [[176, 590], [194, 590], [194, 620], [176, 620]], [[204, 588], [252, 588], [252, 620], [204, 620]], [[258, 588], [320, 588], [320, 618], [258, 618]], [[580, 586], [664, 586], [664, 618], [580, 618]], [[176, 624], [194, 624], [194, 654], [176, 654]], [[202, 622], [280, 622], [280, 654], [202, 654]], [[286, 620], [360, 620], [360, 652], [286, 652]], [[580, 620], [666, 620], [666, 650], [580, 650]], [[200, 686], [348, 686], [348, 748], [200, 748]], [[498, 684], [670, 684], [670, 746], [498, 746]], [[202, 746], [266, 746], [266, 778], [202, 778]], [[580, 740], [668, 740], [668, 770], [580, 770]], [[195, 779], [375, 770], [378, 833], [198, 841]], [[524, 772], [672, 772], [672, 834], [524, 834]]], 
      "vertical": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
      "img_sz": {"width": 864, "height": 1296}, 
      "img_feature": null, 
      "img_url": null
    }
    
    • fields: a list of field types to be parsed.
    • field_rs: a list of representative field types that are used for inter-field grouping.
    • text: a list of text segments.
    • label: [label-s, label-g]
    • label-s: (n_field + n_text) x n_text adjacency matrix expressing rel-s (serialization). null when predicting.
    • label-g: (n_field + n_text) x n_text adjacency matrix expressing rel-g (grouping). null when predicting.
    • coord: a list ofxy-coord of text-box.
    • xy-coord: [xy-top-left, xy-top-right, xy-bottom-right, xy-bottom-left]
    • img_sz: an image size
    • img_feature: an image feature. Currently not used.
    • img_url: an image url.
  • In the uploaded data.tar.gz, you can also find type0 data where the data is organized reflecting their original format. In this case, raw_data_input_type should be set to type0 and label is generated while loading the data.

Test output

CORD

{
  "test__avg_loss": 0.08372728526592255,
  "test__f1": 0.9101991060544494,
  "test__precision_edge_avg": 0.932888658103816,
  "test__recall_edge_avg": 0.9192414351541544,
  "test__f1_edge_avg": 0.9259259429039002,
  "test__precision_edge_of_type_0": 0.9672624647224836,
  "test__recall_edge_of_type_0": 0.9710993577635059,
  "test__f1_edge_of_type_0": 0.9691771137713262,
  "test__precision_edge_of_type_1": 0.8985148514851485,
  "test__recall_edge_of_type_1": 0.8673835125448028,
  "test__f1_edge_of_type_1": 0.882674772036474
}
  • ave_loss: Average cross entropy loss
  • f1: $F_1$ of parse.
  • [precision|recall|f1]_edge_avg: An average precision, recall, and $F_1$ of dependency parsing.
  • [precision|recall|f1]_edge_of_type[0|1]: Precision, recall, and $F_1$ of dependency parsing of individual types: type0 for rel-s and type for rel-g.

FUNSD

In addition to the scores shown in CORD example, it includes

{
  "p_r_f1_entity": [
    [
      0.59375,
      0.3114754098360656,
      0.40860215053763443
    ],
    [
      0.8152524167561761,
      0.7047353760445683,
      0.7559760956175299
    ],
    [
      0.8589341692789969,
      0.6674786845310596,
      0.7511994516792323
    ],
    [
      0.6359447004608295,
      0.4423076923076923,
      0.5217391304347826
    ]
  ],
  "p_r_f1_all_entity_ELB": [
    0.8016216216216216,
    0.635934819897084,
    0.7092300334768054
  ],
  "p_r_f1_link_ELK": [
    0.6720977596741344,
    0.3101503759398496,
    0.42443729903536975
  ]
}
  • p_r_f1_entity: [ [p_r_f1_question], [p_r_f1_answer], [p_r_f1_header], [p_r_f1_others]] for the entity labeling task.
  • p_r_f1_entity_ELB: The FUNSD entity labeling task precision, recall, and $F_1$ scores for all fields.
  • p_r_f1_entity_ELK: The FUNSD entity linking task precision, recall, and $F_1$ scores for all fields.

Prediction output

{
    "data_id": "00081",
    "text_unit": ["1", "SU", "##RI", "##MI","29", ... ],
    "pr_parse": [
      [{"menu.nm": "SURIMI"}, {"menu.cnt": "1"}, {"menu.price": "29,091"}], 
      [{"menu.nm": "CREAMY CHK CLS FTC"}, {"menu.cnt": "1"}, {"menu.price": "42,727"}],
      [{"menu.nm": "MIX 4FUN CHOCOLATE"}, {"menu.cnt": "1"}], 
      [{"menu.nm": "GREEN ITSODA PITCHER"}, {"menu.price": "19,091"}, {"menu.cnt": "1"}], 
      [{"menu.nm": "SC/R GRILLED STEAK"}, {"menu.cnt": "1"}, {"menu.price": "99,091"}], 
      [{"sub_total.subtotal_price": "250,909"}, {"sub_total.tax_price": "25,091"}], 
      [{"total.total_price": "276,000"}]],
    "pr_label": [
      [
        [
          1,
          0,
          0,
          ...
         ],
         ... 
        ]
      ]
    "pr_text_unit_field_label": ["menu.cnt","menu.nm","menu.nm","menu.nm", "menu.price",...]
}
  • data_id: A data id.

  • text_unit: A list of tokens or text segments.

  • pr_parse: A predicted parse.

  • pr_label: A predicted adjacency matrices representing a dependency graph. Simliar to label in the input but each columm and row represents a text unit which is either token or text segment.

  • pr_text_unit_field_label: A list of field-type label for each token in text_unit.

Preprocessing

CORD

  • The preprocessed data and trained model is already included for CORD datset (type1).
  • To generate them from from (almost) raw data (type0), do bash scripts/preprocess_cord.sh

FUNSD

  • bash scripts/preprocess_funsd.sh

Model

Training

  • bash scripts/train_[task].sh
  • Takes around 4 days on 6 P40 gpus with DDP.
  • The best model is picked using dev set for cord.
  • For FUNSD task, use 'eary stopping' for the model validation. If the model is trained with uploaded config file with 6 P40, 2000-4000 epochs are recommended (some fluctuation in the final score is expected due to small size of the dataset depending on the random seed). Do not use the validation score for the model selection. It is dummy.

Evaluation

bash scripts/test_[task].sh

Prediction

bash scripts/predict_cord.sh

Citation

@inproceedings{hwang2021spade,
    title = "Spatial Dependency Parsing for Semi-Structured Document Information Extraction",
      author = {Wonseok Hwang and
               Jinyeung Yim and
               Seunghyun Park and
               Sohee Yang and
               Minjoon Seo},
    booktitle = "ACL",
    year = {2021}
}

License

Copyright 2021-present NAVER Corp.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an " AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

spade's People

Contributors

whwang299 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

spade's Issues

Error in pred_cord.sh

Thank you for sharing nice work. I wanna check the pred_cord scripts.
I followed all script in your readme file. But, I'm using 3090 RTX, so I couldn't install the version in the requirement.txt file.
It returns like this errors. Could you please check this problem?
(+ below is pip list in my anaconda env)

Package                 Version
----------------------- ---------------
absl-py                 0.13.0
addict                  2.4.0
aiohttp                 3.8.1
aiosignal               1.2.0
antlr4-python3-runtime  4.8
async-timeout           4.0.2
attrs                   21.4.0
cachetools              4.2.2
certifi                 2021.5.30
cffi                    1.14.3
charset-normalizer      2.0.4
click                   8.0.1
colorama                0.4.4
colorlog                6.4.1
cosine-annealing-warmup 2.0
cycler                  0.10.0
Cython                  0.29.24
editdistance            0.5.3
filelock                3.6.0
frozenlist              1.3.0
fsspec                  2022.1.0
future                  0.18.2
google-auth             1.35.0
google-auth-oauthlib    0.4.6
grpcio                  1.40.0
huggingface-hub         0.4.0
idna                    3.2
iniconfig               1.1.1
joblib                  1.0.1
kiwisolver              1.3.2
lmdb                    1.2.1
Markdown                3.3.4
matplotlib              3.4.3
mkl-fft                 1.3.0
mkl-random              1.2.2
mkl-service             2.4.0
mmcv                    0.2.12
multidict               6.0.2
munch                   2.5.0
nltk                    3.6.2
numpy                   1.21.2
oauthlib                3.1.1
olefile                 0.46
omegaconf               2.1.1
opencv-python           4.5.3.56
opencv-python-headless  4.5.2.54
packaging               21.3
pandas                  1.3.3
Pillow                  8.3.2
pip                     21.2.4
pluggy                  0.13.1
Polygon3                3.0.9.1
protobuf                3.18.0
py                      1.11.0
pyasn1                  0.4.8
pyasn1-modules          0.2.8
pyclipper               1.3.0
pycparser               2.20
pyDeprecate             0.3.1
pyparsing               2.4.7
pytest                  6.2.3
python-dateutil         2.8.2
pytorch-lightning       1.5.10
pytz                    2021.1
PyYAML                  5.4.1
regex                   2021.8.28
requests                2.26.0
requests-oauthlib       1.3.0
rsa                     4.7.2
sacremoses              0.0.47
scipy                   1.7.1
seaborn                 0.11.2
setuptools              59.5.0
six                     1.16.0
tensorboard             2.6.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit  1.8.0
tensorboardX            2.4
tokenizers              0.10.3
toml                    0.10.2
torch                   1.10.2+cu113
torchaudio              0.9.0a0+33b2469
torchmetrics            0.7.2
torchvision             0.11.3+cu113
tqdm                    4.62.2
transformers            4.16.2
typing-extensions       3.10.0.2
urllib3                 1.26.6
Werkzeug                2.0.1
wheel                   0.37.0
yarl                    1.7.2

image

Share FUNSD trained model weights

Hi,

It is a great help that you have shared a SPADE model trained on CORD dataset - makes setting up and sanity-checks with this codebase much easier.

Any chance you could also share the weights of a trained FUNSD model?

thanks,
Aruni

Custom Data Training

Thank you for the amazing work. The approach mentioned in published paper accurately addresses the problem of relationship between tokens (text detected by OCR). I want to train SPADE on Custom dataset. Could you please explain how to annotate data for training?

  • Most Importantly how to define field nodes (filled blue circles)

CORD_Process

Hello, I really hope to get the processing code from the original data to type0 format of the cor dataset, Can I?

Updating weight name missmatch

Thank you for sharing the code !

I am experiencing some issues with the model weight update:

The weight of the pretraiend model bert-base-multilingual-cased
The # of weights 1.78e+08
pretrained bert-base-multilingual-cased is used
!!!!embeddings.position_ids model param. is not presented in child model!!!!
embeddings.word_embeddings.weight updated
!!!!embeddings.position_embeddings.weight model param. is not presented in child model!!!!
embeddings.token_type_embeddings.weight updated
embeddings.LayerNorm.weight updated
embeddings.LayerNorm.bias updated
encoder.layer.0.attention.self.query.weight updated
encoder.layer.0.attention.self.query.bias updated
encoder.layer.0.attention.self.key.weight updated
encoder.layer.0.attention.self.key.bias updated
encoder.layer.0.attention.self.value.weight updated
encoder.layer.0.attention.self.value.bias updated
encoder.layer.0.attention.output.dense.weight updated
encoder.layer.0.attention.output.dense.bias updated
encoder.layer.0.attention.output.LayerNorm.weight updated
encoder.layer.0.attention.output.LayerNorm.bias updated
encoder.layer.0.intermediate.dense.weight updated
encoder.layer.0.intermediate.dense.bias updated
encoder.layer.0.output.dense.weight updated
encoder.layer.0.output.dense.bias updated
encoder.layer.0.output.LayerNorm.weight updated
encoder.layer.0.output.LayerNorm.bias updated
encoder.layer.1.attention.self.query.weight updated
encoder.layer.1.attention.self.query.bias updated
encoder.layer.1.attention.self.key.weight updated
encoder.layer.1.attention.self.key.bias updated
encoder.layer.1.attention.self.value.weight updated
encoder.layer.1.attention.self.value.bias updated
encoder.layer.1.attention.output.dense.weight updated
encoder.layer.1.attention.output.dense.bias updated
encoder.layer.1.attention.output.LayerNorm.weight updated
encoder.layer.1.attention.output.LayerNorm.bias updated
encoder.layer.1.intermediate.dense.weight updated
encoder.layer.1.intermediate.dense.bias updated
encoder.layer.1.output.dense.weight updated
encoder.layer.1.output.dense.bias updated
encoder.layer.1.output.LayerNorm.weight updated
encoder.layer.1.output.LayerNorm.bias updated
encoder.layer.2.attention.self.query.weight updated
encoder.layer.2.attention.self.query.bias updated
encoder.layer.2.attention.self.key.weight updated
encoder.layer.2.attention.self.key.bias updated
encoder.layer.2.attention.self.value.weight updated
encoder.layer.2.attention.self.value.bias updated
encoder.layer.2.attention.output.dense.weight updated
encoder.layer.2.attention.output.dense.bias updated
encoder.layer.2.attention.output.LayerNorm.weight updated
encoder.layer.2.attention.output.LayerNorm.bias updated
encoder.layer.2.intermediate.dense.weight updated
encoder.layer.2.intermediate.dense.bias updated
encoder.layer.2.output.dense.weight updated
encoder.layer.2.output.dense.bias updated
encoder.layer.2.output.LayerNorm.weight updated
encoder.layer.2.output.LayerNorm.bias updated
encoder.layer.3.attention.self.query.weight updated
encoder.layer.3.attention.self.query.bias updated
encoder.layer.3.attention.self.key.weight updated
encoder.layer.3.attention.self.key.bias updated
encoder.layer.3.attention.self.value.weight updated
encoder.layer.3.attention.self.value.bias updated
encoder.layer.3.attention.output.dense.weight updated
encoder.layer.3.attention.output.dense.bias updated
encoder.layer.3.attention.output.LayerNorm.weight updated
encoder.layer.3.attention.output.LayerNorm.bias updated
encoder.layer.3.intermediate.dense.weight updated
encoder.layer.3.intermediate.dense.bias updated
encoder.layer.3.output.dense.weight updated
encoder.layer.3.output.dense.bias updated
encoder.layer.3.output.LayerNorm.weight updated
encoder.layer.3.output.LayerNorm.bias updated
encoder.layer.4.attention.self.query.weight updated
encoder.layer.4.attention.self.query.bias updated
encoder.layer.4.attention.self.key.weight updated
encoder.layer.4.attention.self.key.bias updated
encoder.layer.4.attention.self.value.weight updated
encoder.layer.4.attention.self.value.bias updated
encoder.layer.4.attention.output.dense.weight updated
encoder.layer.4.attention.output.dense.bias updated
encoder.layer.4.attention.output.LayerNorm.weight updated
encoder.layer.4.attention.output.LayerNorm.bias updated
encoder.layer.4.intermediate.dense.weight updated
encoder.layer.4.intermediate.dense.bias updated
encoder.layer.4.output.dense.weight updated
encoder.layer.4.output.dense.bias updated
encoder.layer.4.output.LayerNorm.weight updated
encoder.layer.4.output.LayerNorm.bias updated
!!!!encoder.layer.5.attention.self.query.weight model param. is not presented in child model!!!!
!!!!encoder.layer.5.attention.self.query.bias model param. is not presented in child model!!!!
!!!!encoder.layer.5.attention.self.key.weight model param. is not presented in child model!!!!
!!!!encoder.layer.5.attention.self.key.bias model param. is not presented in child model!!!!
!!!!encoder.layer.5.attention.self.value.weight model param. is not presented in child model!!!!
!!!!encoder.layer.5.attention.self.value.bias model param. is not presented in child model!!!!
!!!!encoder.layer.5.attention.output.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.5.attention.output.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.5.attention.output.LayerNorm.weight model param. is not presented in child model!!!!
!!!!encoder.layer.5.attention.output.LayerNorm.bias model param. is not presented in child model!!!!
!!!!encoder.layer.5.intermediate.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.5.intermediate.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.5.output.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.5.output.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.5.output.LayerNorm.weight model param. is not presented in child model!!!!
!!!!encoder.layer.5.output.LayerNorm.bias model param. is not presented in child model!!!!
!!!!encoder.layer.6.attention.self.query.weight model param. is not presented in child model!!!!
!!!!encoder.layer.6.attention.self.query.bias model param. is not presented in child model!!!!
!!!!encoder.layer.6.attention.self.key.weight model param. is not presented in child model!!!!
!!!!encoder.layer.6.attention.self.key.bias model param. is not presented in child model!!!!
!!!!encoder.layer.6.attention.self.value.weight model param. is not presented in child model!!!!
!!!!encoder.layer.6.attention.self.value.bias model param. is not presented in child model!!!!
!!!!encoder.layer.6.attention.output.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.6.attention.output.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.6.attention.output.LayerNorm.weight model param. is not presented in child model!!!!
!!!!encoder.layer.6.attention.output.LayerNorm.bias model param. is not presented in child model!!!!
!!!!encoder.layer.6.intermediate.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.6.intermediate.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.6.output.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.6.output.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.6.output.LayerNorm.weight model param. is not presented in child model!!!!
!!!!encoder.layer.6.output.LayerNorm.bias model param. is not presented in child model!!!!
!!!!encoder.layer.7.attention.self.query.weight model param. is not presented in child model!!!!
!!!!encoder.layer.7.attention.self.query.bias model param. is not presented in child model!!!!
!!!!encoder.layer.7.attention.self.key.weight model param. is not presented in child model!!!!
!!!!encoder.layer.7.attention.self.key.bias model param. is not presented in child model!!!!
!!!!encoder.layer.7.attention.self.value.weight model param. is not presented in child model!!!!
!!!!encoder.layer.7.attention.self.value.bias model param. is not presented in child model!!!!
!!!!encoder.layer.7.attention.output.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.7.attention.output.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.7.attention.output.LayerNorm.weight model param. is not presented in child model!!!!
!!!!encoder.layer.7.attention.output.LayerNorm.bias model param. is not presented in child model!!!!
!!!!encoder.layer.7.intermediate.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.7.intermediate.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.7.output.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.7.output.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.7.output.LayerNorm.weight model param. is not presented in child model!!!!
!!!!encoder.layer.7.output.LayerNorm.bias model param. is not presented in child model!!!!
!!!!encoder.layer.8.attention.self.query.weight model param. is not presented in child model!!!!
!!!!encoder.layer.8.attention.self.query.bias model param. is not presented in child model!!!!
!!!!encoder.layer.8.attention.self.key.weight model param. is not presented in child model!!!!
!!!!encoder.layer.8.attention.self.key.bias model param. is not presented in child model!!!!
!!!!encoder.layer.8.attention.self.value.weight model param. is not presented in child model!!!!
!!!!encoder.layer.8.attention.self.value.bias model param. is not presented in child model!!!!
!!!!encoder.layer.8.attention.output.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.8.attention.output.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.8.attention.output.LayerNorm.weight model param. is not presented in child model!!!!
!!!!encoder.layer.8.attention.output.LayerNorm.bias model param. is not presented in child model!!!!
!!!!encoder.layer.8.intermediate.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.8.intermediate.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.8.output.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.8.output.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.8.output.LayerNorm.weight model param. is not presented in child model!!!!
!!!!encoder.layer.8.output.LayerNorm.bias model param. is not presented in child model!!!!
!!!!encoder.layer.9.attention.self.query.weight model param. is not presented in child model!!!!
!!!!encoder.layer.9.attention.self.query.bias model param. is not presented in child model!!!!
!!!!encoder.layer.9.attention.self.key.weight model param. is not presented in child model!!!!
!!!!encoder.layer.9.attention.self.key.bias model param. is not presented in child model!!!!
!!!!encoder.layer.9.attention.self.value.weight model param. is not presented in child model!!!!
!!!!encoder.layer.9.attention.self.value.bias model param. is not presented in child model!!!!
!!!!encoder.layer.9.attention.output.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.9.attention.output.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.9.attention.output.LayerNorm.weight model param. is not presented in child model!!!!
!!!!encoder.layer.9.attention.output.LayerNorm.bias model param. is not presented in child model!!!!
!!!!encoder.layer.9.intermediate.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.9.intermediate.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.9.output.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.9.output.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.9.output.LayerNorm.weight model param. is not presented in child model!!!!
!!!!encoder.layer.9.output.LayerNorm.bias model param. is not presented in child model!!!!
!!!!encoder.layer.10.attention.self.query.weight model param. is not presented in child model!!!!
!!!!encoder.layer.10.attention.self.query.bias model param. is not presented in child model!!!!
!!!!encoder.layer.10.attention.self.key.weight model param. is not presented in child model!!!!
!!!!encoder.layer.10.attention.self.key.bias model param. is not presented in child model!!!!
!!!!encoder.layer.10.attention.self.value.weight model param. is not presented in child model!!!!
!!!!encoder.layer.10.attention.self.value.bias model param. is not presented in child model!!!!
!!!!encoder.layer.10.attention.output.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.10.attention.output.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.10.attention.output.LayerNorm.weight model param. is not presented in child model!!!!
!!!!encoder.layer.10.attention.output.LayerNorm.bias model param. is not presented in child model!!!!
!!!!encoder.layer.10.intermediate.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.10.intermediate.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.10.output.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.10.output.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.10.output.LayerNorm.weight model param. is not presented in child model!!!!
!!!!encoder.layer.10.output.LayerNorm.bias model param. is not presented in child model!!!!
!!!!encoder.layer.11.attention.self.query.weight model param. is not presented in child model!!!!
!!!!encoder.layer.11.attention.self.query.bias model param. is not presented in child model!!!!
!!!!encoder.layer.11.attention.self.key.weight model param. is not presented in child model!!!!
!!!!encoder.layer.11.attention.self.key.bias model param. is not presented in child model!!!!
!!!!encoder.layer.11.attention.self.value.weight model param. is not presented in child model!!!!
!!!!encoder.layer.11.attention.self.value.bias model param. is not presented in child model!!!!
!!!!encoder.layer.11.attention.output.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.11.attention.output.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.11.attention.output.LayerNorm.weight model param. is not presented in child model!!!!
!!!!encoder.layer.11.attention.output.LayerNorm.bias model param. is not presented in child model!!!!
!!!!encoder.layer.11.intermediate.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.11.intermediate.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.11.output.dense.weight model param. is not presented in child model!!!!
!!!!encoder.layer.11.output.dense.bias model param. is not presented in child model!!!!
!!!!encoder.layer.11.output.LayerNorm.weight model param. is not presented in child model!!!!
!!!!encoder.layer.11.output.LayerNorm.bias model param. is not presented in child model!!!!
!!!!pooler.dense.weight model param. is not presented in child model!!!!
!!!!pooler.dense.bias model param. is not presented in child model!!!!

Would you have any advice on how to resolve this issue ?

I am using the following conda env:

munch                     2.5.0                      py_0    conda-forge
nltk                      3.6.7              pyhd8ed1ab_0    conda-forge
numpy                     1.21.5           py37hf2998dd_0    conda-forge
opencv-python-headless    4.5.2.54                 pypi_0    pypi
pytest                    6.2.5            py37h89c1867_1    conda-forge
python                    3.7.12          hf930737_100_cpython    conda-forge
pytorch                   1.8.1           py3.7_cuda11.1_cudnn8.0.5_0    pytorch
pytorch-lightning         1.3.8              pyhd8ed1ab_0    conda-forge
tensorboard               2.4.1              pyhd8ed1ab_1    conda-forge
tensorboard-plugin-wit    1.8.1              pyhd8ed1ab_0    conda-forge
tokenizers                0.10.3           py37hcb7a40c_1    conda-forge
torchmetrics              0.3.2              pyhd8ed1ab_0    conda-forge
torchvision               0.9.1                py37_cu111    pytorch
transformers              4.5.1              pyhd8ed1ab_1    conda-forge

Accuracy running codebase does not match paper

Hi @whwang299 ,

Running evaluation with this codebase on CORD matches the numbers in the github README: "test__f1": 0.9101991060544494

However, this does not match any of the numbers for parse prediction on CORD in the paper: checking the numbers of "CO"RD with oracle, SPADE gives 0.915, and SPADE with tail collision is 0.925.

image

Can you please help resolve this mismatch?

thank you,
Aruni

FUNSD 성능 관련 문의

안녕하세요 먼저 좋은 연구 공개해주셔서 감사합니다 :)

궁금한 사항이 있는데요

1) Memory Error 관련

  • 먼저 답변해주신 내용들을 살펴보니 relative_attention도 학습되게 하기 위해선 메모리 24GB 되는 사양이 필요한 것으로 보입니다
  • 16GB 사양의 gpu를 여러 개(multi-gpu 활용) 사용하더라도 해결하기 어려운 것인가요??

2) FUNSD 성능 관련

  • FUNSD 성능의 경우 논문을 보면 ELB는 70초반, ELK는 41.3으로 되어 있습니다.
  • no_relative_attention을 true하고 epoch은 2500으로 학습한 뒤 best 모델 기준으로 test 성능을 보면 낮게 나오는 것으로 보입니다
  • 혹시 no_relative_attention 옵션의 영향인가요??
{"test__avg_loss": 0.2842002809047699, 
"test__f1": -1, 
"test__precision_edge_avg": 0.20648076446339075, 
"test__recall_edge_avg": 0.09938241102015245, 
"test__f1_edge_avg": 0.133308610191091, 
"test__precision_edge_of_type_0": 0.29918907383696114, 
"test__recall_edge_of_type_0": 0.1588488556537503, 
"test__f1_edge_of_type_0": 0.20751924215512135, 
edge_of_type_1": 0.11377245508982035, 
"test__recall_edge_of_type_1": 0.03991596638655462, 
"test__f1_edge_of_type_1": 0.05909797822706065, 
"p_r_f1_entity": [
[0.47058823529411764, 0.13333333333333333, 0.2077922077922078], 
[0.7303921568627451, 0.5995975855130785, 0.6585635359116022], 
[0.75, 0.556135770234987, 0.638680659670165], 
[0.5909090909090909, 0.16049382716049382, 0.2524271844660194]], 
"p_r_f1_all_entity_ELB": [0.7237715803452855, 0.4945553539019964, 0.5876010781671159], 
"p_r_f1_link_ELK": [0.27941176470588236, 0.03991596638655462, 0.06985294117647059]}

3) FUNSD 예측 코드 관련

  • predict_funsd.sh의 경우 공개되지 않은 것으로 보이는데 predict_cord.sh를 가져와서 일부 변경해서 사용하면 되는 건가요??

4) field_rs 관련

  • 추가로 궁금한 것이 있는데 field_rs를 왜 사용하는 것인지 궁금합니다.
  • funsd, cord 데이터에 따라 그것이 갖는 의미도 다른 것 같습니다.
  • cord에서는 서로 다른 field를 묶어줄 때 맨 앞에 와야 하는 것이기 때문에 사용하나요??
  • funsd 에서는 key-value를 묶어주기 위해(question - answer 형태) question을 field_rs로 지정하는 것인가요?

감사합니다!!

how to hook up spade decoder to other encoder

Thank you for the amazing work for the first! I am thinking to apply the spade decoder to hook up with other encoder(like Bert, LayoutLM) that can be pretrained. Wondering what is the best procedure to do it. Any instructions will be very helpful! Thanks!

Custom model

I found Layoutlmv2 takes both the image and the context into the pretrain. Why would you use BERT and not Layoutlmv2. I want custom to use layoutlmv2 for encoder, is it possible? Thank you so much

Possible Memory Leak - CUDA out of memory after several epochs

I'm training the SPADE model on a custom data set. For that I implemented some data mangling that converts my data to type0. This is working and I can train for several steps/epochs. However, there seems to be a memory leak. I can observe a trend of GPU memory consumption increasing and not being released again.

I am using the default configuration as for FUNSD (funsd.1.5layers.train.yaml) only changing the labels to suit my use case.

I added logging of allocated cuda memory to tensorboard where the trend of increased memory usage is observable, see the attached image.

image

I already checked for potential memory leaks due to repeatedly adding tensors still connected to the graph to a collection by adding several .detach() calls to no avail.

I'm currently running this on a Quadro RTX 8000 with 48 GB memory, so the requirement of 24 GB as mentioned here is met.

Has this behaviour been observed before? Any suggestions why this may happen?

[성능 관련]

안녕하세요 먼저, 좋은 예제와 논문을 공유해주셔서 감사합니다.
논문을 보다보니
image

Invoice - japan 데이터가 있는데 약, 1000장 정도이고 text node 수도 FUNSD 데이터 수에 비해 더 많은 것으로 보입니다.

image

하지만 성능을 보면 FUNSD 데이터 보다 성능이 더 높게 나오는 것으로 보이는데요(INV: 85, FUNSD:72)

  1. Invocie 양식이 FUNSD 보다 text node 수도 많고 문서 양식도 더 복잡할 것으로 생각되는데 그에 비해 성능이 더 높게 나오는 것이 어떤 이유에서인지 궁금합니다.
  • 학습 데이터 크기가 더 많아서 그런가요?
  1. 혹시 일본어 영수증 데이터는 접근할 수 있는 방법이 있나요??

No output in test results

Hi, team, It's great that you can open the source code for your excellent work. I'm trying to run the test scripts in scripts/test_cord.sh. I have only 1 GPU so I change the definition of trainer object in spade/model/run_model.py to be like:
trainer = pl.Trainer( logger=tb_logger, log_every_n_steps=cfg.train_param.get("log_every_n_steps", 50), gpus=1, max_epochs=cfg.train_param.max_epochs, val_check_interval=cfg.train_param.val_check_interval, limit_train_batches=cfg.train_param.limit_train_batches, limit_val_batches=cfg.train_param.limit_val_batches, num_sanity_val_steps=1, progress_bar_refresh_rate=100, accumulate_grad_batches=cfg.train_param.accumulate_grad_batches, precision=cfg.model_param.precision, gradient_clip_val=cfg.train_param.gradient_clip_val, gradient_clip_algorithm=cfg.train_param.gradient_clip_algorithm, )
however, I got empty results after testing iterations finished, something like
DATALOADER:0 TEST RESULTS {}

I am wondering is this script suitable for single gpu training and testing?
PS: I have already downloaded the data and pretrained model and the predict script got normal outputs.

Inferencing Time

Hi team,

What is the inferencing/prediction time for the model on GPU? How can we measure the inferencing time for each component (tokenizer, graph generation etc)

Data Augmentation

Hi Team,

Thanks for opensourcing the project. In the paper, one of the ablations study with data augmentation leads to performance drop. Can that be check from the current code ? If yes, please let me know how to do it. Thanks in Advance

Data directory additional information

The data.tar.gz from https://drive.google.com/file/d/1863IJuyxFh82wTfxrUQ-Jk5NTTbHYsfK/view contains a few files and different level of distorted images.

image

Are the image ids from test_type0.jsonl related to the images inimgs/distorted/receipt_ind_cord_lvl1.config.json, and the image ids from test_type1.jsonl related to the images inimgs/distorted/receipt_ind_cord_lvl2.config.json ?

Could you provide more information on this dataset and its structure.

I am trying to overlay the prediction of the model with the associated image from this dataset.

Also, What is the difference between test_type1.jsonl, op_test_type1.jsonl, op_test_plus_type1.jsonl and op_test_plus_plus_type1.jsonl ?

Training on FUNSD: Cuda out of Memory on GPU with 12Gb memory.

First of all, congratulations to the entire team on the amazing work.

I was trying to train SPADE on the FUNSD dataset on GPU with 12Gb mem (GeForce RTX 208 Ti).
But getting
RuntimeError: CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 10.75 GiB total capacity; 9.14 GiB already allocated; 24.25 MiB free; 9.42 GiB reserved in total by PyTorch)

Is it at all possible to train SPADE on a GPU with 12Gb mem.?
Comments in another issue says that it needs GPU with at least 24Gb mem. #2 (comment)

Help will be appreciated.
Thanks

Dataset

Hi
I find the class data in CORD is unbalanced, the menu classes take up a lot, does this affect the result? Do you have any way to deal with this imbalance? I am very interested in this project. Thank you for sharing your knowledge
image

Minor typo

In readme,
"Findins of ACL" should be "Findings of ACL"

Thank you

poor performance on FUNSD

Hi author, I tried to run the code on funsd dataset, using the given config
funsd.1.5layers.train.zip
. But I got a poor f1 score...
The test score dict is as follows:
{
"test__avg_loss": 0.7166563868522644,
"test__f1": -1,
"test__precision_edge_avg": 0.14041860174087156,
"test__recall_edge_avg": 0.05718078420725836,
"test__f1_edge_avg": 0.07420206034220918,
"test__precision_edge_of_type_0": 0.19155148919602882,
"test__recall_edge_of_type_0": 0.10966232029421598,
"test__f1_edge_of_type_0": 0.13947554925584693,
"test__precision_edge_of_type_1": 0.08928571428571429,
"test__recall_edge_of_type_1": 0.004699248120300752,
"test__f1_edge_of_type_1": 0.008928571428571428,
"p_r_f1_entity": [
[
0.10144927536231885,
0.05737704918032787,
0.07329842931937174
],
[
0.6223404255319149,
0.43454038997214484,
0.5117550574084199
],
[
0.5880149812734082,
0.38246041412911086,
0.46346863468634686
],
[
0.46938775510204084,
0.22115384615384615,
0.30065359477124187
]
],
"p_r_f1_all_entity_ELB": [
0.5712383488681758,
0.36792452830188677,
0.4475743348982785
],
"p_r_f1_link_ELK": [
0.25,
0.004699248120300752,
0.00922509225092251
]
};

The main difference may be that I train the model on a single-gpu machine for 2500 epoches. I also run the preprocess code for funsd offerred in this repo. However the result on CORD dataset is close to that have been reported in the paper, the test score dict is:
{
"test__avg_loss": 0.08372728526592255,
"test__f1": 0.9101991060544494,
"test__precision_edge_avg": 0.932888658103816,
"test__recall_edge_avg": 0.9192414351541544,
"test__f1_edge_avg": 0.9259259429039002,
"test__precision_edge_of_type_0": 0.9672624647224836,
"test__recall_edge_of_type_0": 0.9710993577635059,
"test__f1_edge_of_type_0": 0.9691771137713262,
"test__precision_edge_of_type_1": 0.8985148514851485,
"test__recall_edge_of_type_1": 0.8673835125448028,
"test__f1_edge_of_type_1": 0.882674772036474
}.

Would you please give me some advice how I can improve the score on the funsd IE task? The attached zip file is my trainning config.

Data

How to make data train the model

Standalone evaluation

Hi,

thanks for sharing the codebase!

Request: do you have standalone evaluation code, that is not part of the PyTorch lightining way of implemeting the evals? I.e., if I have predictions dumped on file, right now it is quite tedious to get evaluation numbers (f1 etc) for them - the eval code and the modeling code are very tightly combined in your codebase. Do you have any standable evaluation script that can take in dumped predicted parses, match against GT, and give the CORD test evaluation numbers? This will make comparisons with your method much easier.

thank you,
Aruni

Making custom "Type1" dataset for training

Thank you for sharing this amazing work. After following the readme file and seeing that the SPADE seems to work well (despite some hiccups to which I managed to find temporary workarounds), I decided to try to train and test SPADE on a custom dataset.

I am trying to replicate the format of Type 1 data. I edited the field representers on the config files, but I am stuck on how I should define and set the "labels". The readme file says that it's a pair of "(n_field + n_text) x n_text adjacency matrix expressing serialization/grouping", but I don't think I'm following well on how I should set the values.

Could you please explain how they're defined, or how they are made?

ValueError at `f_parse_head_id.index`

Hello, I'm trying to train with custom data and got this error:

File "/home/phung/.local/lib/python3.9/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 174, in evaluation_step
    output = self.trainer.accelerator.validation_step(args)
  File "/home/phung/.local/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 226, in validation_step
    return self.training_type_plugin.validation_step(*args)
  File "/home/phung/.local/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in validation_step
    return self.lightning_module.validation_step(*args, **kwargs)
  File "/home/phung/phung/Anh_Hung/OCR/OCR-invoice/Vietnamese/spade/spade-2/spade/model/model.py", line 382, in validation_step
    results = self._run("test", batch)
  File "/home/phung/phung/Anh_Hung/OCR/OCR-invoice/Vietnamese/spade/spade-2/spade/model/model.py", line 281, in _run
    parses, f_parses, text_unit_field_labels, f_parse_box_ids = gen_parses(
  File "/home/phung/phung/Anh_Hung/OCR/OCR-invoice/Vietnamese/spade/spade-2/spade/model/model_spade_graph_decoder.py", line 305, in gen_parses
    parses, grouped_col_ids = gen_fg_parses(
  File "/home/phung/phung/Anh_Hung/OCR/OCR-invoice/Vietnamese/spade/spade-2/spade/model/model_spade_graph_decoder.py", line 544, in gen_fg_parses
    parses, remained_f_parse_head_ids = gen_grouped_parses(
  File "/home/phung/phung/Anh_Hung/OCR/OCR-invoice/Vietnamese/spade/spade-2/spade/model/model_spade_graph_decoder.py", line 616, in gen_grouped_parses
    parse = gen_grouped_parse1(
  File "/home/phung/phung/Anh_Hung/OCR/OCR-invoice/Vietnamese/spade/spade-2/spade/model/model_spade_graph_decoder.py", line 644, in gen_grouped_parse1
    f_parse1 = imp_get_f_parse_from_id_member(
  File "/home/phung/phung/Anh_Hung/OCR/OCR-invoice/Vietnamese/spade/spade-2/spade/model/model_spade_graph_decoder.py", line 670, in imp_get_f_parse_from_id_member
    idx = f_parse_head_id.index(id_member)
ValueError: 16 is not in list

The config file:

verbose: true
raw_data_input_type: type1
data_paths:
  dev: vietnamese_invoice2/dev.jsonl
  op_dev: vietnamese_invoice2/devop.jsonl
  op_test: vietnamese_invoice2/devtest.jsonl
  test: vietnamese_invoice2/test.jsonl
  train: vietnamese_invoice2/train.jsonl
dist_norm: img_diagonal
infer_param:
  allow_small_edit_distance: true
  refine_parse: false
  unwanted_fields:
    -
method_for_token_xy_generation: equal_division
model_param:
  input_embedding_components:
    - base
  #    - seqPos
  # - absPos
  #    - charSize/n_
  #    - vertical
  bert_info_comb_type: base
  decoder_hidden_size: 100
  decoder_type: spade
  encoder_backbone_is_pretrained: true
  encoder_backbone_name: bert-base-multilingual-cased
  encoder_backbone_tweak_tag: org
  encoder_config_name: bert-base-multilingual-cased-5layers
  # encoder_backbone_name: vinai/phobert-base
  # encoder_backbone_tweak_tag:   
  # encoder_config_name: vinai/phobert-base
  encoder_type_name: spade
  encoder_layer_ids_used_in_decoder:
    - -1
  #  examples_of_inffering_method: force_single_tail_node, force_single_tail_node_but_allow_multiple_seeds,  no_constraint
  #  examples_of_parse_gen_method: multiple_beam, single_beam
  field_representers:
    - store.name
    - menu.name
    - subtotal.price
    - total.price
    - info.time
  fields:
    - store.name
    - store.address
    - menu.name
    - menu.id
    - menu.count
    - menu.unit
    - menu.unitprice
    - menu.price
    - menu.discount
    - subtotal.tax
    - subtotal.count
    - subtotal.discount
    - subtotal.price
    - total.price
    - total.cash
    - total.credit
    - total.change
    - info.transaction
    - info.time
    - info.staff
    

  gt_parse_gen_method: single_beam
  include_second_order_relations: false
  inferring_method:
    - force_single_tail_node_but_allow_multiple_seeds
    - no_constraint
  input_split_overlap_len: 0
  l_max_gen_of_each_parse: 10
  max_input_len: 32
  max_info_depth: 1
  model_name: RelationTagging
  n_angle_unit: 60
  n_char_unit: 5
  n_dist_unit: 120
  n_relation_type: 2
  no_rel_attention: false
  omit_angle_cal: false
  parse_gen_method: single_beam
  precision: 16
  pre_layer_norm: true
  task: receipt_v1
  task_lan: ind
  token_lv_boxing: false
  trainable_rel_emb: false
  use_cos_emb: false
  vi_params:
    do_gp:
      - true
      - true
    do_sb:
      - true
      - true
    n_vi_iter: 3
  weights:
    trained: false
    path: model/saved/spade.vi.train.yaml/best/model.pt
toy_data: false
toy_size: 10
train_param:
  accelerator: 
  accumulate_grad_batches: 1
  augment_coord: false
  augment_data: false
  batch_size: 1
  batch_size_for_test: 1
  coord_aug_params_keys: '[n_min, n_max, amp_min, amp_max, angle_min, angle_max],  [0, 2, -15, 15, -10, 10]'
  data_augmentation_refresh_interval: 10
  gradient_clip_val: 0
  gradient_clip_algorithm: value
  initial_coord_aug_params:
    - - 0
      - 4
      - 0
      - 35
    - - 0
      - 1.5
      - 0
      - 25
    - - -10
      - 10
  initial_token_aug_params:
    - 0.033
    - 0.033
    - 0
    - 0.033
    - 2
  cross_entropy_loss_weight:
    - 0.1
    - 1.0
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  lr_scheduler_type: warmup_constant
  lr_scheduler_param:
    warmup_constant:
      lr_default: 0.00007
      lr_enc: 0.00007
      lr_dec: 0.0007
      lr_max: 0.0007
      num_warmup_steps: 30
  max_epochs: 10000
  multi_gpu: false
  n_cpus: 12
  optimizer_type: adam
  save_epoch_interval: 25
  skip_long: true
  token_aug_params_keys: '[p_del, p_subs, p_insert, p_tail_insert, n_max_insert],  [0.033,
    0.033, 0, 0.033, 2]'
  unique_token_pool: false
  val_check_interval: 1.0
  validation_metric: f1

The data file we used is this one (all the dev, devop, train, test.. are the same, this file is only for test purpose):
data.zip

It would be awesome if you could give some pointers about what is wrong with the data. Thank you!

数据处理错误

configs中的cord.preprocess.yaml中的type命名是type0,
raw_data_input_type: type0

datapath中也指明了type0,
data_paths:
dev: receipt_ind_cord_v0.1/dev/dev_type0.jsonl
op_dev: receipt_ind_cord_v0.1/dev/op_dev_type0.jsonl
op_test: receipt_ind_cord_v0.1/test/op_test_type0.jsonl
test: receipt_ind_cord_v0.1/test/test_type0.jsonl
train: receipt_ind_cord_v0.1/train/train_type0.jsonl
但是在数据处理过程中却要去取label,type0转为type1之后才会有label。
。。。。。。

custom model

Hello, My goal want to extract entity information without using entity association task. Can you suggest me how to do it? I find that the association in my problem is unnecessary, maybe incorrect compared to the hashcode, I just want to stop at the entity get level.
Thanks you

추출 항목 문의

안녕하세요. 좋은 연구 내용 및 소스 공개 감사합니다.

추출되는 항목 정의에 관해 궁굼한 점이 있습니다.

추출되지 않기를 원하는 OCR 결과 텍스트들은 어떻게 처리해야 하나요?
"others"같은 별도의 태그가 필요한가요?

연구 진행하셨을때 이런 문제는 없었는지, 어떻게 해결하셨는지 궁굼합니다.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.