Giter VIP home page Giter VIP logo

wenwenyu / pick-pytorch Goto Github PK

View Code? Open in Web Editor NEW
553.0 23.0 191.0 9.95 MB

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)

Home Page: https://arxiv.org/abs/2004.07464

License: MIT License

Python 99.80% Shell 0.20%
key-information-extraction document-analysis graph-neural-networks graph-convolutional-network graph-learning document-understanding

pick-pytorch's Introduction

PICK-PyTorch

***** Updated on Feb 6th, 2021: Train Ticket dataset is now available for academic research. You can download from Google Drive or OneDrive. It contains 1,530 synthetic images and 320 real images for training, and 80 real images for testing. Please refer to our paper for more details about how to sample training/testing set from EATEN and generate the corresponding annotations.*****

***** Updated on Sep 17th, 2020: A training example on the large-scale document understanding dataset, DocBank, is now available. Please refer to examples/DocBank/README.md for more details. Thanks TengQi Ye for this contribution.*****

PyTorch reimplementation of "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020). This project is different from our original implementation.

Introduction

PICK is a framework that is effective and robust in handling complex documents layout for Key Information Extraction (KIE) by combining graph learning with graph convolution operation, yielding a richer semantic representation containing the textual and visual features and global layout without ambiguity. Overall architecture shown follows.

Overall

Requirements

  • python = 3.6
  • torchvision = 0.6.1
  • tabulate = 0.8.7
  • overrides = 3.0.0
  • opencv_python = 4.3.0.36
  • numpy = 1.16.4
  • pandas = 1.0.5
  • allennlp = 1.0.0
  • torchtext = 0.6.0
  • tqdm = 4.47.0
  • torch = 1.5.1
pip install -r requirements.txt

Usage

Distributed training with config files

Modify the configurations in config.json and dist_train.sh files, then run:

bash dist_train.sh

The application will be launched via launch.py on a 4 GPU node with one process per GPU (recommend).

This is equivalent to

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -c config.json -d 1,2,3,4 --local_world_size 4

and is equivalent to specify indices of available GPUs by CUDA_VISIBLE_DEVICES instead of -d args

CUDA_VISIBLE_DEVICES=1,2,3,4 python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -c config.json --local_world_size 4

Similarly, it can be launched with a single process that spans all 4 GPUs (if node has 4 available GPUs) using (don't recommend):

CUDA_VISIBLE_DEVICES=1,2,3,4 python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -c config.json --local_world_size 1

Using Multiple Node

You can enable multi-node multi-GPU training by setting nnodes and node_rank args of the commandline line on every node. e.g., 2 nodes 4 gpus run as follows

Node 1, ip: 192.168.0.10, then run on node 1 as follows

CUDA_VISIBLE_DEVICES=1,2,3,4 python -m torch.distributed.launch --nnodes=2 --node_rank=0 --nproc_per_node=4 \
--master_addr=192.168.0.10 --master_port=5555 \
train.py -c config.json --local_world_size 4  

Node 2, ip: 192.168.0.15, then run on node 2 as follows

CUDA_VISIBLE_DEVICES=2,4,6,7 python -m torch.distributed.launch --nnodes=2 --node_rank=1 --nproc_per_node=4 \
--master_addr=192.168.0.10 --master_port=5555 \
train.py -c config.json --local_world_size 4  

Resuming from checkpoints

You can resume from a previously saved checkpoint by:

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -d 1,2,3,4 --local_world_size 4 --resume path/to/checkpoint

Debug mode on one GPU/CPU training with config files

This option of training mode can debug code without distributed way. -dist must set to false to turn off distributed mode. -d specify which one gpu will be used.

python train.py -c config.json -d 1 -dist false

Testing from checkpoints

You can test from a previously saved checkpoint by:

python test.py --checkpoint path/to/checkpoint --boxes_transcripts path/to/boxes_transcripts \
               --images_path path/to/images_path --output_folder path/to/output_folder \
               --gpu 0 --batch_size 2

Customization

Training custom datasets

You can train your own datasets following the steps outlined below.

  1. Prepare the correct format of files as provided in data folder.
    • Please see data/README.md an instruction how to prepare the data in required format for PICK.
  2. Modify train_dataset and validation_dataset args in config.json file, including files_name, images_folder, boxes_and_transcripts_folder, entities_folder, iob_tagging_type and resized_image_size.
  3. Modify Entities_list in utils/entities_list.py file according to the entity type of your dataset.
  4. Modify keys.txt in utils/keys.txt file if needed according to the vocabulary of your dataset.
  5. Modify MAX_BOXES_NUM and MAX_TRANSCRIPT_LEN in data_tuils/documents.py file if needed.

Note: The self-build datasets our paper used cannot be shared for patient privacy and proprietary issues.

Checkpoints

You can specify the name of the training session in config.json files:

"name": "PICK_Default",
"run_id": "test"

The checkpoints will be saved in save_dir/name/run_id_timestamp/checkpoint_epoch_n, with timestamp in mmdd_HHMMSS format.

A copy of config.json file will be saved in the same folder.

Note: checkpoints contain:

{
  'arch': arch,
  'epoch': epoch,
  'state_dict': self.model.state_dict(),
  'optimizer': self.optimizer.state_dict(),
  'monitor_best': self.monitor_best,
  'config': self.config
}

Tensorboard Visualization

This project supports Tensorboard visualization by using either torch.utils.tensorboard or TensorboardX.

  1. Install

    If you are using pytorch 1.1 or higher, install tensorboard by 'pip install tensorboard>=1.14.0'.

    Otherwise, you should install tensorboardx. Follow installation guide in TensorboardX.

  2. Run training

    Make sure that tensorboard option in the config file is turned on.

     "tensorboard" : true
    
  3. Open Tensorboard server

    Type tensorboard --logdir saved/log/ at the project root, then server will open at http://localhost:6006

By default, values of loss will be logged. If you need more visualizations, use add_scalar('tag', data), add_image('tag', image), etc in the trainer._train_epoch method. add_something() methods in this project are basically wrappers for those of tensorboardX.SummaryWriter and torch.utils.tensorboard.SummaryWriter modules.

Note: You don't have to specify current steps, since WriterTensorboard class defined at logger/visualization.py will track current steps.

Results on Train Ticket

example

TODOs

  • Dataset cache mechanism to speed up training loop
  • Multi-node multi-gpu setup (DistributedDataParallel)

Citations

If you find this code useful please cite our paper:

@inproceedings{Yu2020PICKPK,
  title={{PICK}: Processing Key Information Extraction from Documents using 
  Improved Graph Learning-Convolutional Networks},
  author={Wenwen Yu and Ning Lu and Xianbiao Qi and Ping Gong and Rong Xiao},
  booktitle={2020 25th International Conference on Pattern Recognition (ICPR)},
  year={2020}
}

License

This project is licensed under the MIT License. See LICENSE for more details.

Acknowledgements

This project structure takes example by PyTorch Template Project.

pick-pytorch's People

Contributors

dbobrenko avatar dependabot[bot] avatar sanster avatar tengerye avatar wenwenyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pick-pytorch's Issues

Prediction

I really like your PICK project and want to thank you for sharing it, also I have a question about the code.

I am currently trying to export the prediction results from a trained model of PICK using the test.py script, but it just shows the predictions defined on the transcript of the .tsv file of the image provided, it doesn't export the other predictions, so my question is: 
Where are the predictions stored? I tried looking through the code but didn't manage to find the variable that contains them. I would be very grateful if you could give some insight about this.

A funny thing is that the script test.py just prints whatever I put in the .tsv, I replaced the transcripts of a .tsv with "asda sdas" and on the output folder got the same "asda sdas" on the .txt

Regards.

Transfer learning

Hi Team,

Amazing work. Thank you.
Is it possible to train on DocBank dataset and use custom dataset to do fine tuning or transfer learning?
Because we have limited custom dataset which is around 200 to 300 pdf documents. Is this data enough to train from scratch or is it possible to use DocBank trained model as base model?

Thank you

Testing with out-of-sample

While testing with out-of-sample images, i tried creating boxes_and_transcripts for new images using tesseract & prepare tsv files. but when i'm predicting output is not good as in-sample testing images.

kindly please let us know way to create bounding box files (tsv) for prediction for this model.

Also can we predict without tsv files?

Note: In tesseract i tried both WORD & TEXTLINE bounding boxes

Error while training from checkpoint.

Hi there. I have trained the model using ICDAR data and while using the checkpoint to train it on my own data with a different number of entities I have been getting the following error:

RuntimeError: Error(s) in loading state_dict for PICKModel:
size mismatch for decoder.bilstm_layer.mlp.mlp.0.weight: copying a param with shape torch.Size([11, 1024]) from checkpoint, the shape in current model is torch.Size([15, 1024]).
size mismatch for decoder.bilstm_layer.mlp.mlp.0.bias: copying a param with shape torch.Size([11]) from checkpoint, the shape in current model is torch.Size([15]).
size mismatch for decoder.crf_layer.transitions: copying a param with shape torch.Size([11, 11]) from checkpoint, the shape in current model is torch.Size([15, 15]).
size mismatch for decoder.crf_layer._constraint_mask: copying a param with shape torch.Size([13, 13]) from checkpoint, the shape in current model is torch.Size([17, 17]).
size mismatch for decoder.crf_layer.start_transitions: copying a param with shape torch.Size([11]) from checkpoint, the shape in current model is torch.Size([15]).
size mismatch for decoder.crf_layer.end_transitions: copying a param with shape torch.Size([11]) from checkpoint, the shape in current model is torch.Size([15]).

Does it mean that we can't train from the checkpoint? Is there any other way to train if from checkpoints using data with a different number of entities?
Thanks.

boxes_and_transcripts label format

boxes_and_transcripts 每张图片对应一个 .tsv标签
在说明文档中, 每个标签都是如下格式:
index, box_coordinates, transcripts, box_entity_types
在给的两个例子中, index标签有矛盾
X00016469623.tsv 标签中所有的index都是1
asdf.tsv 标签中index又是递增的, 对应的是行数
所以这个 index 到底是指标签对应图片在train_sample_list.csv 文件中的 index, 还是指行数?

Config for large entities list

Hi All

I have created training set which have around 64 tags (entities)
Which are the key configurations need to be changed for better result inside config.json

box_level argument and model evaluation

Hello,

In data/Readme.md, you say « if iob_tagging_type is set to box_level, this folder will not be used, then box_entity_types in file_name.tsv file of boxes_and_transcripts folder will be used as label of entity. otherwise, it must be provided.«  about the entities folder. So you mean entities folder is not used during evaluation? But how can you evaluate model without knowing the ground truth for entities? Or I don’t understand something

Thanks

subprocess.CalledProcessError

Hi,

Does anyone met this problem during distributed training?

subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-u', 'train.py', '--local_rank=3', '-c', 'config.json', '--local_world_size', '4']' returned non-zero exit status 1.

PLS give me some adivce, Tks

Byte Pair Encoding

Hey, I saw you are using keys.txt file for encoding data. if i am wrong please correct me.

  1. If you are using keys.txt then how you are making it and if any word is not in training data then it can be handled or not.
  2. I have to train on another language then how can i do?

A question about test

When testing, do we have to provide boxes and texts(scripts?) of the test images?

pretrained weights of docbank

@wenwenyu @tengerye
Hey can you please upload pretrained weights of docbank dataset. Because this dataset is huge and i am using colab so i am not able to retrain this one. If you able to do it please it would be very helpful.

训练中打印值

[2020-08-11 15:12:01,468 - trainer - INFO] - Train Epoch:[4/100] Step:[560/666] Total Loss: 0.163740 GL_Loss: 0.163414 CRF_Loss: 0.000326
[2020-08-11 15:13:22,293 - trainer - INFO] - Train Epoch:[4/100] Step:[570/666] Total Loss: 0.177532 GL_Loss: 0.177369 CRF_Loss: 0.000163
[2020-08-11 15:14:42,824 - trainer - INFO] - Train Epoch:[4/100] Step:[580/666] Total Loss: 0.171964 GL_Loss: 0.171639 CRF_Loss: 0.000326
[2020-08-11 15:16:05,332 - trainer - INFO] - Train Epoch:[4/100] Step:[590/666] Total Loss: 0.168039 GL_Loss: 0.167388 CRF_Loss: 0.000651
[2020-08-11 15:17:26,160 - trainer - INFO] - Train Epoch:[4/100] Step:[600/666] Total Loss: 0.184481 GL_Loss: 0.184319 CRF_Loss: 0.000163
[2020-08-11 15:24:51,474 - trainer - INFO] - [Step Validation] Epoch:[4/100] Step:[600/666]
+---------+-------+-------+-------+-------+
| name | mEP | mER | mEF | mEA |
+=========+=======+=======+=======+=======+
| overall | 0 | 0 | 0 | 0 |
+---------+-------+-------+-------+-------+
怎么各项指标都为0呢?是不是哪里设置不正确呢?

Failed to converge with increased amount of BiLSTM layers

Seems like PICK fails to converge with increased amount of BiLSTM layers.

Changed: bilstm_kwargs "num_layers" from 2 to 4.

With num_layers = 2 it converges in 6 epochs, once I tried layers = 4, it fails to converge in 42 epochs.
Is the problem in hyper-parameters?

num_layers = 4, epoch = 42:
image

num_layers = 2, epoch = 6 (Converged):
image

About SROIE Dataset Preparation

Hi, that's a good work on IE, thanks.

Currently, I've tested your code on SROIE with the "document_level" setting. I used the OCR results to train and test the model, which downloaded from SROIE official website and extracted from the folder "task1&2_train(626p)" and "taks1&2_text_test(361p)". The performance looks not good as yours. May I ask how you prepared your dataset. Did you use other OCR tools to preprocess the dataset.

Thank you very much.

Is it possible to extract multiple entities with the same label?

Hi, in one of the provided annotation file,

{
    "company": "BOOK TA .K (TAMAN DAYA) SDN BHD",
    "date": "25/12/2018",
    "address": "NO.53 55,57 & 59, JALAN SAGU 18, TAMAN DAYA, 81100 JOHOR BAHRU, JOHOR.",
    "total": "9.00"
}

I am wondering if a document has multiple entities of the same type, e.g., several companies' names, is the model able to find them all? Such as ["company": "company1", "company": "company2"].

If the model supports such case, how shall I prepare the data in entities folder?

Issue with Training data box_coordinates direction

Greetings, I wanted to point a conflict about the data format of training and test data:

The readme says that the direction of the coordinates of the bounding boxes must be clockwise, but looking at the examples given in the X00016469623.tsv file of this repo, the coordinates are counterclockwise.

1,83,41,331,41,331,78,83,78,TAN WOON YANN,other
1,109,171,330,171,330,191,109,191,MR D.I.Y. (M) SDN BHD,company

image

参数解释

能否对config.json中的配置参数进行解释?!

Questions of data example provided

Hi, thank you for your great job. I have some questions about your data file in your repository. In asdf.tsv file,
in the 5th line TAMAN DAYA, is marked as other, shouldn't it be address? And there is no total tag in the label file.

Looking forward to your reply.

Training on colab

Is it possible to train this model on colab? I have a small dataset.

How to evaluate on SROIE2019?

Hi, thank you for your great job.

May I ask two questions please?

  1. Did you use external data for SROIE 2019 competition?
  2. How did you evaluate on SROIE dataset?

checkpoint

whether already have a trained pytorch model or not?

Issue while Loading trained model

I trained model for a while & cancelled training
Now i'm trying it test. i'm getting below issue. kindly please help. thanks in advance

Command:
!python test.py --checkpoint saved/models/PICK_Default/test_1012_064804/model_best.pth
--boxes_transcripts {out_box_path}
--images_path {out_img_path} --output_folder /content/output/
--gpu 0 --batch_size 2

Error:
Loading checkpoint: saved/models/PICK_Default/test_1012_064804/model_best.pth
with saved mEF 0.0000 ...

RuntimeError: Error(s) in loading state_dict for PICKModel:
size mismatch for decoder.bilstm_layer.mlp.mlp.0.weight: copying a param with shape torch.Size([49, 1024]) from checkpoint, the shape in current model is torch.Size([11, 1024]).
size mismatch for decoder.bilstm_layer.mlp.mlp.0.bias: copying a param with shape torch.Size([49]) from checkpoint, the shape in current model is torch.Size([11]).
size mismatch for decoder.crf_layer.transitions: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([11, 11]).
size mismatch for decoder.crf_layer._constraint_mask: copying a param with shape torch.Size([51, 51]) from checkpoint, the shape in current model is torch.Size([13, 13]).
size mismatch for decoder.crf_layer.start_transitions: copying a param with shape torch.Size([49]) from checkpoint, the shape in current model is torch.Size([11]).
size mismatch for decoder.crf_layer.end_transitions: copying a param with shape torch.Size([49]) from checkpoint, the shape in current model is torch.Size([11]).

Code error in `sort_box_with_list`

In your `documents.py' file, there is a snippet:

def sort_box_with_list(data: List[Tuple], left_right_first=True):
    def compare_key(x):
        #  x is (index, points, transcription, type) or (index, points, transcription)
        points = x[1]
        box = np.array([[points[0], points[1]], [points[2], points[3]], [points[4], points[5]], [points[6], points[7]]],
                       dtype=np.float32)
        rect = cv2.minAreaRect(box)  # Smallest rectangle around the box.
        # rect: center(x,y), (width, height), angle of rotation
        center = rect[0]
        if left_right_first:
            return center[1], center[0]
        else:
            return center[0], center[1]

    data = sorted(data, key=compare_key)
    return data

I think if you want left_right_first, you shall return center[0], center[1] instead of center[1], center[0]. Since center=(x, y).

DistributedDataParallel device_ids and output_device arguments only work with single-device CUDA modules

Hi, I encountered a problem with your code:

AssertionError: DistributedDataParallel device_ids and output_device arguments only work
with single-device CUDA modules, but got device_ids [0], output_device 0, and
module parameters {device(type='cuda', index=0), device(type='cpu')}.

But if I commented

if self.config['trainer']['sync_batch_norm']:
    self.model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(self.model)

of your trainer.py, the code runs without errors. I used the recommended settings.
May I ask why?

Regarding post Processing

Output for file 004.txt:

address	LOT 1851-A & 1851-B\, JALAN KPB 6,addressKAWASAN PERINDUSTRIAN BALAKONG,address43300 SERI KEMBANGAN\, SELANGOR,address
address	WA45 /2A - 12,other1,other
address	WA44-A - 12,other1
address	9555916500126,othe
address	43-A - 24,other1,other11.23,other079567600084,otherX,other11.23,other
address	090822,othe
address	.91,other
date	18-11-18

Can you provide some intuition or resource how to post preprocess it so as to store it in key and value form in json format?

重新全文序列标注损失了OCR已经区分开的词组信息?

看了下最后输出关键词及类别的过程,好像是将全部文字组成了一个长序列,然后在里头提取出了各字的标签,再将连续同标签的字组成知名并打个此标签。这个过程中OCR部分识别为一个box里的词组的前后部分都可能被分配到其他组。
比如OCR出了三块"XXX","YYY","ZZZ“,序列标注的拆分结果可能变成了"XX","XYY","YZZ","Z"。
假设OCR的BOX组合的过程准确率很高,那如何更好利用这部分信息?

Output files are empty when testing

Hello,
thank you so much for providing code for great research paper. I am just wondering after I run python train.py --config config.json. All the values of mEP, mER, mEF, mEA are all 0 for every train Epoch. Is this normal?
2020-07-26 20:45:35.357166: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
[2020-07-26 20:45:40,891 - train - INFO] - Model trainable parameters: 68575842
[2020-07-26 20:45:40,892 - train - INFO] - Train datasets: 8 samples Validation datasets: 8 samples Max_epochs: 100 Log_per_step: 10 Validation_per_step: 50
[2020-07-26 20:45:40,892 - train - INFO] - Training start...
[2020-07-26 20:46:07,680 - trainer - INFO] - Train Epoch:[1/100] Step:[2/2] Total Loss: 1121.992798 GL_Loss: 8.390446 CRF_Loss: 1113.602295
[2020-07-26 20:46:09,988 - trainer - INFO] - [Epoch Validation] Epoch:[1/100] Total Loss: 1430.095032 GL_Loss: 0.091890 CRF_Loss: 1420.906006
+---------+-------+-------+-------+-------+
| name | mEP | mER | mEF | mEA |
+=========+=======+=======+=======+=======+
| date | 0 | 0 | 0 | 0 |
+---------+-------+-------+-------+-------+
| name | 0 | 0 | 0 | 0 |
+---------+-------+-------+-------+-------+
| overall | 0 | 0 | 0 | 0 |
+---------+-------+-------+-------+-------+
Also,
I ran

python test.py --checkpoint /content/PICK-pytorch/saved/models/PICK_Default/test2_0726_194721/model_best.pth --boxes_transcripts /content/PICK-pytorch/data/test_data_example/boxes_and_transcripts \
               --images_path /content/PICK-pytorch/data/test_data_example/images/ --output_folder /content/PICK-pytorch/output/test2 \
               --gpu 0 --batch_size 2

Output files are generated, but the files are actually empty. Can you please give me the right direction to test the model? Thank you!

During training, entities are disappearing in the training metrics

Hi, I have the following entities I want to extract. During the training, some entities are not shown in the table and keep disappearing. When I restart the training process, the number of entities shown in the metric table is also different. It gives me an inconsistent result. Do you know what I can do to fix it?

Entities_list = [
"invoice_date",
"tax_rate",
"subtotal",
"total"
]
For example,

+--------------+-------+-------+-------+-------+
| name | mEP | mER | mEF | mEA |
+==============+=======+=======+=======+=======+
| total | 0 | 0 | 0 | 0 |
+--------------+-------+-------+-------+-------+
| invoice_date | 0 | 0 | 0 | 0 |
+--------------+-------+-------+-------+-------+
| overall | 0 | 0 | 0 | 0 |
+--------------+-------+-------+-------+-------+

+--------------+-------+-------+-------+-------+
| name | mEP | mER | mEF | mEA |
+==============+=======+=======+=======+=======+
| invoice_date | 0 | 0 | 0 | 0 |
+--------------+-------+-------+-------+-------+
| overall | 0 | 0 | 0 | 0 |

Did you try documents with variable layout?

Hi wenwenyu,

Thanks this wonderful repo firstly.

One quick question, have you tried the documents/forms with variable layout?
I just wonder if GCN can still help and perform well if the (relative) positional features are not consistent.

Thanks.

Merging nearby bounding boxes using Tesseract

Hi Wenwen,
In the training data set, I noticed a bounding box includes more than one word. Which program did you use to merge nearby bounding boxes based on distance? Currently, I have a bounding box around each word. Will the accuracy be affected if the bounding boxes are merged together? Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.