Giter VIP home page Giter VIP logo

layout-parser / layout-parser Goto Github PK

View Code? Open in Web Editor NEW
4.5K 71.0 439.0 59.72 MB

A Unified Toolkit for Deep Learning Based Document Image Analysis

Home Page: https://layout-parser.github.io/

License: Apache License 2.0

Python 100.00%
layout-analysis deep-learning object-detection ocr layout-parser detectron2 document-layout-analysis computer-vision document-image-processing layout-detection

layout-parser's Introduction

Layout Parser Logo

A unified toolkit for Deep Learning Based Document Image Analysis

PyPI - Downloads


What is LayoutParser

Example Usage

LayoutParser aims to provide a wide range of tools that aims to streamline Document Image Analysis (DIA) tasks. Please check the LayoutParser demo video (1 min) or full talk (15 min) for details. And here are some key features:

  • LayoutParser provides a rich repository of deep learning models for layout detection as well as a set of unified APIs for using them. For example,

    Perform DL layout detection in 4 lines of code
    import layoutparser as lp
    model = lp.AutoLayoutModel('lp://EfficientDete/PubLayNet')
    # image = Image.open("path/to/image")
    layout = model.detect(image) 
  • LayoutParser comes with a set of layout data structures with carefully designed APIs that are optimized for document image analysis tasks. For example,

    Selecting layout/textual elements in the left column of a page
    image_width = image.size[0]
    left_column = lp.Interval(0, image_width/2, axis='x')
    layout.filter_by(left_column, center=True) # select objects in the left column 
    Performing OCR for each detected Layout Region
    ocr_agent = lp.TesseractAgent()
    for layout_region in layout: 
        image_segment = layout_region.crop(image)
        text = ocr_agent.detect(image_segment)
    Flexible APIs for visualizing the detected layouts
    lp.draw_box(image, layout, box_width=1, show_element_id=True, box_alpha=0.25)
    Loading layout data stored in json, csv, and even PDFs
    layout = lp.load_json("path/to/json")
    layout = lp.load_csv("path/to/csv")
    pdf_layout = lp.load_pdf("path/to/pdf")
  • LayoutParser is also a open platform that enables the sharing of layout detection models and DIA pipelines among the community.

    Check the LayoutParser open platform
    Submit your models/pipelines to LayoutParser

Installation

After several major updates, layoutparser provides various functionalities and deep learning models from different backends. But it still easy to install layoutparser, and we designed the installation method in a way such that you can choose to install only the needed dependencies for your project:

pip install layoutparser # Install the base layoutparser library with  
pip install "layoutparser[layoutmodels]" # Install DL layout model toolkit 
pip install "layoutparser[ocr]" # Install OCR toolkit

Extra steps are needed if you want to use Detectron2-based models. Please check installation.md for additional details on layoutparser installation.

Examples

We provide a series of examples for to help you start using the layout parser library:

  1. Table OCR and Results Parsing: layoutparser can be used for conveniently OCR documents and convert the output in to structured data.

  2. Deep Layout Parsing Example: With the help of Deep Learning, layoutparser supports the analysis very complex documents and processing of the hierarchical structure in the layouts.

Contributing

We encourage you to contribute to Layout Parser! Please check out the Contributing guidelines for guidelines about how to proceed. Join us!

Citing layoutparser

If you find layoutparser helpful to your work, please consider citing our tool and paper using the following BibTeX entry.

@article{shen2021layoutparser,
  title={LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis},
  author={Shen, Zejiang and Zhang, Ruochen and Dell, Melissa and Lee, Benjamin Charles Germain and Carlson, Jacob and Li, Weining},
  journal={arXiv preprint arXiv:2103.15348},
  year={2021}
}

layout-parser's People

Contributors

an1018 avatar dumbpy avatar edisongustavo avatar jim-salmons avatar kforcodeai avatar lolipopshock avatar rosenzhang avatar yusanshi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

layout-parser's Issues

Gives wrong results when the code is run for some images in a loop

The code works when it is run for a single image. But when I run the same code in a loop for few images from the publaynet dataset, cached results seem to apply (i.e. The bounding boxes overlap and the boxes for the previous images are also put in the current image).

Question: Do you have plans to train a more light weight model?

Motivation

(Apologies if this is not the right place to ask questions)
The faster_rcnn_R_50_FPN_3x (PubLayNet) seems to be quite slow on a CPU. Locally it's around 3 seconds per image. In Google Colab it's more than 6 seconds. (It's around 350 ms with a GPU though).
Something that would make this work on a CPU at a more reasonable speed could make it more "accessible".
It would also make the download of the model, and PyTorch itself smaller.
I was wondering whether you have any plans to train a smaller model on one of the related datasets?

Related resources
Something like YOLOv5s perhaps?

Additional context
n/a

"lp.draw_box()" is not able to displays the result in .py file

Hi everyone! Thanks for developing this package. It was quite impressive to see the work you guys have achieved. I have a lot of fun when I was messed around with it.

However, would it be possible for the draw_box function to be working in a normal .py file? I was originally writing all the code in a normal python file but, the draw_box function just couldn't display the result to me somehow. I am on Ubuntu 20.04 and Python 3.8 is the version I am using. Then, I switched to the Jupyter Notebook extension in Visual Studio Code. Everything was up and running fine. I noticed in your examples, you guys are using the .ipynb file to show the example as well. I don't know it's a bug on my end or it just couldn't display the result in a normal python file. I don't have a strong reason why we should use the .py file instead of the .pynb. I just noticed a lot of other packages support both, so I guess it would be good if this one also supports them.

Thanks!

Layout Parser text boxes not properly aligned causing incorrect sorting of text boxes

Hi,

I'm using layout parser to perform OCR on a research paper, but on almost every page of the pdf the text boxes are not properly aligned. For example I input this page:

image

perform detection using:

model = lp.Detectron2LayoutModel('lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config', 
                                 extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
                                 label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"})
layout = model.detect(image)

# Show the detected layout of the input image
lp.draw_box(image, layout, box_width=3)

The detected image is shown below:

detect

As can be seen, the bottom left box is not properly aligned, which causes problem with the sort script, as given in the tutorial:

# sort the left and right blocks and assign id to each
h, w = image.size

left_interval = lp.Interval(0, w/2*1.05, axis='x').put_on_canvas(image)

left_blocks = text_blocks.filter_by(left_interval, center=True)
left_blocks.sort(key = lambda b:b.coordinates[1])

right_blocks = [b for b in text_blocks if b not in left_blocks]
right_blocks.sort(key = lambda b:b.coordinates[1])

# And finally combine the two list and add the index
# according to the order
text_blocks = lp.Layout([b.set(id = idx) for idx, b in enumerate(left_blocks + right_blocks)])

# visualize the cleaned text blocks
lp.draw_box(image, text_blocks,
            box_width=3, 
            show_element_id=True)

detect_sort

The misaligned box is given an index of 0. Which is not correct.

Is there any way to avoid this problem?

Thank you

Proposal of integration with the Hugging Face Hub

Hi there!

Layout Parser is very cool! At Hugging Face we are collaborating with open source libraries in the ecosystem such as spaCy, Sentence Transformers, Timm, ESPNet, and more in order to implement integration in the models hub.

The idea is to make it as easy as possible for your users to try out and share models. I think it would be very great to have some integration with Layout Parser. Users would get the following benefits:

  • Free hosting of models
  • Built-in file versioning
  • Hub features: Code snippets, filters to find models, and other features to help with discoverability
  • Potentially hosted Inference API and widgets to try out the models (you can find examples of all our widgets here, and here is an example in a model card.

For Layout Parser I think having downstream support would be a nice feature and would match nicely with the existing workflow of using pretrained models. Instead of using Dropbox to share model links, you could have an organization in the Hub in which users would be able to find all your models and even try them out directly in the browser!

cc @LysandreJik @NielsRogge

How can I use the model and config from local path

Hi

I am trying to use layout-parser to parse the data from PDF. Because some network issue, I can't download the config from 'lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config'. But I can down load the model_final.pth and config.yaml through my browser. After that I use following code to set up the config path and model path. However, Nothing can be parsed. Is there any thing wrong when I use local path?
model = lp.Detectron2LayoutModel(''config.yaml', "model_final.pth",
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"})

AttributeError: module layoutparser has no attribute Detectron2LayoutModel

Hi,

Thank you for this awesome program! I successfully installed layout-parser Detectron2 on my windows 10 laptop. When I run the following code:

import layoutparser as lp
import cv2
from pdf2image import convert_from_bytes

images = convert_from_bytes(open('C:\temp\ConsigneeList\Doc 4 Distribution List.pdf', 'rb').read())

model = lp.Detectron2LayoutModel(
config_path ='lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config', # In model catalog
label_map = {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}, # In modellabel_map
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8] # Optional
)
#loop through each page
for image in images:
ocr_agent = lp.ocr.TesseractAgent()

image = np.array(image)

layout = model.detect(image)

text_blocks = lp.Layout([b for b in layout if b.type == 'Text']) #loop through each text box on page.

for block in text_blocks:
segment_image = (block
.pad(left=5, right=5, top=5, bottom=5)
.crop_image(image))
text = ocr_agent.detect(segment_image)
block.set(text=text, inplace=True)

for i, txt in enumerate(text_blocks.get_texts()):
        my_file = open("OUTPUT FILE PATH/FILENAME.TXT","a+")
        my_file.write(txt)

I get the following errors:


AttributeError Traceback (most recent call last)
in
----> 1 model = lp.Detectron2LayoutModel(
2 config_path ='lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config', # In model catalog
3 label_map = {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}, # In modellabel_map
4 extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8] # Optional
5 )

C:\ProgramData\Anaconda3\lib\site-packages\layoutparser\file_utils.py in getattr(self, name)
224 value = getattr(module, name)
225 else:
--> 226 raise AttributeError(f"module {self.name} has no attribute {name}")
227
228 setattr(self, name, value)

AttributeError: module layoutparser has no attribute Detectron2LayoutModel

Any ideas on what is wrong? Thank you!!

Sincerely,

tom

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. The bug has not been fixed in the latest version, see the Layout Parser Releases

To Reproduce
Steps to reproduce the behavior:

  1. What command or script did you run?
A placeholder for the command.

Environment

  1. Please describe your Platform [Windows/MacOS/Linux]
  2. Please show the Layout Parser version
  3. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback
If applicable, paste the error traceback here.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Save the model.pth

Hello, how can I download the model.pth for further use? I am able to download the config from the model_zoo, but where do I find the final_model.pth file?

Upd: I have found the files in src/layoutparser/models/detectron2/catalog.py
Sorry for asking this

Paragraph and titles with bigger line spacing

I have a lot of documents where line spacing is bigger than normal. Actually a lot of construction documents have this. So the paragraphs and titles are recognised very badly. Is there a workaround for this except labeling new data and retrain the model(s)?

Cannot run the model on Windows.

Hi, Thank you very much for your brilliant work.
I've just passed the first step to install the package on my Windows 10. However, I am currently stuck with the second step to run the model. Would you have any hints to solve this error?
Please check out the attached files below if you have a moment.
Thank you very much.

Screenshot (8)
Screenshot (9)_LI

NameError: name 'IMAGENET_DEFAULT_MEAN' without effdet extra

Describe the bug

Following the installation instructions to use Detectron2, the command is:

pip install layoutparser torch && pip install "git+https://github.com/facebookresearch/[email protected]#egg=detectron2"

When trying to load the model (as per the example):

model = lp.Detectron2LayoutModel(
    'lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config', 
     extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
     label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}
)

That leads to the exception NameError: name 'IMAGENET_DEFAULT_MEAN' is not defined (see logs below)

Following the stacktrace, it seems to be related to EfficientDet.

The error goes away after installing that option via:

pip install layoutparser[effdet]

Environment
Linux.

LayoutParser 0.3.1

Error traceback

logs
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-8818ba54be18> in <module>
----> 1 model = lp.Detectron2LayoutModel(
      2     'lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config',
      3      extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
      4      label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}
      5 )

/opt/conda/lib/python3.8/site-packages/layoutparser/file_utils.py in __getattr__(self, name)
    221             value = self._get_module(name)
    222         elif name in self._class_to_module.keys():
--> 223             module = self._get_module(self._class_to_module[name])
    224             value = getattr(module, name)
    225         else:

/opt/conda/lib/python3.8/site-packages/layoutparser/file_utils.py in _get_module(self, module_name)
    230 
    231     def _get_module(self, module_name: str):
--> 232         return importlib.import_module("." + module_name, self.__name__)
    233 
    234     def __reduce__(self):

/opt/conda/lib/python3.8/importlib/__init__.py in import_module(name, package)
    125                 break
    126             level += 1
--> 127     return _bootstrap._gcd_import(name[level:], package, level)
    128 
    129 

/opt/conda/lib/python3.8/importlib/_bootstrap.py in _gcd_import(name, package, level)

/opt/conda/lib/python3.8/importlib/_bootstrap.py in _find_and_load(name, import_)

/opt/conda/lib/python3.8/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)

/opt/conda/lib/python3.8/importlib/_bootstrap.py in _call_with_frames_removed(f, *args, **kwds)

/opt/conda/lib/python3.8/importlib/_bootstrap.py in _gcd_import(name, package, level)

/opt/conda/lib/python3.8/importlib/_bootstrap.py in _find_and_load(name, import_)

/opt/conda/lib/python3.8/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)

/opt/conda/lib/python3.8/importlib/_bootstrap.py in _load_unlocked(spec)

/opt/conda/lib/python3.8/importlib/_bootstrap_external.py in exec_module(self, module)

/opt/conda/lib/python3.8/importlib/_bootstrap.py in _call_with_frames_removed(f, *args, **kwds)

/opt/conda/lib/python3.8/site-packages/layoutparser/models/__init__.py in <module>
     15 from .detectron2.layoutmodel import Detectron2LayoutModel
     16 from .paddledetection.layoutmodel import PaddleDetectionLayoutModel
---> 17 from .effdet.layoutmodel import EfficientDetLayoutModel
     18 from .auto_layoutmodel import AutoLayoutModel

/opt/conda/lib/python3.8/site-packages/layoutparser/models/effdet/__init__.py in <module>
     14 
     15 from . import catalog as _UNUSED
---> 16 from .layoutmodel import EfficientDetLayoutModel

/opt/conda/lib/python3.8/site-packages/layoutparser/models/effdet/layoutmodel.py in <module>
     34 
     35 
---> 36 class InputTransform:
     37     def __init__(
     38         self,

/opt/conda/lib/python3.8/site-packages/layoutparser/models/effdet/layoutmodel.py in InputTransform()
     38         self,
     39         image_size,
---> 40         mean=IMAGENET_DEFAULT_MEAN,
     41         std=IMAGENET_DEFAULT_STD,
     42     ):

NameError: name 'IMAGENET_DEFAULT_MEAN' is not defined

Error element indices when setting `show_element_id` in the visualization

Describe the bug
When the input sequence is ordered differently from the element ids, the lp.draw_box will create inconsistent id annotation in the visualization.

To Reproduce
Example:

background = Image.new('RGB', (1000,1000), color='white')
layout = lp.Layout(
    [
        lp.TextBlock(block=lp.Rectangle(x_1=80, y_1=79.0, x_2=490, y_2=92.0), text=None, id=1, type=None, parent=0, next=None),
        lp.TextBlock(block=lp.Rectangle(x_1=80, y_1=65.0, x_2=488.0, y_2=77.0), text=None, id=0, type=None, parent=0, next=None),
        lp.TextBlock(block=lp.Rectangle(x_1=80.0, y_1=95.0, x_2=490, y_2=107.0), text=None, id=2, type=None, parent=0, next=None),
        lp.TextBlock(block=lp.Rectangle(x_1=80, y_1=110.0, x_2=490, y_2=122.0), text=None, id=3, type=None, parent=0, next=None),
        lp.TextBlock(block=lp.Rectangle(x_1=80.0, y_1=125.0, x_2=490.0, y_2=138.0), text=None, id=4, type=None, parent=0, next=None)
    ]
).scale((1,2))
lp.draw_box(background, layout, show_element_id=True)

Expected output:
image

Actual output:
image

Temporary fix:

lp.draw_box(background, [b.set(id=str(b.id)) for b in layout], show_element_id=True)

Can we use Yolov5 or other model to do DLA?

LayoutParser used Faster-RCNN and MaskRCNN to do DLA, but the model size are over 300M, and hard to do inference in android. Maybe DLA is similar with object-detection, so can we use yolov5/tiny-yolo to replace Faster-RCNN/MaskRCNN?

instantiating lp.GCVAgent.with_credential returns module 'google.cloud.vision' has no attribute 'types'

Thanks for putting this amazing looking library together. I tried to work through the Table example in the documentation and get
a no attribute 'types' error when I run this line

ocr_agent = lp.GCVAgent.with_credential("/home/alal/keys/sandbox-7f8884d01b79.json", 
                                        languages = ['en'])
>module 'google.cloud.vision' has no attribute 'types'

I am running lp in a conda environment on ubuntu 18.04., have installed google.cloud.vision, and have enabled it on console.cloud.google.com and have GOOGLE_APPLICATION_CREDENTIALS in my path.

I've also verified that I can run the document-text tutorial in my conda environment https://cloud.google.com/vision/docs/fulltext-annotations.

table parse using tesseract

Hi, first I would like to thank you for this amazing project , I am asking if you could provide some details about how can I run the ocr table notebook using tesseract ocr , in your example you are using google vision api, thank you

SSL Certificate error when downloading models in Python 3.9

Describe the bug
When using Python 3.9, it might fail to download the model files and eject the following errors:

ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)

A possible solution might be found here https://stackoverflow.com/questions/50236117/scraping-ssl-certificate-verify-failed-error-for-http-en-wikipedia-org/53310545#53310545, though we may need more systematic way for solving this issue in layoutparser.

group_blocks_by_distance example addition

Describe the bug
In the example given here the function group_blocks_by_distance doesn't sort within row in the x direction. I came up with a simple fix to this as implemented below, and thought I would flag it for anyone else who runs into this issue.

To Reproduce
Steps to reproduce the behavior:

  1. What command or script did you run?
def group_blocks_by_distance(blocks, distance_th):

    blocks = sorted(blocks, key = lambda x: x.coordinates[1])
    distances = np.array([b2.coordinates[1] - b1.coordinates[3] for (b1, b2) in zip(blocks, blocks[1:])])

    distances = np.append([0], distances)
    block_group = (distances>distance_th).cumsum()

    grouped_blocks = [lp.Layout([]) for i in range(max(block_group)+1)]
    for i, block in zip(block_group, blocks):
        grouped_blocks[i].append(block)

    return grouped_blocks

The changes that I implement here allow for within row sorting on the x axis, either from left to right or from right to left depending on the parameter passed.

# left to right if x_direction = 0, right to left if x_direction = 1 
def group_blocks_by_distance(blocks, distance_th, x_direction):
    blocks = sorted(blocks, key = lambda x: (x.coordinates[1]))
    distances = np.array([b2.coordinates[1] - b1.coordinates[3] for (b1, b2) in zip(blocks, blocks[1:])])

    distances = np.append([0], distances)
    block_group = (distances>distance_th).cumsum()
    grouped_blocks = [[] for i in range(max(block_group)+1)]
    for i, block in zip(block_group, blocks):
        grouped_blocks[i].append(block)
    for i in range(len(grouped_blocks)):
        grouped_blocks[i] = sorted(grouped_blocks[i], 
                                   key = lambda x: (x.coordinates[0]), 
                                   reverse= x_direction)
        
    grouped_sorted_blocks = [lp.Layout(grouped_blocks[i]) for i in range(max(block_group)+1)]

    return grouped_sorted_blocks

Environment
I'm on Mac, using layoutparser version 0.2.0, working in a conda virtual environment.

Screenshots
Lets say I have the following row.
image
The original function reads it as
image
My modification reads it as
image

lp.Detectron2LayoutModel often hangs on download step (config/model)

Describe the bug
When downloading the config.yaml or model_final.pth of a new (to user) model, the program often hangs.

To Reproduce
Steps to reproduce the behavior:

  1. What command or script did you run?
model_PLN_a = lp.Detectron2LayoutModel(
            config_path ='lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config', # In model catalog
            label_map   ={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}, # In model`label_map`
            extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8] # Optional
        )

Environment

  1. Please describe your Platform [Windows/MacOS/Linux]: Windows 10, WSL2
  2. Please show the Layout Parser version: 0.2.0

Error traceback
When I manually end the script, this is the traceback

>>> model_PLN_b = lp.Detectron2LayoutModel(
...             config_path ='lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config', # In model catalog
...             label_map   ={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}, # In model`label_map`
...             extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.4] # Optional
...         )
config.yaml?dl=1: 8.19kB [00:01, 5.99kB/s]
model_final.pth?dl=1: 0.00B [04:57, ?B/s]^C
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/bin/venv-wsl/lib/python3.8/site-packages/layoutparser/models/layoutmodel.py", line 124, in __init__
    self._create_model()
  File "/usr/bin/venv-wsl/lib/python3.8/site-packages/layoutparser/models/layoutmodel.py", line 149, in _create_model
    self.model = self._engine.DefaultPredictor(self.cfg)
  File "/usr/bin/venv-wsl/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 222, in __init__
    checkpointer.load(cfg.MODEL.WEIGHTS)
  File "/usr/bin/venv-wsl/lib/python3.8/site-packages/fvcore/common/checkpoint.py", line 140, in load
    path = self.path_manager.get_local_path(path)
  File "/usr/bin/venv-wsl/lib/python3.8/site-packages/iopath/common/file_io.py", line 1107, in get_local_path
    return handler._get_local_path(
  File "/usr/bin/venv-wsl/lib/python3.8/site-packages/iopath/common/file_io.py", line 766, in _get_local_path
    cached = download(path, dirname, filename=filename)
  File "/usr/bin/venv-wsl/lib/python3.8/site-packages/iopath/common/download.py", line 58, in download
    tmp, _ = request.urlretrieve(url, filename=tmp, reporthook=hook(t))
  File "/usr/lib/python3.8/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/lib/python3.8/urllib/request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 1360, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/usr/lib/python3.8/urllib/request.py", line 1317, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/lib/python3.8/http/client.py", line 1230, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1276, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1225, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1004, in _send_output
    self.send(msg)
  File "/usr/lib/python3.8/http/client.py", line 944, in send
    self.connect()
  File "/usr/lib/python3.8/http/client.py", line 1392, in connect
    super().connect()
  File "/usr/lib/python3.8/http/client.py", line 915, in connect
    self.sock = self._create_connection(
  File "/usr/lib/python3.8/socket.py", line 796, in create_connection
    sock.connect(sa)

** Extra comments
Cancelling the script and re-running it eventually leads to a successful download.

I assume the issue is in how WSL2 handles network connections. Would it be possible to include a timeout and/or retry element to the download instruction? The first would at least indicate to the user that there is a network issue, the latter might fix the issue.

conflicting dependencies

If I follow the installation steps currently in the README, I get the following from the last pip command (i.e. when installing layout-parser a second time, after installing detectron2):

Installing collected packages: pycocotools, fvcore
  Attempting uninstall: pycocotools
    Found existing installation: pycocotools 2.0.2
    Uninstalling pycocotools-2.0.2:
      Successfully uninstalled pycocotools-2.0.2
  Attempting uninstall: fvcore
    Found existing installation: fvcore 0.1.2.post20210128
    Uninstalling fvcore-0.1.2.post20210128:
      Successfully uninstalled fvcore-0.1.2.post20210128
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ocrd-segment 0.1.3 requires pycocotools>=2.0.2, but you have pycocotools 2.0.1 which is incompatible.
detectron2 0.3 requires fvcore<0.1.3,>=0.1.2, but you have fvcore 0.1.1.post20200623 which is incompatible.
detectron2 0.3 requires pycocotools>=2.0.2, but you have pycocotools 2.0.1 which is incompatible.
Successfully installed fvcore-0.1.1.post20200623 pycocotools-2.0.1

So it seems we are in conflict with current master of detectron2 here.

Could you point me to the right version of detectron2 to fetch, or update layout-parser accordingly?

Also, why not make the detectron2 dependency explicit in setup.py?

Intersect operation

The intersect operation always returns results, however, it is not true when two blocks are not overlapped.

Add option to show coordinate guidelines when calling draw_text

Motivation

Hi everyone, thanks for this library! I've been testing it out and it's quite good. I'd like to have a feature request of displaying coordinate guidelines in draw_text to aid in filtering relevant info that I want.

My workflow

I've been using LayoutParser in tandem with Google Cloud Vision. After I get the image, I usually call draw_text.
As an example (this is public information of an LGU spend from the Philippines):

image

If I want to get the Current Assets for 2017, I'd still need to do some trial-and-error to filter the exact coordinates for my rectangle. Maybe I'd try 100 first, then 120, etc.

# I still need to trial-and-error the coordinates here. 
# I hope there's a way to better guesstimate this
filtered_assets = layout.filter_by(
    lp.Rectangle(x_1=132, y_1=300, x_2=264, y_2=840)
)

Request: I'd appreciate it if this function also has an option to display coordinate guidelines, so that I can easily "guess-timate" parts I want to filter.

I'm interested to contribute so please let me know which part of the code I can inspect. Thank you!

Installation on Windows

Hello,

There are some tricky problems when trying to install this on Windows 10. When running the pip install command:

Collecting layoutparser
Using cached layoutparser-0.1.3-py3-none-any.whl (19.1 MB)
Requirement already satisfied: pyyaml>=5.1 in c:\programdata\anaconda3\lib\site-packages (from layoutparser) (5.4.1)
Requirement already satisfied: torch in c:\programdata\anaconda3\lib\site-packages (from layoutparser) (1.8.0)
Collecting fvcore==0.1.1.post20200623
Using cached fvcore-0.1.1.post20200623-py3-none-any.whl
Requirement already satisfied: pandas in c:\programdata\anaconda3\lib\site-packages (from layoutparser) (1.2.3)
Collecting opencv-python
Using cached opencv_python-4.5.1.48-cp38-cp38-win_amd64.whl (34.9 MB)
Requirement already satisfied: numpy in c:\programdata\anaconda3\lib\site-packages (from layoutparser) (1.19.2)
Requirement already satisfied: torchvision in c:\programdata\anaconda3\lib\site-packages (from layoutparser) (0.9.0)
Collecting pycocotools==2.0.1
Using cached pycocotools-2.0.1.tar.gz (23 kB)
Requirement already satisfied: pillow in c:\programdata\anaconda3\lib\site-packages (from layoutparser) (8.1.2)
Requirement already satisfied: tabulate in c:\programdata\anaconda3\lib\site-packages (from fvcore==0.1.1.post20200623->layoutparser) (0.8.9)
Requirement already satisfied: portalocker in c:\programdata\anaconda3\lib\site-packages (from fvcore==0.1.1.post20200623->layoutparser) (2.2.1)
Requirement already satisfied: yacs>=0.1.6 in c:\programdata\anaconda3\lib\site-packages (from fvcore==0.1.1.post20200623->layoutparser) (0.1.8)
Requirement already satisfied: tqdm in c:\programdata\anaconda3\lib\site-packages (from fvcore==0.1.1.post20200623->layoutparser) (4.59.0)
Requirement already satisfied: termcolor>=1.1 in c:\programdata\anaconda3\lib\site-packages (from fvcore==0.1.1.post20200623->layoutparser) (1.1.0)
Requirement already satisfied: setuptools>=18.0 in c:\programdata\anaconda3\lib\site-packages (from pycocotools==2.0.1->layoutparser) (52.0.0.post20210125)
Requirement already satisfied: cython>=0.27.3 in c:\programdata\anaconda3\lib\site-packages (from pycocotools==2.0.1->layoutparser) (0.29.22)
Requirement already satisfied: matplotlib>=2.1.0 in c:\programdata\anaconda3\lib\site-packages (from pycocotools==2.0.1->layoutparser) (3.3.4)
Requirement already satisfied: cycler>=0.10 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=2.1.0->pycocotools==2.0.1->layoutparser) (0.10.0)
Requirement already satisfied: python-dateutil>=2.1 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=2.1.0->pycocotools==2.0.1->layoutparser) (2.8.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=2.1.0->pycocotools==2.0.1->layoutparser) (2.4.7)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=2.1.0->pycocotools==2.0.1->layoutparser) (1.3.1)
Requirement already satisfied: six in c:\programdata\anaconda3\lib\site-packages (from cycler>=0.10->matplotlib>=2.1.0->pycocotools==2.0.1->layoutparser) (1.15.0)
Requirement already satisfied: pytz>=2017.3 in c:\programdata\anaconda3\lib\site-packages (from pandas->layoutparser) (2021.1)
Requirement already satisfied: pywin32!=226 in c:\programdata\anaconda3\lib\site-packages (from portalocker->fvcore==0.1.1.post20200623->layoutparser) (227)
Requirement already satisfied: typing-extensions in c:\programdata\anaconda3\lib\site-packages (from torch->layoutparser) (3.7.4.3)
Building wheels for collected packages: pycocotools
Building wheel for pycocotools (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: 'C:\ProgramData\Anaconda3\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\simon\AppData\Local\Temp\pip-install-89f1380y\pycocotools_f23f85d993154668b0597a2f67daedc9\setup.py'"'"'; file='"'"'C:\Users\simon\AppData\Local\Temp\pip-install-89f1380y\pycocotools_f23f85d993154668b0597a2f67daedc9\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\simon\AppData\Local\Temp\pip-wheel-uyclbppl'
cwd: C:\Users\simon\AppData\Local\Temp\pip-install-89f1380y\pycocotools_f23f85d993154668b0597a2f67daedc9
Complete output (22 lines):
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-3.8
creating build\lib.win-amd64-3.8\pycocotools
copying pycocotools\coco.py -> build\lib.win-amd64-3.8\pycocotools
copying pycocotools\cocoeval.py -> build\lib.win-amd64-3.8\pycocotools
copying pycocotools\mask.py -> build\lib.win-amd64-3.8\pycocotools
copying pycocotools_init_.py -> build\lib.win-amd64-3.8\pycocotools
running build_ext
cythoning pycocotools/_mask.pyx to pycocotools_mask.c
C:\ProgramData\Anaconda3\lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\simon\AppData\Local\Temp\pip-install-89f1380y\pycocotools_f23f85d993154668b0597a2f67daedc9\pycocotools_mask.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
building 'pycocotools._mask' extension
creating build\temp.win-amd64-3.8
creating build\temp.win-amd64-3.8\Release
creating build\temp.win-amd64-3.8\Release\common
creating build\temp.win-amd64-3.8\Release\pycocotools
C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\ProgramData\Anaconda3\lib\site-packages\numpy\core\include -I./common -IC:\ProgramData\Anaconda3\include -IC:\ProgramData\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" /Tc./common/maskApi.c /Fobuild\temp.win-amd64-3.8\Release./common/maskApi.obj -Wno-cpp -Wno-unused-function -std=c99
cl : Command line error D8021 : invalid numeric argument '/Wno-cpp'
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\bin\HostX86\x64\cl.exe' failed with exit status 2

ERROR: Failed building wheel for pycocotools
Running setup.py clean for pycocotools
Failed to build pycocotools
Installing collected packages: pycocotools, opencv-python, fvcore, layoutparser
Attempting uninstall: pycocotools
Found existing installation: pycocotools 2.0
Uninstalling pycocotools-2.0:
Successfully uninstalled pycocotools-2.0
Running setup.py install for pycocotools ... error
ERROR: Command errored out with exit status 1:
command: 'C:\ProgramData\Anaconda3\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\simon\AppData\Local\Temp\pip-install-89f1380y\pycocotools_f23f85d993154668b0597a2f67daedc9\setup.py'"'"'; file='"'"'C:\Users\simon\AppData\Local\Temp\pip-install-89f1380y\pycocotools_f23f85d993154668b0597a2f67daedc9\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\simon\AppData\Local\Temp\pip-record-9im9zem_\install-record.txt' --single-version-externally-managed --compile --install-headers 'C:\ProgramData\Anaconda3\Include\pycocotools'
cwd: C:\Users\simon\AppData\Local\Temp\pip-install-89f1380y\pycocotools_f23f85d993154668b0597a2f67daedc9
Complete output (20 lines):
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.8
creating build\lib.win-amd64-3.8\pycocotools
copying pycocotools\coco.py -> build\lib.win-amd64-3.8\pycocotools
copying pycocotools\cocoeval.py -> build\lib.win-amd64-3.8\pycocotools
copying pycocotools\mask.py -> build\lib.win-amd64-3.8\pycocotools
copying pycocotools_init_.py -> build\lib.win-amd64-3.8\pycocotools
running build_ext
skipping 'pycocotools_mask.c' Cython extension (up-to-date)
building 'pycocotools.mask' extension
creating build\temp.win-amd64-3.8
creating build\temp.win-amd64-3.8\Release
creating build\temp.win-amd64-3.8\Release\common
creating build\temp.win-amd64-3.8\Release\pycocotools
C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\ProgramData\Anaconda3\lib\site-packages\numpy\core\include -I./common -IC:\ProgramData\Anaconda3\include -IC:\ProgramData\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" /Tc./common/maskApi.c /Fobuild\temp.win-amd64-3.8\Release./common/maskApi.obj -Wno-cpp -Wno-unused-function -std=c99
cl : Command line error D8021 : invalid numeric argument '/Wno-cpp'
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\bin\HostX86\x64\cl.exe' failed with exit status 2
----------------------------------------
Rolling back uninstall of pycocotools
Moving to c:\programdata\anaconda3\lib\site-packages\pycocotools-2.0.dist-info
from C:\ProgramData\Anaconda3\Lib\site-packages~ycocotools-2.0.dist-info
Moving to c:\programdata\anaconda3\lib\site-packages\pycocotools
from C:\ProgramData\Anaconda3\Lib\site-packages~ycocotools
ERROR: Command errored out with exit status 1: 'C:\ProgramData\Anaconda3\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\simon\AppData\Local\Temp\pip-install-89f1380y\pycocotools_f23f85d993154668b0597a2f67daedc9\setup.py'"'"'; file='"'"'C:\Users\simon\AppData\Local\Temp\pip-install-89f1380y\pycocotools_f23f85d993154668b0597a2f67daedc9\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\simon\AppData\Local\Temp\pip-record-9im9zem
\install-record.txt' --single-version-externally-managed --compile --install-headers 'C:\ProgramData\Anaconda3\Include\pycocotools' Check the logs for full command output.

So apparently it is a known issue with pycocotools and installation on Windows. There is a workaround for this here:
https://github.com/cocodataset/cocoapi/issues/169
which indeed enables me to install pycocotools.
But when I then run the pip install layoutparser command again, it doesn't work again, giving the same error. It seems to do that because it wants to re-install pycoco?

Thanks for your help on this.

Apply detect() on readable PDF files

Hi there,
from the docs I infere that detect() operates, for example, on PIL.Image objects. Is there way to directly operate on already readable PDF files (which obviates the need applying OCR as well).
Greetings

Error installing dependencies

Hi Team,
Thank you for all the great work. It looks amazing.
I tried installing pip install layoutparser but it thrown me the below error,
can you please let me know how to rectify this,

ERROR: Command errored out with exit status 1:
command: 'C:\Program Files\Anaconda\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-install-s13j7o41\pycocotools_6c1fc2cce84542a8be1c0cbeacfda632\setup.py'"'"'; file='"'"'C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-install-s13j7o41\pycocotools_6c1fc2cce84542a8be1c0cbeacfda632\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-wheel-awmfv0cr'
cwd: C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-install-s13j7o41\pycocotools_6c1fc2cce84542a8be1c0cbeacfda632
Complete output (22 lines):
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-3.8
creating build\lib.win-amd64-3.8\pycocotools
copying pycocotools\coco.py -> build\lib.win-amd64-3.8\pycocotools
copying pycocotools\cocoeval.py -> build\lib.win-amd64-3.8\pycocotools
copying pycocotools\mask.py -> build\lib.win-amd64-3.8\pycocotools
copying pycocotools_init_.py -> build\lib.win-amd64-3.8\pycocotools
running build_ext
cythoning pycocotools/_mask.pyx to pycocotools_mask.c
C:\Users\pss.ch\AppData\Roaming\Python\Python38\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-install-s13j7o41\pycocotools_6c1fc2cce84542a8be1c0cbeacfda632\pycocotools_mask.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
building 'pycocotools._mask' extension
creating build\temp.win-amd64-3.8
creating build\temp.win-amd64-3.8\Release
creating build\temp.win-amd64-3.8\Release\common
creating build\temp.win-amd64-3.8\Release\pycocotools
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\pss.ch\AppData\Roaming\Python\Python38\site-packages\numpy\core\include -I./common "-IC:\Program Files\Anaconda\include" "-IC:\Program Files\Anaconda\include" "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\ATLMFC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\Program Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\winrt" /Tc./common/maskApi.c /Fobuild\temp.win-amd64-3.8\Release./common/maskApi.obj -Wno-cpp -Wno-unused-function -std=c99
cl : Command line error D8021 : invalid numeric argument '/Wno-cpp'
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe' failed with exit status 2

ERROR: Failed building wheel for pycocotools
ERROR: Command errored out with exit status 1:
command: 'C:\Program Files\Anaconda\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-install-s13j7o41\pycocotools_6c1fc2cce84542a8be1c0cbeacfda632\setup.py'"'"'; file='"'"'C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-install-s13j7o41\pycocotools_6c1fc2cce84542a8be1c0cbeacfda632\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-record-w4euj5sb\install-record.txt' --single-version-externally-managed --user --prefix= --compile --install-headers 'C:\Users\pss.ch\AppData\Roaming\Python\Python38\Include\pycocotools'
cwd: C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-install-s13j7o41\pycocotools_6c1fc2cce84542a8be1c0cbeacfda632
Complete output (20 lines):
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.8
creating build\lib.win-amd64-3.8\pycocotools
copying pycocotools\coco.py -> build\lib.win-amd64-3.8\pycocotools
copying pycocotools\cocoeval.py -> build\lib.win-amd64-3.8\pycocotools
copying pycocotools\mask.py -> build\lib.win-amd64-3.8\pycocotools
copying pycocotools_init_.py -> build\lib.win-amd64-3.8\pycocotools
running build_ext
skipping 'pycocotools_mask.c' Cython extension (up-to-date)
building 'pycocotools._mask' extension
creating build\temp.win-amd64-3.8
creating build\temp.win-amd64-3.8\Release
creating build\temp.win-amd64-3.8\Release\common
creating build\temp.win-amd64-3.8\Release\pycocotools
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\AppData\Roaming\Python\Python38\site-packages\numpy\core\include -I./common "-IC:\Program Files\Anaconda\include" "-IC:\Program Files\Anaconda\include" "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\ATLMFC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\Program Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\winrt" /Tc./common/maskApi.c /Fobuild\temp.win-amd64-3.8\Release./common/maskApi.obj -Wno-cpp -Wno-unused-function -std=c99
cl : Command line error D8021 : invalid numeric argument '/Wno-cpp'
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe' failed with exit status 2
----------------------------------------
ERROR: Command errored out with exit status 1: 'C:\Program Files\Anaconda\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-install-s13j7o41\pycocotools_6c1fc2cce84542a8be1c0cbeacfda632\setup.py'"'"'; file='"'"'C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-install-s13j7o41\pycocotools_6c1fc2cce84542a8be1c0cbeacfda632\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-record-w4euj5sb\install-record.txt' --single-version-externally-managed --user --prefix= --compile --install-headers 'C:\AppData\Roaming\Python\Python38\Include\pycocotools' Check the logs for full command output.

Model download hangs

When I try to download a layout model, at some point of the download (usually around 20 %), the download speed goes from 5 mb/s to close to zero. Then at some point the download stops and gives "connection reset by peer" error. What
could be the reason for this?

To Reproduce
Steps to reproduce the behavior:

  1. What command or script did you run?
model = lp.Detectron2LayoutModel('lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config',
                                 extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
                                 label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"})

Environment

  1. Please describe your Platform [Windows/MacOS/Linux]: Mac
  2. Please show the Layout Parser version: 0.2.0

Retrieving text inside layouts.

Hi, thank you very much for your brilliant work.
I have successfully installed and run the parserlayout package on my win10.
However, as I come from a non-computing/ data science background, I've currently been stuck on how to retrieve the text inside layouts and restore them into a dataframe for further analysis.
Would you be able to provide any keywords or links about how to do the tasks? Any word will be very much appreciated.
Thank you a lot.

Extracting information from Invoices

HI Team,

Is it possible to extract the information from invoices of different format and put it in a structured format in an excel?
Please let me know. i am keen to use this library. thanks

Use ONNX models to avoid installing Detectron2

Motivation
In order to ease the installation for Windows users (i.e. avoid installing Detectron2 to use pre-trained models), why not converting the Detectron2 models to ONNX for use? It would also allow using your trained models from other laguages, e.g. C#/.Net. The model converted was also smaller - half the size (from 816MB for the .pth to 408MB for the .onnx)

Related resources
I've created a repos here with a simple PoC notebook explaining how to convert the Detectron2 model into ONNX and use the ONNX model (model used was mask_rcnn_X_101_32x8d_FPN_3x).

It uses the export_model.py tool available in the detectron2 repos here

I managed to convert the model using the following command:

python export_model.py --sample-image ...\layout-parser\data\foo.0_raw.png --config-file .../layout-parser/models/PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config.yaml --output ./output --export-method caffe2_tracing --format onnx MODEL.WEIGHTS .../layout-parser/models/PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/model_final.pth MODEL.DEVICE cpu

Additional context
Difference between the original model and exported model would need to be understood as the conversion might not implement every post-processing steps

Handling skewed text

Hi, and thanks for the nice work!

I'm using the PrimaLayout model to detect layout in scanned documents. Most of the documents have been scanned at a slight angle, so the text is a bit skewed. The effectiveness of the model seems to vary a lot between images. When I test the model with rotated samples of a single document, it seems that only a single degree of rotation can impact the result a lot at a certain threshold. So I was curious if the PrimaLayout model was trained with image rotations as part of the augmentation pipeline? If not, could such augmentations make the model more robust to skewed text? Maybe the simplest hack in my current project is to deskew the images upfront?

Multi modal approach to LP's Deep Layout Parsing capability

Motivation
So basically when it comes to layout Parsing of Forms and other such structured data. I have noticed that just having access to the image features of a region of interest could lead to quite a few false positives. If we could have a multimodal approach where we also take into consideration the text present within these regions, to then form a richer representation, we could considerablbly improve the performance over the existing pure object detection methodology.

Ofcourse this is relevant only for structured documents like forms and invoices. But I'm guessing a vast majority of your users, much like myself would be interested in such a feature.

PS: Would love to work on developing such a feature with you all.

For reference: a form like this.

@lolipopshock

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.