Giter VIP home page Giter VIP logo

microsoft / ocr-form-tools Goto Github PK

View Code? Open in Web Editor NEW
504.0 30.0 169.0 21.49 MB

A set of tools to use in Microsoft Azure Form Recognizer and OCR services.

License: MIT License

Dockerfile 0.04% Shell 0.18% HTML 0.10% JavaScript 0.27% CSS 0.01% TypeScript 89.58% Python 4.44% SCSS 5.36% Procfile 0.01%
ocr-form-labeling rpa machine-learning machine-learning-algorithms form-recognizer labeling-tool typescript

ocr-form-tools's People

Contributors

alegarro avatar alex-krasn avatar beachside-project avatar buddhawang avatar cennis-endpoint avatar chessyhsu avatar cschenio avatar ctstone avatar dependabot[bot] avatar imicknl avatar kunzms avatar laujan avatar luzhang06 avatar microsoft-github-policy-service[bot] avatar sam961124 avatar simotw avatar starain-pactera avatar stew-ro avatar tianxiangs avatar v-yuhang avatar xinase avatar xinxingliu avatar yongbing-chen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ocr-form-tools's Issues

Easily create sample code based on user's config

scenario
• After trying the whole end-to-end scenario, a user already provides all necessary parameters and info to finish an end-to-end operation with code. I wish there is a button of “turn this into python code”. It will produce a python code which:
○ Loads the labeled files
○ Train
○ Get the train result
○ Predict use the file path (one file or multiple files)
○ Display the prediction result
○ Reference of Python code: https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/quickstarts/python-labeled-data

UI improvement for "Predict" page

UI change for "Predict" page:
change the "Predict" to analyze
• Rename heading text to “Upload file and analyze” so it’s clearer what the overall user action is.
• Separate and change “Result” heading to “Results”
• Add a label for Page # / Field name / Value plus Confidence %
• Remove “P.” from page number. Add tooltip if necessary.
• Add horizontal line and move the “Download result (JSON)” button down to the bottom since it’s a secondary action.
• Add space (4px) between Analyze icon and Analyze text at the top.

also, have a button for "Generate sample code"

in our docs, we should make sure the word Predict will be changed to Analyze over time. we could keep Predict in some places for backward compat.

Suggestion: Process Folders in blob storage

Is your feature request related to a problem? Please describe.
The number of files in a model can be quite large when handling many document types - like invoices, etc. Having to maintain all the files in one big list with unique file names, etc is difficult.
For each company we deal with - they send around 7 different types of documents. For each type of document we need 5 samples. So with only 10 companies we are already managing 350 files!

Describe the solution you'd like
Use files in folders for labeling and model building.
At a minimum, that would mean displaying them just like any other root folder item.

  • Optionally, the UI could represent folders on the thumbnail view and only show files from that one folder (or show all).
  • Optionally, the setup could have a comma separated list of folders to include or exclude, and or take multiple glob expressions.

Describe alternatives you've considered
Without this end up with 1000s of files to manage in one big folder.

show a visualized analyze result page and field raw value vs. post-processed value

Is your feature request related to a problem? Please describe.
after getting the result, I couldn't see the result in the UI, just like labeling UI

Describe the solution you'd like
after I get the analyzed result, I'd like to see the fields being highlighted, also show the value extracted. It might be different from the value in the image/pdf, for example, I specified the field type as "no-whitespace", I'd like to compare the raw data vs. the analyzed result.

Describe alternatives you've considered
I have to write my own code to show this, which is re-inventing the wheel, as FOTT has such code already in labeling process.

Suggestion: Add document management

Is your feature request related to a problem? Please describe.
Add ability to add or remove a document from the list. As it is now, you have to go directly to blob storage and upload/delete from there instead of this nice UI.

Describe the solution you'd like
A button to add a document and a way to delete an existing doc. (This would delete the ocr/label data that goes with it)

Describe alternatives you've considered
Separately uploading deleting through another UI seems like the only alternative.

simple improvement of the analyze sample code

code: https://github.com/microsoft/OCR-Form-Tools/blob/master/public/analyze.py

improvements:

  1. add more comments, e.g. Project info, Copyright info, comment for each funtion.
  2. add comment about the commandline parameter

optional:
3) add code to analyze all files in one local directory and save the result in another directory (with commandline parameters)

---- this could be done later ----
4) improve the 3) algorithm with multi-thread process so that the code could process 1000 of files.

support Checkbox labeling

layer
checkbox should have its own layer, only show checkbox UI when the layer is activated.

**UI **
have simple UI to support select the checkbox

field type support
checkbox should have its own fieldtype.

pass such value to backend for training

show result
in this UI, show training accuracy info and analyze result, and analyze result confidence info.

Predict multiple files in one directory, or Azure blob storage

the current tool could only run the prediction on one file, could it run prediction for multiple files?

for example, user could specify a blob storage, or a local file path, the tool will then load all files and run them thru prediction call one by one.
At the end, the tool could share the overall accuracy of process multiple files, also the individual result, if needed.

FOTT bug report - Reordering a tag quickly does not work

Describe the bug
In the Tags editor page, I have around 20+ tags and want to reorder one that is at the bottom of the tag list for labelling efficiency.
When I click on the arrow to reorder it several times in a row the tag doesn't go at all where expected.
It renders the reordering of labels for project with large number of label practically undoable.

To Reproduce
Steps to reproduce the behavior:

  1. Go to Tags Editor page
  2. Create around 10 tags
  3. Select a tag at the bottom of the list and try to move it up to first place
  4. See that the tag moves up and down on its own for a few seconds after last clicking on the Move Tag up arrow

Expected behavior
Reordering of tags working smoothly

Desktop (please complete the following information):

  • OS: macOS Catalina 10.15.4
  • Docker version: Docker Desktop Community 2.2.0.5
  • Browser: firefox
  • Version : latest docker image (ImageId: 23ba43da7425)

Additional context
Azure Storage is in France Central, Form Recognizer in West Europe

Prototype: receipt analyze

Receipt is a pre-built model, it's a good demo scenario towards building a comprehensive demo for FR.

  1. good UI, user could know where to get to this demo, and how to return back to do other things (home UI)
  2. good demo quality, could we have recommended demo files or URLs so that users could see the expected successful result? we don't want to see users struggling to find a demo image from their PC and end up with less-than-ideal result.
  3. consistent. do we also provide a "project"? what's in the project? do we need SaS token for blob storage account?
  4. partner with receipt team, after we finish the technical work, we could sync up with receipt team and try to deliver a demo they really want to show to customers.

FOTT bug report - Deselect tag content doesn't work

Describe the bug
After you have set the tag for a highlighted part of text, it can not be removed easily.

To Reproduce
Steps to reproduce the behavior:

  1. Highlight a part of the text
  2. Apply the text to a tag
  3. Try to remove the selected words from this tag, without reapplying it to another tag.

Expected behavior
Have a way to clear a tag, without deleting the tag. Currently I assign the selected text to another new tag, which I than remove. This should be way easier.

Desktop (please complete the following information):

  • OS: Mac
  • Docker version: latest 1.0.0 and 2.0.0-202fb2f
  • Browser: Edge Chromium Canary (Mac)
  • Version: 84.0.488.0

support Model Compose feature

This feature needs to align with model compose API release date.

basic feature:

  1. list models in a subscription
  2. view model basic info
  3. could select several model and compose a new model
  4. given an doc (pdf or image), could run prediction on a selected model. (this is the same as today)

regularly check - "what's new"

we have weekly update of stable builds, and sometimes daily update to fix issues.
we want users to know we have new features and important fixes.

it seems most people would just re-use the same webpage instance, it would be nice for the web page to check and show "what's new" in the appropriate places.

this also enables a simple heart-beat count so that server could know how many instances are out there without any user info.

auto suggest field type when labeling

right now, I need to label a field and then pick the field type from the dropdown list, for every single field.

expect:
since I had clicked on the field value, from the value, it's relative easy to infer the field type. could the tool automatically set the field type for me?

for ambiguous field types, the tool could pick one of them

once user makes a manual selection, respect that selection.

easy re-use of field names when labeling for similar models.

user wish: I am creating 10 invoice models for 10 providers one needs to input the labels per project 10 times and make sure he is labeling them the same names.

“Labels are ‘common’. I want the same labels available for every model. We could re-type them in for each one – but they all have the same types of labels – and if it does not have that label on that document type we just don’t label it. Just want to make sure one person does not label it as ‘ExpirationDate’ and then on another model it is called ‘Expiration’. I need those consistent when I get the data back from the models to be able to process it!

wish: in near future:
same user could re-use a set of commonly used field names.
could import a file which contains a list of field names.

wish: future: (need more analysis)
share such common field names between different users.

Suggestion: Add lasso selection functionality to Form Labeling tool

copied from https://cognitive.uservoice.com/forums/921556-form-recognizer/suggestions/39850153-add-lasso-selection-functionality-to-form-labeling

It would be really nice to have a "Lasso" selection functionality in the form labeling tool. I know I can hold the left mouse button down to highlight multiple words but there are some areas of forms (e.g. remarks) where we have several hundred words that have to be selected. They usually are in a rectangle shape so hence a lasso selection would work perfectly.

FOTT bug report: need better error message for CORS

we need to configure cross-domain resource sharing (CORS) before we could use FoTT, but, several users are stuck due to CORs not enabled on the blob. the error message is not clear. it says:

Failed to send request to http://...

expect:
Any chance we can make the error more informative and tell the user CORs is not enabled so that they can resolve it on their own, no need for support ?

External Project Management Support.

Is your feature request related to a problem? Please describe.
I need a way to setup a project programmatically so a end user only has to tag images. I also need this solution to scale for thousands of projects.

Describe the solution you'd like
A page that could take a project security token, connection, and project as parameters and drop the user into the tagging UI would be best. A way to create the projects through an api in azure would also be helpful.

Describe alternatives you've considered
My current plan is to give my end users the keys to create their own projects and have them upload images through a custom ui I've built. After they've created a model I'll have them past the model id back into my application.

Additional context
I need my end users to manage the creation of around 11,000+ models to extract data from different formats.

Suggestion: Copy fields to another project

Is your feature request related to a problem? Please describe.
When working on multiple models with the same fields, it is a hassle to create them over and over again. I have tried copying the fields.json to another project, however that file will be overwritten with an empty one directly..

Describe the solution you'd like
A method to copy fields to another model project, or at least the possibility to upload it on the blob storage.

FOTT bug report - still has non-numeric text/space after label as numeric

When labeling customer specified one field as numeric .
They were expecting to receive data in numeric format that means:
no additional strings and no whitespaces .
When testing it on their sample they found that for some cases they Form Recognizer Service retrieves extra string as well and the figures have whitespaces between them (as they were an array of numbers)
expectation:
those noise was ruled out by the type specification .
For example they receive “UMH 00181” or “ 0 02 81”. they consider this not honoring the type specification during label time.

Suggestion: Store thumbnail image in blob storage for speed

Is your feature request related to a problem? Please describe.
When dealing with lots of documents the time to see thumbnails can take a bit as you scroll in the list of documents. The document (PDF in my case) has to be loaded client side and a thumbnail generated.

Describe the solution you'd like
Store the generated thumbnail and just load it instead. This would be much faster as just a straight image.
Optionally - flag in the setup to 'Store thumbnails in blob storage'.

Suggestion: Perform OCR on every document in list ahead of time

Is your feature request related to a problem? Please describe.
As I click on each document that has had labels applied - it must perform OCR on it and I must wait for it to finish. Instead of that - a button (or just do it) to go ahead and perform OCR on all documents that need it in the background would be nice.

Describe the solution you'd like
An option in the setup to 'Perform OCR on all documents without OCR data automatically in background'. If that is on, it does it - if it is off - it works like it does today.

Optionally - you could have a button/option that needs to be pressed that would do the same action.

Describe alternatives you've considered
Wait as you click each document for the OCR is a bit painful when it could have been easily done while I was labeling. The only alternative I know would be to write some scripts to do this myself, which seems a bit much considering this UI already does most of it.

bug: break in sidebar

Describe the bug
On 'create new project' page when viewed on non-Hi-Res screens (1980p and bellow) or when user has zoom-in/scale-up in browser settings on Hi-Res-screen - the sidebar does not extends all the way to the bottom of the screen. There’s a gap between the sidebar and status bar (look at screenshots).

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'initial screen '
  2. Click on 'New Project '
  3. Increase the scale or zoom-in little bit, then scroll down
  4. See the error on the left.

Expected behavior
No breaks in sidebar.

Screenshots
Screen Shot 2020-04-19 at 8 41 47 AM

Desktop (please complete the following information):

  • OS: any
  • Docker version: not tested
  • Browser: any
  • Version: 2.0.0-bce554e

Additional context

FOTT bug report: Prediction page does throw TypeError `getResolutionForZoom`

Describe the bug
Predict page throws TypeError, Cannot read property of 'getResolutionForZoom' of null.

To Reproduce
I am having a hard time reproducing it now when I create this issue. However, it did happen a few times today already.

Steps to reproduce the behavior:

  1. Go to predict page
  2. Upload file and press 'predict'
  3. ???

It has something to do with zooming in and resizing the screen. I will update this issue when I have more details.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
Screenshot 2020-04-02 at 20 00 20

Desktop (please complete the following information):

  • OS: Linux, Azure WebApps for Linux
  • Docker version: unknown
  • Browser: Edge Chromium Canary
  • Version: 82.0.459.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.