Comments (5)
About dataset re-labeling:
- We used an in-house labeling tool to re-label the data. First we generated the char-level OCR results (actually line-level with coordinates of each char) using an in-house OCR engine, then we marked the key-type category of each char comparing to the official SROIE gt labels.
About data label input format:
- Labes are in csv format. Each row should contain the left, top, right, bottom coordinates and the text string, key-type category of a word/text-line, see the example below
About args in pipeline/sroie_data_preprocessing.py
dir_test_root
refers to the root folder of the official raw SROIE test data, which should contain three folders:img
with the images,box
with the SROIE official OCR results, andkey
with the SROIE official key-type string labels.dir_test_processed
is the dir to the processed label files.
Here's my advice on arranging the dataset, if you want to use the sroie_data_preprocessing.py:
- Download the official SROIE dataset here(https://rrc.cvc.uab.es/?ch=13&com=downloads)
0325updated.task1train(626p)
contains images and OCR results of the traing set, put the images inimg
folder and txt files inbox
folder.txt
files in0325updatd.task2train(626p)
are key-type labels, put them in thekey
folder. The argdir_train_root
is the dir to the root of the three folders mentioned above. The argdir_train_processed
is the dir to the processed csv labels generated by thesroie_data_preprocessing.py
, namedlabel
. Put theimg
,key
andlabel
folders in the same root namedtrain
.- Download the raw data of test set from the link provided at the bottom of the official page, (https://rrc.cvc.uab.es/?com=downloads&action=download&ch=13&f=aHR0cHM6Ly9ycmMuY3ZjLnVhYi5lcy9kb3dubG9hZHMvU1JPSUVfdGVzdF9ndF90YXNrXzMuemlw and https://rrc.cvc.uab.es/?com=downloads&action=download&ch=13&f=aHR0cHM6Ly9ycmMuY3ZjLnVhYi5lcy9kb3dubG9hZHMvU1JPSUVfdGVzdF9pbWFnZXNfdGFza18zLnppcA==). Follow the instructions in
2
, but name the folder astest
. Put it in the same root withtrain
folder and named the root asSROIE
. - Change the arg
data_root
in the config yaml file to theSROIE
folder, then the dataloader can recognize the the data and load them automatically when calling train scripts.
from vibertgrid-pytorch.
Thanks for the earlier reply just a few more concerns I have,
- The txt file in box folder should be converted to csv format with format as you showed in the snippet ?
Thanks for the reply and help.
from vibertgrid-pytorch.
No, just keep it as the txt format. (See line 147 in sroie_data_preprocessing.py, I used a readline()
rather than pandas.read_csv()
). The csv format showed in the snippet is the required format of the processed labels.
from vibertgrid-pytorch.
Thanks for the reply, since you mentioned that we can use .txt file. I think there is some error
- line 109, sroie_data_preprocessing.py : You have mentioned directory to bbox csv file I think that should be directory to bbox txt file.
- After defining the directory we are calling data_parser function, in which we are calling data_preprocessing_pipeline function which is expecting bounding box in csv file and key in json format (line 316 and 318 respectfully)
That why when I am running it with .txt files in bbox folder I am getting an error that "FileNotFoundError: [Errno 2] No such file or directory"
from vibertgrid-pytorch.
from vibertgrid-pytorch.
Related Issues (10)
- Hi,could you share example configs of funsd dataset ? HOT 6
- Model Training. HOT 2
- FUNSD dataset - empty key_dict HOT 2
- Validation in CRF mode HOT 2
- For Inference Pre-trained weights are not available. Inference running giving errors. HOT 2
- No predictions in inference. HOT 16
- About SROIE annotations HOT 10
- I need help about customize entities of SROIE dataset HOT 6
- Training on custom dataset HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vibertgrid-pytorch.