Comments (6)
Yes, it is possible. The main modification lies in the number of categories and the corresponding mappings. Change the SROIE_CLASS_LIST, TAG_TO_IDX, and TAG_TO_IDX_BIO in train_SROIE.py
and eval_SROIE.py
to your custom entity type, then change the num_classes
term in the config yaml file. You may also need to modify the postprocessing rules in eval_SROIE.py
accordingly.
from vibertgrid-pytorch.
Thank you very much for your very fast answer. But I did not understand how modify B- or I- tag. Can you modify for me, according to my expand sample
SROIE_CLASS_LIST = ["others", "company", "date", "address", "total"]
TAG_TO_IDX = {
"O": 0,
"B-company": 1,
"B-date": 2,
"B-address": 3,
"B-total": 4,
}
TAG_TO_IDX_BIO = {
"O": 0,
"B-company": 1,
"I-company": 2,
"B-date": 3,
"I-date": 4,
"B-address": 5,
"I-address": 6,
"B-total": 7,
"I-total": 8,
}
from vibertgrid-pytorch.
And one more question.
I have to use entities for training SORIE's entities as following
{
"company": "BOOK TA .K (TAMAN DAYA) SDN BHD",
"date": "25/12/2018",
"address": "NO.53 55,57 & 59, JALAN SAGU 18, TAMAN DAYA, 81100 JOHOR BAHRU, JOHOR.",
"total": "9.00"
}
**or just can I use only box and scripts file without entities **
1,83,41,331,41,331,78,83,78,TAN WOON YANN,other
1,109,171,330,171,330,191,109,191,MR D.I.Y. (M) SDN BHD,company
1,122,190,325,190,325,213,122,213,(CO. RFG : 860671-D),other
1,47,208,391,208,391,233,47,233,LOT 1851-A & 1851-B, JALAN KPB 6,,address
1,62,235,381,235,381,254,62,254,KAWASAN PERINDUSTRIAN BALAKONG,,address
1,70,256,384,256,384,275,70,275,43300 SERI KEMBANGAN, SELANGOR,address
1,125,275,318,275,318,297,125,297,(TESCO PUTRA NILAI),other
1,177,295,266,295,266,317,177,317,-INVOICE-,other
1,12,337,402,337,402,362,12,362,KILAT AUTO ECO WASH & SHINE ES1000 1L,other
1,20,360,160,360,160,383,20,383,WA45 /2A - 12,other
1,16,382,156,382,156,402,16,402,9555916500133,other
from vibertgrid-pytorch.
Thank you very much for your very fast answer. But I did not understand how modify B- or I- tag. Can you modify for me, according to my expand sample
SROIE_CLASS_LIST = ["others", "company", "date", "address", "total"] TAG_TO_IDX = { "O": 0, "B-company": 1, "B-date": 2, "B-address": 3, "B-total": 4, } TAG_TO_IDX_BIO = { "O": 0, "B-company": 1, "I-company": 2, "B-date": 3, "I-date": 4, "B-address": 5, "I-address": 6, "B-total": 7, "I-total": 8, }
For example, if your entity types are [others, type1, type2, type3]
, the corresponding IDX maps should be:
TAG_TO_IDX = {
"O": 0, # Remember to keep the background type (others, or O tag) as the first term
"B-type1": 1,
"B-type2": 2,
"B-type3": 3,
}
TAG_TO_IDX_BIO = {
"O": 0, # Remember to keep the background type (others, or O tag) as the first term
"B-type1": 1,
"I-type1": 2,
"B-type2": 3,
"I-type2": 4,
"B-type3": 5,
"I-type3": 6,
}
You may also use the following codes to generate the corresponding mappings:
SROIE_CLASS_LIST = ["others", "company", "date", "time", "address", "total", "tax", "sub_total"]
TAG_TO_IDX_ = ["O"]
TAG_TO_IDX_BIO_ = ["O"]
for cls_type in SROIE_CLASS_LIST[1:]:
TAG_TO_IDX_.append(f"B-{cls_type}")
TAG_TO_IDX_BIO_.append(f"B-{cls_type}")
TAG_TO_IDX_BIO_.append(f"I-{cls_type}")
TAG_TO_IDX = {s: i for i, s in enumerate(TAG_TO_IDX_)}
TAG_TO_IDX_BIO = {s: i for i, s in enumerate(TAG_TO_IDX_BIO_)}
from vibertgrid-pytorch.
And one more question.
I have to use entities for training SORIE's entities as following
{ "company": "BOOK TA .K (TAMAN DAYA) SDN BHD", "date": "25/12/2018", "address": "NO.53 55,57 & 59, JALAN SAGU 18, TAMAN DAYA, 81100 JOHOR BAHRU, JOHOR.", "total": "9.00" }
**or just can I use only box and scripts file without entities **
1,83,41,331,41,331,78,83,78,TAN WOON YANN,other 1,109,171,330,171,330,191,109,191,MR D.I.Y. (M) SDN BHD,company 1,122,190,325,190,325,213,122,213,(CO. RFG : 860671-D),other 1,47,208,391,208,391,233,47,233,LOT 1851-A & 1851-B, JALAN KPB 6,,address 1,62,235,381,235,381,254,62,254,KAWASAN PERINDUSTRIAN BALAKONG,,address 1,70,256,384,256,384,275,70,275,43300 SERI KEMBANGAN, SELANGOR,address 1,125,275,318,275,318,297,125,297,(TESCO PUTRA NILAI),other 1,177,295,266,295,266,317,177,317,-INVOICE-,other 1,12,337,402,337,402,362,12,362,KILAT AUTO ECO WASH & SHINE ES1000 1L,other 1,20,360,160,360,160,383,20,383,WA45 /2A - 12,other 1,16,382,156,382,156,402,16,402,9555916500133,other
For the training phase, only the latter one is required. The codes directly parse the annotations and generate the corresponding BIO tags.
from vibertgrid-pytorch.
I will try. Thank you very much for your support and effort. Have nice days.
from vibertgrid-pytorch.
Related Issues (11)
- Training on custom dataset HOT 4
- Hi,could you share example configs of funsd dataset ? HOT 6
- SROIE dataset issues. HOT 5
- Model Training. HOT 2
- FUNSD dataset - empty key_dict HOT 2
- Validation in CRF mode HOT 2
- For Inference Pre-trained weights are not available. Inference running giving errors. HOT 2
- No predictions in inference. HOT 16
- About SROIE annotations HOT 10
- Configure sroie_data_preprocessing.py for expand CLASS_LIST HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vibertgrid-pytorch.