Giter VIP home page Giter VIP logo

Comments (6)

ZeningLin avatar ZeningLin commented on June 20, 2024

Yes, it is possible. The main modification lies in the number of categories and the corresponding mappings. Change the SROIE_CLASS_LIST, TAG_TO_IDX, and TAG_TO_IDX_BIO in train_SROIE.py and eval_SROIE.py to your custom entity type, then change the num_classes term in the config yaml file. You may also need to modify the postprocessing rules in eval_SROIE.py accordingly.

from vibertgrid-pytorch.

kerberosargos avatar kerberosargos commented on June 20, 2024

Thank you very much for your very fast answer. But I did not understand how modify B- or I- tag. Can you modify for me, according to my expand sample

SROIE_CLASS_LIST = ["others", "company", "date", "address", "total"]

TAG_TO_IDX = {
    "O": 0,
    "B-company": 1,
    "B-date": 2,
    "B-address": 3,
    "B-total": 4,
}

TAG_TO_IDX_BIO = {
    "O": 0,
    "B-company": 1,
    "I-company": 2,
    "B-date": 3,
    "I-date": 4,
    "B-address": 5,
    "I-address": 6,
    "B-total": 7,
    "I-total": 8,
}

from vibertgrid-pytorch.

kerberosargos avatar kerberosargos commented on June 20, 2024

And one more question.

I have to use entities for training SORIE's entities as following

{
    "company": "BOOK TA .K (TAMAN DAYA) SDN BHD",
    "date": "25/12/2018",
    "address": "NO.53 55,57 & 59, JALAN SAGU 18, TAMAN DAYA, 81100 JOHOR BAHRU, JOHOR.",
    "total": "9.00"
} 

**or just can I use only box and scripts file without entities **

1,83,41,331,41,331,78,83,78,TAN WOON YANN,other
1,109,171,330,171,330,191,109,191,MR D.I.Y. (M) SDN BHD,company
1,122,190,325,190,325,213,122,213,(CO. RFG : 860671-D),other
1,47,208,391,208,391,233,47,233,LOT 1851-A & 1851-B, JALAN KPB 6,,address
1,62,235,381,235,381,254,62,254,KAWASAN PERINDUSTRIAN BALAKONG,,address
1,70,256,384,256,384,275,70,275,43300 SERI KEMBANGAN, SELANGOR,address
1,125,275,318,275,318,297,125,297,(TESCO PUTRA NILAI),other
1,177,295,266,295,266,317,177,317,-INVOICE-,other
1,12,337,402,337,402,362,12,362,KILAT AUTO ECO WASH & SHINE ES1000 1L,other
1,20,360,160,360,160,383,20,383,WA45 /2A - 12,other
1,16,382,156,382,156,402,16,402,9555916500133,other

from vibertgrid-pytorch.

ZeningLin avatar ZeningLin commented on June 20, 2024

Thank you very much for your very fast answer. But I did not understand how modify B- or I- tag. Can you modify for me, according to my expand sample

SROIE_CLASS_LIST = ["others", "company", "date", "address", "total"]

TAG_TO_IDX = {
    "O": 0,
    "B-company": 1,
    "B-date": 2,
    "B-address": 3,
    "B-total": 4,
}

TAG_TO_IDX_BIO = {
    "O": 0,
    "B-company": 1,
    "I-company": 2,
    "B-date": 3,
    "I-date": 4,
    "B-address": 5,
    "I-address": 6,
    "B-total": 7,
    "I-total": 8,
}

For example, if your entity types are [others, type1, type2, type3], the corresponding IDX maps should be:

TAG_TO_IDX = {
    "O": 0,    # Remember to keep the background type (others, or O tag) as the first term
    "B-type1": 1,
    "B-type2": 2,
    "B-type3": 3,
}

TAG_TO_IDX_BIO = {
    "O": 0,   # Remember to keep the background type (others, or O tag) as the first term
    "B-type1": 1,
    "I-type1": 2,
    "B-type2": 3,
    "I-type2": 4,
    "B-type3": 5,
    "I-type3": 6,
}

You may also use the following codes to generate the corresponding mappings:

SROIE_CLASS_LIST = ["others", "company", "date", "time", "address", "total", "tax", "sub_total"]

TAG_TO_IDX_ = ["O"]
TAG_TO_IDX_BIO_ = ["O"]
for cls_type in SROIE_CLASS_LIST[1:]:
    TAG_TO_IDX_.append(f"B-{cls_type}")
    TAG_TO_IDX_BIO_.append(f"B-{cls_type}")
    TAG_TO_IDX_BIO_.append(f"I-{cls_type}")

TAG_TO_IDX = {s: i for i, s in enumerate(TAG_TO_IDX_)}
TAG_TO_IDX_BIO = {s: i for i, s in enumerate(TAG_TO_IDX_BIO_)}

from vibertgrid-pytorch.

ZeningLin avatar ZeningLin commented on June 20, 2024

And one more question.

I have to use entities for training SORIE's entities as following

{
    "company": "BOOK TA .K (TAMAN DAYA) SDN BHD",
    "date": "25/12/2018",
    "address": "NO.53 55,57 & 59, JALAN SAGU 18, TAMAN DAYA, 81100 JOHOR BAHRU, JOHOR.",
    "total": "9.00"
} 

**or just can I use only box and scripts file without entities **

1,83,41,331,41,331,78,83,78,TAN WOON YANN,other
1,109,171,330,171,330,191,109,191,MR D.I.Y. (M) SDN BHD,company
1,122,190,325,190,325,213,122,213,(CO. RFG : 860671-D),other
1,47,208,391,208,391,233,47,233,LOT 1851-A & 1851-B, JALAN KPB 6,,address
1,62,235,381,235,381,254,62,254,KAWASAN PERINDUSTRIAN BALAKONG,,address
1,70,256,384,256,384,275,70,275,43300 SERI KEMBANGAN, SELANGOR,address
1,125,275,318,275,318,297,125,297,(TESCO PUTRA NILAI),other
1,177,295,266,295,266,317,177,317,-INVOICE-,other
1,12,337,402,337,402,362,12,362,KILAT AUTO ECO WASH & SHINE ES1000 1L,other
1,20,360,160,360,160,383,20,383,WA45 /2A - 12,other
1,16,382,156,382,156,402,16,402,9555916500133,other

For the training phase, only the latter one is required. The codes directly parse the annotations and generate the corresponding BIO tags.

from vibertgrid-pytorch.

kerberosargos avatar kerberosargos commented on June 20, 2024

I will try. Thank you very much for your support and effort. Have nice days.

from vibertgrid-pytorch.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.