A curated list of resources for Document Understanding (DU) topic

awesome-list machine-learning information-extraction key-information-extraction document-understanding robotic-process-automation document-analysis document-layout-analysis ocr natural-language-processing

awesome-document-understanding's People

Contributors

Stargazers

Watchers

Forkers

michalturski kai2002 agosiewska flamato dagongji10 kshaz yanqi1811 benqian sevdimali karndeepsingh uakarsh cellslogic shenyi666666 tzz89 thorpham caohieu-281 tranmduc leondragon datlt4 fengbinzhu tuanthng abirrahalir akashmavle5 watsonzhouanda gztangde zhanhang123 tminhquang00 palash93 revanks beyondyourself yangxiaomin08 zhouchen428 yongshuaihuang cxhgh drrv gkiril hasansalimkanmaz haiduongcable fireae watanka yynnxu karliky jaspreetsinghmaan milanvarghese etrigger doducthao cv-small-snails yagmur-q atraining anirudh110198 kruthikakr gyanendrol9 benhoff sontran1001 senwang98 keyiwang1 isydmr zyzyzhou ductho9799 hoanglehtn yousefis yangchenghuang corranmac jxzhangjhu abhigaikwad2000 matt-dinh trunggnsap done520 kirilcvetkov92 zyoungxu iamkamleshrangi neosis chloe-hahn jsv4 madhumathisingh anoop-qasolve bprus kaidduong jame76 papiamahato essencetech8028 arguswatch argus-app waterloo-data paulpaul91 piegu neosiswork sulc inamori1932 rubensmau khanfarhan10 pvillamil vasco989k jjbiggins artificio2023 wudaclark thihaoocs shanthshivam andreagemelli yeus

awesome-document-understanding's Issues

Recommended Information Extraction Method for PDF-Resumes?

Hello,

I know this isnt a issue, but i couldnt find a better place to ask this question.

I guess extracting data from resumes belongs to the key-information-extraction area.
So for the start I thought about using just a normal BERT and in my training data I only mark the entities that want to extract. But does it also makes sense to create a label for the label "english" (see example below) to get better results or to use relation extraction at this semi-form like data? Or does the behaviour of BERT recognizes automatically that after the string "english: " is going be a grade?

For a simple example at extracting grades from a resume:
input:
"english: 2"
-> do i need only need to annotate "2" or is it recommended to do something else

output:
grade_english: "2"

wanted labels:

firstname
lastname
last_job_title
graduation
grade_english
grade_math
grade_economy

Grouping corresponding entities

I am sorry, I know this is not an issue, but I don't know where to ask it.

I am parsing PDF documents and now I have a task to group entities together: I have a chemical and its characteristics, I am parsing them using NER (huggingface transformers) and quality is OK, but I don't know how to group each chemical with corresponding characteristics (I don't even how the task is called). I can write some rules, that characteristics, which appear after the chemical name, correspond to this chemical, but sometimes the order is different and some characteristics appear before the name of the chemical.

So I want to use some model to link chemicals and their corresponding characteristics somehow together.

Please can you help me and give me some advice for this problem

tstanislawek / awesome-document-understanding Goto Github PK

awesome-document-understanding's People

Contributors

Stargazers

Watchers

Forkers

awesome-document-understanding's Issues

Recommended Information Extraction Method for PDF-Resumes?

Grouping corresponding entities

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent