Comments (3)
其实这个问题我也和你一样,确实标注的成本太高了;于是我想了很多办法;
1,wps这个想法还是很不错的。相当于用别人的优点帮你生成一些;其实国内还有一家公司做表格还原很厉害,还有线条坐标,不过由于某些原因,人家关闭了,还加密了代码;
2,说说我目前的做法吧,开始是标注,发现效率太慢,修改和优化了ppocrlabel,主要是单元格循序调整,还有文字标注,非常费时。一个表格少说有400个多个格子,手都标废了;
3,生成差不多的样例表格,拿这个去混合样本训练,不过这样的话,精度肯定不会和原表一样的;
这样会有一些提升,但肯定没有原表标注好,优点也显而易见,就是快,看取舍,如果样本够,时间够,那标注是最优解;
个人经历,参考下;
from paddleocr.
目前表格识别模型对于复杂一点表格识别效果较差,我们也在努力优化中,之后会放出下个版本。
from paddleocr.
目前表格识别模型对于复杂一点表格识别效果较差,我们也在努力优化中,之后会放出下个版本。
希望赶紧有一个盼头。。。
from paddleocr.
Related Issues (20)
- V4识别蒸馏模型预训练权重以及训练出现KeyError: 'valid_ratio' HOT 5
- 使用ch_PP-OCRv4_rec训练数据集报错:Out of memory error on GPU 0. Cannot allocate 129.394531MB memory on GPU 0, 23.611938GB memory has been allocated and available memory is only 31.687500MB. HOT 4
- 使用量化后的模型进行推理报错Delete Weight Dequant Linear Op Pass is not supported for per-channel quantization HOT 2
- 繁体模型有更新计划么 HOT 1
- 请问PP-OCRv3和PP-OCRv4检测模型是用了哪些数据集训练的? HOT 1
- rec模型,同部首易错字,是放到一行进行训练好,还是2个一组分开多行进行训练好?例如:奇绮畸崎椅倚 HOT 1
- Related to semantic entity relation HOT 5
- text detection and recognizer for PaddleOCR KIE for RE HOT 2
- NameError: name 'predict_system' is not defined HOT 3
- Can not import paddle core libstdc++.so.6: version `GLIBCXX_3.4.30' not found HOT 2
- AttributeError: module 'paddleocr' has no attribute 'PaddleDetector' HOT 1
- 华为Atlas 300I(3010,昇腾310)部署paddleOCR,使用npu推断报错 HOT 1
- finetune SER model 任務中的 max_seq_len HOT 7
- 训练时,训练几天后最佳准确率显示0.43时,有的时候准确率就会变到0.25,这种只能加载最佳模型,重新启动训练吧 HOT 33
- 长宽比超过20的图像字符检测问题 HOT 2
- predict_system源码bug,在docker build中如何修改 HOT 1
- 关于如何微调PP-OCRV4的检测模型 HOT 3
- 关于fastdeploy文档不齐全问题的反馈 HOT 3
- 很久没有增加新的模型了,能否把MixNet加进来?https://github.com/D641593/MixNet HOT 1
- Batch processing using paddleocr HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paddleocr.