Giter VIP home page Giter VIP logo

einvoice's Introduction

电子发票识别

简介

电子发票识别,可识别大部分地区的电子普票 电子专票 文件类型支持 pdf ofd

笔记本电脑测试单线程6秒100张pdf电子发票

在线识别页面 http://www.heycore.com/invoice.html

效果预览

pdf识别部分参考改进了 https://github.com/fantasyxxj/einvoice 改进前16秒单线程解析100张票

pdf识别非ocr识别,识别原理:按常规票面布局位置定位与文字匹配双重规则对票面文字解析

ofd识别原理:读取ofd文件中的描述文件

待完成

  1. 按地区票种分层、分模板识别,提升识别准确度
  2. 全电发票识别

einvoice's People

Contributors

gitoschina avatar myfengstyle avatar sanluan avatar zhao2018mr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

einvoice's Issues

商品名称中带有%的发票解析失败

出问题的发票文件:011002000911-12095199.pdf
此发票中货物名称:日用杂品润本(RUNBEN)驱蚊 液婴儿 110ml 防蚊 驱蚊水 驱防蚊喷 雾 防蚊液 蚊虫叮咬花露水7%驱
在 com.sanluan.einvoice.service.InvoiceExtractor 文件 318-321 行中,有用到税率中的“%”来判定是否为下一商品名称,但是这个商品名称自带“%”。然后就会炸掉。

合计金额识别错误BUG

com.sanluan.einvoice.service.InvoiceExtractor
第79行应当修改:
修改前:Matcher matcher = pattern.matcher(fullText);
修改后:Matcher matcher = pattern.matcher(allText);

因为使用未处理文本导致合计金额识别不准确

部分情况不能识别

首先,抱歉我不能提供原票:
一、部分(并非全部)过路费发票,无法识别日期;
二、带有销货单的发票,无法识别票面上面的货物
三、部分发票,未知原因,不能识别货物名称,看起来也没有什么特殊的

建议支持“深圳电子普通发票”

感谢作者的辛苦付出,此库对增票完美支持,实际使用中发现在“深圳电子普通发票”下不能解析购买方,看代码原因可能是深圳电子普通发票的购买方存在“电子支付标识”,而不是增票中的“开户行及账号”,导致定位错误。

相关差异参考:
http://bpm.exceam.com/AttachUpload/File/2021/11/19/%E8%9E%8D%E6%82%A6%E5%A4%A7%E5%8E%A6%E9%A1%B9%E7%9B%AE2021%E5%B9%B410-11%E6%9C%88%E5%8A%A0%E7%8F%AD%E9%A4%90%E8%B4%B9%E5%8F%91%E7%A5%A8.pdf

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.