Comments (1)
Eventually, also recording of glyph positions but not sure if&when.
Both FineReader-XML and hOCR already offer character level encoding. It should be possible to transform either of these two formats to ALTO without loss of information.
Skewed Text and polygons are possible with hOCR, so should be a transformation.
3.0 did not seem to bring many notable changes, so it's probably straightforward to implement hocr-alto3.1
from ocr-fileformat.
Related Issues (20)
- Release version 0.3.0 and 1.0.0 HOT 11
- GCV to HOCR or PAGE conversion not working HOT 9
- Support conversion from and to Textract JSON HOT 4
- "ocr-transform page alto ... ...": loosing text HOT 13
- New Saxon version 10.2 is out HOT 8
- Google Cloud Vision to PAGE-XML HOT 8
- alto to text: too many spaces HOT 7
- Proxy support HOT 7
- Support conversion to MiniOCR HOT 1
- Web interface in Docker container/ Error when uploading document: "Must be either POST with the field 'file'...." HOT 2
- page__text.xsl is not honoring the reading order HOT 7
- Transformation for ImageWare MyBib HOT 2
- page__alto transformation mixes XML with logging in the output HOT 2
- page page2019: does not work
- Conversion from ABBYY to ALTO HOT 2
- [feature request] Support MacOS HOT 13
- regression: page-to-alto is missing HOT 6
- Feature request: Page concatenation during conversion
- Add example files
- Table extraction
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ocr-fileformat.