Comments (7)
This is indeed related to #3470 and will thus be solved with the next PyMuPDF version containing this MuPDF fix.
from pymupdf.
I can also reproduce this with 1.24.2 but NOT with 1.24.1.
from pymupdf.
Downstream link: https://gitlab.com/plom/plom/-/issues/3374
from pymupdf.
In this image, the spurious letters can take fonts that come from a different page of the document.
from pymupdf.
Just in case you are not aware:
Creating font subsets has been implemented in MuPDF. While it still is officially an experimental feature, we as PyMuPDF are very interested in replacing the current solution - which is pure Python-based and creates an external dependency on another package (fontTools).
So we have a vital interest to deprecate this solution short to medium term.
There are also some secondary advantages:
- The MuPDF solution is at least 15 times faster and it covers a larger set of font types compared to fontTools (which is restricted to TTF and OTF formats).
- Being a MuPDF solution, not only MuPDF itself, but all its language bindings will immediately benefit from it. These are currently Java, JavaScript and the new C# bindings, MuPDF.net, which is on the verge to be published this or early next week.
Given this background, we will not continue fixing any issues around the fontTools-based solution.
from pymupdf.
Thanks sounds very promising! Is there a timeline or issue I can follow?
I the meantime, perhaps I'll try to scale back our use of subset_fonts()
to only those cases where we used PyMuPDF to add non-ASCII text.
from pymupdf.
We are testing this feature for a considerable time now. That new fix should actually be it.
To elevate it from the "experimental" label, I mean.
We nonetheless will provide the fallback
option for some more time of course.
For my own purposes, I am using the new version all the time.
Especially if you use Page.insert_htmlbox
or other Story-based code, MuPDF is likely to pull in needed fonts all over the place. Here, subset fonts can be a life saver. Have a look at this (certainly extreme) example. Not using font subsets lets you save a 2 MB file, subsets reduce it to 80 KB.
MuPDF just recently has introduced rich text support for FreeText annotations (not yet supported in PyMuPDF). And the technique used is ... again the Story class!
from pymupdf.
Related Issues (20)
- Widget font not being updated HOT 3
- Check the hash of the downloaded MuPDF tarball
- pix = page.get_pixmap(matrix=matrix, clip=rect) recommend to modify function get_pixmap HOT 1
- subset_fonts error exit without exception/warning HOT 6
- insert_pdf gives TypeError HOT 4
- insert_pdf gives SystemError HOT 6
- Embedded full-text search index HOT 4
- Page.delete_widget() doesn't fully remove the widget, other programs still detect the widgets HOT 14
- regression: fill_textbox: IndexError: pop from empty list HOT 5
- Unable to create a checked radiobutton HOT 1
- draw_rect scaled to very small size HOT 5
- set_toc method error HOT 8
- Marked content sequences in text trace dictionary HOT 3
- PyMuPDF 1.24.4 causes "segmentation fault" under Python 3.12 and old MAC OS HOT 12
- pixmap.invert_irect(pixmap.irect) take 7 seconds HOT 3
- cygwin x64 pip3 install pymupdf error HOT 2
- When extracting a numbered list, the result is not as expected. HOT 3
- Small size after apply fitz.TOOLS.set_small_glyph_heights(True) HOT 2
- page.get_label() gets wrong label on the first page of doc
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pymupdf.