Comments (7)
Update: fix developed.
from pymupdf.
This post cannot be accepted with a reproducing file.
To circumvent an urgent situation, please use argument fallback=True
.
from pymupdf.
try to run doc.subset_fonts in the attached file will create an error in an
1 - Copy.pdf
earlier version.
Under with fallback, the doc.subset_fonts will raise the same error.
Under new version(without fallback), the error will not be raised, but the file doc.save after doc.subset_fonts will scramble the words.
from pymupdf.
I can reproduce the previous comment:
In [2]: fitz.version
Out[2]: ('1.23.3', '1.23.2', '20230831000001')
In [3]: d = fitz.open("1.-.Copy.pdf")
In [4]: d.subset_fonts()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[4], line 1
----> 1 d.subset_fonts()
File /usr/lib64/python3.12/site-packages/fitz/utils.py:5448, in subset_fonts(doc, verbose)
5445 # walk through the original font xrefs and replace each by the subset def
5446 for font_xref in xref_set:
5447 # we need the original '/W' and '/DW' width values
-> 5448 width_table, def_width = get_old_widths(font_xref)
5449 # ... and replace original font definition at xref with it
5450 doc.update_object(font_xref, font_str)
File /usr/lib64/python3.12/site-packages/fitz/utils.py:5175, in subset_fonts.<locals>.get_old_widths(xref)
5173 if df[0] != "array": # only handle xref specifications
5174 return None, None
-> 5175 df_xref = int(df[1][1:-1].replace("0 R", ""))
5176 widths = doc.xref_get_key(df_xref, "W")
5177 if widths[0] != "array": # no widths key found
ValueError: invalid literal for int() with base 10: '<</BaseFont/CIDFont+F1/CIDSystemInfo<</Ordering 13 /Registry 14 /Supplement 0>>/CIDToGIDMap/Identity/FontDescriptor<</Ascent 952/CapHeight 631/Descent -268/Flags 6/FontBBox 15 /FontFile2 16 /FontName
But with 1.24.3, I get no error and upon save I see scrambled words:
from pymupdf.
The MuPDF team has developed a fix which I am currently testing.
from pymupdf.
I have a possibly-related issue where 1.24.3 leaves some misc chars on the page, which go away if I stop using subset_fonts. Haven't narrowed it down to a MWE yet, but one difference is I DO NOT get an error with older pymupdf: so it might not be quite the same issue... More to follow.
Downstream issue: https://gitlab.com/plom/plom/-/issues/3374
from pymupdf.
Fixed in 1.24.6.
from pymupdf.
Related Issues (20)
- page.find_tables how to extract words in table cell HOT 1
- unexpectedly big rect.height and rect.get_area after version update HOT 7
- Got "malloc(): unaligned tcache chunk detected Aborted (core dumped)" while using add_redact_annot/apply_redactions HOT 4
- get_page_images docs and behaviour do not align regarding values returned HOT 1
- PDFs saved with encryption cannot be saved in Acrobat Pro with "Bad parameter" error
- Cannot get Tessdata with Tesseract-OCR 5 HOT 4
- add_redact_annot and apply_redactions causes images to be removed.
- pymupdf4llm is failing to recognize text when using multithreading HOT 1
- `Link.set_border` gives `TypeError: '<' not supported between instances of 'NoneType' and 'int'` HOT 2
- `fitz.__version__`` does not work anymore HOT 3
- Inserted links do not appear in the list of links HOT 2
- Incorrect text positions for some font HOT 1
- Lost character from ligature HOT 2
- Ability to change widget.field_name from a unique field name to a field name that already exists in the pdf file and vice versa HOT 3
- Spans detected in page.get_text("dict") fails in a weird pdf format
- Charcaters are broken for Korean language pdf with /UniKS-UTF16-H encoding HOT 5
- I am not sure if this is a bug. HOT 3
- ValueError: not enough values to unpack (expected 3, got 2) is thrown when call insert_pdf HOT 4
- Unable recognize merged cells in table HOT 1
- Is it possible to turn off ligatures preservation when I call get_svg_image? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pymupdf.