Comments (7)
Update: fix developed.
from pymupdf.
This post cannot be accepted with a reproducing file.
To circumvent an urgent situation, please use argument fallback=True
.
from pymupdf.
try to run doc.subset_fonts in the attached file will create an error in an
1 - Copy.pdf
earlier version.
Under with fallback, the doc.subset_fonts will raise the same error.
Under new version(without fallback), the error will not be raised, but the file doc.save after doc.subset_fonts will scramble the words.
from pymupdf.
I can reproduce the previous comment:
In [2]: fitz.version
Out[2]: ('1.23.3', '1.23.2', '20230831000001')
In [3]: d = fitz.open("1.-.Copy.pdf")
In [4]: d.subset_fonts()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[4], line 1
----> 1 d.subset_fonts()
File /usr/lib64/python3.12/site-packages/fitz/utils.py:5448, in subset_fonts(doc, verbose)
5445 # walk through the original font xrefs and replace each by the subset def
5446 for font_xref in xref_set:
5447 # we need the original '/W' and '/DW' width values
-> 5448 width_table, def_width = get_old_widths(font_xref)
5449 # ... and replace original font definition at xref with it
5450 doc.update_object(font_xref, font_str)
File /usr/lib64/python3.12/site-packages/fitz/utils.py:5175, in subset_fonts.<locals>.get_old_widths(xref)
5173 if df[0] != "array": # only handle xref specifications
5174 return None, None
-> 5175 df_xref = int(df[1][1:-1].replace("0 R", ""))
5176 widths = doc.xref_get_key(df_xref, "W")
5177 if widths[0] != "array": # no widths key found
ValueError: invalid literal for int() with base 10: '<</BaseFont/CIDFont+F1/CIDSystemInfo<</Ordering 13 /Registry 14 /Supplement 0>>/CIDToGIDMap/Identity/FontDescriptor<</Ascent 952/CapHeight 631/Descent -268/Flags 6/FontBBox 15 /FontFile2 16 /FontName
But with 1.24.3, I get no error and upon save I see scrambled words:
from pymupdf.
The MuPDF team has developed a fix which I am currently testing.
from pymupdf.
I have a possibly-related issue where 1.24.3 leaves some misc chars on the page, which go away if I stop using subset_fonts. Haven't narrowed it down to a MWE yet, but one difference is I DO NOT get an error with older pymupdf: so it might not be quite the same issue... More to follow.
Downstream issue: https://gitlab.com/plom/plom/-/issues/3374
from pymupdf.
Fixed in 1.24.6.
from pymupdf.
Related Issues (20)
- Consider namespacing fitz HOT 4
- Can't find a word in pdf using search_for HOT 4
- high_level.extract_pages() raises TypeError HOT 2
- high_level.extract_pages() raises KeyError: MediaBox
- Missig integer values for PDF_OC_* HOT 2
- table extraction not working properly - when there is a change in contrast between Title and rows HOT 2
- Missing literals HOT 3
- About set PDF Filter HOT 2
- Information dropped when using horizontal_strategy="text" in Page.find_tables() HOT 2
- Unable to extract subset font name using the newer versions of PyMuPDF : 1.24.6 and 1.24.7. HOT 11
- Regarding some issues with reading mathematical symbols. HOT 1
- After translating and replacing the translation to the original position, the original position has a white color block. HOT 4
- Encoding issue with get_textbox() only HOT 4
- `Page.get_text` results in `AssertionError` for epub files HOT 3
- I get same origin y values and bbox values for obviously different spans HOT 1
- Pymupdf is unable to identify charts in the pdf HOT 2
- Circular Import Issue HOT 2
- page.search_for(text) splits a line break into two completely different objects HOT 1
- documentation issue - old code in the annotations documentation
- show_pdf_page mistake on page rotate HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pymupdf.