Comments (6)
Please provide a reproducing example.
So far your post leads to nothing actionable.
from pymupdf.
Please provide a reproducing example. So far your post leads to nothing actionable.
Hard to do since it's a resume of an existing person and personal data... You have a way to workaround this to provide the example ?
from pymupdf.
from pymupdf.
The example PDF shared with me violates the specifications for links / annotations:
Instead of giving indirect references as it should be, it provides all the links dirctly in the /Annots
array.
IAW it should look like /Annots [4711 0 R 4712 0 R ...]
. Instead we find:
/Annots [ <<
/Type /Annot
/Subtype /Link
/Rect [ 248.31678 596.1143 279.57893 605.7143 ]
/Border [ 0 0 0 ]
/A <<
/Type /Action
/S /URI
/URI (https://alexialabbe.fr/#projects)
>>
>> <<
/Type /Annot
/Subtype /Link
/Rect [ 238.40349 71.17554 260.10389 80.775539 ]
/Border [ 0 0 0 ]
/A <<
/Type /Action
/S /URI
/URI (https://blog.codein.fr/guide-rgpd-les-pratiques-essentielles-pour-assurer-la-conformite-de-votre-site-web)
>>
>>
... ]
So pymupdf does recognize the links, but cannot assign an xref to them (xref=0 consequently).
You cannot update / delete links in PyMuPDF using the normal API (delete_link etc.) in such a situation - no way.
But you can edit the page's object definition source using low-level API and kill everything: for this you could delete the whole /Annots
array.
This will remove everything (!!!): links, annotations and fields that may be on the page.
doc.xref_set_key(5, "Annots", "null")
print(doc.xref_object(5)) # 5 = page xref
<<
/Type /Page
/Parent 1 0 R
/MediaBox [ 0 0 540 780 ]
/Contents 134 0 R
/Resources <<
/ExtGState <<
/Alpha0 10 0 R
/Alpha1 11 0 R
>>
/Font <<
/Font4 14 0 R
/Font11 21 0 R
/Font12 22 0 R
/Font5 15 0 R
>>
>>
/Annots null
/Group <<
/S /Transparency
/CS /DeviceRGB
>>
>>
All links are gone!
from pymupdf.
BTW the example page looks exactly the same, but all hot areas are gone.
Also, the file size (when saving via ez_save()
) goes down to 44KB (was 1 MB before).
from pymupdf.
Thanks Jorj !!
from pymupdf.
Related Issues (20)
- Redaction Annotation Fill Not Matching Up With Redacted Section HOT 4
- Updating Annotations HOT 1
- False result when finding bounding boxes for lines in blocks. HOT 6
- For what reason IRect exists? HOT 1
- MuPDF error: argument error: not a dict (string) HOT 3
- Get image inside table's cell
- `'width'` in `Page.get_drawings()` returns width equal as 0 HOT 2
- trouble in page.find_tables HOT 1
- Garbled extraction for Amazon Sustainability Report HOT 6
- This pdf would cause stack overflow exception, HOT 3
- ImportError: DLL load failed while importing _extra: The specified module could not be found. HOT 1
- Story.fit_width() has a weird line HOT 2
- The position box obtained through the get_text() method is inaccurate HOT 5
- ObjStm compression and PDF linearization doesn't work together HOT 3
- SMask of Image is not detected HOT 8
- insert_htmlbox does not print out characters if there is a mix of non English characters and English characters HOT 5
- find_tables OOM HOT 1
- page.get_pixmap() fails due to `fitz.mupdf.FzErrorLimit: code=5: too many nested graphics states` HOT 5
- No OCR support: TESSDATA_PREFIX not set HOT 1
- apply_redactions moves graphics HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pymupdf.