Comments (8)
I understand where you are aiming at. Some clarification is necessary though:
I think the following is clear - but please confirm:
- a specified
clip
means a rectangle which will be transformed together with the rest of the page - the
clip
should be intersected withpage.bound()
before being processed
To decide:
- Either: the pixmap has the size of (transformed)
clip
and only containsclip
content - Or: the pixmap has the original (transformed) page size, but is empty (white) except for the
clip
area
from pymupdf.
The clip rect passed in should be pre-transformed as you described.
as for if the clip
should be intersected with page.bound(), I guess that depends on the choice of the 2nd part of the question. Should the pixmap has the size of the clip area or the size of the page.
I assume the clip
passed in is always a subset of the page.bound
.
Actually, I wonder how does fz_new_draw_device_with_bbox
of mupdf treat pixmap with the clip parameter. Does it require that the pixmap of the page bound size?
If the pixmap has to be the page size, then the saving is mostly on mupdf render/run part. The saving on pixmap size will not be possible.
I'd prefer the 1st option if possible, i.e. the pixmap reture is of the size of the clipping rectangle. That will save some memory at the end.
And since the pixmap returned is not created by the user, the user will always have to crop to the clipped rect before make use of the pixmap.
from pymupdf.
Thank your for for rapid answer.
I already have a working version.
What I am doing right now is what you also prefer:
- intersect
clip
withpage.bound()
- just to correct any nonsense input - take the
fitz.Rect
-version ofclip
as basisr
for what follows - transform
r
andclip
with the matrix - create pixmap with transformed
clip
asIRect
- render the page
Below is the result of clip = fitz.IRect(0, 0, width, height/2)
and transforming it with fitz.Matrix(2, 2).preRotate(90)
. Please send me a note if you are fine with this.
Because of transformation, the IRect
of the resulting pixmap in this case (correctly) is fitz.IRect(-842, 0, 0, 1191)
. One may or may not like it this way.
My question is, whether we should translate the IRect
coordinates such that no negative values occur? Has no effect on produced image.
from pymupdf.
A normal IRect for pixmap origined at (0,0) is least surprise for the users.
And can you confirm that the 2nd half of the page can be clipped properly? clip = fitz.IRect(0, height/2, width, height/2)
.
from pymupdf.
Yes, that works, too ... of course with clip = fitz.IRect(0, height/2, width, height)
- otherwise the rectangle area would be zero.
Here is the complete code I used - fairly raw of course, needs to be packaged into getPixmap, etc.
import fitz
doc = fitz.open("pymupdf.pdf")
m = fitz.Matrix(2,2).preRotate(90)
p = doc.loadPage(22)
r = p.bound()
ir = r.round()
clip = fitz.IRect(0, 842/2, 596, 842)
r.intersect(clip.getRect())
r.transform(m)
clip = r.round()
dl = fitz.DisplayList()
p.run(fitz.Device(dl), fitz.Identity)
pix = fitz.Pixmap(fitz.csRGB, clip)
pix.clearWith(255)
dv = fitz.Device(pix, clip)
dl.run(dv, m, r)
pix.writePNG("lower.png")
pix.x = 0 # just to prevent programmer's suprise
pix.y = 0 # ...
from pymupdf.
Great work!
from pymupdf.
just uploaded the updates responding to this issue. Documentation has also been updated accordingly. PyPI hosted docu will follow in a minute.
from pymupdf.
@mozbugbox - thank you very much for input, attentive reading the documentation and the many ideas for improvements!
from pymupdf.
Related Issues (20)
- Cannot add Widgets containing inter-field-calculation JavaScript
- find_tables doesn't recognize any table in scanned document HOT 1
- page.find_tables() is taking high CPU. HOT 1
- Move CLA signatures to dedicated branch.
- "fitz.mupdf.FzErrorArgument: code=4: source object number out of range" after "add_redact_annot" HOT 3
- MuPDF error: syntax error: unknown keyword: '4.48823e' HOT 3
- get_toc(simple=False) return 'to' point coordinate is not based on top-left origin HOT 6
- missing attribute set_dpi() HOT 1
- stamp annotation from pixmap/file HOT 1
- Re-introduced bug, text align add_redact_annot HOT 1
- doc.xref_stream(xref).decode().splitlines() does NOT split the line HOT 3
- OCR segmentation fault HOT 7
- Replacing text with redaction and insert_textbox and fixing reading order
- PyMuPDF failed to extract bw images HOT 11
- Extra characters returned by `page.get_text` with clip HOT 1
- page.get_text() cause process freeze with certain pdf on v1.24.2 HOT 2
- Unable to set ComboBox value HOT 1
- Page.apply_redactions() removes more text than expected in the pdf document. HOT 13
- insert_text() not display true font correctly HOT 2
- Facing Issues after applying redactions they delete some Image or Icons HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pymupdf.