Comments (36)
We now handle parallel flows for floats, absolutes, relatives, and table-cells.
This bug is now closed. It required 9 years of hard work 🚀.
We’ll release a beta soon, tests and feedback are welcome!
from weasyprint.
Hello,
A beta has been released.
Don’t hesistate to try it 😉
from weasyprint.
pls fix that problem.
Please, be kind. Everybody would like this issue to be closed, but there's no simple solution.
from weasyprint.
Let's go! I'll skip some details and lie a little bit to avoid useless complexity.
Web pages have mainly been created to be displayed on rectangles whose width is fixed and whose height is automatically calculated according to the content. That's what a "normal" browser do. But the problem is a bit different when you want to print these web pages: the height is fixed too and you'll need to cut the content between different pages.
CSS defines how the layout must be done, how blocks and texts are displayed. "Normal" blocks are put one below the other and "normal" texts are broken between multiple lines put one below the other. The way the "normal" content is displayed is called the normal flow.
CSS gives the possibility to remove blocks from the normal flow of the page and make them behave in a different way. These blocks sometimes create their own flow, creating nested or parallel flows in the page. That's where it's becoming a bit hard.
When CSS 2 has been written, floats and absolute/relative blocks (and somehow tables) were (almost) the only blocks creating parallel flows, and no-one really defined how these parallel flows had to be broken between pages. That's why WeasyPrint's layout has only one flow that can be correcly broken, and the blocks that are outside this flow are seen as atomic blocks going below the bottom of the page if needed.
But now, many CSS specifications have added many ways to create strange flows, such as columns, regions, flexbox and grid. It was time to define how parallel and nested flows had to be broken between pages. It's now done in the fragmentation module. It's not clearly defined but it's much better than what we had in CSS 2.
Bad news: it was not written when we started WeasyPrint.
Really bad news: it's really different from what we have in WeasyPrint.
It's probably not that difficult to implement the parts of the fragmentation module that are needed to fix this issue (well, for really simple cases). But it will need to slightly change many functions and modules in a single atomic commit that will be huge. We can imagine that the work needed is something like #291: long, tiring and painful. But not impossible.
from weasyprint.
I am using version
57.1
, but still can't break a<tr>
of long text over to the next page. Is there any parameter I need to pass for this to work?
It should work out of the box.
If your table row is not split, then there may be another CSS rule avoiding breaks somewhere (td { break-inside: avoid }
for example). Or the content of the table cell may be using a layout that WeasyPrint is not able to split yet, like a flex box.
from weasyprint.
We solved the table split problem by placing <div style="clear:both;"><div>
before table.
from weasyprint.
WeasyPrint's layout model and algorithm
You'll find all the code you need in the layout
folder. The layout.pages
module has got a make_all_pages
function, calling the make_page
function, calling the block_level_layout
function, etc.
Where to make fixes for this issue
What would have to change architecturally to address the fragmentation module spec
Nested flows (as defined by the fragmentation CSS module) are pretty well supported for block-level and inline-level boxes, using a variable called resume_at
that keeps a kind of pointer to where the rendering is (the "current" position). resume_at
contains nested tuples representing the nested boxes, you'll find how it works for example in the block_container_layout
function (in layout.blocks
).
We need to add the support of parallel flows. Instead of one pointer pointing to one position in the flow, we need multiple pointers pointing to the "current" positions in the parallel flows. I imagine that resume_at
can be changed into resume_at_list
, containing one or more resume_at
pointers.
To fix this issue, we basically need:
- to change
resume_at
intoresume_at_list
almost everywhere, as rendering a box in the flow can return parallel positions where the parallel flows have reached the end of the page (one flow for itself and for each child creating parallel flows such as floats, table cells, etc., the list is in the fragmentation module), - to make floats, table cells, etc. take care of the bottom of the page and return their
resume_at_list
, instead of assuming that they have no limit for their vertical position.
That's all 😄! I think that everything's not correcly defined in the spec, we'll have to make some stupid choices for stupid cases (how do you render floats whose top border is taller than the page?), but the "normal" use cases should be quite well described and easy (and long, and painful) to implement.
If you need anything, I'll be really happy to help!
from weasyprint.
Yes, this is a known limitation: no page breaks are supported inside floats, absolute positioning, or table cells. Unfortunately right now I don’t have a better answer than “avoid using floats that way”.
I’d be happy to help anyone who wants to fix this, but this is a non-trivial change in the layout code. Otherwise this is something to be fixed eventually, but I don’t know when I’ll get to it.
from weasyprint.
Thanks. From your description I'm not sure whether this is the known limitation or not.
In this case I'm not trying to have page breaks inside a floated element, but between floated elements. Each li
is floated separately. My apologies for not making that clearer in the initial report.
from weasyprint.
As reported in #375, we have the same problem with consecutive absolute/relative blocks.
from weasyprint.
I have just started using WeasyPrint, and I'm already a big fan. However, I have also quickly run into the float/break issue -- my users want Bootstrap and floated columns, and don't like what happens in the PDF document!
Can @SimonSapin or anyone else comment on the refactoring that would be necessary to fix this wartish problem? I haven't perused your codebase yet, but I know Python very well; so, I'm looking for high-level overview of the current layout model/algorithm and why it gets tripped up trying to put breaks in floats, and what would have to be changed.
Thanks,
-Hugh
from weasyprint.
301 @liZe
from weasyprint.
OK. Where should I be looking in the code to learn about the following (beyond the peephole insight of #291):
- WeasyPrint's layout model and algorithm
- Where to make fixes for this issue
- What would have to change architecturally to address the fragmentation module spec
Thanks!
from weasyprint.
Thanks. That's just the kind of overview I was looking for.
One last thing: Testing driven development: (okay, two last things)
- What's the quickest way to run tests during development work?
- Do you have instances of HTML/CSS tests for parallel flows, that, when passing, will indicate that the work is finished? I of course have the instance that got me here, but do you know of a reference test set?
from weasyprint.
FYI, my habit is to do minor refactoring while I'm working to understand existing logic. So you can expect some PRs along those lines.
Also, I'm completely new to CSS implementation work ! ;-) However, my career has largely been developing scientific software, so I'm at home with specs and deep algorithms.
The code appears to have a good number of pointers to key CSS specs. However, if there are some spec documents that are so basic that you wouldn't mention them in code comments, they might actually be useful for me! So, I would appreciate pointers to key algorithmic starting points for CSS.
Thanks.
from weasyprint.
What's the quickest way to run tests during development work?
./setup.py test
(launch tests and check coding style).
Do you have instances of HTML/CSS tests for parallel flows.
<style>
@page {
font-family: monospace;
height: 2.5em;
line-height: 1em;
margin: 0;
width: 10em;
}
body {
margin: 0;
}
div {
background: red;
float: left;
width: 50%;
}
</style>
<body>
<div>
float float float float float
</div>
flow flow flow flow flow
</body>
You need to get something like:
Page 1
+-------------------------+
| float float | flow flow |
| float float | flow flow |
+-------------------------+
Page 2
+-------------------------+
| float | flow |
|-------------+ |
+-------------------------+
However, my career has largely been developing scientific software, so I'm at home with specs and deep algorithms.
You'll need these skills!
So, I would appreciate pointers to key algorithmic starting points for CSS.
There's a very useful chapter in the documentation. In the CSS spec, the best starting point is probably the presentation of the normal flow and the implementation of 9.4.1 and 9.4.2 in layout.blocks
and layout.inlines
.
Good luck!
from weasyprint.
OK, you threw me in the deep end of CSS spec, and I'm floundering, but progressing, through prose like this:
Except for table boxes, which are described in a later chapter, and replaced elements, a block-level box is also a block container box. A block container box either contains only block-level boxes or establishes an inline formatting context and thus contains only inline-level boxes. Not all block container boxes are block-level boxes: non-replaced inline blocks and non-replaced table cells are block containers but not block-level boxes. Block-level boxes that are also block containers are called block boxes.
I don't yet have a solid mental-model of what it takes to do all the layout given multiple flows and page breaks, but I'm working on it, and the WeasyPrint code-base is very approachable and the focus on resume_at
helps. To keep getting my hands dirty I intend to add a terse detail string to each assert, so at least I'll know e.g. box sizes when I'm breaking things...
from weasyprint.
This problem is really annoying, I found a way to fix this.
The key point is split one <tr>
into more <tr>
, eg:
<tr>
<td>col1</td>
<td>long lines1
long lines2
long lines3
long lines4
</td>
</tr>
will be changed to
<tr>
<td class="top_border"></td>
<td class="top_border">long lines1</td>
</tr>
<tr>
<td class="no_border">col1</td>
<td class="no_border">long lines2</td>
</tr>
<tr>
<td class="no_border"></td>
<td class="no_border">long lines3</td>
</tr>
<tr>
<td class="no_border"></td>
<td class="no_border">long lines4</td>
</tr>
css
table tr .no_border {
border-left: 1px solid #000000;
border-right: 1px solid #000000;
border-top: 0;
border-bottom: 0;
}
table tr .top_border {
border-left: 1px solid #000000;
border-right: 1px solid #000000;
border-top: 1px solid #000000;
border-bottom: 0;
}
This just some sample code, just try to explain the main ideas, you need to change it to fit your situations. Wish this could help someone out.
from weasyprint.
This just some sample code, just try to explain the main ideas, you need to change it to fit your situations. Wish this could help someone out.
wd, thank you for your idea, but (at least in our case) is not feasible.
Any news about this bug??? It's critical if we pretend put in production ... :(
from weasyprint.
Unfortunately, after getting a beauty result Weasyprint, we have a deadline for our project upgrade where we need to create PDF files .... and we had to choose a solution where we didn't lose any text on getting the PDF files. This bug creates a big problem in our case, so finally we needed to adopt mPDF libraries instead. It have other collateral issues, but in final output is the same text that were in original HTML page.
Anyway, I give you all thanks for your help and comprehension. I will try to visit this thread from time to time, to see if it's closed ..... and then we will at last use Weasyprint as our solution.
You are doing a great job!!!!! ;)
from weasyprint.
Got same problem,, but
<ol>
<li>a</li>
...
<li>z</li>
</ol>
If there some <li>
will show in page 2, it will not shown in pdf,, but after that code will show in page 2..
Any help will appreciated..
from weasyprint.
If there some
<li>
will show in page 2, it will not shown in pdf
Page breaks are allowed in lists. You probably get this problem because your list has a position
, display
or float
property different from the default values.
from weasyprint.
Yeah it's solved a moment ago,, sory late reply..
i got class="input-group"
in my bootstrap layout.. i remove it,,
And my ol
and table
runs well..
Thanks for reply.. 👍
from weasyprint.
So we ran into this issue as well.
@liZe I see your description above about how to solve this issue and you suggest turning resume_at
into an array. This seems complicated and a bit error prone. I'm wondering if there isn't a simpler way to do this, using Python generators/coroutines. My idea is is that each element is only responsible for rendering itself, yielding (unsplittable) layout blocks and having containers combine them. This way you don't have to track any kind of resume_at
value. The algorithm for rendering a table over multiple pages would look something like (pseudocode!)
# remaining = remaining height in page
foreach row in table:
# Init cells
foreach cell in row:
iters[cell] = cell.start_render(width=cell.width)
# While any cell still has content
while any(iters[c] not None for c in row):
this_row = LayoutContainer()
for cell in row:
if not iters[cell]: continue
# Collect blocks from this cell until space is full
remain_cell = remaining
while remain_cell > 0:
# Calls generator to return a layout block
block = it.send(remain_cell)
if not block: # This cell done
iters[cell] = None
break
this_row.add(block) # Result block
remain_cell -= block.height
# Note: if we filled a page, then remaining will become the height of the new page
remaining = (yield this_row)
As you can see, this can split a table cell over multiple pages, without any of the contents of the cells actually being aware that they are being split over multiple pages. There are of course details to work out, if remaining
is too small for your widget, who is responsible for adding the spacer? And floats need to be rendered first, and then other things needing to be rendered around them (possibly by passing a "current page" object around that widgets can inspect to see what they need to wrap around). This rendering method would allow you to render the float until the end of the this page so you know what it fills, then rendering the rest of the page. Then you can continue rendering the rest of float on the next page.
I've not really looked at the code so I'm not sure if there is some reason why this couldn't work, but this does seem much simpler that tracking resume locations yourself, by letting the Python coroutine stacks hold the state for you implicitly.
from weasyprint.
Is anyone currently working on a PR for this? I see that multiple people have begun looking into it. As I'm looking at it now, I don't want to duplicate someone else's effort.
from weasyprint.
For expedience sake I have moved on. I'm using Puppeteer in a headless Chrome browser to turn HTML into PDF. And, I'm using Mozilla's pdf.js
library to analyze PDFs. The heavyweight browser folks have solved all the common problems, and they keep up with the evolving specs... Yes, this means I'm using Typescript/Javascript a lot these days, and, I'm enjoying functional programming far more than I ever expected.
from weasyprint.
I moved to other tool too precisely by this bug. Any HTML to PDF tool web browser based is far away to have the functionality of WeasyPrint. I'll use again when this bug is solved.
from weasyprint.
Is anyone currently working on a PR for this? I see that multiple people have begun looking into it. As I'm looking at it now, I don't want to duplicate someone else's effort.
I don't think anyone is working on this right now. If you need help, please ask, I'll be happy to answer!
from weasyprint.
pls fix that problem. the workaround with clear: both is not working for me.
i've a table with a td that's larger than one page.
from weasyprint.
Any news on this thread, what about the support of tables?
I'm currently using wkHtmlToPdf and also have the issue with tables, the current behavior is cut every where (that is fine for me) but it also allow to cut in the middle of a line of text that makes the lib not usable for me.
Do we have a patch for this lib for my desired behavior?
from weasyprint.
Do we have a patch for this lib for my desired behavior?
No, there’s currently no patch. As said earlier, there’s no easy fix, and closing this issue requires a lot of work.
from weasyprint.
We won’t break inline-block
boxes, because according to the spec:
Since line boxes contain no possible break points, inline-block and inline-table boxes (and other inline-level display types that establish an independent formatting context) may also be considered monolithic: that is, in the cases where a single line box is too large to fit within its fragmentainer even by itself and the UA chooses to split the line box, it may fragment such boxes or it may treat them as monolithic.
from weasyprint.
Please, warn us here to test when available.
Thank you
from weasyprint.
So where can I find this new feature? Is this integrated in the newest release?
from weasyprint.
So where can I find this new feature? Is this integrated in the newest release?
Hi!
As you can see in the metadata of these issues, it’s available since version 54.
from weasyprint.
Thanks liZe, sorry for this stupid question:
I am using version 57.1
, but still can't break a <tr>
of long text over to the next page. Is there any parameter I need to pass for this to work?
from weasyprint.
Related Issues (20)
- Weasyprint.exe : Fontconfig error: Cannot load default config file: No such file: (null) HOT 8
- Support of `@media only` HOT 4
- Migrate to resvg? HOT 2
- Support grid-auto-flow: column HOT 4
- Fonts breaking in v62 HOT 8
- Grid support enhancements
- v62 - TypeError: can't multiply sequence by non-int of type 'float' HOT 1
- Create single page pdf for thermal printer HOT 1
- Bold font sometimes work but sometimes doesn't HOT 2
- Grid support with errors (print-css-rocks) HOT 3
- PDF/UA accessibility. Labeled strange. HOT 1
- TypeError: can only concatenate str (not "float") to str (after update from 61.2 to 62.1) HOT 5
- Tailwind color codes are incompatible HOT 2
- Issue when trying to write a pdf with the openssl_md5 function in your library HOT 1
- Failure when building wheel for 62.1 from pypi HOT 5
- Add support for ol start attribute HOT 2
- Weasyprint (62.1) broken on macOS using Python 3.12.3 from python.org HOT 3
- python api: set created date HOT 3
- Support overflow-x/y HOT 1
- display flex is breaking UI HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from weasyprint.