Giter VIP home page Giter VIP logo

Comments (36)

liZe avatar liZe commented on May 31, 2024 25

We now handle parallel flows for floats, absolutes, relatives, and table-cells.

This bug is now closed. It required 9 years of hard work 🚀.

We’ll release a beta soon, tests and feedback are welcome!

from weasyprint.

grewn0uille avatar grewn0uille commented on May 31, 2024 8

Hello,

A beta has been released.

Don’t hesistate to try it 😉

from weasyprint.

liZe avatar liZe commented on May 31, 2024 7

pls fix that problem.

Please, be kind. Everybody would like this issue to be closed, but there's no simple solution.

from weasyprint.

liZe avatar liZe commented on May 31, 2024 5

Let's go! I'll skip some details and lie a little bit to avoid useless complexity.

Web pages have mainly been created to be displayed on rectangles whose width is fixed and whose height is automatically calculated according to the content. That's what a "normal" browser do. But the problem is a bit different when you want to print these web pages: the height is fixed too and you'll need to cut the content between different pages.

CSS defines how the layout must be done, how blocks and texts are displayed. "Normal" blocks are put one below the other and "normal" texts are broken between multiple lines put one below the other. The way the "normal" content is displayed is called the normal flow.

CSS gives the possibility to remove blocks from the normal flow of the page and make them behave in a different way. These blocks sometimes create their own flow, creating nested or parallel flows in the page. That's where it's becoming a bit hard.

When CSS 2 has been written, floats and absolute/relative blocks (and somehow tables) were (almost) the only blocks creating parallel flows, and no-one really defined how these parallel flows had to be broken between pages. That's why WeasyPrint's layout has only one flow that can be correcly broken, and the blocks that are outside this flow are seen as atomic blocks going below the bottom of the page if needed.

But now, many CSS specifications have added many ways to create strange flows, such as columns, regions, flexbox and grid. It was time to define how parallel and nested flows had to be broken between pages. It's now done in the fragmentation module. It's not clearly defined but it's much better than what we had in CSS 2.

Bad news: it was not written when we started WeasyPrint.

Really bad news: it's really different from what we have in WeasyPrint.

It's probably not that difficult to implement the parts of the fragmentation module that are needed to fix this issue (well, for really simple cases). But it will need to slightly change many functions and modules in a single atomic commit that will be huge. We can imagine that the work needed is something like #291: long, tiring and painful. But not impossible.

from weasyprint.

liZe avatar liZe commented on May 31, 2024 3

I am using version 57.1, but still can't break a <tr> of long text over to the next page. Is there any parameter I need to pass for this to work?

It should work out of the box.

If your table row is not split, then there may be another CSS rule avoiding breaks somewhere (td { break-inside: avoid } for example). Or the content of the table cell may be using a layout that WeasyPrint is not able to split yet, like a flex box.

from weasyprint.

polonat avatar polonat commented on May 31, 2024 2

We solved the table split problem by placing <div style="clear:both;"><div> before table.

from weasyprint.

liZe avatar liZe commented on May 31, 2024 1

WeasyPrint's layout model and algorithm

You'll find all the code you need in the layout folder. The layout.pages module has got a make_all_pages function, calling the make_page function, calling the block_level_layout function, etc.

Where to make fixes for this issue
What would have to change architecturally to address the fragmentation module spec

Nested flows (as defined by the fragmentation CSS module) are pretty well supported for block-level and inline-level boxes, using a variable called resume_at that keeps a kind of pointer to where the rendering is (the "current" position). resume_at contains nested tuples representing the nested boxes, you'll find how it works for example in the block_container_layout function (in layout.blocks).

We need to add the support of parallel flows. Instead of one pointer pointing to one position in the flow, we need multiple pointers pointing to the "current" positions in the parallel flows. I imagine that resume_at can be changed into resume_at_list, containing one or more resume_at pointers.

To fix this issue, we basically need:

  • to change resume_at into resume_at_list almost everywhere, as rendering a box in the flow can return parallel positions where the parallel flows have reached the end of the page (one flow for itself and for each child creating parallel flows such as floats, table cells, etc., the list is in the fragmentation module),
  • to make floats, table cells, etc. take care of the bottom of the page and return their resume_at_list, instead of assuming that they have no limit for their vertical position.

That's all 😄! I think that everything's not correcly defined in the spec, we'll have to make some stupid choices for stupid cases (how do you render floats whose top border is taller than the page?), but the "normal" use cases should be quite well described and easy (and long, and painful) to implement.

If you need anything, I'll be really happy to help!

from weasyprint.

SimonSapin avatar SimonSapin commented on May 31, 2024

Yes, this is a known limitation: no page breaks are supported inside floats, absolute positioning, or table cells. Unfortunately right now I don’t have a better answer than “avoid using floats that way”.

I’d be happy to help anyone who wants to fix this, but this is a non-trivial change in the layout code. Otherwise this is something to be fixed eventually, but I don’t know when I’ll get to it.

from weasyprint.

Smylers avatar Smylers commented on May 31, 2024

Thanks. From your description I'm not sure whether this is the known limitation or not.

In this case I'm not trying to have page breaks inside a floated element, but between floated elements. Each li is floated separately. My apologies for not making that clearer in the initial report.

from weasyprint.

liZe avatar liZe commented on May 31, 2024

As reported in #375, we have the same problem with consecutive absolute/relative blocks.

from weasyprint.

hughsw avatar hughsw commented on May 31, 2024

I have just started using WeasyPrint, and I'm already a big fan. However, I have also quickly run into the float/break issue -- my users want Bootstrap and floated columns, and don't like what happens in the PDF document!

Can @SimonSapin or anyone else comment on the refactoring that would be necessary to fix this wartish problem? I haven't perused your codebase yet, but I know Python very well; so, I'm looking for high-level overview of the current layout model/algorithm and why it gets tripped up trying to put breaks in floats, and what would have to be changed.

Thanks,
-Hugh

from weasyprint.

SimonSapin avatar SimonSapin commented on May 31, 2024

301 @liZe

from weasyprint.

hughsw avatar hughsw commented on May 31, 2024

OK. Where should I be looking in the code to learn about the following (beyond the peephole insight of #291):

  • WeasyPrint's layout model and algorithm
  • Where to make fixes for this issue
  • What would have to change architecturally to address the fragmentation module spec

Thanks!

from weasyprint.

hughsw avatar hughsw commented on May 31, 2024

Thanks. That's just the kind of overview I was looking for.

One last thing: Testing driven development: (okay, two last things)

  • What's the quickest way to run tests during development work?
  • Do you have instances of HTML/CSS tests for parallel flows, that, when passing, will indicate that the work is finished? I of course have the instance that got me here, but do you know of a reference test set?

from weasyprint.

hughsw avatar hughsw commented on May 31, 2024

FYI, my habit is to do minor refactoring while I'm working to understand existing logic. So you can expect some PRs along those lines.

Also, I'm completely new to CSS implementation work ! ;-) However, my career has largely been developing scientific software, so I'm at home with specs and deep algorithms.

The code appears to have a good number of pointers to key CSS specs. However, if there are some spec documents that are so basic that you wouldn't mention them in code comments, they might actually be useful for me! So, I would appreciate pointers to key algorithmic starting points for CSS.

Thanks.

from weasyprint.

liZe avatar liZe commented on May 31, 2024

What's the quickest way to run tests during development work?

./setup.py test (launch tests and check coding style).

Do you have instances of HTML/CSS tests for parallel flows.

<style>
  @page {
    font-family: monospace;
    height: 2.5em;
    line-height: 1em;
    margin: 0;
    width: 10em;
  }
  body {
    margin: 0;
  }
  div {
    background: red;
    float: left; 
    width: 50%; 
  }
</style>

<body>
  <div>
    float float float float float
  </div>
  flow flow flow flow flow
</body>

You need to get something like:

Page 1
+-------------------------+
| float float | flow flow |
| float float | flow flow |
+-------------------------+
Page 2
+-------------------------+
| float       | flow      |
|-------------+           |
+-------------------------+

However, my career has largely been developing scientific software, so I'm at home with specs and deep algorithms.

You'll need these skills!

So, I would appreciate pointers to key algorithmic starting points for CSS.

There's a very useful chapter in the documentation. In the CSS spec, the best starting point is probably the presentation of the normal flow and the implementation of 9.4.1 and 9.4.2 in layout.blocks and layout.inlines.

Good luck!

from weasyprint.

hughsw avatar hughsw commented on May 31, 2024

OK, you threw me in the deep end of CSS spec, and I'm floundering, but progressing, through prose like this:

Except for table boxes, which are described in a later chapter, and replaced elements, a block-level box is also a block container box. A block container box either contains only block-level boxes or establishes an inline formatting context and thus contains only inline-level boxes. Not all block container boxes are block-level boxes: non-replaced inline blocks and non-replaced table cells are block containers but not block-level boxes. Block-level boxes that are also block containers are called block boxes.

I don't yet have a solid mental-model of what it takes to do all the layout given multiple flows and page breaks, but I'm working on it, and the WeasyPrint code-base is very approachable and the focus on resume_at helps. To keep getting my hands dirty I intend to add a terse detail string to each assert, so at least I'll know e.g. box sizes when I'm breaking things...

from weasyprint.

wd avatar wd commented on May 31, 2024

This problem is really annoying, I found a way to fix this.

The key point is split one <tr> into more <tr>, eg:

<tr>
  <td>col1</td>
  <td>long lines1
long lines2
long lines3
long lines4
  </td>
 </tr>

will be changed to

<tr>
  <td class="top_border"></td>
  <td class="top_border">long lines1</td>
 </tr>
<tr>
  <td class="no_border">col1</td>
  <td class="no_border">long lines2</td>
 </tr>
<tr>
  <td class="no_border"></td>
  <td class="no_border">long lines3</td>
 </tr>
<tr>
  <td class="no_border"></td>
  <td class="no_border">long lines4</td>
 </tr>

css

        table tr .no_border {
            border-left: 1px solid #000000;
            border-right: 1px solid #000000;
            border-top: 0;
            border-bottom: 0;
        }

        table tr .top_border {
            border-left: 1px solid #000000;
            border-right: 1px solid #000000;
            border-top: 1px solid #000000;
            border-bottom: 0;
        }

This just some sample code, just try to explain the main ideas, you need to change it to fit your situations. Wish this could help someone out.

from weasyprint.

RafaelLinux avatar RafaelLinux commented on May 31, 2024

This just some sample code, just try to explain the main ideas, you need to change it to fit your situations. Wish this could help someone out.

wd, thank you for your idea, but (at least in our case) is not feasible.

Any news about this bug??? It's critical if we pretend put in production ... :(

from weasyprint.

RafaelLinux avatar RafaelLinux commented on May 31, 2024

Unfortunately, after getting a beauty result Weasyprint, we have a deadline for our project upgrade where we need to create PDF files .... and we had to choose a solution where we didn't lose any text on getting the PDF files. This bug creates a big problem in our case, so finally we needed to adopt mPDF libraries instead. It have other collateral issues, but in final output is the same text that were in original HTML page.

Anyway, I give you all thanks for your help and comprehension. I will try to visit this thread from time to time, to see if it's closed ..... and then we will at last use Weasyprint as our solution.

You are doing a great job!!!!! ;)

from weasyprint.

budimm avatar budimm commented on May 31, 2024

Got same problem,, but

<ol>
  <li>a</li>
...
  <li>z</li>
</ol>

If there some <li> will show in page 2, it will not shown in pdf,, but after that code will show in page 2..

Any help will appreciated..

from weasyprint.

liZe avatar liZe commented on May 31, 2024

If there some <li> will show in page 2, it will not shown in pdf

Page breaks are allowed in lists. You probably get this problem because your list has a position, display or float property different from the default values.

from weasyprint.

budimm avatar budimm commented on May 31, 2024

Yeah it's solved a moment ago,, sory late reply..
i got class="input-group" in my bootstrap layout.. i remove it,,
And my ol and table runs well..
Thanks for reply.. 👍

from weasyprint.

kleptog avatar kleptog commented on May 31, 2024

So we ran into this issue as well.

@liZe I see your description above about how to solve this issue and you suggest turning resume_at into an array. This seems complicated and a bit error prone. I'm wondering if there isn't a simpler way to do this, using Python generators/coroutines. My idea is is that each element is only responsible for rendering itself, yielding (unsplittable) layout blocks and having containers combine them. This way you don't have to track any kind of resume_at value. The algorithm for rendering a table over multiple pages would look something like (pseudocode!)

# remaining = remaining height in page
foreach row in table:
    # Init cells
    foreach cell in row:
        iters[cell] = cell.start_render(width=cell.width)
    # While any cell still has content
    while any(iters[c] not None for c in row):
        this_row = LayoutContainer()
        for cell in row:
            if not iters[cell]: continue
            # Collect blocks from this cell until space is full
            remain_cell = remaining
            while remain_cell > 0:
                # Calls generator to return a layout block
                block = it.send(remain_cell)
                if not block: # This cell done
                    iters[cell] = None
                    break
                this_row.add(block)   # Result block
                remain_cell -= block.height
        # Note: if we filled a page, then remaining will become the height of the new page
        remaining = (yield this_row)

As you can see, this can split a table cell over multiple pages, without any of the contents of the cells actually being aware that they are being split over multiple pages. There are of course details to work out, if remaining is too small for your widget, who is responsible for adding the spacer? And floats need to be rendered first, and then other things needing to be rendered around them (possibly by passing a "current page" object around that widgets can inspect to see what they need to wrap around). This rendering method would allow you to render the float until the end of the this page so you know what it fills, then rendering the rest of the page. Then you can continue rendering the rest of float on the next page.

I've not really looked at the code so I'm not sure if there is some reason why this couldn't work, but this does seem much simpler that tracking resume locations yourself, by letting the Python coroutine stacks hold the state for you implicitly.

from weasyprint.

pytrumpeter avatar pytrumpeter commented on May 31, 2024

Is anyone currently working on a PR for this? I see that multiple people have begun looking into it. As I'm looking at it now, I don't want to duplicate someone else's effort.

from weasyprint.

hughsw avatar hughsw commented on May 31, 2024

For expedience sake I have moved on. I'm using Puppeteer in a headless Chrome browser to turn HTML into PDF. And, I'm using Mozilla's pdf.js library to analyze PDFs. The heavyweight browser folks have solved all the common problems, and they keep up with the evolving specs... Yes, this means I'm using Typescript/Javascript a lot these days, and, I'm enjoying functional programming far more than I ever expected.

from weasyprint.

RafaelLinux avatar RafaelLinux commented on May 31, 2024

I moved to other tool too precisely by this bug. Any HTML to PDF tool web browser based is far away to have the functionality of WeasyPrint. I'll use again when this bug is solved.

from weasyprint.

liZe avatar liZe commented on May 31, 2024

Is anyone currently working on a PR for this? I see that multiple people have begun looking into it. As I'm looking at it now, I don't want to duplicate someone else's effort.

I don't think anyone is working on this right now. If you need help, please ask, I'll be happy to answer!

from weasyprint.

cymn avatar cymn commented on May 31, 2024

pls fix that problem. the workaround with clear: both is not working for me.
i've a table with a td that's larger than one page.

from weasyprint.

Hideman85 avatar Hideman85 commented on May 31, 2024

Any news on this thread, what about the support of tables?

I'm currently using wkHtmlToPdf and also have the issue with tables, the current behavior is cut every where (that is fine for me) but it also allow to cut in the middle of a line of text that makes the lib not usable for me.

Do we have a patch for this lib for my desired behavior?

from weasyprint.

liZe avatar liZe commented on May 31, 2024

Do we have a patch for this lib for my desired behavior?

No, there’s currently no patch. As said earlier, there’s no easy fix, and closing this issue requires a lot of work.

from weasyprint.

liZe avatar liZe commented on May 31, 2024

We won’t break inline-block boxes, because according to the spec:

Since line boxes contain no possible break points, inline-block and inline-table boxes (and other inline-level display types that establish an independent formatting context) may also be considered monolithic: that is, in the cases where a single line box is too large to fit within its fragmentainer even by itself and the UA chooses to split the line box, it may fragment such boxes or it may treat them as monolithic.

from weasyprint.

RafaelLinux avatar RafaelLinux commented on May 31, 2024

Please, warn us here to test when available.

Thank you

from weasyprint.

pzdkn avatar pzdkn commented on May 31, 2024

So where can I find this new feature? Is this integrated in the newest release?

from weasyprint.

liZe avatar liZe commented on May 31, 2024

So where can I find this new feature? Is this integrated in the newest release?

Hi!

As you can see in the metadata of these issues, it’s available since version 54.

from weasyprint.

pzdkn avatar pzdkn commented on May 31, 2024

Thanks liZe, sorry for this stupid question:
I am using version 57.1, but still can't break a <tr> of long text over to the next page. Is there any parameter I need to pass for this to work?

from weasyprint.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.