Giter VIP home page Giter VIP logo

Comments (22)

hejsan avatar hejsan commented on May 30, 2024

Hi.
This would solve a lot of use cases.
If I were to take this on, could you give me pointers on where to start please?

from weasyprint.

SimonSapin avatar SimonSapin commented on May 30, 2024

Awesome!
First, read this to get a general idea of the code is organized: http://weasyprint.org/docs/hacking/
Then read the spec: http://dev.w3.org/csswg/css-page/#using-named-pages
Then read the spec again :)

The cascade happens in weasyprint/css/__init__.py. Its result is the computed value of every property for every element, pseudo-element, page, and page-margin box. The parser already supports the syntax for named pages, they are explicitly rejected by the cascade.

At the moment we compute styles for every type of page in advance, but with named pages I think it should be more lazy: the layout code (as pages are generated) would call back into the cascade to ask for the style of a given page. (The cascade would cache that result.)

During layout, wherever we look at page-break-before and page-break-after you should also look for page. We only look look for page breaks between sibling block-level boxes, but the way the page value is propagated from first/last children is similar to how page-break-before/after is propagated. This all happens in the block_level_page_break() function of weasyprint/layout/blocks.py. The next_page return value that is passed all over the place would have to change to encode not just left/right/any page, but also the page "name".

Then, in weasyprint/layout/pages.py, look at that next_page return value and request the corresponding styles when creating a page box. Look for style_for(). Oh, and you’ll also specific code for the first page, the spec should cover this.

Good luck, and do ask questions or help as much as you need.

from weasyprint.

psmolenski avatar psmolenski commented on May 30, 2024

Are there any plans to implement this feature?

from weasyprint.

SimonSapin avatar SimonSapin commented on May 30, 2024

Hi @psmolenski . I haven’t heard from @hejsan since the message above, so I’m not aware of anyone working on this at the moment. I will gladly provide guidance to anyone interested in working on this.

from weasyprint.

hejsan avatar hejsan commented on May 30, 2024

Hi again.
I'm sorry I haven't had time to look at this. I ended up circumventing this by making a script to combine individually generated HTML files like Simon suggested:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#--------------------------------
# Name:           weasybatch.py
# Purpose:        Combines multiple html files into single pdf
# Author:         Bjarni Þórisson
# Created:        24.05.2013
# Copyright:      Copyright 2013, RHÍ
# Python Version: 2.6
#--------------------------------
import os
import sys
import argparse
import re
from weasyprint import HTML

__version_info__ = (0, 1, 0, 'final', 0)
__version__ = re.sub("-(\w+)\.0$", "-\g<1>", re.sub("-final.*", "", re.sub("(^\d+\.\d\.\d)\.", "\g<1>-" , ".".join(map(lambda x: str(x), __version_info__)))))


def weasybatch(argv):
    """Combines HTML files into single PDF"""
    try:
        documents = [HTML(file_obj=f).render() for f in argv.files]
        documents[0].copy([page for doc in documents for page in doc.pages]).write_pdf(argv.output)
    except Exception as e:
        print e.args[0]

def main(argv):
    parser = argparse.ArgumentParser(description=u"Combines multiple html files into single pdf")
    parser.add_argument('files', metavar='file', type=argparse.FileType('r'), nargs='+',
            help="paths to input HTML files");
    parser.add_argument('--output', type=argparse.FileType('w'), help="target filename")

    args = parser.parse_args()
    weasybatch(args)

if __name__ == "__main__":
    main(sys.argv[1:])

Hope this helps, although I'd love to see named pages happen.

from weasyprint.

psmolenski avatar psmolenski commented on May 30, 2024

@hejsan, your script works perfectly. However, I couldn't find a way to add page numbers, which would be consistent throughout the whole final document.

Until now, I have been using CSS counters (counter(page)) and `@bottom-center' property to add page numbers at the bottom of every page. Unfortunately, rendering documents and then combining them into a single file causes the counter to reset for each of the document I merge.

I've tried using counter-reset property to set page counter to a specified value to keep the numbers consistent, but it seems that page counter is not affected by this property (custom counters works as expected).

I was wondering, whether you have any ideas how to add page numbering?

from weasyprint.

SimonSapin avatar SimonSapin commented on May 30, 2024

@psmolenski , that is bug #93. It will also probably require clarification in the spec: https://www.w3.org/Style/CSS/Tracker/issues/334

from weasyprint.

dmoonfire avatar dmoonfire commented on May 30, 2024

I'm trying to format books for publishing including all the related complexities with initial chapters and the like. Using named pages seemed like the right way to implement it. I was trying to grok the code to understand the problem so I figured I'd ask questions and make sure my understanding is correct.

Since make_page is what figures out the page format to use, it seems like that should have the ability to identify the page based on the first block element that would be on the page. That means we need to keep track of the desired page name through the HTML AST so we can query it from the first element.

Next, a start page value and end page value is determined for each box as the value (if any) propagated from its first or last child box (respectively), else the used value on the box itself. A child propagates its own start or end page value if and only if the page property applies to it. (css3-page)

Also from the specification, the scope would be important because the "next" page is based on the last item on the page. That way if you have an item that sets a page followed by a second item with a different page, the second page is covered.

<style>
  .p1 { page: page-1; }
  .p2 { page: page-2; }
</style>
<p class='p1'>On first page because no break</p>
<p class='p2'>
  On first page because still no break, if this is the last
  on the page, the next page will be page-2.
</p>

So, I would guess that the way to implement this is to:

  1. Create two properties on the blocks:
    • start_page_name: The name of the page to use for the start of this block.
    • end_page_name: The name of the page for the last item in the block.
  2. Go through the element tree and calculate the start_page_name and end_page_name.
    • If an element's selector defines page, the start_page_name will be set but the end won't.
    • If the selector doesn't define it, the start_page_name would be defined by the previous element's end_page_name or blank/undefined for the first page.
    • As the system recurses into the element tree, it keeps track of the last page name of the last item (or parent item). The end_page_name would be the end_page_name of the last child or start_page_name if there is no child element.
  3. Change make_page to use the start_page_name to figure out which page to look up. It won't need end_page_name because of previous step.
    • Instead of building up name first_right_page for the first page, it would be first_right_pagename_page.
    • The :first selector applies only to the first page of the document. However, this means that every named page will have to have a first, left, and right style tree to pick up whichever one will actually used.

Using the above rules, from my understanding of the specification, I could do initial chapter pages like this.

<style>
  @page chapter-first {
    /* No headers, high top margin */
  }
  @page chapter {
    /* Headers, low chapter */
  }
  div.chapter h1 { page: chapter-first; }
  div.chapter p { page: chapter; }
</style>
<div class='chapter'>
  <h1>Chapter 1</h1>
  <p>It was a dark and stormy night...</p>
</div>
<div class='chapter'>
  <h1>Chapter 2</h1>
  <p>It was the best of times, it was the worst...</p>
</div>

Does this make sense or seem probable for the code?

from weasyprint.

SimonSapin avatar SimonSapin commented on May 30, 2024
<style>
  .p1 { page: page-1; }
  .p2 { page: page-2; }
</style>
<p class='p1'>On first page because no break</p>
<p class='p2'>
  On first page because still no break, if this is the last
  on the page, the next page will be page-2.
</p>

This seems wrong. A page break is introduced between elements with a different value for the page property, so the second paragraph with be on the second page. (And the first page will have blank space at the bottom.)

https://drafts.csswg.org/css-page/#using-named-pages

If […] then a page break is forced between the two boxes, and content after the break resumes on a page box of the named type.

from weasyprint.

dmoonfire avatar dmoonfire commented on May 30, 2024

You are correct, I missed that. I would then amend a step that automatically puts a page-break-before: always for any selector element that has page and doesn't have a break. I think that would resolve that case. Well, while handling page-break-after.

I do believe the scope on page is still correct.

Now, my chapter example is wrong but going over the specification, I suspect it can't be done without a vendor flag since there is no pseudo selector for "first page after a forced page break".

from weasyprint.

SimonSapin avatar SimonSapin commented on May 30, 2024

The “high top margin” can be on the chapter title rather than on the page. But yes, I believe that css-page level 3 does not provide something flexible enough to inhibit a page-margin box on the first page in a chapter. The only thing I can think of is a huge hack like this to mask it with a white rectangle:

.chapter > h1::before {
    content: "";
    position: absolute;
    /* bottom of this pseudo-element is placed at:
       100% of the page’s height from the page’s bottom
       which is the page’s top. */
    bottom: 100%;
    height: 3cm; /* same a margin-top in @page */
    width: 100%;
    background: white;
}

from weasyprint.

dmoonfire avatar dmoonfire commented on May 30, 2024

Or just create a vendor pseudo class (:x-first-named). I'll see if that hack works for me, I have to get a book formatted by the end of the month but I can do a little drudge work to get around the page number problem (mainly because 93 hasn't been resolved either, I looked at that one also).

Is the overall suggestion reasonable?

from weasyprint.

dmoonfire avatar dmoonfire commented on May 30, 2024

@SimonSapin: That hack almost worked but it puts the box underneath the text and I can't use z-index to get over it. I suspect because the blocks are laid out before the page elements are insert in.

from weasyprint.

SimonSapin avatar SimonSapin commented on May 30, 2024

GitHub parsed "hope on our long way to close #57." in #485 as "This PR closes #57", which I believe is not the case. @liZe, can you confirm?

from weasyprint.

liZe avatar liZe commented on May 30, 2024

@liZe, can you confirm?

Of course, it's not closed (yet).

from weasyprint.

liZe avatar liZe commented on May 30, 2024

It's closed now!

If anyone is interested (@dmoonfire @hejsan @psmolenski), I'd like to know if it works for your use cases. I've added some unit tests and tried with real-life documents, it fits my needs (and the spec I hope).

There is room for improvements, including tests with page names, pseudo classes and specificity (easy), less stupid code (easy) and lazy style loading (hard). Please ask for help if you want to fix that!

I'll release 0.40 soon if nothing's bad for anyone here.

from weasyprint.

andul avatar andul commented on May 30, 2024

@liZe thanks for your great work! We tested the named pages feature and it runs just wonderful! No problems discovered on our test cases.

from weasyprint.

dmoonfire avatar dmoonfire commented on May 30, 2024

@liZe: I did a quick ad-hoc test for my purposes and the named pages work out very nicely. I was happy to see that the page: chapter only applied to the next page and then it went back to the :right and :left which handled my need for leading chapter pages having a different style from the rest of the chapter.

My hack was to render each chapter twice (leading chapter page and then the rest), then use pdfcat to combine the two together. Since I can't reset page numbers, I had to add blank pages to get the page numbers right. It was kind of messy but it worked out good enough for a good-looking book.

With this, it looks like I only need to render stuff twice: once for the front matter which doesn't have page numbers, and once for the main matter using named pages. That should significantly reduce the generation time, thank you!

from weasyprint.

liZe avatar liZe commented on May 30, 2024

Happy to see that this feature is helpful! @dmoonfire I'd love to have HTML and PDF samples of your books if possible, and find what's missing to generate them without pdfcat.

from weasyprint.

dmoonfire avatar dmoonfire commented on May 30, 2024

@liZe: Let me update my mfgames-writing-weasyprint package to use named pages and I'll be glad to give you the HTML source. Right now, the system generates 1-2 PDFs per chapter/section and stitches them together, so it's a mess with 37 sections and 74 PDFs. :)

The final result can be seen here, https://fedran.com/sand-and-bone/dmoonfire-100-02-sand-and-bone-1.0.1.pdf (my novel is CC-BY-NC-SA, so no trouble putting it online).

If you want to D/L a version (and have Node), there is a full demo with only a few chapters at https://gitlab.com/mfgames-writing-js/example-frankenstein. Running ./node_modules/.bin/mfgames-writing build pdf once everything is populated should generate the output with loads of debugging that shows how I stitch everything together today.

For the most part, I just need counter-reset: page 1; in a named page to make it be a single generated PDF. This is because page numbers should start at 1 for the first page of the first chapter, not the 10 pages of front matter (legal statements, title pages, dedications, etc.) You see that on page 11 of the PDF.

I haven't checked to see if content: counter(my-awesome-counter, lower-roman); works yet. I don't have page numbers in the front matter but those aren't that important. :)

from weasyprint.

SimonSapin avatar SimonSapin commented on May 30, 2024

For the most part, I just need counter-reset: page 1; in a named page

I think that, if supported, this would reset the counter to 1 on every single page of the group of pages that share that "name". You’d need either:

  • Some way to select the first page of a group. :first is the first of the entire document.
  • Or make counter-reset: page 1; work on elements (e.g. h1, used together with page-break-before). This may be tricky to implement.

from weasyprint.

dmoonfire avatar dmoonfire commented on May 30, 2024

Since the page: chapter; only applies to the next page and then it goes back to the normal :right and :left, I can easily have a page: first-chapter; that is only applied to the first chapter to reset the number and then just have a sequence of :right, :left, and chapter pages for the rest of the document.

The problem with :first is that the chapter is page 10 and :first only applies to page 1.

from weasyprint.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.