Comments (22)
Hi.
This would solve a lot of use cases.
If I were to take this on, could you give me pointers on where to start please?
from weasyprint.
Awesome!
First, read this to get a general idea of the code is organized: http://weasyprint.org/docs/hacking/
Then read the spec: http://dev.w3.org/csswg/css-page/#using-named-pages
Then read the spec again :)
The cascade happens in weasyprint/css/__init__.py
. Its result is the computed value of every property for every element, pseudo-element, page, and page-margin box. The parser already supports the syntax for named pages, they are explicitly rejected by the cascade.
At the moment we compute styles for every type of page in advance, but with named pages I think it should be more lazy: the layout code (as pages are generated) would call back into the cascade to ask for the style of a given page. (The cascade would cache that result.)
During layout, wherever we look at page-break-before
and page-break-after
you should also look for page
. We only look look for page breaks between sibling block-level boxes, but the way the page
value is propagated from first/last children is similar to how page-break-before/after
is propagated. This all happens in the block_level_page_break()
function of weasyprint/layout/blocks.py
. The next_page
return value that is passed all over the place would have to change to encode not just left/right/any page, but also the page "name".
Then, in weasyprint/layout/pages.py
, look at that next_page
return value and request the corresponding styles when creating a page box. Look for style_for()
. Oh, and you’ll also specific code for the first page, the spec should cover this.
Good luck, and do ask questions or help as much as you need.
from weasyprint.
Are there any plans to implement this feature?
from weasyprint.
Hi @psmolenski . I haven’t heard from @hejsan since the message above, so I’m not aware of anyone working on this at the moment. I will gladly provide guidance to anyone interested in working on this.
from weasyprint.
Hi again.
I'm sorry I haven't had time to look at this. I ended up circumventing this by making a script to combine individually generated HTML files like Simon suggested:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#--------------------------------
# Name: weasybatch.py
# Purpose: Combines multiple html files into single pdf
# Author: Bjarni Þórisson
# Created: 24.05.2013
# Copyright: Copyright 2013, RHÍ
# Python Version: 2.6
#--------------------------------
import os
import sys
import argparse
import re
from weasyprint import HTML
__version_info__ = (0, 1, 0, 'final', 0)
__version__ = re.sub("-(\w+)\.0$", "-\g<1>", re.sub("-final.*", "", re.sub("(^\d+\.\d\.\d)\.", "\g<1>-" , ".".join(map(lambda x: str(x), __version_info__)))))
def weasybatch(argv):
"""Combines HTML files into single PDF"""
try:
documents = [HTML(file_obj=f).render() for f in argv.files]
documents[0].copy([page for doc in documents for page in doc.pages]).write_pdf(argv.output)
except Exception as e:
print e.args[0]
def main(argv):
parser = argparse.ArgumentParser(description=u"Combines multiple html files into single pdf")
parser.add_argument('files', metavar='file', type=argparse.FileType('r'), nargs='+',
help="paths to input HTML files");
parser.add_argument('--output', type=argparse.FileType('w'), help="target filename")
args = parser.parse_args()
weasybatch(args)
if __name__ == "__main__":
main(sys.argv[1:])
Hope this helps, although I'd love to see named pages happen.
from weasyprint.
@hejsan, your script works perfectly. However, I couldn't find a way to add page numbers, which would be consistent throughout the whole final document.
Until now, I have been using CSS counters (counter(page)
) and `@bottom-center' property to add page numbers at the bottom of every page. Unfortunately, rendering documents and then combining them into a single file causes the counter to reset for each of the document I merge.
I've tried using counter-reset
property to set page
counter to a specified value to keep the numbers consistent, but it seems that page
counter is not affected by this property (custom counters works as expected).
I was wondering, whether you have any ideas how to add page numbering?
from weasyprint.
@psmolenski , that is bug #93. It will also probably require clarification in the spec: https://www.w3.org/Style/CSS/Tracker/issues/334
from weasyprint.
I'm trying to format books for publishing including all the related complexities with initial chapters and the like. Using named pages seemed like the right way to implement it. I was trying to grok the code to understand the problem so I figured I'd ask questions and make sure my understanding is correct.
Since make_page
is what figures out the page format to use, it seems like that should have the ability to identify the page based on the first block element that would be on the page. That means we need to keep track of the desired page name through the HTML AST so we can query it from the first element.
Next, a start page value and end page value is determined for each box as the value (if any) propagated from its first or last child box (respectively), else the used value on the box itself. A child propagates its own start or end page value if and only if the page property applies to it. (css3-page)
Also from the specification, the scope would be important because the "next" page is based on the last item on the page. That way if you have an item that sets a page followed by a second item with a different page, the second page is covered.
<style>
.p1 { page: page-1; }
.p2 { page: page-2; }
</style>
<p class='p1'>On first page because no break</p>
<p class='p2'>
On first page because still no break, if this is the last
on the page, the next page will be page-2.
</p>
So, I would guess that the way to implement this is to:
- Create two properties on the blocks:
start_page_name
: The name of the page to use for the start of this block.end_page_name
: The name of the page for the last item in the block.
- Go through the element tree and calculate the
start_page_name
andend_page_name
.- If an element's selector defines
page
, thestart_page_name
will be set but the end won't. - If the selector doesn't define it, the
start_page_name
would be defined by the previous element'send_page_name
or blank/undefined for the first page. - As the system recurses into the element tree, it keeps track of the last page name of the last item (or parent item). The
end_page_name
would be theend_page_name
of the last child orstart_page_name
if there is no child element.
- If an element's selector defines
- Change
make_page
to use thestart_page_name
to figure out which page to look up. It won't needend_page_name
because of previous step.- Instead of building up name
first_right_page
for the first page, it would befirst_right_pagename_page
. - The
:first
selector applies only to the first page of the document. However, this means that every named page will have to have a first, left, and right style tree to pick up whichever one will actually used.
- Instead of building up name
Using the above rules, from my understanding of the specification, I could do initial chapter pages like this.
<style>
@page chapter-first {
/* No headers, high top margin */
}
@page chapter {
/* Headers, low chapter */
}
div.chapter h1 { page: chapter-first; }
div.chapter p { page: chapter; }
</style>
<div class='chapter'>
<h1>Chapter 1</h1>
<p>It was a dark and stormy night...</p>
</div>
<div class='chapter'>
<h1>Chapter 2</h1>
<p>It was the best of times, it was the worst...</p>
</div>
Does this make sense or seem probable for the code?
from weasyprint.
<style> .p1 { page: page-1; } .p2 { page: page-2; } </style> <p class='p1'>On first page because no break</p> <p class='p2'> On first page because still no break, if this is the last on the page, the next page will be page-2. </p>
This seems wrong. A page break is introduced between elements with a different value for the page
property, so the second paragraph with be on the second page. (And the first page will have blank space at the bottom.)
https://drafts.csswg.org/css-page/#using-named-pages
If […] then a page break is forced between the two boxes, and content after the break resumes on a page box of the named type.
from weasyprint.
You are correct, I missed that. I would then amend a step that automatically puts a page-break-before: always
for any selector element that has page
and doesn't have a break. I think that would resolve that case. Well, while handling page-break-after
.
I do believe the scope on page
is still correct.
Now, my chapter example is wrong but going over the specification, I suspect it can't be done without a vendor flag since there is no pseudo selector for "first page after a forced page break".
from weasyprint.
The “high top margin” can be on the chapter title rather than on the page. But yes, I believe that css-page level 3 does not provide something flexible enough to inhibit a page-margin box on the first page in a chapter. The only thing I can think of is a huge hack like this to mask it with a white rectangle:
.chapter > h1::before {
content: "";
position: absolute;
/* bottom of this pseudo-element is placed at:
100% of the page’s height from the page’s bottom
which is the page’s top. */
bottom: 100%;
height: 3cm; /* same a margin-top in @page */
width: 100%;
background: white;
}
from weasyprint.
Or just create a vendor pseudo class (:x-first-named
). I'll see if that hack works for me, I have to get a book formatted by the end of the month but I can do a little drudge work to get around the page number problem (mainly because 93 hasn't been resolved either, I looked at that one also).
Is the overall suggestion reasonable?
from weasyprint.
@SimonSapin: That hack almost worked but it puts the box underneath the text and I can't use z-index
to get over it. I suspect because the blocks are laid out before the page elements are insert in.
from weasyprint.
GitHub parsed "hope on our long way to close #57." in #485 as "This PR closes #57", which I believe is not the case. @liZe, can you confirm?
from weasyprint.
@liZe, can you confirm?
Of course, it's not closed (yet).
from weasyprint.
It's closed now!
If anyone is interested (@dmoonfire @hejsan @psmolenski), I'd like to know if it works for your use cases. I've added some unit tests and tried with real-life documents, it fits my needs (and the spec I hope).
There is room for improvements, including tests with page names, pseudo classes and specificity (easy), less stupid code (easy) and lazy style loading (hard). Please ask for help if you want to fix that!
I'll release 0.40 soon if nothing's bad for anyone here.
from weasyprint.
@liZe thanks for your great work! We tested the named pages feature and it runs just wonderful! No problems discovered on our test cases.
from weasyprint.
@liZe: I did a quick ad-hoc test for my purposes and the named pages work out very nicely. I was happy to see that the page: chapter
only applied to the next page and then it went back to the :right
and :left
which handled my need for leading chapter pages having a different style from the rest of the chapter.
My hack was to render each chapter twice (leading chapter page and then the rest), then use pdfcat
to combine the two together. Since I can't reset page numbers, I had to add blank pages to get the page numbers right. It was kind of messy but it worked out good enough for a good-looking book.
With this, it looks like I only need to render stuff twice: once for the front matter which doesn't have page numbers, and once for the main matter using named pages. That should significantly reduce the generation time, thank you!
from weasyprint.
Happy to see that this feature is helpful! @dmoonfire I'd love to have HTML and PDF samples of your books if possible, and find what's missing to generate them without pdfcat
.
from weasyprint.
@liZe: Let me update my mfgames-writing-weasyprint
package to use named pages and I'll be glad to give you the HTML source. Right now, the system generates 1-2 PDFs per chapter/section and stitches them together, so it's a mess with 37 sections and 74 PDFs. :)
The final result can be seen here, https://fedran.com/sand-and-bone/dmoonfire-100-02-sand-and-bone-1.0.1.pdf (my novel is CC-BY-NC-SA, so no trouble putting it online).
If you want to D/L a version (and have Node), there is a full demo with only a few chapters at https://gitlab.com/mfgames-writing-js/example-frankenstein. Running ./node_modules/.bin/mfgames-writing build pdf
once everything is populated should generate the output with loads of debugging that shows how I stitch everything together today.
For the most part, I just need counter-reset: page 1;
in a named page to make it be a single generated PDF. This is because page numbers should start at 1 for the first page of the first chapter, not the 10 pages of front matter (legal statements, title pages, dedications, etc.) You see that on page 11 of the PDF.
I haven't checked to see if content: counter(my-awesome-counter, lower-roman);
works yet. I don't have page numbers in the front matter but those aren't that important. :)
from weasyprint.
For the most part, I just need
counter-reset: page 1;
in a named page
I think that, if supported, this would reset the counter to 1 on every single page of the group of pages that share that "name". You’d need either:
- Some way to select the first page of a group.
:first
is the first of the entire document. - Or make
counter-reset: page 1;
work on elements (e.g.h1
, used together withpage-break-before
). This may be tricky to implement.
from weasyprint.
Since the page: chapter;
only applies to the next page and then it goes back to the normal :right
and :left
, I can easily have a page: first-chapter;
that is only applied to the first chapter to reset the number and then just have a sequence of :right
, :left
, and chapter
pages for the rest of the document.
The problem with :first
is that the chapter is page 10 and :first
only applies to page 1.
from weasyprint.
Related Issues (20)
- v62 - TypeError: can't multiply sequence by non-int of type 'float' HOT 1
- Create single page pdf for thermal printer HOT 1
- Bold font sometimes work but sometimes doesn't HOT 2
- Grid support with errors (print-css-rocks) HOT 3
- PDF/UA accessibility. Labeled strange. HOT 1
- TypeError: can only concatenate str (not "float") to str (after update from 61.2 to 62.1) HOT 5
- Tailwind color codes are incompatible HOT 2
- Issue when trying to write a pdf with the openssl_md5 function in your library HOT 1
- Failure when building wheel for 62.1 from pypi HOT 5
- Add support for ol start attribute HOT 2
- Weasyprint (62.1) broken on macOS using Python 3.12.3 from python.org HOT 3
- python api: set created date HOT 3
- Support overflow-x/y HOT 1
- display flex is breaking UI HOT 1
- tailwindcss as stylesheets: TypeError: 'NoneType' object is not subscriptable HOT 3
- Background image flickers HOT 8
- Don’t display bottom border on cells in split rows
- CSS `gap` adds too much space at the end HOT 1
- Setting html, body height causes margins to result in partial document HOT 2
- CSS ` line-height` not working properly
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from weasyprint.