Feature request for proper figure formatting during conversion with Calibre about safaribooks HOT 4 OPEN

lorenzodifuccia commented on July 3, 2024

Feature request for proper figure formatting during conversion with Calibre

from safaribooks.

Comments (4)

piovac commented on July 3, 2024 2

In reality, the problem can be easily fixed with a setting Calibre conversion tool. On the "Search & replace" setting need to add \ width=\"(.*?)\" height=\"(.*?)\" to the Search field while leaving the Replace field empty.

Another trick to make better converted files is to reduce the size of the code text to avoid wrapping. That requires to edit the Style01.css file, search for .courprogramlisting and change the font-size:80%! to font-size:40%!

Maybe you could add some of these suggestions in an .md file.

from safaribooks.

lorenzodifuccia commented on July 3, 2024

I'll work on that.
Thanks!

from safaribooks.

klezm commented on July 3, 2024

Script to fix epubs

I wrote a script to update epub books with the suggestion by @piovac

import pathlib
import shutil
import io
import xml.dom.minidom
import xml.etree.ElementTree as ET
import zipfile
import re
import difflib
from pprint import pprint

# https://medium.com/dev-bits/ultimate-guide-for-working-with-i-o-streams-and-zip-archives-in-python-3-6f3cf96dca50

# #############################  Change this  #############################

paths = r"path/to/your/calibre/library"

# #########################################################################

paths = list(pathlib.Path(paths).rglob("*.epub"))
pprint(paths)

# path_backup = pathlib.Path(r"")
# shutil.copyfile(path_backup, path)

for path in paths:
    print("\n"*0 + f'---------------    Processing book: {path.name}    ---------------')

    zip_updated = io.BytesIO()

    with zipfile.ZipFile(path, "a") as zip:
        with zipfile.ZipFile(zip_updated, "w", compression = zipfile.ZIP_DEFLATED) as zip_u:
            files = zip.infolist()

            f_types = ['.xml', '.html', '.xhtml']

            for f in files:
                # print("." * 50, f'  {f.filename}  ', "." * 50)

                # copy not modified files
                if pathlib.Path(f.filename).suffix not in f_types:
                    zip_u.writestr(f, zip.read(f.filename))

                # modify data in archive
                else:
                    data = zip.read(f.filename).decode()
                    data_temp = data

                    # use xml to modify all img tags
                    if False:
                        # data = ET.canonicalize(data, rewrite_prefixes = True)
                        # data_xml = ET.fromstring(data)
                        data_xml = ET.ElementTree(ET.fromstring(data)).getroot()
                        # namespace = data_xml.tag[1:data_xml.tag.index("}")]

                        imgs = data_xml.findall(".//{*}img")  # use XPATH and use wildcard for any namespace
                        for img in imgs:
                            del img.attrib["width"]  # img.attrib.pop("width")
                            del img.attrib["height"]  # img.attrib.pop("height")
                            # img.tag = img.tag[img.tag.index("}") + 1:]  # remove namespace from tag

                        ET.register_namespace("", xml.dom.XHTML_NAMESPACE)  # prevent namespace before tags
                        # ET.register_namespace("", namespace)
                        data = ET.tostring(data_xml, encoding = "unicode")

                    # use simple regex to remove the attributes
                    else:
                        data = re.sub(r" width=\"(.*?)\" height=\"(.*?)\"", "", data)

                    zip_u.writestr(f, bytearray(data, "utf-8"))

                    if False:  # for debugging purposes
                        TMP_data = xml.dom.minidom.parseString(data).toprettyxml()
                        TMP_data_temp = xml.dom.minidom.parseString(data_temp).toprettyxml()
                        print(*[x for x in difflib.Differ().compare(TMP_data_temp.splitlines(), TMP_data.splitlines())
                                if x.find("<img ") >= 0 and not x.startswith("-")], sep = "\n")
                        print(*list(difflib.context_diff(TMP_data_temp.splitlines(keepends = True),
                                                         TMP_data.splitlines(keepends = True), n = 0)), sep = "")

    with open(path, "wb") as zip:
        zip.write(zip_updated.getbuffer())

    zip_updated.close()

Fix for safaribooks?

inserting the following code in this line might solve the problem (not tested)

safaribooks/safaribooks.py

Line 722 in c94d6b4

# remove width / height attributes from img tags
imgs = book_content.findall(".//{*}img")  # use XPATH and use wildcard for any namespace
for img in imgs:
    del img.attrib["width"]
    del img.attrib["height"]

from safaribooks.

SylvainMartel commented on July 3, 2024

using the replace with \ width=\"(.*?)\" height=\"(.*?)\" solved my problem, thanks! Will a fix make it in the main python code?

from safaribooks.

Feature request for proper figure formatting during conversion with Calibre about safaribooks HOT 4 OPEN

Comments (4)

Script to fix epubs

Fix for safaribooks?

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent