Comments (4)
In reality, the problem can be easily fixed with a setting Calibre conversion tool. On the "Search & replace" setting need to add \ width=\"(.*?)\" height=\"(.*?)\"
to the Search field while leaving the Replace field empty.
Another trick to make better converted files is to reduce the size of the code text to avoid wrapping. That requires to edit the Style01.css file
, search for .courprogramlisting
and change the font-size:80%!
to font-size:40%!
Maybe you could add some of these suggestions in an .md file.
from safaribooks.
I'll work on that.
Thanks!
from safaribooks.
Script to fix epubs
I wrote a script to update epub books with the suggestion by @piovac
import pathlib
import shutil
import io
import xml.dom.minidom
import xml.etree.ElementTree as ET
import zipfile
import re
import difflib
from pprint import pprint
# https://medium.com/dev-bits/ultimate-guide-for-working-with-i-o-streams-and-zip-archives-in-python-3-6f3cf96dca50
# ############################# Change this #############################
paths = r"path/to/your/calibre/library"
# #########################################################################
paths = list(pathlib.Path(paths).rglob("*.epub"))
pprint(paths)
# path_backup = pathlib.Path(r"")
# shutil.copyfile(path_backup, path)
for path in paths:
print("\n"*0 + f'--------------- Processing book: {path.name} ---------------')
zip_updated = io.BytesIO()
with zipfile.ZipFile(path, "a") as zip:
with zipfile.ZipFile(zip_updated, "w", compression = zipfile.ZIP_DEFLATED) as zip_u:
files = zip.infolist()
f_types = ['.xml', '.html', '.xhtml']
for f in files:
# print("." * 50, f' {f.filename} ', "." * 50)
# copy not modified files
if pathlib.Path(f.filename).suffix not in f_types:
zip_u.writestr(f, zip.read(f.filename))
# modify data in archive
else:
data = zip.read(f.filename).decode()
data_temp = data
# use xml to modify all img tags
if False:
# data = ET.canonicalize(data, rewrite_prefixes = True)
# data_xml = ET.fromstring(data)
data_xml = ET.ElementTree(ET.fromstring(data)).getroot()
# namespace = data_xml.tag[1:data_xml.tag.index("}")]
imgs = data_xml.findall(".//{*}img") # use XPATH and use wildcard for any namespace
for img in imgs:
del img.attrib["width"] # img.attrib.pop("width")
del img.attrib["height"] # img.attrib.pop("height")
# img.tag = img.tag[img.tag.index("}") + 1:] # remove namespace from tag
ET.register_namespace("", xml.dom.XHTML_NAMESPACE) # prevent namespace before tags
# ET.register_namespace("", namespace)
data = ET.tostring(data_xml, encoding = "unicode")
# use simple regex to remove the attributes
else:
data = re.sub(r" width=\"(.*?)\" height=\"(.*?)\"", "", data)
zip_u.writestr(f, bytearray(data, "utf-8"))
if False: # for debugging purposes
TMP_data = xml.dom.minidom.parseString(data).toprettyxml()
TMP_data_temp = xml.dom.minidom.parseString(data_temp).toprettyxml()
print(*[x for x in difflib.Differ().compare(TMP_data_temp.splitlines(), TMP_data.splitlines())
if x.find("<img ") >= 0 and not x.startswith("-")], sep = "\n")
print(*list(difflib.context_diff(TMP_data_temp.splitlines(keepends = True),
TMP_data.splitlines(keepends = True), n = 0)), sep = "")
with open(path, "wb") as zip:
zip.write(zip_updated.getbuffer())
zip_updated.close()
Fix for safaribooks?
inserting the following code in this line might solve the problem (not tested)
Line 722 in c94d6b4
# remove width / height attributes from img tags
imgs = book_content.findall(".//{*}img") # use XPATH and use wildcard for any namespace
for img in imgs:
del img.attrib["width"]
del img.attrib["height"]
from safaribooks.
using the replace with \ width=\"(.*?)\" height=\"(.*?)\"
solved my problem, thanks! Will a fix make it in the main python code?
from safaribooks.
Related Issues (20)
- Downloading from public library providing Oreilly subscription HOT 1
- Images from books are corrupt HOT 3
- Auth Failure. - Unexpected error! HOT 3
- flask3.9 ImportError: cannot import name 'escape' from 'jinja2' HOT 1
- Authentication issue: unable to access profile page. HOT 8
- Cannot sudo rm -rf some .log file so cannot download my book HOT 1
- Parser: book content's corrupted or not present: ch01.xhtml
- Unhandled Exception: 'rights' (type: KeyError) HOT 1
- Trial account not working due to email issue HOT 2
- Error trying to parse this page
- SSO, Company, University, etc., Login Problems: *READ BEFORE NEW ISSUE* HOT 1
- Crawler: error trying to parse this page: c02.xhtml HOT 5
- Every chapter only has first page HOT 1
- Parser: book content's corrupted or not present
- download all books in specific playlist
- Is it normal normal that the program can't login after 10 minutes? HOT 23
- Table titles appear vertically HOT 1
- Stuck at login HOT 1
- 'Connection aborted.', RemoteDisconnected('Remote end closed connection without response') HOT 1
- Still being maintained? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from safaribooks.