Giter VIP home page Giter VIP logo

weread-exporter's People

Contributors

drunkdream avatar fengshenx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

weread-exporter's Issues

报错

90b32020813ab7914g019ab7
这个运行到第三篇的时候,卡住不动了
[2023-05-19 14:31:37,263][ERROR][Exporter] Go to chapter 第三篇 **** failed
Traceback (most recent call last):
File "C:\Users\2020\Desktop\1*-exporter-main*_exporter\export.py", line 303, in export_markdown
await self._page.goto_chapter(
File "C:\Users\2020\Desktop\1*-exporter-main**_exporter\webpage.py", line 390, in goto_chapter
await self._page.goto(self._url, timeout=1000 * timeout)
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\pyppeteer\page.py", line 837, in goto
raise error
pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 30000 ms exceeded.
[2023-05-19 14:31:37,266][INFO][***WebPage] Go to chapter 13
[2023-05-19 14:31:37,267][ERROR]connection unexpectedly closed
[2023-05-19 14:31:37,268][ERROR]Task exception was never retrieved
future: <Task finished name='Task-2264' coro=<Connection._async_send() done, defined at C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\pyppeteer\connection.py:69> exception=InvalidStateError('invalid state')>
Traceback (most recent call last):
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\websockets\legacy\protocol.py", line 968, in transfer_data
message = await self.read_message()
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\websockets\legacy\protocol.py", line 1038, in read_message
frame = await self.read_data_frame(max_size=self.max_size)
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\websockets\legacy\protocol.py", line 1113, in read_data_frame
frame = await self.read_frame(max_size)
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\websockets\legacy\protocol.py", line 1170, in read_frame
frame = await Frame.read(
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\websockets\legacy\framing.py", line 69, in read
data = await reader(2)
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\asyncio\streams.py", line 706, in readexactly
raise exceptions.IncompleteReadError(incomplete, n)
asyncio.exceptions.IncompleteReadError: 0 bytes read on a total of 2 expected bytes

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\pyppeteer\connection.py", line 73, in _async_send
await self.connection.send(msg)
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\websockets\legacy\protocol.py", line 635, in send
await self.ensure_open()
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\websockets\legacy\protocol.py", line 944, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: sent 1000 (OK); no close frame received

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\pyppeteer\connection.py", line 79, in _async_send
await self.dispose()
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\pyppeteer\connection.py", line 170, in dispose
await self._on_close()
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\pyppeteer\connection.py", line 151, in _on_close
cb.set_exception(_rewriteError(
asyncio.exceptions.InvalidStateError: invalid state

卡住了

status is invalid,

(base) bin@bindeMacBook-Pro  ~/VsCodeProjects/weread-exporter   main ●  python -m weread_exporter -b 7023271071a3505370215dc -o epub
/Users/bin/VsCodeProjects/weread-exporter/weread_exporter/main.py:129: DeprecationWarning: There is no current event loop
loop = asyncio.get_event_loop()
[2023-05-20 14:53:12,487][INFO]Exporting book 7023271071a3505370215dc
[2023-05-20 14:53:12,705][WARNING]Book 7023271071a3505370215dc status is invalid, stop exporting

总是显示超时,主页无法打开~~~

总是显示超时,主页无法打开~~~实际上可以调用浏览器,但是无法自动打开内容。

I:\Program Files (x86)\weread-exporter-main>python -m weread_exporter -b ebd327f0718c8443ebdf735 -o epub
Fontconfig error: Cannot load default config file: No such file: (null)
[2024-03-15 22:38:08,977][INFO]Exporting book ebd327f0718c8443ebdf735
[2024-03-15 22:38:09,196][INFO][WeReadWebPage] Launch url https://weread.qq.com/web/bookDetail/ebd327f0718c8443ebdf735
[2024-03-15 22:38:09,728][INFO]Browser listening on: ws://127.0.0.1:54490/devtools/browser/b69386b8-79fb-4c0b-bc11-a4ffdf297a77

(process:4720): GLib-GIO-WARNING **: 22:38:09.956: Unexpectedly, UWP app Microsoft.OutlookForWindows_1.2023.1114.100_x64__8wekyb3d8bbwe' (AUMId Microsoft.OutlookForWindows_8wekyb3d8bbwe!Microsoft.OutlookforWindows') supports 1 extensions but has no verbs

(process:4720): GLib-GIO-WARNING **: 22:38:10.025: Unexpectedly, UWP app Microsoft.ScreenSketch_11.2312.33.0_x64__8wekyb3d8bbwe' (AUMId Microsoft.ScreenSketch_8wekyb3d8bbwe!App') supports 29 extensions but has no verbs

(process:4720): GLib-GIO-WARNING **: 22:38:10.332: Unexpectedly, UWP app Clipchamp.Clipchamp_2.9.3.0_neutral__yxz26nhyzhsrt' (AUMId Clipchamp.Clipchamp_yxz26nhyzhsrt!App') supports 41 extensions but has no verbs
[2024-03-15 22:38:11,429][INFO][WeReadWebPage] Current login user is 微信用户
[2024-03-15 22:38:11,429][INFO][WeReadWebPage] Inject cookie wr_name=%E5%BE%AE%E4%BF%A1%E7%94%A8%E6%88%B7
[2024-03-15 22:38:11,430][INFO][WeReadWebPage] Inject cookie wr_localvid=ff032c4081ef580b9ff02b8
[2024-03-15 22:38:11,432][INFO][WeReadWebPage] Inject cookie wr_rt=web%40UOkhaelbwjfqfmAGQ3r_AL
[2024-03-15 22:38:11,433][INFO][WeReadWebPage] Inject cookie wr_gender=0
[2024-03-15 22:38:11,435][INFO][WeReadWebPage] Inject cookie wr_pf=0
[2024-03-15 22:38:11,436][INFO][WeReadWebPage] Inject cookie wr_avatar=
[2024-03-15 22:38:11,437][INFO][WeReadWebPage] Inject cookie wr_skey=js_pEZ8M
[2024-03-15 22:38:11,439][INFO][WeReadWebPage] Inject cookie wr_vid=519405753
[2024-03-15 22:38:11,440][INFO][WeReadWebPage] Inject cookie wr_fp=2600317417
[2024-03-15 22:38:11,441][INFO][WeReadWebPage] Inject cookie wr_gid=231410492
[2024-03-15 22:38:42,126][ERROR]Launch book ebd327f0718c8443ebdf735 home page failed
Traceback (most recent call last):
File "I:\Program Files (x86)\weread-exporter-main\weread_exporter_main_.py", line 85, in async_main
await page.launch(headless=args.headless, force_login=args.force_login)
File "I:\Program Files (x86)\weread-exporter-main\weread_exporter\webpage.py", line 233, in launch
await self.wait_for_avatar()
File "I:\Program Files (x86)\weread-exporter-main\weread_exporter\webpage.py", line 280, in wait_for_avatar
raise RuntimeError("Wait for avatar timeout")
RuntimeError: Wait for avatar timeout

报错RuntimeError: Wait for creating markdown timeout

4de32960813ab8721g011b12.log

C:\Portable\EdgeGPT\weread-exporter>python -m weread_exporter -b 4de32960813ab8721g011b12 -o epub --load-timeout=300
Fontconfig error: Cannot load default config file: No such file: (null)
[2024-01-25 21:45:27,245][INFO]Exporting book 4de32960813ab8721g011b12
[2024-01-25 21:45:27,544][INFO][WeReadWebPage] Launch url https://weread.qq.com/web/bookDetail/4de32960813ab8721g011b12
[2024-01-25 21:45:28,066][INFO]Browser listening on: ws://127.0.0.1:49973/devtools/browser/b93df7c1-5120-45a9-813a-8dcec06f9c9b
[2024-01-25 21:45:28,285][INFO][WeReadWebPage] Current login user is ***
[2024-01-25 21:45:28,286][INFO][WeReadWebPage] Inject cookie wr_rt=***
[2024-01-25 21:45:28,288][INFO][WeReadWebPage] Inject cookie wr_vid=***
[2024-01-25 21:45:28,289][INFO][WeReadWebPage] Inject cookie wr_fp=***
[2024-01-25 21:45:28,291][INFO][WeReadWebPage] Inject cookie wr_pf=***
[2024-01-25 21:45:28,292][INFO][WeReadWebPage] Inject cookie wr_gender=
[2024-01-25 21:45:28,293][INFO][WeReadWebPage] Inject cookie wr_gid=***
[2024-01-25 21:45:28,294][INFO][WeReadWebPage] Inject cookie wr_skey=***
[2024-01-25 21:45:28,296][INFO][WeReadWebPage] Inject cookie wr_avatar=
[2024-01-25 21:45:28,298][INFO][WeReadWebPage] Inject cookie wr_name=
[2024-01-25 21:45:28,299][INFO][WeReadWebPage] Inject cookie wr_localvid=
[2024-01-25 21:45:37,943][INFO][WeReadExporter] Check chapter 2/版权信息
[2024-01-25 21:45:37,943][INFO][WeReadExporter] Check chapter 3/太平洋战争(一):山雨欲来
[2024-01-25 21:45:37,944][INFO][WeReadExporter] File cache\4de32960813ab8721g011b12\chapters\2-3.md not exist
[2024-01-25 21:45:37,944][INFO][WeReadWebPage] Go to chapter 3
[2024-01-25 21:45:37,951][INFO][WeReadWebPage] Fetch url https://weread.qq.com/web/reader/4de32960813ab8721g011b12kecc32f3013eccbc87e4b62e
[2024-01-25 21:45:38,385][INFO][WeReadWebPage] Fetch url https://midas.gtimg.cn/midas/minipay_v2/jsapi/cashier.js
[2024-01-25 21:45:38,387][INFO][WeReadWebPage] Fetch url https://cdn.weread.qq.com/web/wpa.js
[2024-01-25 21:45:38,387][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/css/app.3e110853.css
[2024-01-25 21:45:38,388][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/js/app.da544679.js
Traceback (most recent call last):
File "D:\Researching\Python310\lib\runpy.py", line 196, in run_module_as_main
return run_code(code, main_globals, None,
File "D:\Researching\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Portable\EdgeGPT\weread-exporter\weread_exporter_main
.py", line 158, in
main()
File "C:\Portable\EdgeGPT\weread-exporter\weread_exporter_main
.py", line 154, in main
loop.run_until_complete(async_main())
File "D:\Researching\Python310\lib\asyncio\base_events.py", line 649, in run_until_complete
return future.result()
File "C:\Portable\EdgeGPT\weread-exporter\weread_exporter_main
.py", line 92, in async_main
await exporter.export_markdown(args.load_timeout, args.load_interval)
File "C:\Portable\EdgeGPT\weread-exporter\weread_exporter\export.py", line 353, in export_markdown
markdown = await self._page.get_markdown()
File "C:\Portable\EdgeGPT\weread-exporter\weread_exporter\webpage.py", line 374, in get_markdown
raise RuntimeError("Wait for creating markdown timeout")
RuntimeError: Wait for creating markdown timeout

这本书下载会报错

hash: b5832a8072169a18b58c570
试了很多次都一样,其他书就没这个问题。

@macbook-pro weread-exporter-main % python -m weread_exporter -b b5832a8072169a18b58c570 -o epub
[2023-08-09 08:51:12,793][INFO]Exporting book b5832a8072169a18b58c570
[2023-08-09 08:51:12,920][INFO][WeReadWebPage] Launch url https://weread.qq.com/web/bookDetail/b5832a8072169a18b58c570
[2023-08-09 08:51:14,039][INFO]Browser listening on: ws://127.0.0.1:53070/devtools/browser/1fd9307a-20fb-43f1-ade3-ad8899ffeab5
[2023-08-09 08:51:14,815][INFO][WeReadWebPage] Update cookie wr_vid=
*
[2023-08-09 08:51:14,815][INFO][WeReadWebPage] Update cookie wr_skey=*********
[2023-08-09 08:51:14,816][INFO][WeReadWebPage] Update cookie wr_pf=0
[2023-08-09 08:51:14,816][INFO][WeReadWebPage] Update cookie wr_rt=*******
[2023-08-09 08:51:14,910][INFO][WeReadWebPage] Current login user is ********
[2023-08-09 08:51:14,910][INFO][WeReadWebPage] Inject cookie wr_gender=1
[2023-08-09 08:51:14,914][INFO][WeReadWebPage] Inject cookie ***********************
[2023-08-09 08:51:14,915][INFO][WeReadWebPage] Inject cookie wr_name=****
[2023-08-09 08:51:14,917][INFO][WeReadWebPage] Inject cookie wr_rt=*********
[2023-08-09 08:51:14,918][INFO][WeReadWebPage] Inject cookie wr_skey=********
[2023-08-09 08:51:14,920][INFO][WeReadWebPage] Inject cookie wr_vid=******
[2023-08-09 08:51:14,921][INFO][WeReadWebPage] Inject cookie wr_fp=******
[2023-08-09 08:51:14,922][INFO][WeReadWebPage] Inject cookie wr_pf=0
[2023-08-09 08:51:14,924][INFO][WeReadWebPage] Inject cookie wr_localvid=******
[2023-08-09 08:51:14,925][INFO][WeReadWebPage] Inject cookie wr_gid=******
[2023-08-09 08:51:21,067][INFO][WeReadExporter] Check chapter 2/说明
[2023-08-09 08:51:21,067][INFO][WeReadExporter] File cache/b5832a8072169a18b58c570/chapters/1-2.md not exist
[2023-08-09 08:51:21,068][INFO][WeReadWebPage] Go to chapter 2
[2023-08-09 08:51:21,086][INFO][WeReadWebPage] Fetch url https://weread.qq.com/web/reader/b5832a8072169a18b58c570kc81322c012c81e728d9d180
[2023-08-09 08:51:21,354][INFO][WeReadWebPage] Fetch url https://midas.gtimg.cn/midas/minipay_v2/jsapi/cashier.js
[2023-08-09 08:51:21,354][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/css/app.4605d864.css
[2023-08-09 08:51:21,355][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/js/app.c15c0d84.js
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/Users//Downloads/weread-exporter-main/weread_exporter/main.py", line 147, in
main()
File "/Users/
/Downloads/weread-exporter-main/weread_exporter/main.py", line 143, in main
loop.run_until_complete(async_main())
File "/opt/homebrew/Cellar/[email protected]/3.11.4/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users//Downloads/weread-exporter-main/weread_exporter/main.py", line 83, in async_main
await exporter.export_markdown(args.load_timeout, args.load_interval)
File "/Users/
/Downloads/weread-exporter-main/weread_exporter/export.py", line 353, in export_markdown
markdown = await self._page.get_markdown()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/****/Downloads/weread-exporter-main/weread_exporter/webpage.py", line 363, in get_markdown
raise RuntimeError("Wait for creating markdown timeout")
RuntimeError: Wait for creating markdown timeout

bug: 导出的pdf不含图片

环境:克隆最新代码
问题:文章获取、下载md和图片成功,但导出的pdf不含图片(所有图片均无)。

导出pdf时报错信息:
image

[2023-07-09 20:32:38,667][ERROR]Failed to load image at "file:///F:/git_repository/weread-exporter/cache/e1d326f0813ab7e2fg012a70/images/21b77ed54397af0db45f170a428c4abc.jpg" (Pixbuf error: Unrecognized image file format)
[2023-07-09 20:32:38,670][ERROR]Failed to load image at "file:///F:/git_repository/weread-exporter/cache/e1d326f0813ab7e2fg012a70/images/9bbc30619a54a5b1a33f22bdd3e2bd07.jpg" (Pixbuf error: Unrecognized image file format)
[2023-07-09 20:32:42,460][INFO]Save file output\软件单元测试.pdf complete

但可用浏览器或图片查看器预览对应位置的图片是正常的,路径也没问题。

通过分析:md 文件引用图片的路径是:![](images/xxx.jpg),而实际上 images 和 md 文件的父目录 chapter 才是同一级目录。
md 中正常预览图片需要:
1.将 images 目录复制到 chapter 目录下。

2.将图片引用链接改为 ![](../images/xxx.jpg)

但无论采取那种方式,导出的 pdf 仍然无图。

不知道是作者未实现导出pdf时导出图片,还是哪儿有bug.

目前只能先合并所有的md为同一个文档,然后用 pandoc 来导出pdf。

运行报错

安装完所有内容后,执行

 python -m weread_exporter -b 6a032b60813ab71f0g01944f -o pdf --force-login

会报下面的错误,我搜了很多cannot load library 'libcairo.so.2'相关的答案,按照他们的方式也去尝试都没有效果,我也不确定是不是我本地环境有问题,新接触Python,很多东西不了解,特来咨询下大佬,提供下方向。

cat@catdeMac weread-exporter-main2 % python3 -m weread_exporter -b 6a032b60813ab71f0g01944f -o pdf --force-login
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/Users/cat/Downloads/weread-exporter-main2/weread_exporter/main.py", line 147, in
main()
File "/Users/cat/Downloads/weread-exporter-main2/weread_exporter/main.py", line 143, in main
loop.run_until_complete(async_main())
File "/usr/local/Cellar/[email protected]/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/cat/Downloads/weread-exporter-main2/weread_exporter/main.py", line 16, in async_main
from . import export, utils, webpage
File "/Users/cat/Downloads/weread-exporter-main2/weread_exporter/export.py", line 12, in
from weasyprint import HTML, CSS
File "/usr/local/lib/python3.11/site-packages/weasyprint/init.py", line 469, in
from .css import preprocess_stylesheet # noqa isort:skip
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/weasyprint/css/init.py", line 27, in
from . import computed_values, counters, media_queries
File "/usr/local/lib/python3.11/site-packages/weasyprint/css/computed_values.py", line 15, in
from .. import text
File "/usr/local/lib/python3.11/site-packages/weasyprint/text.py", line 11, in
import cairocffi as cairo
File "/usr/local/lib/python3.11/site-packages/cairocffi/init.py", line 47, in
cairo = dlopen(
^^^^^^^
File "/usr/local/lib/python3.11/site-packages/cairocffi/init.py", line 44, in dlopen
raise OSError(error_message) # pragma: no cover
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: no library called "cairo-2" was found
no library called "cairo" was found
no library called "libcairo-2" was found
cannot load library 'libcairo.so.2': dlopen(libcairo.so.2, 0x0002): tried: 'libcairo.so.2' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibcairo.so.2' (no such file), '/usr/local/lib/libcairo.so.2' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/usr/local/lib/libcairo.so.2' (no such file), '/usr/lib/libcairo.so.2' (no such file, not in dyld cache), 'libcairo.so.2' (no such file), '/usr/local/lib/libcairo.so.2' (no such file), '/usr/lib/libcairo.so.2' (no such file, not in dyld cache). Additionally, ctypes.util.find_library() did not manage to locate a library called 'libcairo.so.2'
cannot load library 'libcairo.2.dylib': dlopen(libcairo.2.dylib, 0x0002): tried: 'libcairo.2.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibcairo.2.dylib' (no such file), '/usr/local/lib/libcairo.2.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/usr/local/lib/libcairo.2.dylib' (no such file), '/usr/lib/libcairo.2.dylib' (no such file, not in dyld cache), 'libcairo.2.dylib' (no such file), '/usr/local/lib/libcairo.2.dylib' (no such file), '/usr/lib/libcairo.2.dylib' (no such file, not in dyld cache). Additionally, ctypes.util.find_library() did not manage to locate a library called 'libcairo.2.dylib'
cannot load library 'libcairo-2.dll': dlopen(libcairo-2.dll, 0x0002): tried: 'libcairo-2.dll' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibcairo-2.dll' (no such file), '/usr/local/lib/libcairo-2.dll' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/usr/local/lib/libcairo-2.dll' (no such file), '/usr/lib/libcairo-2.dll' (no such file, not in dyld cache), 'libcairo-2.dll' (no such file), '/usr/local/lib/libcairo-2.dll' (no such file), '/usr/lib/libcairo-2.dll' (no such file, not in dyld cache). Additionally, ctypes.util.find_library() did not manage to locate a library called 'libcairo-2.dll'

后面尝试了把本地的所有Python环境重新删除后重装,依旧无法解决,安装后使用的是Python3,这个会不会有影响

[WIN] Fontconfig error: Cannot load default config file: No such file: (null)

Windows 平台报错,且无进一步的详细信息,以下为全文:

Fontconfig error: Cannot load default config file: No such file: (null)
[2024-03-31 23:33:53,119][INFO]Exporting book 83132780716754a783196a7
[2024-03-31 23:33:54,637][WARNING]Book 83132780716754a783196a7 status is invalid, stop exporting

微信读书文内注释样式

我目前发现微信读书脚注样有两种:

  1. 在微信读书中展示为图片上标,点击弹出显示注释,例如 https://weread.qq.com/web/reader/52f320a05cf65852f08359cka87322c014a87ff679a21ea 一书
    图片
    对应代码:

    <span class="reader_footer_note js_readerFooterNote" data-wr-footernote="《美国研究》曾刊登一篇论文,使用杰维斯的认知理论分析中美关系中的知觉与错误知觉。这是我看到的国内仅有的一篇用杰维斯国际政治心理学理论对中美关系所做实证性研究的文章。作者是杰维斯曾经执教过的美国加州大学洛杉矶分校的博士生。参见王栋:《超越国家利益:对20世纪90年代中美关系的知觉性解释》,载《美国研究》2001年第3期,第27—46页。另外,在一些关于西方国际关系理论的书中,对杰维斯和国际政治认知学派有简单的介绍。参见王逸舟:《西方国际政治学:历史与理论》,上海人民出版社1998年版,第三章第二节;倪世雄等:《当代西方国际关系理论》,复旦大学出版社2001年版,第四章第四节。"></span>
    
  2. 在微信读书中展示为链接上标,点击跳转,例如 https://weread.qq.com/web/reader/f8932f4072305432f89f7aa 一书

    图片

    对应文中上标代码:

    <a id="w15"></a><a href=""><span class="super"><span>[15]</span></span></a>
    

    对应文中脚注内容代码:

    <p class="note"><a id="m15"></a><a href=""><span>[15]</span></a><span> [美]S·E·佛罗斯特著,吴元训等译:《西方教育的历史和哲学基础》,华夏出版社1987年版,第170页。</span></p>
    

目前在cache的Markdown中,只有正文,没有注释:

1966年12月,大名鼎鼎的哲学家、**史家以赛亚·伯林到友人、著名美国学者埃
德蒙·威尔逊处做客。威尔逊在一则日记里提到,两人此间有过一次争论。伯林“变
得很激动,有时对人充满非理性的偏见”,威尔逊写道,“比如[对]汉娜·阿伦特,尽
管他从未读过她那本关于艾希曼的书”。在1987年发表在《耶鲁评论》上的一篇回
忆录里,伯林以同样的罪名讨伐威尔逊,并在1991年同威尔逊日记编辑的一次采访
中细述此事。我们不知道这次争执的最终结果,不过有一点我们是知道的:尽管
距离汉娜·阿伦特的《艾希曼在耶路撒冷:一份关于平庸的恶的报告》出版已经过

对于第一种样式的脚注会比较好处理,第二种可能不太好搞。或许原始格式保存为html而非Markdown会更合适?Markdown还是丢失了很多信息。

相关讨论 #48

改为300还是超时

pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 30000 ms exceeded.
[2023-06-12 17:15:10,147][INFO][WeReadWebPage] Go to chapter 5
[2023-06-12 17:15:10,148][ERROR]connection unexpectedly closed
[2023-06-12 17:15:10,148][ERROR]Task exception was never retrieved
future: <Task finished name='Task-618' coro=<Connection._async_send() done, defined at F:\python_install\Lib\site-packages\pyppeteer\connection.py:69> exception=InvalidStateError('invalid state')>
Traceback (most recent call last):
File "F:\python_install\Lib\site-packages\websockets\legacy\protocol.py", line 1314, in close_connection
await self.transfer_data_task
File "F:\python_install\Lib\site-packages\websockets\legacy\protocol.py", line 979, in transfer_data
await asyncio.shield(self._put_message_waiter)
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "F:\python_install\Lib\site-packages\pyppeteer\connection.py", line 73, in _async_send
await self.connection.send(msg)
File "F:\python_install\Lib\site-packages\websockets\legacy\protocol.py", line 635, in send
await self.ensure_open()
File "F:\python_install\Lib\site-packages\websockets\legacy\protocol.py", line 944, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: sent 1000 (OK); no close frame received

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "F:\python_install\Lib\site-packages\pyppeteer\connection.py", line 79, in _async_send
await self.dispose()
File "F:\python_install\Lib\site-packages\pyppeteer\connection.py", line 170, in dispose
await self._on_close()
File "F:\python_install\Lib\site-packages\pyppeteer\connection.py", line 151, in _on_close
cb.set_exception(_rewriteError(
asyncio.exceptions.InvalidStateError: invalid state

有的图片会缺失

比如:f4f32d30813ab7430g014017
最后附录部分的图片缺了几个,不全,
其它正文部分图片缺不缺不知道
这个的目录也有点乱
另外如果遇到疑难字【图片形式】感觉和正常文字的排版不太符合【这本里就有这种图片形式的文字

这个卡住不动了

c2332c70813ab7157g011108
这个卡住不动了:
[2023-06-17 21:44:20,357][INFO][Exporter] Check chapter 7/第1
[2023-06-17 21:44:20,357][INFO][***Exporter] File cache\c2332c70813ab7157g011108\chapters\6-7.md not exist
[2023-06-17 21:44:20,357][INFO][WebPage] Go to chapter 7
[2023-06-17 21:44:20,367][INFO][WebPage] Fetch url https://.
.com/web/reader/c2332c70813ab7157g011108k8f132430178f14e45fce0f7
[2023-06-17 21:44:20,730][INFO][***WebPage] Fetch url https://midas.gtimg.cn/midas/minipay_v2/jsapi/cashier.js
[2023-06-17 21:44:20,731][INFO][WebPage] Fetch url https://-1258476243.file.myqcloud.com/web/wrwebnjlogic/css/app.4605d864.css
[2023-06-17 21:44:20,733][INFO][WebPage] Fetch url https://-1258476243.file.myqcloud.com/web/wrwebnjlogic/js/app.d42bbcf6.js
[2023-06-17 21:44:24,030][INFO][***WebPage] Go to next page【卡住不动了】

部分书抓取时会卡死报错

像在抓取c8832370813ab7fdbg016f39这本书时,版权页就会卡死,然后程序重新抓取,又卡死,如此反复循环,按ctrl+c终止后报错如下:
D:\Software\weread-exporter-main>python -m weread_exporter -b c8832370813ab7fdbg016f39 -o epub
[2023-08-01 20:07:58,420][INFO]Exporting book c8832370813ab7fdbg016f39
[2023-08-01 20:07:58,577][INFO][WeReadWebPage] Launch url https://weread.qq.com/web/bookDetail/c8832370813ab7fdbg016f39
[2023-08-01 20:07:59,110][INFO]Browser listening on: ws://127.0.0.1:50099/devtools/browser/8328d9f7-f693-45bd-bba7-3f599a42261e

[2023-08-01 20:08:05,902][INFO][WeReadExporter] Check chapter 2/版权信息
[2023-08-01 20:08:05,902][INFO][WeReadExporter] File cache\c8832370813ab7fdbg016f39\chapters\1-2.md not exist
[2023-08-01 20:08:05,902][INFO][WeReadWebPage] Go to chapter 2
[2023-08-01 20:08:05,922][INFO][WeReadWebPage] Fetch url https://weread.qq.com/web/reader/c8832370813ab7fdbg016f39kc81322c012c81e728d9d180
[2023-08-01 20:08:06,209][INFO][WeReadWebPage] Fetch url https://midas.gtimg.cn/midas/minipay_v2/jsapi/cashier.js
[2023-08-01 20:08:06,211][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/css/app.4605d864.css
[2023-08-01 20:08:06,212][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/js/app.27ff86e3.js
[2023-08-01 20:08:35,910][WARNING]Load chapter failed, close browser and retry
[2023-08-01 20:08:35,911][INFO]terminate chrome process...
[2023-08-01 20:08:35,911][ERROR]connection unexpectedly closed
[2023-08-01 20:08:35,911][ERROR]Task exception was never retrieved
future: <Task finished name='Task-275' coro=<Connection._async_send() done, defined at C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py:69> exception=InvalidStateError('invalid state')>
Traceback (most recent call last):
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\websockets\legacy\protocol.py", line 979, in transfer_data
await asyncio.shield(self._put_message_waiter)
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 73, in _async_send
await self.connection.send(msg)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\websockets\legacy\protocol.py", line 635, in send
await self.ensure_open()
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\websockets\legacy\protocol.py", line 944, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: sent 1000 (OK); no close frame received

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 79, in _async_send
await self.dispose()
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 170, in dispose
await self._on_close()
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 151, in _on_close
cb.set_exception(_rewriteError(
asyncio.exceptions.InvalidStateError: invalid state
[2023-08-01 20:08:36,039][INFO][WeReadWebPage] Launch url https://weread.qq.com/web/bookDetail/c8832370813ab7fdbg016f39
[2023-08-01 20:08:36,575][INFO]Browser listening on: ws://127.0.0.1:50160/devtools/browser/5153350a-c749-4f25-9074-d84de4c8869a
[2023-08-01 20:08:42,724][INFO][WeReadExporter] Check chapter 2/版权信息
[2023-08-01 20:08:42,724][INFO][WeReadExporter] File cache\c8832370813ab7fdbg016f39\chapters\1-2.md not exist
[2023-08-01 20:08:42,724][INFO][WeReadWebPage] Go to chapter 2
[2023-08-01 20:08:42,735][INFO][WeReadWebPage] Fetch url https://weread.qq.com/web/reader/c8832370813ab7fdbg016f39kc81322c012c81e728d9d180
[2023-08-01 20:08:42,961][INFO][WeReadWebPage] Fetch url https://midas.gtimg.cn/midas/minipay_v2/jsapi/cashier.js
[2023-08-01 20:08:42,962][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/css/app.4605d864.css
[2023-08-01 20:08:42,963][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/js/app.27ff86e3.js
Traceback (most recent call last):
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
return run_code(code, main_globals, None,
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "D:\Software\weread-exporter-main\weread_exporter_main
.py", line 147, in
main()
File "D:\Software\weread-exporter-main\weread_exporter_main
.py", line 143, in main
loop.run_until_complete(async_main())
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\asyncio\base_events.py", line 629, in run_until_complete
self.run_forever()
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\asyncio\windows_events.py", line 321, in run_forever
super().run_forever()
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\asyncio\base_events.py", line 596, in run_forever
self._run_once()
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\asyncio\base_events.py", line 1854, in _run_once
event_list = self._selector.select(timeout)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\asyncio\windows_events.py", line 439, in select
self._poll(timeout)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\asyncio\windows_events.py", line 788, in _poll
status = _overlapped.GetQueuedCompletionStatus(self._iocp, ms)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\launcher.py", line 153, in _close_process
self._loop.run_until_complete(self.killChrome())
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\asyncio\base_events.py", line 618, in run_until_complete
self._check_running()
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\asyncio\base_events.py", line 578, in _check_running
raise RuntimeError('This event loop is already running')
RuntimeError: This event loop is already running
[2023-08-01 20:08:54,035][INFO]terminate chrome process...
[2023-08-01 20:08:54,035][ERROR]connection unexpectedly closed
[2023-08-01 20:08:54,035][ERROR]Task exception was never retrieved
future: <Task finished name='Task-544' coro=<Connection._async_send() done, defined at C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py:69> exception=InvalidStateError('invalid state')>
Traceback (most recent call last):
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 73, in _async_send
await self.connection.send(msg)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\websockets\legacy\protocol.py", line 635, in send
await self.ensure_open()
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\websockets\legacy\protocol.py", line 944, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: sent 1000 (OK); no close frame received

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 79, in _async_send
await self.dispose()
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 170, in dispose
await self._on_close()
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 151, in _on_close
cb.set_exception(_rewriteError(
asyncio.exceptions.InvalidStateError: invalid state
[2023-08-01 20:08:54,136][ERROR]Task exception was never retrieved
future: <Task finished name='Task-4' coro=<Connection._recv_loop() done, defined at C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py:53> exception=UnicodeEncodeError('gbk', '[https://weread.qq.com/web/reader/c8832370813ab7fdbg016f39kc81322c012c81e728d9d180] fillText © 0 881.3333339691162 JSHandle@array\r\n', 93, 94, 'illegal multibyte sequence')>
Traceback (most recent call last):
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 61, in _recv_loop
await self._on_message(resp)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 143, in _on_message
self._on_query(msg)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 123, in _on_query
session._on_message(params.get('message'))
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 276, in _on_message
self.emit(obj.get('method'), obj.get('params'))
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyee_base.py", line 115, in emit
handled = self._call_handlers(event, args, kwargs)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyee_base.py", line 98, in _call_handlers
self._emit_run(f, args, kwargs)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyee_base.py", line 83, in _emit_run
f(*args, **kwargs)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\page.py", line 184, in
client.on('Runtime.consoleAPICalled', lambda event: self._onConsoleAPI(event))
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\page.py", line 692, in _onConsoleAPI
self._addConsoleMessage(event['type'], values)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\page.py", line 729, in _addConsoleMessage
self.emit(Page.Events.Console, message)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyee_base.py", line 115, in emit
handled = self._call_handlers(event, args, kwargs)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyee_base.py", line 98, in _call_handlers
self._emit_run(f, args, kwargs)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyee_base.py", line 83, in _emit_run
f(*args, **kwargs)
File "D:\Software\weread-exporter-main\weread_exporter\webpage.py", line 234, in handle_log
fp.write("[%s] %s\n" % (self._url, message.text))
UnicodeEncodeError: 'gbk' codec can't encode character '\xa9' in position 93: illegal multibyte sequence
[2023-08-01 20:08:54,182][ERROR]Task was destroyed but it is pending!
task: <Task pending name='Task-179' coro=<WeReadWebPage._handle_request() running at D:\Software\weread-exporter-main\weread_exporter\webpage.py:337> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x0000025681125D30>()]>>
sys:1: RuntimeWarning: coroutine 'Launcher.killChrome' was never awaited

正文中的换行符

有些书在抽取的时候把网页上显示的换行也带了下来,是否有办法去除这些换行符?

报错

ok
有些图片会缺失,不知道啥原因

运行报错,麻烦看一下,谢谢!!!

运行的时候报错,麻烦看一下哪里出错了,谢谢!!!!

[2023-12-09 00:23:02,096][INFO]Exporting book b4a32760813ab8187g015f3f
[2023-12-09 00:23:02,377][ERROR]Fetch url https://weread.qq.com/web/bookDetail/b4a32760813ab8187g015f3f failed
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 992, in _wrap_create_connection
return await self._loop.create_connection(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 1112, in create_connection
transport, protocol = await self._create_connection_transport(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 1145, in _create_connection_transport
await waiter
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/sslproto.py", line 574, in _on_handshake_complete
raise handshake_exc
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/sslproto.py", line 556, in _do_handshake
self._sslobj.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 979, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/oilo/DEV/Projects/Proj-VSCode/drunkdream-weread-exporter/weread_exporter/utils.py", line 28, in fetch
async with session.get(url, headers=headers) as response:
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/client.py", line 1187, in aenter
self._resp = await self._coro
^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/client.py", line 574, in _request
conn = await self._connector.connect(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 544, in connect
proto = await self._create_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 911, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 1235, in _create_direct_connection
raise last_exc
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 1204, in _create_direct_connection
transp, proto = await self._wrap_create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 994, in _wrap_create_connection
raise ClientConnectorCertificateError(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host weread.qq.com:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)')]
[2023-12-09 00:23:02,537][ERROR]Fetch url https://weread.qq.com/web/bookDetail/b4a32760813ab8187g015f3f failed
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 992, in _wrap_create_connection
return await self._loop.create_connection(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 1112, in create_connection
transport, protocol = await self._create_connection_transport(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 1145, in _create_connection_transport
await waiter
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/sslproto.py", line 574, in _on_handshake_complete
raise handshake_exc
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/sslproto.py", line 556, in _do_handshake
self._sslobj.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 979, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/oilo/DEV/Projects/Proj-VSCode/drunkdream-weread-exporter/weread_exporter/utils.py", line 28, in fetch
async with session.get(url, headers=headers) as response:
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/client.py", line 1187, in aenter
self._resp = await self._coro
^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/client.py", line 574, in _request
conn = await self._connector.connect(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 544, in connect
proto = await self._create_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 911, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 1235, in _create_direct_connection
raise last_exc
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 1204, in _create_direct_connection
transp, proto = await self._wrap_create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 994, in _wrap_create_connection
raise ClientConnectorCertificateError(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host weread.qq.com:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)')]
[2023-12-09 00:23:02,694][ERROR]Fetch url https://weread.qq.com/web/bookDetail/b4a32760813ab8187g015f3f failed
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 992, in _wrap_create_connection
return await self._loop.create_connection(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 1112, in create_connection
transport, protocol = await self._create_connection_transport(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 1145, in _create_connection_transport
await waiter
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/sslproto.py", line 574, in _on_handshake_complete
raise handshake_exc
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/sslproto.py", line 556, in _do_handshake
self._sslobj.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 979, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/oilo/DEV/Projects/Proj-VSCode/drunkdream-weread-exporter/weread_exporter/utils.py", line 28, in fetch
async with session.get(url, headers=headers) as response:
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/client.py", line 1187, in aenter
self._resp = await self._coro
^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/client.py", line 574, in _request
conn = await self._connector.connect(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 544, in connect
proto = await self._create_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 911, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 1235, in _create_direct_connection
raise last_exc
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 1204, in _create_direct_connection
transp, proto = await self._wrap_create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 994, in _wrap_create_connection
raise ClientConnectorCertificateError(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host weread.qq.com:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)')]
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/Users/oilo/DEV/Projects/Proj-VSCode/drunkdream-weread-exporter/weread_exporter/main.py", line 147, in
main()
File "/Users/oilo/DEV/Projects/Proj-VSCode/drunkdream-weread-exporter/weread_exporter/main.py", line 143, in main
loop.run_until_complete(async_main())
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/oilo/DEV/Projects/Proj-VSCode/drunkdream-weread-exporter/weread_exporter/main.py", line 67, in async_main
if not await page.check_valid():
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/oilo/DEV/Projects/Proj-VSCode/drunkdream-weread-exporter/weread_exporter/webpage.py", line 141, in check_valid
html = await utils.fetch(self._home_url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/oilo/DEV/Projects/Proj-VSCode/drunkdream-weread-exporter/weread_exporter/utils.py", line 38, in fetch
raise RuntimeError("Fetch url %s failed" % url)
RuntimeError: Fetch url https://weread.qq.com/web/bookDetail/b4a32760813ab8187g015f3f failed

Browser closed unexpectedly报错是为什么?

[2023-10-28 13:58:07,233][INFO]Exporting book f8a32350813ab71e0g015d0c
[2023-10-28 13:58:07,418][INFO][WeReadWebPage] Launch url https://weread.qq.com/web/bookDetail/f8a32350813ab71e0g015d0c
Traceback (most recent call last):
  File "/usr/local/Cellar/[email protected]/3.9.18/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/Cellar/[email protected]/3.9.18/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/maq/PyProject/weread-exporter/weread_exporter/__main__.py", line 147, in <module>
    main()
  File "/Users/maq/PyProject/weread-exporter/weread_exporter/__main__.py", line 143, in main
    loop.run_until_complete(async_main())
  File "/usr/local/Cellar/[email protected]/3.9.18/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/Users/maq/PyProject/weread-exporter/weread_exporter/__main__.py", line 77, in async_main
    await page.launch(headless=args.headless, force_login=args.force_login)
  File "/Users/maq/PyProject/weread-exporter/weread_exporter/webpage.py", line 153, in launch
    self._browser = await pyppeteer.launch(
  File "/Users/maq/PyProject/weread-exporter/venv/lib/python3.9/site-packages/pyppeteer/launcher.py", line 307, in launch
    return await Launcher(options, **kwargs).launch()
  File "/Users/maq/PyProject/weread-exporter/venv/lib/python3.9/site-packages/pyppeteer/launcher.py", line 168, in launch
    self.browserWSEndpoint = get_ws_endpoint(self.url)
  File "/Users/maq/PyProject/weread-exporter/venv/lib/python3.9/site-packages/pyppeteer/launcher.py", line 227, in get_ws_endpoint
    raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:

makdown creating timeout

环境配置没问题
运行环境
内核版本 #1 SMP PREEMPT_DYNAMIC Debian 6.3.7-1kali1 (2023-06-29) x86_64 GNU/Linux
python 版本 Python 3.11.4
chrome 版本 Google Chrome 116.0.5845.96 unknown

报错信息如下
Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/home/kali/python/weread-exporter/weread_exporter/__main__.py", line 147, in <module> main() File "/home/kali/python/weread-exporter/weread_exporter/__main__.py", line 143, in main loop.run_until_complete(async_main()) File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/home/kali/python/weread-exporter/weread_exporter/__main__.py", line 83, in async_main await exporter.export_markdown(args.load_timeout, args.load_interval) File "/home/kali/python/weread-exporter/weread_exporter/export.py", line 353, in export_markdown markdown = await self._page.get_markdown() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/kali/python/weread-exporter/weread_exporter/webpage.py", line 363, in get_markdown raise RuntimeError("Wait for creating markdown timeout") RuntimeError: Wait for creating markdown timeout [2023-08-16 22:18:33,692][INFO]terminate chrome process...
运行行为 成功启动chrome,并开始采录第一页,几秒过后即报错

是必须购买的书籍才可以导出吗?

会员到了付费内容部分就没法导出了?我发现我自己购买的书籍可以导出没问题,但是每购买的,就会卡在付费内容那里无限的重启,无法继续。

错误信息如下:
[2023-12-21 10:51:15,028][WARNING]Load chapter failed, close browser and retry
[2023-12-21 10:51:15,028][INFO]terminate chrome process...
[2023-12-21 10:51:15,029][ERROR]connection unexpectedly closed
[2023-12-21 10:51:15,029][ERROR]Task exception was never retrieved
future: <Task finished name='Task-551' coro=<Connection._async_send() done, defined at C:\Python312\Lib\site-packages\pyppeteer\connection.py:69> exception=InvalidStateError('invalid state')>
Traceback (most recent call last):
File "C:\Python312\Lib\site-packages\websockets\legacy\protocol.py", line 1314, in close_connection
await self.transfer_data_task
File "C:\Python312\Lib\site-packages\websockets\legacy\protocol.py", line 979, in transfer_data
await asyncio.shield(self._put_message_waiter)
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Python312\Lib\site-packages\pyppeteer\connection.py", line 73, in _async_send
await self.connection.send(msg)
File "C:\Python312\Lib\site-packages\websockets\legacy\protocol.py", line 635, in send
await self.ensure_open()
File "C:\Python312\Lib\site-packages\websockets\legacy\protocol.py", line 944, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: sent 1000 (OK); no close frame received

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Python312\Lib\site-packages\pyppeteer\connection.py", line 79, in _async_send
await self.dispose()
File "C:\Python312\Lib\site-packages\pyppeteer\connection.py", line 170, in dispose
await self._on_close()
File "C:\Python312\Lib\site-packages\pyppeteer\connection.py", line 151, in _on_close
cb.set_exception(_rewriteError(
asyncio.exceptions.InvalidStateError: invalid state

🚀💥 零基础也能hold住!轻松学会科学上网和ChatGPT

科学上网/翻墙梯子 ChatGPT可用机场

【最新推荐】2023年必备的超值推荐神器来啦!WgetCloud机场绝对是你不容错过的宝贝!不仅提供多线BGP中转+双程CN2高品质线路,还设有位于香港的私家机房,稳定性和安全性无可挑剔!更棒的是,机场的团队成员分布在海外,为你提供无忧的安全保障!听说运营人员都拥有着5年的机场行业经验,技术可靠到极致!

【最新技术】WgetCloud机场采用Shadowsocks协议,最新新增支持SSR、V2ray和Trojan协议,各大平台软件对SS协议的支持程度也相当高!而且,机场还支持所有主流的代理订阅格式——Clash、Shadowrocket、Quantumult X、Surge 4,应有尽有!机场更有一项强项:采用分组制管理线路节点,一组人数满400即停止增加,并追加动态限速,确保线路高速稳定!

赶快点击下方链接注册成为新用户吧!全员无门槛8折优惠券等你来领取!立即体验WgetCloud机场,享受一场高速、稳定的网络之旅吧!💻💸🎉

WgetCloud官网链接

以下为晚高峰测速:

image

windows10 python 312 "fontconfig error"

C:\temp\Kindle\weread-exporter-main>python -m weread_exporter -b 4e132bc07263ff664e11075 -o epub -o pdf --force-login
Fontconfig error: Cannot load default config file: No such file: (null)
[2024-03-12 22:05:12,303][INFO]Exporting book 4e132bc07263ff664e11075
[2024-03-12 22:05:12,500][INFO][WeReadWebPage] Launch url https://weread.qq.com/web/bookDetail/4e132bc07263ff664e11075
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\temp\Kindle\weread-exporter-main\weread_exporter_main
.py", line 158, in
main()
File "C:\temp\Kindle\weread-exporter-main\weread_exporter_main
.py", line 154, in main
loop.run_until_complete(async_main())
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 685, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "C:\temp\Kindle\weread-exporter-main\weread_exporter_main_.py", line 85, in async_main
await page.launch(headless=args.headless, force_login=args.force_login)
File "C:\temp\Kindle\weread-exporter-main\weread_exporter\webpage.py", line 174, in launch
chrome = self._check_chrome()
^^^^^^^^^^^^^^^^^^^^
File "C:\temp\Kindle\weread-exporter-main\weread_exporter\webpage.py", line 167, in _check_chrome
raise utils.ChromeNotInstalledError(
weread_exporter.utils.ChromeNotInstalledError: Please make sure chrome is installed, and the install path is added to PATH environment.
You can test that with where chrome command.

查了一下,好像font这个错很多,但是大部分都是linux下的解决方法,windows下没有头绪,是要在path增加font文件路径?

有些书最后一页下载不了

原因是: readerFooter 里 没有 readerFooter_button 是 readerFooter_ending_title

把函数改为下边就可以了。

async def _check_next_page(self):
    while True:
        result = ''
        try:
            await self.wait_for_selector(
                # "button.readerFooter_button", timeout=59000
                "div.readerFooter", timeout=59000
            )
            try:
                result = await self._page.evaluate(
                    "document.getElementsByClassName('readerFooter_button')[0].innerText;"
                )
            except pyppeteer.errors.ElementHandleError:
                logging.info("[%s] load selector ElementHandleError " % self.__class__.__name__)
                result = await self._page.evaluate(
                    "document.getElementsByClassName('readerFooter_ending_title')[0].innerText;"
                )

        except pyppeteer.errors.TimeoutError:
            logging.info("[%s] load selector timeout " % self.__class__.__name__)
            break

        if result == "下一页":
            logging.info("[%s] Go to next page" % self.__class__.__name__)
            await self._page.evaluate(
                r"canvasContextHandler.data.markdown += '\n\n';"
            )
            await self.pre_load_page()
            await self._page.click("button.readerFooter_button")
            await asyncio.sleep(1)
        elif result == "下一章":
            break
        elif result.startswith("登录"):
            raise utils.LoginRequiredError()
        elif result == "全 书 完":
            break
        else:
            raise NotImplementedError(result)

抓取书时遇到UnicodeEncodeError

在抓取63f32a40813ab7cd5g011236这本书的第二章时,会遇到报错:UnicodeEncodeError: 'gbk' codec can't encode character '\xa0' in position 7: illegal multibyte sequence。

请问能看下这个问题吗,感谢。

运行代码后扫码登录后,一小会就自动关了窗口,该怎么弄啊?(win10+python3.9 chrome113/114都试过)

D:\weread-exporter>python -m weread_exporter -b 08232ac0720befa90825d88 -o epub -o pdf
[2023-05-11 13:56:23,565][INFO]Exporting book 08232ac0720befa90825d88
[2023-05-11 13:56:23,941][INFO][WeReadWebPage] Launch url https://weread.qq.com/web/bookDetail/08232ac0720befa90825d88
[2023-05-11 13:56:24,505][INFO]Browser listening on: ws://127.0.0.1:52744/devtools/browser/1cbe3df2-9107-4d13-9587-b14fd21ceebf
Traceback (most recent call last):
File "D:\Program Files\Python39\lib\runpy.py", line 197, in run_module_as_main
return run_code(code, main_globals, None,
File "D:\Program Files\Python39\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "D:\weread-exporter\weread_exporter_main.py", line 126, in
main()
File "D:\weread-exporter\weread_exporter_main.py", line 122, in main
loop.run_until_complete(async_main())
File "D:\Program Files\Python39\lib\asyncio\base_events.py", line 647, in run_until_complete
return future.result()
File "D:\weread-exporter\weread_exporter_main.py", line 63, in async_main
await page.launch(args.force_login)
File "D:\weread-exporter\weread_exporter\webpage.py", line 170, in launch
await self._page.waitForSelector("div.readerFooter a")
File "D:\Program Files\Python39\Lib\site-packages\pyppeteer\frame_manager.py", line 855, in await
raise result
pyppeteer.errors.TimeoutError: Waiting for selector "div.readerFooter a" failed: timeout 30000ms exceeds.
[2023-05-11 13:56:57,102][INFO]terminate chrome process...

实在搞不清楚了 求助各位大神 No such file: (null)问题

Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\Users\daile\Desktop\tt\weread-exporter\weread_exporter_main
.py", line 158, in
main()
File "C:\Users\daile\Desktop\tt\weread-exporter\weread_exporter_main
.py", line 154, in main
loop.run_until_complete(async_main())
File "C:\Python312\Lib\asyncio\base_events.py", line 684, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "C:\Users\daile\Desktop\tt\weread-exporter\weread_exporter_main_.py", line 17, in async_main
from . import export, utils, webpage
File "C:\Users\daile\Desktop\tt\weread-exporter\weread_exporter\export.py", line 14, in
from . import utils
File "C:\Users\daile\Desktop\tt\weread-exporter\weread_exporter\utils.py", line 4, in
import aiohttp
ModuleNotFoundError: No module named 'aiohttp'
Fontconfig error: Cannot load default config file: No such file: (null)

点赞👍

🌹 天才项目,全网最强 🌹

  • Mac 无法找到 chrome 的问题
    由于谷歌浏览器默认安装的是 Google Chrome 所以无法找到 chrome,可以在 Google Chrome.app 中创建一个 Google Chrome 的链接 chrome

  • pdf 导出字体很奇怪
    在 style.css 中设定字体,比如使用 LXGW WenKai 字体

windows 10系统运行该项目遇到的两个问题以及解决办法

先说结论给大家打打气:该项目可以爬取微信读书上已购买的书籍,付费书籍需要购买才能完整爬取。

项目使用过程中遇到的两个问题:

1、运行python -m weread_exporter -b $book_id -o epub -o pdf命令时报错,报错提示如下:
报错:
OSError: no library called "cairo-2" was found
no library called "cairo" was found
no library called "libcairo-2" was found
cannot load library 'libcairo.so.2': error 0x7e
cannot load library 'libcairo.2.dylib': error 0x7e
cannot load library 'libcairo-2.dll': error 0x7e

报错原因:Python环境缺少这三个库:cairo-2、cairo、libcairo-2。如果使用以下命令直接安装这三个库还是会报错:
pip install pycairo
pip install cairocffi
pip install WeasyPrint
因为在Windows系统上安装WeasyPrint还需要其它步骤,而且我们只需要安装WeasyPrint,其它两个库在安装WeasyPrint的过程中会自动附带。

报错1的解决办法:
在Windows 10环境中安装WeasyPrint可以按照以下步骤进行:

①安装3.7版本以上的Python环境:访问Python官网([https://www.python.org/downloads/windows/),下载并安装3.7版本以上的Python。(具体怎么安装网上找教程)

②安装GTK+运行时环境:WeasyPrint使用了GTK+和cairo库来实现渲染和布局,因此需要先安装GTK+运行时环境。访问GTK+官网([https://www.gtk.org/docs/installations/windows/)或者github项目(https://github.com/tschoonj/GTK-for-Windows-Runtime-Environment-Installer/releases),下载并安装最新版本的GTK+运行时环境。

③安装WeasyPrint:在安装好Python和GTK+运行时环境后,可以使用以下命令来安装WeasyPrint(如果之前安装过就不用安装了):

python -m pip install weasyprint

④测试安装是否成功:在安装完WeasyPrint后,可以使用以下命令来测试是否安装成功:

python -m weasyprint --version
如果安装成功,将输出WeasyPrint的版本号。

注意事项:

如果你在安装WeasyPrint时遇到了找不到libffi-7.dll的错误,请下载libffi-7.dll文件并将其放置在Python安装目录下的DLLs文件夹中。

WeasyPrint需要的是GTK+ 3.x版本,而不是GTK+ 2.x版本。因此,请确保安装的是GTK+ 3.x版本的运行时环境。
不确定以上步骤是否对Windows 11适用,Windows 11系统安装WeasyPrint,可以参考官网的windows部分(https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#cairo)


2、安装完WeasyPrint后,执行命令python -m weread_exporter -b $book_id -o epub -o pdf报错:
Windows 10系统中提示文件找不到(FileNotFound)。mac系统中提示chrome找不到。

报错原因:没有配置Chrome浏览器的系统环境变量。

解决办法:将chrome.exe的目录配置在系统环境变量中。比如,我的谷歌浏览器路径是:C:\Program Files\Google\Chrome\Application\chrome.exe,就将C:\Program Files\Google\Chrome\Application配置到系统环境变量中。(如果不会配置系统环境变量,百度一下)


3、这个不是错误,还是说一下:
使用命令:python -m weread_exporter -b $book_id -o epub -o pdf ,脚本采集书籍到一半的时候(免费阅读章节完的时候)会中断让你扫码登录,不是很方便。因此建议使用命令:python -m weread_exporter -b $book_id -o epub -o pdf --force-login,该命令直接在最开始就让你登陆,避免采集到一半暂停。

备注:使用的时候遇到一个问题:如果之前使用这个项目的脚本登录过微信读书(就是运行脚本,脚本会自动调用Chrome打开微信读书的网页,你在这个网页登录过),那么之后就不能使用python -m weread_exporter -b $book_id -o epub -o pdf --force-login,否则会报错。这时候直接用python -m weread_exporter -b $book_id -o epub -o pdf 就好了。

有的没下完整

比如:aaa322a0813ab7c0eg011f87
正文部分应该完整了,参考文献部分下了一点点,没下完整

macos下无法运行。。

按照文档,运行后出现以下错误提示:
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/Users/acookie/Desktop/weread-exporter/weread_exporter/main.py", line 158, in
main()
File "/Users/acookie/Desktop/weread-exporter/weread_exporter/main.py", line 154, in main
loop.run_until_complete(async_main())
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 684, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/acookie/Desktop/weread-exporter/weread_exporter/main.py", line 17, in async_main
from . import export, utils, webpage
File "/Users/acookie/Desktop/weread-exporter/weread_exporter/export.py", line 12, in
from weasyprint import HTML, CSS
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/weasyprint/init.py", line 469, in
from .css import preprocess_stylesheet # noqa isort:skip
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/weasyprint/css/init.py", line 27, in
from . import computed_values, counters, media_queries
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/weasyprint/css/computed_values.py", line 15, in
from .. import text
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/weasyprint/text.py", line 11, in
import cairocffi as cairo
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/cairocffi/init.py", line 47, in
cairo = dlopen(
^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/cairocffi/init.py", line 44, in dlopen
raise OSError(error_message) # pragma: no cover
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: no library called "cairo-2" was found
no library called "cairo" was found
no library called "libcairo-2" was found
cannot load library 'libcairo.so.2': dlopen(libcairo.so.2, 0x0002): tried: 'libcairo.so.2' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibcairo.so.2' (no such file), '/usr/lib/libcairo.so.2' (no such file, not in dyld cache), 'libcairo.so.2' (no such file), '/usr/lib/libcairo.so.2' (no such file, not in dyld cache). Additionally, ctypes.util.find_library() did not manage to locate a library called 'libcairo.so.2'
cannot load library 'libcairo.2.dylib': dlopen(libcairo.2.dylib, 0x0002): tried: 'libcairo.2.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibcairo.2.dylib' (no such file), '/usr/lib/libcairo.2.dylib' (no such file, not in dyld cache), 'libcairo.2.dylib' (no such file), '/usr/lib/libcairo.2.dylib' (no such file, not in dyld cache). Additionally, ctypes.util.find_library() did not manage to locate a library called 'libcairo.2.dylib'
cannot load library 'libcairo-2.dll': dlopen(libcairo-2.dll, 0x0002): tried: 'libcairo-2.dll' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibcairo-2.dll' (no such file), '/usr/lib/libcairo-2.dll' (no such file, not in dyld cache), 'libcairo-2.dll' (no such file), '/usr/lib/libcairo-2.dll' (no such file, not in dyld cache). Additionally, ctypes.util.find_library() did not manage to locate a library called 'libcairo-2.dll'

下载章节暂停——已解决

书籍已经购买。
用的命令是
python -m weread_exporter -b 339321e0813ab6d0fg019228 -o epub -o pdf --load-timeout=300
然后Ctrl+C结果是这样的
屏幕截图 2023-12-22 232025

图书的页面也打开了,登陆也成功了,但是最后显示Load chapter failed, close browser and retry

(.venv) C:\Users\Town>python -m weread_exporter -b 54c32520715e229954c8b8a -o epub -o epub --force-login
Fontconfig error: Cannot load default config file: No such file: (null)
[2024-01-15 22:07:21,033][INFO]Exporting book 54c32520715e229954c8b8a
[2024-01-15 22:07:21,479][INFO][WeReadWebPage] Launch url https://weread.qq.com/web/bookDetail/54c32520715e229954c8b8a
[2024-01-15 22:07:22,099][INFO]Browser listening on: ws://127.0.0.1:1258/devtools/browser/26aee2a6-c51b-4b45-b8b8-0793e1bb6f06
[2024-01-15 22:07:24,684][INFO][WeReadWebPage] Waiting for login
[2024-01-15 22:07:34,693][INFO][WeReadWebPage] Login success
[2024-01-15 22:07:35,289][INFO][WeReadExporter] Check chapter 19/版权信息
[2024-01-15 22:07:35,290][INFO][WeReadExporter] File cache\54c32520715e229954c8b8a\chapters\1-19.md not exist
[2024-01-15 22:07:35,291][INFO][WeReadWebPage] Go to chapter 19
[2024-01-15 22:07:35,303][INFO][WeReadWebPage] Fetch url https://weread.qq.com/web/reader/54c32520715e229954c8b8ak1f032c402131f0e3dad99f3
[2024-01-15 22:07:35,856][INFO][WeReadWebPage] Fetch url https://midas.gtimg.cn/midas/minipay_v2/jsapi/cashier.js
[2024-01-15 22:07:35,859][INFO][WeReadWebPage] Fetch url https://cdn.weread.qq.com/web/wpa.js
[2024-01-15 22:07:35,860][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/css/app.3e110853.css
[2024-01-15 22:07:35,862][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/js/app.e7373bc5.js
[2024-01-15 22:08:05,313][WARNING]Load chapter failed, close browser and retry
[2024-01-15 22:08:05,313][INFO]terminate chrome process...

您好,小白Mac遇到如下No module named 'bs4'问题,卡在这里了,谢谢帮忙!

jaimezhang@192 weread-exporter-main % python -m weread_exporter -b 08232ac0720befa90825d88 -o epub -o pdf
Traceback (most recent call last):
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/jaimezhang/weread-exporter-main/weread_exporter/main.py", line 158, in
main()
File "/Users/jaimezhang/weread-exporter-main/weread_exporter/main.py", line 154, in main
loop.run_until_complete(async_main())
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/Users/jaimezhang/weread-exporter-main/weread_exporter/main.py", line 17, in async_main
from . import export, utils, webpage
File "/Users/jaimezhang/weread-exporter-main/weread_exporter/export.py", line 8, in
import bs4
ModuleNotFoundError: No module named 'bs4'

反馈:下载不完整、报错,谢谢!

比如
https://weread.qq.com/web/bookDetail/ddc3252071dbe8a8ddc8170 可能有些章节较多下 不了卡住,比如这本
[2023-11-27 13:52:18,250][INFO][WeReadExporter] File cache\ddc3252071dbe8a8ddc8170\chapters\7-8.md not exist
[2023-11-27 13:52:18,251][INFO][WeReadWebPage] Go to chapter 8
[2023-11-27 13:52:18,276][INFO][WeReadWebPage] Fetch url https://weread.qq.com/web/reader/ddc3252071dbe8a8ddc8170kc9f326d018c9f0f895fb5e4
[2023-11-27 13:52:18,648][INFO][WeReadWebPage] Fetch url https://midas.gtimg.cn/midas/minipay_v2/jsapi/cashier.js
[2023-11-27 13:52:18,653][INFO][WeReadWebPage] Fetch url https://cdn.weread.qq.com/web/wpa.js
[2023-11-27 13:52:18,656][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/css/app.02ecef75.css
[2023-11-27 13:52:18,708][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/css/8.a2448854.css
[2023-11-27 13:52:18,728][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/js/app.e2263c63.js
[2023-11-27 13:52:48,269][WARNING]Load chapter failed, close browser and retry
[2023-11-27 13:52:48,270][INFO]terminate chrome process...
[2023-11-27 13:52:48,272][ERROR]connection unexpectedly closed
[2023-11-27 13:52:48,273][ERROR]Task exception was never retrieved
future: <Task finished name='Task-2448' coro=<Connection._async_send() done, defined at C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyppeteer\connection.py:69> exception=InvalidStateError('invalid state')>
Traceback (most recent call last):
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\websockets\legacy\protocol.py", line 968, in transfer_data
message = await self.read_message()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\websockets\legacy\protocol.py", line 1038, in read_message
frame = await self.read_data_frame(max_size=self.max_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\websockets\legacy\protocol.py", line 1113, in read_data_frame
frame = await self.read_frame(max_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\websockets\legacy\protocol.py", line 1170, in read_frame
frame = await Frame.read(
^^^^^^^^^^^^^^^^^
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\websockets\legacy\framing.py", line 69, in read
data = await reader(2)
^^^^^^^^^^^^^^^
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\asyncio\streams.py", line 727, in readexactly
raise exceptions.IncompleteReadError(incomplete, n)
asyncio.exceptions.IncompleteReadError: 0 bytes read on a total of 2 expected bytes

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyppeteer\connection.py", line 73, in _async_send
await self.connection.send(msg)
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\websockets\legacy\protocol.py", line 635, in send
await self.ensure_open()
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\websockets\legacy\protocol.py", line 944, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: sent 1000 (OK); no close frame received

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyppeteer\connection.py", line 79, in _async_send
await self.dispose()
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyppeteer\connection.py", line 170, in dispose
await self._on_close()
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyppeteer\connection.py", line 151, in _on_close
cb.set_exception(_rewriteError(
asyncio.exceptions.InvalidStateError: invalid state

望指点,谢谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.