drunkdream / weread-exporter Goto Github PK
View Code? Open in Web Editor NEW将微信读书中的书籍导出成epub、pdf、mobi等格式
将微信读书中的书籍导出成epub、pdf、mobi等格式
bookid:c3032820813ab8038g014ada
9-10 章节有图片就会卡死,超时后重试,循环如此
90b32020813ab7914g019ab7
这个运行到第三篇的时候,卡住不动了
[2023-05-19 14:31:37,263][ERROR][Exporter] Go to chapter 第三篇 **** failed
Traceback (most recent call last):
File "C:\Users\2020\Desktop\1*-exporter-main*_exporter\export.py", line 303, in export_markdown
await self._page.goto_chapter(
File "C:\Users\2020\Desktop\1*-exporter-main**_exporter\webpage.py", line 390, in goto_chapter
await self._page.goto(self._url, timeout=1000 * timeout)
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\pyppeteer\page.py", line 837, in goto
raise error
pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 30000 ms exceeded.
[2023-05-19 14:31:37,266][INFO][***WebPage] Go to chapter 13
[2023-05-19 14:31:37,267][ERROR]connection unexpectedly closed
[2023-05-19 14:31:37,268][ERROR]Task exception was never retrieved
future: <Task finished name='Task-2264' coro=<Connection._async_send() done, defined at C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\pyppeteer\connection.py:69> exception=InvalidStateError('invalid state')>
Traceback (most recent call last):
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\websockets\legacy\protocol.py", line 968, in transfer_data
message = await self.read_message()
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\websockets\legacy\protocol.py", line 1038, in read_message
frame = await self.read_data_frame(max_size=self.max_size)
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\websockets\legacy\protocol.py", line 1113, in read_data_frame
frame = await self.read_frame(max_size)
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\websockets\legacy\protocol.py", line 1170, in read_frame
frame = await Frame.read(
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\websockets\legacy\framing.py", line 69, in read
data = await reader(2)
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\asyncio\streams.py", line 706, in readexactly
raise exceptions.IncompleteReadError(incomplete, n)
asyncio.exceptions.IncompleteReadError: 0 bytes read on a total of 2 expected bytes
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\pyppeteer\connection.py", line 73, in _async_send
await self.connection.send(msg)
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\websockets\legacy\protocol.py", line 635, in send
await self.ensure_open()
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\websockets\legacy\protocol.py", line 944, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: sent 1000 (OK); no close frame received
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\pyppeteer\connection.py", line 79, in _async_send
await self.dispose()
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\pyppeteer\connection.py", line 170, in dispose
await self._on_close()
File "C:\Users\2020\AppData\Local\Programs\Python\Python310\lib\site-packages\pyppeteer\connection.py", line 151, in _on_close
cb.set_exception(_rewriteError(
asyncio.exceptions.InvalidStateError: invalid state
卡住了
(base) bin@bindeMacBook-Pro ~/VsCodeProjects/weread-exporter main ● python -m weread_exporter -b 7023271071a3505370215dc -o epub
/Users/bin/VsCodeProjects/weread-exporter/weread_exporter/main.py:129: DeprecationWarning: There is no current event loop
loop = asyncio.get_event_loop()
[2023-05-20 14:53:12,487][INFO]Exporting book 7023271071a3505370215dc
[2023-05-20 14:53:12,705][WARNING]Book 7023271071a3505370215dc status is invalid, stop exporting
weread_exporter/webpage.py", line 140, in check_valid
html = await utils.fetch(self._home_url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/test/Downloads/weread-exporter-main/weread_exporter/utils.py", line 34, in fetch
raise RuntimeError("Fetch url %s failed" % url)
RuntimeError: Fetch url https://weread.qq.com/web/bookDetail/b2832e50811e73169g017ca5 failed
总是显示超时,主页无法打开~~~实际上可以调用浏览器,但是无法自动打开内容。
I:\Program Files (x86)\weread-exporter-main>python -m weread_exporter -b ebd327f0718c8443ebdf735 -o epub
Fontconfig error: Cannot load default config file: No such file: (null)
[2024-03-15 22:38:08,977][INFO]Exporting book ebd327f0718c8443ebdf735
[2024-03-15 22:38:09,196][INFO][WeReadWebPage] Launch url https://weread.qq.com/web/bookDetail/ebd327f0718c8443ebdf735
[2024-03-15 22:38:09,728][INFO]Browser listening on: ws://127.0.0.1:54490/devtools/browser/b69386b8-79fb-4c0b-bc11-a4ffdf297a77
(process:4720): GLib-GIO-WARNING **: 22:38:09.956: Unexpectedly, UWP app Microsoft.OutlookForWindows_1.2023.1114.100_x64__8wekyb3d8bbwe' (AUMId
Microsoft.OutlookForWindows_8wekyb3d8bbwe!Microsoft.OutlookforWindows') supports 1 extensions but has no verbs
(process:4720): GLib-GIO-WARNING **: 22:38:10.025: Unexpectedly, UWP app Microsoft.ScreenSketch_11.2312.33.0_x64__8wekyb3d8bbwe' (AUMId
Microsoft.ScreenSketch_8wekyb3d8bbwe!App') supports 29 extensions but has no verbs
(process:4720): GLib-GIO-WARNING **: 22:38:10.332: Unexpectedly, UWP app Clipchamp.Clipchamp_2.9.3.0_neutral__yxz26nhyzhsrt' (AUMId
Clipchamp.Clipchamp_yxz26nhyzhsrt!App') supports 41 extensions but has no verbs
[2024-03-15 22:38:11,429][INFO][WeReadWebPage] Current login user is 微信用户
[2024-03-15 22:38:11,429][INFO][WeReadWebPage] Inject cookie wr_name=%E5%BE%AE%E4%BF%A1%E7%94%A8%E6%88%B7
[2024-03-15 22:38:11,430][INFO][WeReadWebPage] Inject cookie wr_localvid=ff032c4081ef580b9ff02b8
[2024-03-15 22:38:11,432][INFO][WeReadWebPage] Inject cookie wr_rt=web%40UOkhaelbwjfqfmAGQ3r_AL
[2024-03-15 22:38:11,433][INFO][WeReadWebPage] Inject cookie wr_gender=0
[2024-03-15 22:38:11,435][INFO][WeReadWebPage] Inject cookie wr_pf=0
[2024-03-15 22:38:11,436][INFO][WeReadWebPage] Inject cookie wr_avatar=
[2024-03-15 22:38:11,437][INFO][WeReadWebPage] Inject cookie wr_skey=js_pEZ8M
[2024-03-15 22:38:11,439][INFO][WeReadWebPage] Inject cookie wr_vid=519405753
[2024-03-15 22:38:11,440][INFO][WeReadWebPage] Inject cookie wr_fp=2600317417
[2024-03-15 22:38:11,441][INFO][WeReadWebPage] Inject cookie wr_gid=231410492
[2024-03-15 22:38:42,126][ERROR]Launch book ebd327f0718c8443ebdf735 home page failed
Traceback (most recent call last):
File "I:\Program Files (x86)\weread-exporter-main\weread_exporter_main_.py", line 85, in async_main
await page.launch(headless=args.headless, force_login=args.force_login)
File "I:\Program Files (x86)\weread-exporter-main\weread_exporter\webpage.py", line 233, in launch
await self.wait_for_avatar()
File "I:\Program Files (x86)\weread-exporter-main\weread_exporter\webpage.py", line 280, in wait_for_avatar
raise RuntimeError("Wait for avatar timeout")
RuntimeError: Wait for avatar timeout
C:\Portable\EdgeGPT\weread-exporter>python -m weread_exporter -b 4de32960813ab8721g011b12 -o epub --load-timeout=300
Fontconfig error: Cannot load default config file: No such file: (null)
[2024-01-25 21:45:27,245][INFO]Exporting book 4de32960813ab8721g011b12
[2024-01-25 21:45:27,544][INFO][WeReadWebPage] Launch url https://weread.qq.com/web/bookDetail/4de32960813ab8721g011b12
[2024-01-25 21:45:28,066][INFO]Browser listening on: ws://127.0.0.1:49973/devtools/browser/b93df7c1-5120-45a9-813a-8dcec06f9c9b
[2024-01-25 21:45:28,285][INFO][WeReadWebPage] Current login user is ***
[2024-01-25 21:45:28,286][INFO][WeReadWebPage] Inject cookie wr_rt=***
[2024-01-25 21:45:28,288][INFO][WeReadWebPage] Inject cookie wr_vid=***
[2024-01-25 21:45:28,289][INFO][WeReadWebPage] Inject cookie wr_fp=***
[2024-01-25 21:45:28,291][INFO][WeReadWebPage] Inject cookie wr_pf=***
[2024-01-25 21:45:28,292][INFO][WeReadWebPage] Inject cookie wr_gender=
[2024-01-25 21:45:28,293][INFO][WeReadWebPage] Inject cookie wr_gid=***
[2024-01-25 21:45:28,294][INFO][WeReadWebPage] Inject cookie wr_skey=***
[2024-01-25 21:45:28,296][INFO][WeReadWebPage] Inject cookie wr_avatar=
[2024-01-25 21:45:28,298][INFO][WeReadWebPage] Inject cookie wr_name=
[2024-01-25 21:45:28,299][INFO][WeReadWebPage] Inject cookie wr_localvid=
[2024-01-25 21:45:37,943][INFO][WeReadExporter] Check chapter 2/版权信息
[2024-01-25 21:45:37,943][INFO][WeReadExporter] Check chapter 3/太平洋战争(一):山雨欲来
[2024-01-25 21:45:37,944][INFO][WeReadExporter] File cache\4de32960813ab8721g011b12\chapters\2-3.md not exist
[2024-01-25 21:45:37,944][INFO][WeReadWebPage] Go to chapter 3
[2024-01-25 21:45:37,951][INFO][WeReadWebPage] Fetch url https://weread.qq.com/web/reader/4de32960813ab8721g011b12kecc32f3013eccbc87e4b62e
[2024-01-25 21:45:38,385][INFO][WeReadWebPage] Fetch url https://midas.gtimg.cn/midas/minipay_v2/jsapi/cashier.js
[2024-01-25 21:45:38,387][INFO][WeReadWebPage] Fetch url https://cdn.weread.qq.com/web/wpa.js
[2024-01-25 21:45:38,387][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/css/app.3e110853.css
[2024-01-25 21:45:38,388][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/js/app.da544679.js
Traceback (most recent call last):
File "D:\Researching\Python310\lib\runpy.py", line 196, in run_module_as_main
return run_code(code, main_globals, None,
File "D:\Researching\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Portable\EdgeGPT\weread-exporter\weread_exporter_main.py", line 158, in
main()
File "C:\Portable\EdgeGPT\weread-exporter\weread_exporter_main.py", line 154, in main
loop.run_until_complete(async_main())
File "D:\Researching\Python310\lib\asyncio\base_events.py", line 649, in run_until_complete
return future.result()
File "C:\Portable\EdgeGPT\weread-exporter\weread_exporter_main.py", line 92, in async_main
await exporter.export_markdown(args.load_timeout, args.load_interval)
File "C:\Portable\EdgeGPT\weread-exporter\weread_exporter\export.py", line 353, in export_markdown
markdown = await self._page.get_markdown()
File "C:\Portable\EdgeGPT\weread-exporter\weread_exporter\webpage.py", line 374, in get_markdown
raise RuntimeError("Wait for creating markdown timeout")
RuntimeError: Wait for creating markdown timeout
python -m weread_exporter -b a4d32d90813ab787eg012068 -o pdf
然后显示no module name weread exporter,我在这个文件夹下运行的
hash: b5832a8072169a18b58c570
试了很多次都一样,其他书就没这个问题。
@macbook-pro weread-exporter-main % python -m weread_exporter -b b5832a8072169a18b58c570 -o epub
[2023-08-09 08:51:12,793][INFO]Exporting book b5832a8072169a18b58c570
[2023-08-09 08:51:12,920][INFO][WeReadWebPage] Launch url https://weread.qq.com/web/bookDetail/b5832a8072169a18b58c570
[2023-08-09 08:51:14,039][INFO]Browser listening on: ws://127.0.0.1:53070/devtools/browser/1fd9307a-20fb-43f1-ade3-ad8899ffeab5
[2023-08-09 08:51:14,815][INFO][WeReadWebPage] Update cookie wr_vid=*
[2023-08-09 08:51:14,815][INFO][WeReadWebPage] Update cookie wr_skey=*********
[2023-08-09 08:51:14,816][INFO][WeReadWebPage] Update cookie wr_pf=0
[2023-08-09 08:51:14,816][INFO][WeReadWebPage] Update cookie wr_rt=*******
[2023-08-09 08:51:14,910][INFO][WeReadWebPage] Current login user is ********
[2023-08-09 08:51:14,910][INFO][WeReadWebPage] Inject cookie wr_gender=1
[2023-08-09 08:51:14,914][INFO][WeReadWebPage] Inject cookie ***********************
[2023-08-09 08:51:14,915][INFO][WeReadWebPage] Inject cookie wr_name=****
[2023-08-09 08:51:14,917][INFO][WeReadWebPage] Inject cookie wr_rt=*********
[2023-08-09 08:51:14,918][INFO][WeReadWebPage] Inject cookie wr_skey=********
[2023-08-09 08:51:14,920][INFO][WeReadWebPage] Inject cookie wr_vid=******
[2023-08-09 08:51:14,921][INFO][WeReadWebPage] Inject cookie wr_fp=******
[2023-08-09 08:51:14,922][INFO][WeReadWebPage] Inject cookie wr_pf=0
[2023-08-09 08:51:14,924][INFO][WeReadWebPage] Inject cookie wr_localvid=******
[2023-08-09 08:51:14,925][INFO][WeReadWebPage] Inject cookie wr_gid=******
[2023-08-09 08:51:21,067][INFO][WeReadExporter] Check chapter 2/说明
[2023-08-09 08:51:21,067][INFO][WeReadExporter] File cache/b5832a8072169a18b58c570/chapters/1-2.md not exist
[2023-08-09 08:51:21,068][INFO][WeReadWebPage] Go to chapter 2
[2023-08-09 08:51:21,086][INFO][WeReadWebPage] Fetch url https://weread.qq.com/web/reader/b5832a8072169a18b58c570kc81322c012c81e728d9d180
[2023-08-09 08:51:21,354][INFO][WeReadWebPage] Fetch url https://midas.gtimg.cn/midas/minipay_v2/jsapi/cashier.js
[2023-08-09 08:51:21,354][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/css/app.4605d864.css
[2023-08-09 08:51:21,355][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/js/app.c15c0d84.js
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/Users//Downloads/weread-exporter-main/weread_exporter/main.py", line 147, in
main()
File "/Users//Downloads/weread-exporter-main/weread_exporter/main.py", line 143, in main
loop.run_until_complete(async_main())
File "/opt/homebrew/Cellar/[email protected]/3.11.4/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users//Downloads/weread-exporter-main/weread_exporter/main.py", line 83, in async_main
await exporter.export_markdown(args.load_timeout, args.load_interval)
File "/Users//Downloads/weread-exporter-main/weread_exporter/export.py", line 353, in export_markdown
markdown = await self._page.get_markdown()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/****/Downloads/weread-exporter-main/weread_exporter/webpage.py", line 363, in get_markdown
raise RuntimeError("Wait for creating markdown timeout")
RuntimeError: Wait for creating markdown timeout
环境:克隆最新代码。
问题:文章获取、下载md和图片成功,但导出的pdf不含图片(所有图片均无)。
[2023-07-09 20:32:38,667][ERROR]Failed to load image at "file:///F:/git_repository/weread-exporter/cache/e1d326f0813ab7e2fg012a70/images/21b77ed54397af0db45f170a428c4abc.jpg" (Pixbuf error: Unrecognized image file format)
[2023-07-09 20:32:38,670][ERROR]Failed to load image at "file:///F:/git_repository/weread-exporter/cache/e1d326f0813ab7e2fg012a70/images/9bbc30619a54a5b1a33f22bdd3e2bd07.jpg" (Pixbuf error: Unrecognized image file format)
[2023-07-09 20:32:42,460][INFO]Save file output\软件单元测试.pdf complete
但可用浏览器或图片查看器预览对应位置的图片是正常的,路径也没问题。
通过分析:md 文件引用图片的路径是:![](images/xxx.jpg)
,而实际上 images 和 md 文件的父目录 chapter 才是同一级目录。
md 中正常预览图片需要:
1.将 images 目录复制到 chapter 目录下。
或
2.将图片引用链接改为 ![](../images/xxx.jpg)
但无论采取那种方式,导出的 pdf 仍然无图。
不知道是作者未实现导出pdf时导出图片,还是哪儿有bug.
目前只能先合并所有的md为同一个文档,然后用 pandoc 来导出pdf。
安装完所有内容后,执行
python -m weread_exporter -b 6a032b60813ab71f0g01944f -o pdf --force-login
会报下面的错误,我搜了很多cannot load library 'libcairo.so.2'相关的答案,按照他们的方式也去尝试都没有效果,我也不确定是不是我本地环境有问题,新接触Python,很多东西不了解,特来咨询下大佬,提供下方向。
cat@catdeMac weread-exporter-main2 % python3 -m weread_exporter -b 6a032b60813ab71f0g01944f -o pdf --force-login
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/Users/cat/Downloads/weread-exporter-main2/weread_exporter/main.py", line 147, in
main()
File "/Users/cat/Downloads/weread-exporter-main2/weread_exporter/main.py", line 143, in main
loop.run_until_complete(async_main())
File "/usr/local/Cellar/[email protected]/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/cat/Downloads/weread-exporter-main2/weread_exporter/main.py", line 16, in async_main
from . import export, utils, webpage
File "/Users/cat/Downloads/weread-exporter-main2/weread_exporter/export.py", line 12, in
from weasyprint import HTML, CSS
File "/usr/local/lib/python3.11/site-packages/weasyprint/init.py", line 469, in
from .css import preprocess_stylesheet # noqa isort:skip
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/weasyprint/css/init.py", line 27, in
from . import computed_values, counters, media_queries
File "/usr/local/lib/python3.11/site-packages/weasyprint/css/computed_values.py", line 15, in
from .. import text
File "/usr/local/lib/python3.11/site-packages/weasyprint/text.py", line 11, in
import cairocffi as cairo
File "/usr/local/lib/python3.11/site-packages/cairocffi/init.py", line 47, in
cairo = dlopen(
^^^^^^^
File "/usr/local/lib/python3.11/site-packages/cairocffi/init.py", line 44, in dlopen
raise OSError(error_message) # pragma: no cover
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: no library called "cairo-2" was found
no library called "cairo" was found
no library called "libcairo-2" was found
cannot load library 'libcairo.so.2': dlopen(libcairo.so.2, 0x0002): tried: 'libcairo.so.2' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibcairo.so.2' (no such file), '/usr/local/lib/libcairo.so.2' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/usr/local/lib/libcairo.so.2' (no such file), '/usr/lib/libcairo.so.2' (no such file, not in dyld cache), 'libcairo.so.2' (no such file), '/usr/local/lib/libcairo.so.2' (no such file), '/usr/lib/libcairo.so.2' (no such file, not in dyld cache). Additionally, ctypes.util.find_library() did not manage to locate a library called 'libcairo.so.2'
cannot load library 'libcairo.2.dylib': dlopen(libcairo.2.dylib, 0x0002): tried: 'libcairo.2.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibcairo.2.dylib' (no such file), '/usr/local/lib/libcairo.2.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/usr/local/lib/libcairo.2.dylib' (no such file), '/usr/lib/libcairo.2.dylib' (no such file, not in dyld cache), 'libcairo.2.dylib' (no such file), '/usr/local/lib/libcairo.2.dylib' (no such file), '/usr/lib/libcairo.2.dylib' (no such file, not in dyld cache). Additionally, ctypes.util.find_library() did not manage to locate a library called 'libcairo.2.dylib'
cannot load library 'libcairo-2.dll': dlopen(libcairo-2.dll, 0x0002): tried: 'libcairo-2.dll' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibcairo-2.dll' (no such file), '/usr/local/lib/libcairo-2.dll' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/usr/local/lib/libcairo-2.dll' (no such file), '/usr/lib/libcairo-2.dll' (no such file, not in dyld cache), 'libcairo-2.dll' (no such file), '/usr/local/lib/libcairo-2.dll' (no such file), '/usr/lib/libcairo-2.dll' (no such file, not in dyld cache). Additionally, ctypes.util.find_library() did not manage to locate a library called 'libcairo-2.dll'
后面尝试了把本地的所有Python环境重新删除后重装,依旧无法解决,安装后使用的是Python3,这个会不会有影响
Windows 平台报错,且无进一步的详细信息,以下为全文:
Fontconfig error: Cannot load default config file: No such file: (null)
[2024-03-31 23:33:53,119][INFO]Exporting book 83132780716754a783196a7
[2024-03-31 23:33:54,637][WARNING]Book 83132780716754a783196a7 status is invalid, stop exporting
上传的书在获取bookinfo的时候好像是空的,是否在解析的时候有问题?
我目前发现微信读书脚注样有两种:
在微信读书中展示为图片上标,点击弹出显示注释,例如 https://weread.qq.com/web/reader/52f320a05cf65852f08359cka87322c014a87ff679a21ea 一书
对应代码:
<span class="reader_footer_note js_readerFooterNote" data-wr-footernote="《美国研究》曾刊登一篇论文,使用杰维斯的认知理论分析中美关系中的知觉与错误知觉。这是我看到的国内仅有的一篇用杰维斯国际政治心理学理论对中美关系所做实证性研究的文章。作者是杰维斯曾经执教过的美国加州大学洛杉矶分校的博士生。参见王栋:《超越国家利益:对20世纪90年代中美关系的知觉性解释》,载《美国研究》2001年第3期,第27—46页。另外,在一些关于西方国际关系理论的书中,对杰维斯和国际政治认知学派有简单的介绍。参见王逸舟:《西方国际政治学:历史与理论》,上海人民出版社1998年版,第三章第二节;倪世雄等:《当代西方国际关系理论》,复旦大学出版社2001年版,第四章第四节。"></span>
在微信读书中展示为链接上标,点击跳转,例如 https://weread.qq.com/web/reader/f8932f4072305432f89f7aa 一书
对应文中上标代码:
<a id="w15"></a><a href=""><span class="super"><span>[15]</span></span></a>
对应文中脚注内容代码:
<p class="note"><a id="m15"></a><a href=""><span>[15]</span></a><span> [美]S·E·佛罗斯特著,吴元训等译:《西方教育的历史和哲学基础》,华夏出版社1987年版,第170页。</span></p>
目前在cache的Markdown中,只有正文,没有注释:
一
1966年12月,大名鼎鼎的哲学家、**史家以赛亚·伯林到友人、著名美国学者埃
德蒙·威尔逊处做客。威尔逊在一则日记里提到,两人此间有过一次争论。伯林“变
得很激动,有时对人充满非理性的偏见”,威尔逊写道,“比如[对]汉娜·阿伦特,尽
管他从未读过她那本关于艾希曼的书”。在1987年发表在《耶鲁评论》上的一篇回
忆录里,伯林以同样的罪名讨伐威尔逊,并在1991年同威尔逊日记编辑的一次采访
中细述此事。我们不知道这次争执的最终结果,不过有一点我们是知道的:尽管
距离汉娜·阿伦特的《艾希曼在耶路撒冷:一份关于平庸的恶的报告》出版已经过
对于第一种样式的脚注会比较好处理,第二种可能不太好搞。或许原始格式保存为html而非Markdown会更合适?Markdown还是丢失了很多信息。
相关讨论 #48
pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 30000 ms exceeded.
[2023-06-12 17:15:10,147][INFO][WeReadWebPage] Go to chapter 5
[2023-06-12 17:15:10,148][ERROR]connection unexpectedly closed
[2023-06-12 17:15:10,148][ERROR]Task exception was never retrieved
future: <Task finished name='Task-618' coro=<Connection._async_send() done, defined at F:\python_install\Lib\site-packages\pyppeteer\connection.py:69> exception=InvalidStateError('invalid state')>
Traceback (most recent call last):
File "F:\python_install\Lib\site-packages\websockets\legacy\protocol.py", line 1314, in close_connection
await self.transfer_data_task
File "F:\python_install\Lib\site-packages\websockets\legacy\protocol.py", line 979, in transfer_data
await asyncio.shield(self._put_message_waiter)
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "F:\python_install\Lib\site-packages\pyppeteer\connection.py", line 73, in _async_send
await self.connection.send(msg)
File "F:\python_install\Lib\site-packages\websockets\legacy\protocol.py", line 635, in send
await self.ensure_open()
File "F:\python_install\Lib\site-packages\websockets\legacy\protocol.py", line 944, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: sent 1000 (OK); no close frame received
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "F:\python_install\Lib\site-packages\pyppeteer\connection.py", line 79, in _async_send
await self.dispose()
File "F:\python_install\Lib\site-packages\pyppeteer\connection.py", line 170, in dispose
await self._on_close()
File "F:\python_install\Lib\site-packages\pyppeteer\connection.py", line 151, in _on_close
cb.set_exception(_rewriteError(
asyncio.exceptions.InvalidStateError: invalid state
大佬们,出了这个问题怎么解决
书id:71f323e0813ab70b3g01690b
比如:f4f32d30813ab7430g014017
最后附录部分的图片缺了几个,不全,
其它正文部分图片缺不缺不知道
这个的目录也有点乱
另外如果遇到疑难字【图片形式】感觉和正常文字的排版不太符合【这本里就有这种图片形式的文字
~作者看看呢,比如绍兴十二年这本
缓存比较完整,但是在最后转pdf过程出现问题。结果得重新下载所有文件。
c2332c70813ab7157g011108
这个卡住不动了:
[2023-06-17 21:44:20,357][INFO][Exporter] Check chapter 7/第1
[2023-06-17 21:44:20,357][INFO][***Exporter] File cache\c2332c70813ab7157g011108\chapters\6-7.md not exist
[2023-06-17 21:44:20,357][INFO][WebPage] Go to chapter 7
[2023-06-17 21:44:20,367][INFO][WebPage] Fetch url https://..com/web/reader/c2332c70813ab7157g011108k8f132430178f14e45fce0f7
[2023-06-17 21:44:20,730][INFO][***WebPage] Fetch url https://midas.gtimg.cn/midas/minipay_v2/jsapi/cashier.js
[2023-06-17 21:44:20,731][INFO][WebPage] Fetch url https://-1258476243.file.myqcloud.com/web/wrwebnjlogic/css/app.4605d864.css
[2023-06-17 21:44:20,733][INFO][WebPage] Fetch url https://-1258476243.file.myqcloud.com/web/wrwebnjlogic/js/app.d42bbcf6.js
[2023-06-17 21:44:24,030][INFO][***WebPage] Go to next page【卡住不动了】
像在抓取c8832370813ab7fdbg016f39这本书时,版权页就会卡死,然后程序重新抓取,又卡死,如此反复循环,按ctrl+c终止后报错如下:
D:\Software\weread-exporter-main>python -m weread_exporter -b c8832370813ab7fdbg016f39 -o epub
[2023-08-01 20:07:58,420][INFO]Exporting book c8832370813ab7fdbg016f39
[2023-08-01 20:07:58,577][INFO][WeReadWebPage] Launch url https://weread.qq.com/web/bookDetail/c8832370813ab7fdbg016f39
[2023-08-01 20:07:59,110][INFO]Browser listening on: ws://127.0.0.1:50099/devtools/browser/8328d9f7-f693-45bd-bba7-3f599a42261e
[2023-08-01 20:08:05,902][INFO][WeReadExporter] Check chapter 2/版权信息
[2023-08-01 20:08:05,902][INFO][WeReadExporter] File cache\c8832370813ab7fdbg016f39\chapters\1-2.md not exist
[2023-08-01 20:08:05,902][INFO][WeReadWebPage] Go to chapter 2
[2023-08-01 20:08:05,922][INFO][WeReadWebPage] Fetch url https://weread.qq.com/web/reader/c8832370813ab7fdbg016f39kc81322c012c81e728d9d180
[2023-08-01 20:08:06,209][INFO][WeReadWebPage] Fetch url https://midas.gtimg.cn/midas/minipay_v2/jsapi/cashier.js
[2023-08-01 20:08:06,211][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/css/app.4605d864.css
[2023-08-01 20:08:06,212][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/js/app.27ff86e3.js
[2023-08-01 20:08:35,910][WARNING]Load chapter failed, close browser and retry
[2023-08-01 20:08:35,911][INFO]terminate chrome process...
[2023-08-01 20:08:35,911][ERROR]connection unexpectedly closed
[2023-08-01 20:08:35,911][ERROR]Task exception was never retrieved
future: <Task finished name='Task-275' coro=<Connection._async_send() done, defined at C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py:69> exception=InvalidStateError('invalid state')>
Traceback (most recent call last):
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\websockets\legacy\protocol.py", line 979, in transfer_data
await asyncio.shield(self._put_message_waiter)
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 73, in _async_send
await self.connection.send(msg)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\websockets\legacy\protocol.py", line 635, in send
await self.ensure_open()
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\websockets\legacy\protocol.py", line 944, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: sent 1000 (OK); no close frame received
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 79, in _async_send
await self.dispose()
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 170, in dispose
await self._on_close()
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 151, in _on_close
cb.set_exception(_rewriteError(
asyncio.exceptions.InvalidStateError: invalid state
[2023-08-01 20:08:36,039][INFO][WeReadWebPage] Launch url https://weread.qq.com/web/bookDetail/c8832370813ab7fdbg016f39
[2023-08-01 20:08:36,575][INFO]Browser listening on: ws://127.0.0.1:50160/devtools/browser/5153350a-c749-4f25-9074-d84de4c8869a
[2023-08-01 20:08:42,724][INFO][WeReadExporter] Check chapter 2/版权信息
[2023-08-01 20:08:42,724][INFO][WeReadExporter] File cache\c8832370813ab7fdbg016f39\chapters\1-2.md not exist
[2023-08-01 20:08:42,724][INFO][WeReadWebPage] Go to chapter 2
[2023-08-01 20:08:42,735][INFO][WeReadWebPage] Fetch url https://weread.qq.com/web/reader/c8832370813ab7fdbg016f39kc81322c012c81e728d9d180
[2023-08-01 20:08:42,961][INFO][WeReadWebPage] Fetch url https://midas.gtimg.cn/midas/minipay_v2/jsapi/cashier.js
[2023-08-01 20:08:42,962][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/css/app.4605d864.css
[2023-08-01 20:08:42,963][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/js/app.27ff86e3.js
Traceback (most recent call last):
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
return run_code(code, main_globals, None,
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "D:\Software\weread-exporter-main\weread_exporter_main.py", line 147, in
main()
File "D:\Software\weread-exporter-main\weread_exporter_main.py", line 143, in main
loop.run_until_complete(async_main())
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\asyncio\base_events.py", line 629, in run_until_complete
self.run_forever()
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\asyncio\windows_events.py", line 321, in run_forever
super().run_forever()
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\asyncio\base_events.py", line 596, in run_forever
self._run_once()
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\asyncio\base_events.py", line 1854, in _run_once
event_list = self._selector.select(timeout)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\asyncio\windows_events.py", line 439, in select
self._poll(timeout)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\asyncio\windows_events.py", line 788, in _poll
status = _overlapped.GetQueuedCompletionStatus(self._iocp, ms)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\launcher.py", line 153, in _close_process
self._loop.run_until_complete(self.killChrome())
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\asyncio\base_events.py", line 618, in run_until_complete
self._check_running()
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\asyncio\base_events.py", line 578, in _check_running
raise RuntimeError('This event loop is already running')
RuntimeError: This event loop is already running
[2023-08-01 20:08:54,035][INFO]terminate chrome process...
[2023-08-01 20:08:54,035][ERROR]connection unexpectedly closed
[2023-08-01 20:08:54,035][ERROR]Task exception was never retrieved
future: <Task finished name='Task-544' coro=<Connection._async_send() done, defined at C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py:69> exception=InvalidStateError('invalid state')>
Traceback (most recent call last):
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 73, in _async_send
await self.connection.send(msg)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\websockets\legacy\protocol.py", line 635, in send
await self.ensure_open()
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\websockets\legacy\protocol.py", line 944, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: sent 1000 (OK); no close frame received
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 79, in _async_send
await self.dispose()
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 170, in dispose
await self._on_close()
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 151, in _on_close
cb.set_exception(_rewriteError(
asyncio.exceptions.InvalidStateError: invalid state
[2023-08-01 20:08:54,136][ERROR]Task exception was never retrieved
future: <Task finished name='Task-4' coro=<Connection._recv_loop() done, defined at C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py:53> exception=UnicodeEncodeError('gbk', '[https://weread.qq.com/web/reader/c8832370813ab7fdbg016f39kc81322c012c81e728d9d180] fillText © 0 881.3333339691162 JSHandle@array\r\n', 93, 94, 'illegal multibyte sequence')>
Traceback (most recent call last):
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 61, in _recv_loop
await self._on_message(resp)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 143, in _on_message
self._on_query(msg)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 123, in _on_query
session._on_message(params.get('message'))
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\connection.py", line 276, in _on_message
self.emit(obj.get('method'), obj.get('params'))
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyee_base.py", line 115, in emit
handled = self._call_handlers(event, args, kwargs)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyee_base.py", line 98, in _call_handlers
self._emit_run(f, args, kwargs)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyee_base.py", line 83, in _emit_run
f(*args, **kwargs)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\page.py", line 184, in
client.on('Runtime.consoleAPICalled', lambda event: self._onConsoleAPI(event))
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\page.py", line 692, in _onConsoleAPI
self._addConsoleMessage(event['type'], values)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyppeteer\page.py", line 729, in _addConsoleMessage
self.emit(Page.Events.Console, message)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyee_base.py", line 115, in emit
handled = self._call_handlers(event, args, kwargs)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyee_base.py", line 98, in _call_handlers
self._emit_run(f, args, kwargs)
File "C:\Users\huyuj\AppData\Local\Programs\Python\Python39\lib\site-packages\pyee_base.py", line 83, in _emit_run
f(*args, **kwargs)
File "D:\Software\weread-exporter-main\weread_exporter\webpage.py", line 234, in handle_log
fp.write("[%s] %s\n" % (self._url, message.text))
UnicodeEncodeError: 'gbk' codec can't encode character '\xa9' in position 93: illegal multibyte sequence
[2023-08-01 20:08:54,182][ERROR]Task was destroyed but it is pending!
task: <Task pending name='Task-179' coro=<WeReadWebPage._handle_request() running at D:\Software\weread-exporter-main\weread_exporter\webpage.py:337> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x0000025681125D30>()]>>
sys:1: RuntimeWarning: coroutine 'Launcher.killChrome' was never awaited
有些书在抽取的时候把网页上显示的换行也带了下来,是否有办法去除这些换行符?
ok
有些图片会缺失,不知道啥原因
pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 30000 ms exceeded.
[2023-06-13 20:33:17,510][INFO]terminate chrome process...
运行的时候报错,麻烦看一下哪里出错了,谢谢!!!!
[2023-12-09 00:23:02,096][INFO]Exporting book b4a32760813ab8187g015f3f
[2023-12-09 00:23:02,377][ERROR]Fetch url https://weread.qq.com/web/bookDetail/b4a32760813ab8187g015f3f failed
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 992, in _wrap_create_connection
return await self._loop.create_connection(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 1112, in create_connection
transport, protocol = await self._create_connection_transport(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 1145, in _create_connection_transport
await waiter
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/sslproto.py", line 574, in _on_handshake_complete
raise handshake_exc
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/sslproto.py", line 556, in _do_handshake
self._sslobj.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 979, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/oilo/DEV/Projects/Proj-VSCode/drunkdream-weread-exporter/weread_exporter/utils.py", line 28, in fetch
async with session.get(url, headers=headers) as response:
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/client.py", line 1187, in aenter
self._resp = await self._coro
^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/client.py", line 574, in _request
conn = await self._connector.connect(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 544, in connect
proto = await self._create_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 911, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 1235, in _create_direct_connection
raise last_exc
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 1204, in _create_direct_connection
transp, proto = await self._wrap_create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 994, in _wrap_create_connection
raise ClientConnectorCertificateError(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host weread.qq.com:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)')]
[2023-12-09 00:23:02,537][ERROR]Fetch url https://weread.qq.com/web/bookDetail/b4a32760813ab8187g015f3f failed
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 992, in _wrap_create_connection
return await self._loop.create_connection(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 1112, in create_connection
transport, protocol = await self._create_connection_transport(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 1145, in _create_connection_transport
await waiter
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/sslproto.py", line 574, in _on_handshake_complete
raise handshake_exc
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/sslproto.py", line 556, in _do_handshake
self._sslobj.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 979, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/oilo/DEV/Projects/Proj-VSCode/drunkdream-weread-exporter/weread_exporter/utils.py", line 28, in fetch
async with session.get(url, headers=headers) as response:
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/client.py", line 1187, in aenter
self._resp = await self._coro
^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/client.py", line 574, in _request
conn = await self._connector.connect(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 544, in connect
proto = await self._create_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 911, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 1235, in _create_direct_connection
raise last_exc
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 1204, in _create_direct_connection
transp, proto = await self._wrap_create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 994, in _wrap_create_connection
raise ClientConnectorCertificateError(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host weread.qq.com:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)')]
[2023-12-09 00:23:02,694][ERROR]Fetch url https://weread.qq.com/web/bookDetail/b4a32760813ab8187g015f3f failed
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 992, in _wrap_create_connection
return await self._loop.create_connection(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 1112, in create_connection
transport, protocol = await self._create_connection_transport(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 1145, in _create_connection_transport
await waiter
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/sslproto.py", line 574, in _on_handshake_complete
raise handshake_exc
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/sslproto.py", line 556, in _do_handshake
self._sslobj.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 979, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/oilo/DEV/Projects/Proj-VSCode/drunkdream-weread-exporter/weread_exporter/utils.py", line 28, in fetch
async with session.get(url, headers=headers) as response:
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/client.py", line 1187, in aenter
self._resp = await self._coro
^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/client.py", line 574, in _request
conn = await self._connector.connect(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 544, in connect
proto = await self._create_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 911, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 1235, in _create_direct_connection
raise last_exc
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 1204, in _create_direct_connection
transp, proto = await self._wrap_create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiohttp/connector.py", line 994, in _wrap_create_connection
raise ClientConnectorCertificateError(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host weread.qq.com:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)')]
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/Users/oilo/DEV/Projects/Proj-VSCode/drunkdream-weread-exporter/weread_exporter/main.py", line 147, in
main()
File "/Users/oilo/DEV/Projects/Proj-VSCode/drunkdream-weread-exporter/weread_exporter/main.py", line 143, in main
loop.run_until_complete(async_main())
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/oilo/DEV/Projects/Proj-VSCode/drunkdream-weread-exporter/weread_exporter/main.py", line 67, in async_main
if not await page.check_valid():
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/oilo/DEV/Projects/Proj-VSCode/drunkdream-weread-exporter/weread_exporter/webpage.py", line 141, in check_valid
html = await utils.fetch(self._home_url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/oilo/DEV/Projects/Proj-VSCode/drunkdream-weread-exporter/weread_exporter/utils.py", line 38, in fetch
raise RuntimeError("Fetch url %s failed" % url)
RuntimeError: Fetch url https://weread.qq.com/web/bookDetail/b4a32760813ab8187g015f3f failed
建议参考一下类似的项目,能够支持连注释一起导出,就完美了
[2023-10-28 13:58:07,233][INFO]Exporting book f8a32350813ab71e0g015d0c
[2023-10-28 13:58:07,418][INFO][WeReadWebPage] Launch url https://weread.qq.com/web/bookDetail/f8a32350813ab71e0g015d0c
Traceback (most recent call last):
File "/usr/local/Cellar/[email protected]/3.9.18/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/Cellar/[email protected]/3.9.18/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/maq/PyProject/weread-exporter/weread_exporter/__main__.py", line 147, in <module>
main()
File "/Users/maq/PyProject/weread-exporter/weread_exporter/__main__.py", line 143, in main
loop.run_until_complete(async_main())
File "/usr/local/Cellar/[email protected]/3.9.18/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/Users/maq/PyProject/weread-exporter/weread_exporter/__main__.py", line 77, in async_main
await page.launch(headless=args.headless, force_login=args.force_login)
File "/Users/maq/PyProject/weread-exporter/weread_exporter/webpage.py", line 153, in launch
self._browser = await pyppeteer.launch(
File "/Users/maq/PyProject/weread-exporter/venv/lib/python3.9/site-packages/pyppeteer/launcher.py", line 307, in launch
return await Launcher(options, **kwargs).launch()
File "/Users/maq/PyProject/weread-exporter/venv/lib/python3.9/site-packages/pyppeteer/launcher.py", line 168, in launch
self.browserWSEndpoint = get_ws_endpoint(self.url)
File "/Users/maq/PyProject/weread-exporter/venv/lib/python3.9/site-packages/pyppeteer/launcher.py", line 227, in get_ws_endpoint
raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:
环境配置没问题
运行环境
内核版本 #1 SMP PREEMPT_DYNAMIC Debian 6.3.7-1kali1 (2023-06-29) x86_64 GNU/Linux
python 版本 Python 3.11.4
chrome 版本 Google Chrome 116.0.5845.96 unknown
报错信息如下
Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/home/kali/python/weread-exporter/weread_exporter/__main__.py", line 147, in <module> main() File "/home/kali/python/weread-exporter/weread_exporter/__main__.py", line 143, in main loop.run_until_complete(async_main()) File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/home/kali/python/weread-exporter/weread_exporter/__main__.py", line 83, in async_main await exporter.export_markdown(args.load_timeout, args.load_interval) File "/home/kali/python/weread-exporter/weread_exporter/export.py", line 353, in export_markdown markdown = await self._page.get_markdown() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/kali/python/weread-exporter/weread_exporter/webpage.py", line 363, in get_markdown raise RuntimeError("Wait for creating markdown timeout") RuntimeError: Wait for creating markdown timeout [2023-08-16 22:18:33,692][INFO]terminate chrome process...
运行行为 成功启动chrome,并开始采录第一页,几秒过后即报错
会员到了付费内容部分就没法导出了?我发现我自己购买的书籍可以导出没问题,但是每购买的,就会卡在付费内容那里无限的重启,无法继续。
错误信息如下:
[2023-12-21 10:51:15,028][WARNING]Load chapter failed, close browser and retry
[2023-12-21 10:51:15,028][INFO]terminate chrome process...
[2023-12-21 10:51:15,029][ERROR]connection unexpectedly closed
[2023-12-21 10:51:15,029][ERROR]Task exception was never retrieved
future: <Task finished name='Task-551' coro=<Connection._async_send() done, defined at C:\Python312\Lib\site-packages\pyppeteer\connection.py:69> exception=InvalidStateError('invalid state')>
Traceback (most recent call last):
File "C:\Python312\Lib\site-packages\websockets\legacy\protocol.py", line 1314, in close_connection
await self.transfer_data_task
File "C:\Python312\Lib\site-packages\websockets\legacy\protocol.py", line 979, in transfer_data
await asyncio.shield(self._put_message_waiter)
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Python312\Lib\site-packages\pyppeteer\connection.py", line 73, in _async_send
await self.connection.send(msg)
File "C:\Python312\Lib\site-packages\websockets\legacy\protocol.py", line 635, in send
await self.ensure_open()
File "C:\Python312\Lib\site-packages\websockets\legacy\protocol.py", line 944, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: sent 1000 (OK); no close frame received
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python312\Lib\site-packages\pyppeteer\connection.py", line 79, in _async_send
await self.dispose()
File "C:\Python312\Lib\site-packages\pyppeteer\connection.py", line 170, in dispose
await self._on_close()
File "C:\Python312\Lib\site-packages\pyppeteer\connection.py", line 151, in _on_close
cb.set_exception(_rewriteError(
asyncio.exceptions.InvalidStateError: invalid state
【最新推荐】2023年必备的超值推荐神器来啦!WgetCloud机场绝对是你不容错过的宝贝!不仅提供多线BGP中转+双程CN2高品质线路,还设有位于香港的私家机房,稳定性和安全性无可挑剔!更棒的是,机场的团队成员分布在海外,为你提供无忧的安全保障!听说运营人员都拥有着5年的机场行业经验,技术可靠到极致!
【最新技术】WgetCloud机场采用Shadowsocks协议,最新新增支持SSR、V2ray和Trojan协议,各大平台软件对SS协议的支持程度也相当高!而且,机场还支持所有主流的代理订阅格式——Clash、Shadowrocket、Quantumult X、Surge 4,应有尽有!机场更有一项强项:采用分组制管理线路节点,一组人数满400即停止增加,并追加动态限速,确保线路高速稳定!
赶快点击下方链接注册成为新用户吧!全员无门槛8折优惠券等你来领取!立即体验WgetCloud机场,享受一场高速、稳定的网络之旅吧!💻💸🎉
以下为晚高峰测速:
weread-exporter/weread_exporter/webpage.py
Line 149 in cb78efe
求维护
C:\temp\Kindle\weread-exporter-main>python -m weread_exporter -b 4e132bc07263ff664e11075 -o epub -o pdf --force-login
Fontconfig error: Cannot load default config file: No such file: (null)
[2024-03-12 22:05:12,303][INFO]Exporting book 4e132bc07263ff664e11075
[2024-03-12 22:05:12,500][INFO][WeReadWebPage] Launch url https://weread.qq.com/web/bookDetail/4e132bc07263ff664e11075
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\temp\Kindle\weread-exporter-main\weread_exporter_main.py", line 158, in
main()
File "C:\temp\Kindle\weread-exporter-main\weread_exporter_main.py", line 154, in main
loop.run_until_complete(async_main())
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 685, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "C:\temp\Kindle\weread-exporter-main\weread_exporter_main_.py", line 85, in async_main
await page.launch(headless=args.headless, force_login=args.force_login)
File "C:\temp\Kindle\weread-exporter-main\weread_exporter\webpage.py", line 174, in launch
chrome = self._check_chrome()
^^^^^^^^^^^^^^^^^^^^
File "C:\temp\Kindle\weread-exporter-main\weread_exporter\webpage.py", line 167, in _check_chrome
raise utils.ChromeNotInstalledError(
weread_exporter.utils.ChromeNotInstalledError: Please make sure chrome
is installed, and the install path is added to PATH environment.
You can test that with where chrome
command.
查了一下,好像font这个错很多,但是大部分都是linux下的解决方法,windows下没有头绪,是要在path增加font文件路径?
原因是: readerFooter 里 没有 readerFooter_button 是 readerFooter_ending_title
把函数改为下边就可以了。
async def _check_next_page(self):
while True:
result = ''
try:
await self.wait_for_selector(
# "button.readerFooter_button", timeout=59000
"div.readerFooter", timeout=59000
)
try:
result = await self._page.evaluate(
"document.getElementsByClassName('readerFooter_button')[0].innerText;"
)
except pyppeteer.errors.ElementHandleError:
logging.info("[%s] load selector ElementHandleError " % self.__class__.__name__)
result = await self._page.evaluate(
"document.getElementsByClassName('readerFooter_ending_title')[0].innerText;"
)
except pyppeteer.errors.TimeoutError:
logging.info("[%s] load selector timeout " % self.__class__.__name__)
break
if result == "下一页":
logging.info("[%s] Go to next page" % self.__class__.__name__)
await self._page.evaluate(
r"canvasContextHandler.data.markdown += '\n\n';"
)
await self.pre_load_page()
await self._page.click("button.readerFooter_button")
await asyncio.sleep(1)
elif result == "下一章":
break
elif result.startswith("登录"):
raise utils.LoginRequiredError()
elif result == "全 书 完":
break
else:
raise NotImplementedError(result)
在抓取63f32a40813ab7cd5g011236这本书的第二章时,会遇到报错:UnicodeEncodeError: 'gbk' codec can't encode character '\xa0' in position 7: illegal multibyte sequence。
运行代码后扫码登录后,一小会就自动关了窗口,该怎么弄啊?(win10+python3.9 chrome113/114都试过)
D:\weread-exporter>python -m weread_exporter -b 08232ac0720befa90825d88 -o epub -o pdf
[2023-05-11 13:56:23,565][INFO]Exporting book 08232ac0720befa90825d88
[2023-05-11 13:56:23,941][INFO][WeReadWebPage] Launch url https://weread.qq.com/web/bookDetail/08232ac0720befa90825d88
[2023-05-11 13:56:24,505][INFO]Browser listening on: ws://127.0.0.1:52744/devtools/browser/1cbe3df2-9107-4d13-9587-b14fd21ceebf
Traceback (most recent call last):
File "D:\Program Files\Python39\lib\runpy.py", line 197, in run_module_as_main
return run_code(code, main_globals, None,
File "D:\Program Files\Python39\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "D:\weread-exporter\weread_exporter_main.py", line 126, in
main()
File "D:\weread-exporter\weread_exporter_main.py", line 122, in main
loop.run_until_complete(async_main())
File "D:\Program Files\Python39\lib\asyncio\base_events.py", line 647, in run_until_complete
return future.result()
File "D:\weread-exporter\weread_exporter_main.py", line 63, in async_main
await page.launch(args.force_login)
File "D:\weread-exporter\weread_exporter\webpage.py", line 170, in launch
await self._page.waitForSelector("div.readerFooter a")
File "D:\Program Files\Python39\Lib\site-packages\pyppeteer\frame_manager.py", line 855, in await
raise result
pyppeteer.errors.TimeoutError: Waiting for selector "div.readerFooter a" failed: timeout 30000ms exceeds.
[2023-05-11 13:56:57,102][INFO]terminate chrome process...
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\Users\daile\Desktop\tt\weread-exporter\weread_exporter_main.py", line 158, in
main()
File "C:\Users\daile\Desktop\tt\weread-exporter\weread_exporter_main.py", line 154, in main
loop.run_until_complete(async_main())
File "C:\Python312\Lib\asyncio\base_events.py", line 684, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "C:\Users\daile\Desktop\tt\weread-exporter\weread_exporter_main_.py", line 17, in async_main
from . import export, utils, webpage
File "C:\Users\daile\Desktop\tt\weread-exporter\weread_exporter\export.py", line 14, in
from . import utils
File "C:\Users\daile\Desktop\tt\weread-exporter\weread_exporter\utils.py", line 4, in
import aiohttp
ModuleNotFoundError: No module named 'aiohttp'
Fontconfig error: Cannot load default config file: No such file: (null)
🌹 天才项目,全网最强 🌹
Mac 无法找到 chrome 的问题
由于谷歌浏览器默认安装的是 Google Chrome 所以无法找到 chrome,可以在 Google Chrome.app 中创建一个 Google Chrome 的链接 chrome
pdf 导出字体很奇怪
在 style.css 中设定字体,比如使用 LXGW WenKai 字体
先说结论给大家打打气:该项目可以爬取微信读书上已购买的书籍,付费书籍需要购买才能完整爬取。
1、运行python -m weread_exporter -b $book_id -o epub -o pdf命令时报错,报错提示如下:
报错:
OSError: no library called "cairo-2" was found
no library called "cairo" was found
no library called "libcairo-2" was found
cannot load library 'libcairo.so.2': error 0x7e
cannot load library 'libcairo.2.dylib': error 0x7e
cannot load library 'libcairo-2.dll': error 0x7e
报错原因:Python环境缺少这三个库:cairo-2、cairo、libcairo-2。如果使用以下命令直接安装这三个库还是会报错:
pip install pycairo
pip install cairocffi
pip install WeasyPrint
因为在Windows系统上安装WeasyPrint还需要其它步骤,而且我们只需要安装WeasyPrint,其它两个库在安装WeasyPrint的过程中会自动附带。
报错1的解决办法:
在Windows 10环境中安装WeasyPrint可以按照以下步骤进行:
①安装3.7版本以上的Python环境:访问Python官网([https://www.python.org/downloads/windows/),下载并安装3.7版本以上的Python。(具体怎么安装网上找教程)
②安装GTK+运行时环境:WeasyPrint使用了GTK+和cairo库来实现渲染和布局,因此需要先安装GTK+运行时环境。访问GTK+官网([https://www.gtk.org/docs/installations/windows/)或者github项目(https://github.com/tschoonj/GTK-for-Windows-Runtime-Environment-Installer/releases),下载并安装最新版本的GTK+运行时环境。
③安装WeasyPrint:在安装好Python和GTK+运行时环境后,可以使用以下命令来安装WeasyPrint(如果之前安装过就不用安装了):
python -m pip install weasyprint
④测试安装是否成功:在安装完WeasyPrint后,可以使用以下命令来测试是否安装成功:
python -m weasyprint --version
如果安装成功,将输出WeasyPrint的版本号。
注意事项:
如果你在安装WeasyPrint时遇到了找不到libffi-7.dll的错误,请下载libffi-7.dll文件并将其放置在Python安装目录下的DLLs文件夹中。
WeasyPrint需要的是GTK+ 3.x版本,而不是GTK+ 2.x版本。因此,请确保安装的是GTK+ 3.x版本的运行时环境。
不确定以上步骤是否对Windows 11适用,Windows 11系统安装WeasyPrint,可以参考官网的windows部分(https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#cairo)
2、安装完WeasyPrint后,执行命令python -m weread_exporter -b $book_id -o epub -o pdf报错:
Windows 10系统中提示文件找不到(FileNotFound)。mac系统中提示chrome找不到。
报错原因:没有配置Chrome浏览器的系统环境变量。
解决办法:将chrome.exe的目录配置在系统环境变量中。比如,我的谷歌浏览器路径是:C:\Program Files\Google\Chrome\Application\chrome.exe,就将C:\Program Files\Google\Chrome\Application配置到系统环境变量中。(如果不会配置系统环境变量,百度一下)
3、这个不是错误,还是说一下:
使用命令:python -m weread_exporter -b $book_id -o epub -o pdf ,脚本采集书籍到一半的时候(免费阅读章节完的时候)会中断让你扫码登录,不是很方便。因此建议使用命令:python -m weread_exporter -b $book_id -o epub -o pdf --force-login,该命令直接在最开始就让你登陆,避免采集到一半暂停。
备注:使用的时候遇到一个问题:如果之前使用这个项目的脚本登录过微信读书(就是运行脚本,脚本会自动调用Chrome打开微信读书的网页,你在这个网页登录过),那么之后就不能使用python -m weread_exporter -b $book_id -o epub -o pdf --force-login,否则会报错。这时候直接用python -m weread_exporter -b $book_id -o epub -o pdf 就好了。
比如:aaa322a0813ab7c0eg011f87
正文部分应该完整了,参考文献部分下了一点点,没下完整
按照文档,运行后出现以下错误提示:
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/Users/acookie/Desktop/weread-exporter/weread_exporter/main.py", line 158, in
main()
File "/Users/acookie/Desktop/weread-exporter/weread_exporter/main.py", line 154, in main
loop.run_until_complete(async_main())
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 684, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/acookie/Desktop/weread-exporter/weread_exporter/main.py", line 17, in async_main
from . import export, utils, webpage
File "/Users/acookie/Desktop/weread-exporter/weread_exporter/export.py", line 12, in
from weasyprint import HTML, CSS
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/weasyprint/init.py", line 469, in
from .css import preprocess_stylesheet # noqa isort:skip
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/weasyprint/css/init.py", line 27, in
from . import computed_values, counters, media_queries
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/weasyprint/css/computed_values.py", line 15, in
from .. import text
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/weasyprint/text.py", line 11, in
import cairocffi as cairo
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/cairocffi/init.py", line 47, in
cairo = dlopen(
^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/cairocffi/init.py", line 44, in dlopen
raise OSError(error_message) # pragma: no cover
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: no library called "cairo-2" was found
no library called "cairo" was found
no library called "libcairo-2" was found
cannot load library 'libcairo.so.2': dlopen(libcairo.so.2, 0x0002): tried: 'libcairo.so.2' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibcairo.so.2' (no such file), '/usr/lib/libcairo.so.2' (no such file, not in dyld cache), 'libcairo.so.2' (no such file), '/usr/lib/libcairo.so.2' (no such file, not in dyld cache). Additionally, ctypes.util.find_library() did not manage to locate a library called 'libcairo.so.2'
cannot load library 'libcairo.2.dylib': dlopen(libcairo.2.dylib, 0x0002): tried: 'libcairo.2.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibcairo.2.dylib' (no such file), '/usr/lib/libcairo.2.dylib' (no such file, not in dyld cache), 'libcairo.2.dylib' (no such file), '/usr/lib/libcairo.2.dylib' (no such file, not in dyld cache). Additionally, ctypes.util.find_library() did not manage to locate a library called 'libcairo.2.dylib'
cannot load library 'libcairo-2.dll': dlopen(libcairo-2.dll, 0x0002): tried: 'libcairo-2.dll' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibcairo-2.dll' (no such file), '/usr/lib/libcairo-2.dll' (no such file, not in dyld cache), 'libcairo-2.dll' (no such file), '/usr/lib/libcairo-2.dll' (no such file, not in dyld cache). Additionally, ctypes.util.find_library() did not manage to locate a library called 'libcairo-2.dll'
(.venv) C:\Users\Town>python -m weread_exporter -b 54c32520715e229954c8b8a -o epub -o epub --force-login
Fontconfig error: Cannot load default config file: No such file: (null)
[2024-01-15 22:07:21,033][INFO]Exporting book 54c32520715e229954c8b8a
[2024-01-15 22:07:21,479][INFO][WeReadWebPage] Launch url https://weread.qq.com/web/bookDetail/54c32520715e229954c8b8a
[2024-01-15 22:07:22,099][INFO]Browser listening on: ws://127.0.0.1:1258/devtools/browser/26aee2a6-c51b-4b45-b8b8-0793e1bb6f06
[2024-01-15 22:07:24,684][INFO][WeReadWebPage] Waiting for login
[2024-01-15 22:07:34,693][INFO][WeReadWebPage] Login success
[2024-01-15 22:07:35,289][INFO][WeReadExporter] Check chapter 19/版权信息
[2024-01-15 22:07:35,290][INFO][WeReadExporter] File cache\54c32520715e229954c8b8a\chapters\1-19.md not exist
[2024-01-15 22:07:35,291][INFO][WeReadWebPage] Go to chapter 19
[2024-01-15 22:07:35,303][INFO][WeReadWebPage] Fetch url https://weread.qq.com/web/reader/54c32520715e229954c8b8ak1f032c402131f0e3dad99f3
[2024-01-15 22:07:35,856][INFO][WeReadWebPage] Fetch url https://midas.gtimg.cn/midas/minipay_v2/jsapi/cashier.js
[2024-01-15 22:07:35,859][INFO][WeReadWebPage] Fetch url https://cdn.weread.qq.com/web/wpa.js
[2024-01-15 22:07:35,860][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/css/app.3e110853.css
[2024-01-15 22:07:35,862][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/js/app.e7373bc5.js
[2024-01-15 22:08:05,313][WARNING]Load chapter failed, close browser and retry
[2024-01-15 22:08:05,313][INFO]terminate chrome process...
jaimezhang@192 weread-exporter-main % python -m weread_exporter -b 08232ac0720befa90825d88 -o epub -o pdf
Traceback (most recent call last):
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/jaimezhang/weread-exporter-main/weread_exporter/main.py", line 158, in
main()
File "/Users/jaimezhang/weread-exporter-main/weread_exporter/main.py", line 154, in main
loop.run_until_complete(async_main())
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/Users/jaimezhang/weread-exporter-main/weread_exporter/main.py", line 17, in async_main
from . import export, utils, webpage
File "/Users/jaimezhang/weread-exporter-main/weread_exporter/export.py", line 8, in
import bs4
ModuleNotFoundError: No module named 'bs4'
比如
https://weread.qq.com/web/bookDetail/ddc3252071dbe8a8ddc8170 可能有些章节较多下 不了卡住,比如这本
[2023-11-27 13:52:18,250][INFO][WeReadExporter] File cache\ddc3252071dbe8a8ddc8170\chapters\7-8.md not exist
[2023-11-27 13:52:18,251][INFO][WeReadWebPage] Go to chapter 8
[2023-11-27 13:52:18,276][INFO][WeReadWebPage] Fetch url https://weread.qq.com/web/reader/ddc3252071dbe8a8ddc8170kc9f326d018c9f0f895fb5e4
[2023-11-27 13:52:18,648][INFO][WeReadWebPage] Fetch url https://midas.gtimg.cn/midas/minipay_v2/jsapi/cashier.js
[2023-11-27 13:52:18,653][INFO][WeReadWebPage] Fetch url https://cdn.weread.qq.com/web/wpa.js
[2023-11-27 13:52:18,656][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/css/app.02ecef75.css
[2023-11-27 13:52:18,708][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/css/8.a2448854.css
[2023-11-27 13:52:18,728][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/js/app.e2263c63.js
[2023-11-27 13:52:48,269][WARNING]Load chapter failed, close browser and retry
[2023-11-27 13:52:48,270][INFO]terminate chrome process...
[2023-11-27 13:52:48,272][ERROR]connection unexpectedly closed
[2023-11-27 13:52:48,273][ERROR]Task exception was never retrieved
future: <Task finished name='Task-2448' coro=<Connection._async_send() done, defined at C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyppeteer\connection.py:69> exception=InvalidStateError('invalid state')>
Traceback (most recent call last):
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\websockets\legacy\protocol.py", line 968, in transfer_data
message = await self.read_message()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\websockets\legacy\protocol.py", line 1038, in read_message
frame = await self.read_data_frame(max_size=self.max_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\websockets\legacy\protocol.py", line 1113, in read_data_frame
frame = await self.read_frame(max_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\websockets\legacy\protocol.py", line 1170, in read_frame
frame = await Frame.read(
^^^^^^^^^^^^^^^^^
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\websockets\legacy\framing.py", line 69, in read
data = await reader(2)
^^^^^^^^^^^^^^^
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\asyncio\streams.py", line 727, in readexactly
raise exceptions.IncompleteReadError(incomplete, n)
asyncio.exceptions.IncompleteReadError: 0 bytes read on a total of 2 expected bytes
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyppeteer\connection.py", line 73, in _async_send
await self.connection.send(msg)
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\websockets\legacy\protocol.py", line 635, in send
await self.ensure_open()
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\websockets\legacy\protocol.py", line 944, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: sent 1000 (OK); no close frame received
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyppeteer\connection.py", line 79, in _async_send
await self.dispose()
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyppeteer\connection.py", line 170, in dispose
await self._on_close()
File "C:\Users\uesr\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyppeteer\connection.py", line 151, in _on_close
cb.set_exception(_rewriteError(
asyncio.exceptions.InvalidStateError: invalid state
望指点,谢谢!
多谢,遇到几个新问题
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.