Giter VIP home page Giter VIP logo

geek_crawler's Issues

可以下载指定课程

在原来的代码基础上简单的修改了一下,实现下载指定的课程
修改点1.使用原来的exclude变量,存储想要下载的课程,大概在539行左右

 # 将exclude设置为指定要爬取的文章
    exclude = ['快速上手C++数据结构与算法']

修改点2.将297行左右的

if product.get('title', '')  in self.exclude:
修改为
if product.get('title', '')  not in self.exclude:

返回的文章列表不能大于100

在一个专栏里有大于100个的文章时,该脚本最大只能保存100个文章。
查看代码后发现
_articles 方法中的 'data = res.json().get('data', {})' 返回值中的list最大只有100。如图:
image

文件后缀名始终是.md的问题

在主函数中:
原: run(cellphone, pwd, exclude=exclude, get_comments=get_comments)
应改为:
run(cellphone, pwd, exclude=exclude, get_comments=get_comments, file_type=file_type)

不能下载部分课程

我有50多门课(仅3个是视频课,其他都是文字版),只有20多门课能下载。请问是什么原因导致不能下载所有课程

python3.8 失败

$ python geek_crawler.py
Traceback (most recent call last):
File "geek_crawler.py", line 12, in
import requests
ModuleNotFoundError: No module named 'requests'

抓取报错

大神来看下呀:

/Users/bo/PycharmProjects/pythonProject/main.py[line:550] - ERROR: 请求过程中出错了,出错信息为:Traceback (most recent call last):
File "/Users/bo/PycharmProjects/pythonProject/main.py", line 547, in
run(cellphone, pwd, exclude=exclude, get_comments=get_comments)
File "/Users/bo/PycharmProjects/pythonProject/main.py", line 513, in run
geek._article(aid, pro, file_type=file_type, get_comments=get_comments) # 获取单个文章的信息
File "/Users/bo/PycharmProjects/pythonProject/main.py", line 341, in _article
self.save_to_file(
File "/Users/bo/PycharmProjects/pythonProject/main.py", line 449, in save_to_file
os.mkdir(dir_path)
FileNotFoundError: [Errno 2] No such file or directory: 'A/B测试从0到1'

个别专栏爬取报错

报错信息:
File "geek_crawler.py", line 483, in save_to_file
with open(file_path, 'w', encoding='utf-8') as f:
OSError: [Errno 22] Invalid argument: 'D:\0-git-time\geek_crawler-master\JavaScript核心原理解析\20 _ (0, eval)("x = 100") :一行让严格模式形同虚设的破坏性设计(上).md'

大神要不要看看

你好,我问下下面这个报错怎么解决

请求登录接口:
接口请求参数:{'country': 86, 'cellphone': '*******', 'password': '********', 'captcha': '', 'remember': 1, 'platform': 3, 'appid': 1, 'source': ''}
请求过程中出错了,出错信息为:Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/urllib3-1.26.0.dev0-py3.8.egg/urllib3/connectionpool.py", line 686, in urlopen
self._prepare_proxy(conn)
File "/usr/local/lib/python3.8/site-packages/urllib3-1.26.0.dev0-py3.8.egg/urllib3/connectionpool.py", line 952, in prepare_proxy
conn.connect()
File "/usr/local/lib/python3.8/site-packages/urllib3-1.26.0.dev0-py3.8.egg/urllib3/connection.py", line 389, in connect
self.sock = ssl_wrap_socket(
File "/usr/local/lib/python3.8/site-packages/urllib3-1.26.0.dev0-py3.8.egg/urllib3/util/ssl
.py", line 397, in ssl_wrap_socket
ssl_sock = context.wrap_socket(sock, server_hostname=server_hostname)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1040, in _create
self.do_handshake()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1123)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/requests-2.24.0-py3.8.egg/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/usr/local/lib/python3.8/site-packages/urllib3-1.26.0.dev0-py3.8.egg/urllib3/connectionpool.py", line 745, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/site-packages/urllib3-1.26.0.dev0-py3.8.egg/urllib3/util/retry.py", line 474, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='account.geekbang.org', port=443): Max retries exceeded with url: /account/ticket/login (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1123)')))

下载视频部分

请问 有下载视频部分的处理吗? 可以分享下不? 你功能部分介绍的最后一个有具体实现吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.