Comments (3)
在改。。。挺简单的。。
from wechatsogou.
具体是哪个情景下,会有需求呢
from wechatsogou.
答:用框架最大的好处在我看来无外乎在于后续需要修改代码的话,我们能很快知道在框架的那个模块里去修改(毕竟里面已经变相规定每个模块干什么功能),同时框架很多功能我们也不用去实现了,比如多线线程,设置下载速度(配置参数就OK)
其实我现在想在框架的基础上实现每5个小时定时抓取各个分类下最新的文章(判重好像只能通过标题是否重复了吧!)
import pymongo
import readability
import re
from lxml import etree
client = pymongo.MongoClient()
db = client.sougou
articles_single = []
page = 0
article_urls = wechats.get_recent_article_url_by_index_all()
for url in article_urls:
html_text = wechats.get_gzh_message(url=url)
try:
article_content = readability.Readability(html_text, url).content
except RuntimeError as e:
article_content = None
article_title = re.search('<title>(.*?)</title>', html_text).group(0)
html_string = etree.HTML(html_text)
try:
article_date = html_string.xpath('//em[@id="post-date"]/text()')[0]
except IndexError as e:
article_date = None
db.wechats.insert({'title': article_title, 'content': article_content, 'date': article_date})
但是发现好像就是上面简单的程序放在框架里很蛋疼,无法实现。
from wechatsogou.
Related Issues (20)
- 现在还可以获取微信的profile_url链接吗?
- 为什么报没有API接口的错误? HOT 2
- 现在import 就报错找不到模块是什么问题
- 请问这个项目还可以用吗,还在维护吗 HOT 1
- 获取不到公众号文章链接,profile_url为空 HOT 1
- bug: ModuleNotFoundError: No module named 'werkzeug.contrib' HOT 2
- 这代码咋使用,运行test里面的文件吗,通过不了,报下面错误,大佬怎么操作的
- 怎么获取微信公众号的biz
- [Bug report]有依赖损坏 HOT 1
- 关于验证码解决的问题。就是禁止验证码出现 HOT 7
- 网络请求太频繁,微信觉得框架异常,所以会出现验证码 HOT 1
- 模块已经安装,报错ModuleNotFoundError: No module named 'werkzeug.contrib' HOT 4
- 怎么解决验证码问题 HOT 4
- get_gzh_article_by_history文章列表为空 HOT 2
- 无法解析带有*的文章链接 HOT 1
- python3.8 安装完包执行报错 HOT 2
- ws_api.get_gzh_info 调用这个接口报这个错误 ('WechatSogouAPI get img', <Response [403]>)
- 运行demo没有反应,直接退出了 HOT 1
- 博主有联系方式吗?想谈合作~ HOT 2
- ('WechatSogouAPI get img', <Response [403]>)加上代理也一样报,如何解决403? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wechatsogou.