qdk_scrapy's Introduction

qdk_scrapy

使用 scrapy 下载青豆客图片

安装

python3
scrapy

使用

先在根目录下建立一个名为img的文件夹. 如果你想自定义位置, 修改./tutorial/settings.py文件, 将其中的IMAGES_STORE修改为你想要的位置.

初始链接是 http://www.aisinei.com/forum-qingdouke-1.html

内容链接类似于 http://www.aisinei.com/thread-12853-1-1.html

写了三个spider:

qdk : 下载一个内容链接
qdklist : 从初始链接开始, 爬去所有的内容链接
q : 整合前面两个, 下载所有内容链接

运行命令

cd 到根目录, 然后执行下面三个中的任意一个:

scrapy crawl qdk -o qdk.jl
scrapy crawl qdklist -o qdklist.jl
scrapy crawl q -o q.jl

使用第一个命令的时候务必在 quotes_spider.py 中修改 start_urls 为你想要下载的内容链接. 或者这样运行命令

scrapy crawl qdk -o qdk.jl -a start_url="example.com"

example.com 修改为你想下载的内容链接.

日志

默认 LOG_LEVEL = 'ERROR' 运行命令后无反应不要慌, 如果你想看到一连串输出, 修改为 INFO 即可.

错误

运行之后不能够保证一定能完整下载所有的图片, 尤其是运行scrapy crawl q -o q.jl. 遇到这种情况, 基本只能靠scrapy crawl qdk -o qdk.jl下载特定的内容链接.

微小的扩展性

理论上来说也适合这个页面上的板块.

青豆客只是其中之一. 简单试了下, 推女郎应该没问题. 记得修改相应的链接.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

ast-interview / qdk_scrapy Goto Github PK