Giter VIP home page Giter VIP logo

epoch's Introduction

背景

基于Python2.7.13的Scrapy爬虫框架。

部署步骤

.WIN部署

依赖安装好,修改settings.py文件中相关目录配置, 本地启动就可以。。。 相关依赖请看Linux部署。

.LINUX部署

1. Python配置:

下载:https://www.python.org/ftp/python/2.7.13/Python-2.7.13.tgz

解压到服务器目录: tar -zxf Python-2.7.13.tgz

进入Python-2.7.13目录: cd Python-2.7.13

编译&安装:
./configure --enable-shared --enable-loadable-sqlite-extensions --with-zlib 其中--enable-loadable-sqlite-extensions是sqlite的扩展

vi ./Modules/Setup
找到#zlib zlibmodule.c -I$(prefix)/include -L$(exec_prefix)/lib -lz去掉注释并保存
make && make install

验证:
使用 python -V 命令 查看python版本。

2. 安装scrapyd服务

步骤:http://blog.csdn.net/xxwang6276/article/details/45745181

3. phantomjs配置

下载:https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
解压到服务器目录: tar jxvf phantomjs-2.1.1-linux-x86_64.tar.bz2
修改目录名称: mv phantomjs-2.1.1-linux-x86_64 phantomjs-2.1.1

4. 安装其他依赖

pip install scrapyd-client:部署Scrapy到Scrapyd-client中
pip install pymysql: python MySql
pip install sqlalchemy: Python Mysql 依赖注入框架
pip install Twisted
pip install Scrapy: 爬虫框架

5. 运行

本地运行 scrapy crawl spiders名称

PS:由于各公司大神各显神通,不断的修改&提高反爬虫策略,爬虫和反爬虫工程师之间一直处于进攻和防御状态;程序没有持续维护,可能后期无法爬取数据;需稍加修改爬虫策略。。。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.