Giter VIP home page Giter VIP logo

proxyfetch's Introduction

ProxyFetch

简介

​ 使用python的requests库,实现对三个IP代理网站的免费IP代理的爬取,使用协程进行运行三个爬虫加快整体爬取速度,使用Ubuntu的cron服务来定时刷新MySQL数据库里面的数据,并检测获得的IP是否可用并刷选保留可用IP代理。另外写了一个ProxyPool代理爬虫项目可以通过简单的添加xpath页面解析规则和添加item清理即可完成一个新代理网站的开发,这里提到爬取的代理网站ProxyPool代理爬虫项目里都有,详情可到项目里看~

​ 爬取的代理网站有网站一www.goubanjia.com,网站二proxy.mimvp.com和网站三www.xicidaili.com三个网站。

​ 网站一对页面数据进行加密了,所以直接用selenium来获取js渲染后的页面。另外该网站对IP显示加入了噪音,最后采用正则获取IP地址。

​ 网站二端口字段的值用图片显示,用简单的向量空间搜索算法进行图片识别获得端口号。

​ 网站三很简单...

Requirements

​ Ubuntu 17.10

​ python3 (anaconda3)

​ requests

​ gevent

​ DButils

​ Pillows

​ 以上第三方库可以使用requirements直接安装,pip install -r requirements, 即可配置python环境需求,此外还需要用到MySQL数据库。

使用方法

​ 配置好环境,直接运行run_fetch.py即可

说明

​ 仅作学习交流使用!

proxyfetch's People

Contributors

barnettxxf avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

gwtale

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.