Giter VIP home page Giter VIP logo

fp-server's Introduction

fp-server

This code really sucks. I'll rewrite it when I have free time. 这项目写的很烂,仅供参考,后续有时间会重构……


Scrutinizer Build Python Version CocoaPods

A free proxy server based on Tornado and Scrapy.

Build your own proxy pool!

Features:

  • continuously crawling and providing free proxies
  • asynchronous and high-perfermance
  • automatically check proxies in cycle and ditch unavailable ones
  • easy-to-use HTTP api

免费代理服务器,基于TornadoScrapy,在本地搭建自己的代理池

  • 持续爬取新的免费代理,检测可用后存入本地数据库
  • 完全异步,支持高并发
  • 易用的HTTP API
  • 周期性检测代理可用性,自动更新

查看中文文档_(:ι」∠)_

This project has been tested on:

  • Archlinux; Python-3.6.5
  • Debian(WSL, Raspbian); Python-3.5.3

And it cannot directly run on Windows. Windows users may try using Docker or WSL to run this project.

Contents

Get started

Choose either one option as follows. After successful deployment, use the APIs to get proxies.

Using Docker

The easiest way to run this repo is using Docker. Install Docker and then run:

# download the image
docker pull karmenzind/fp-server:stable
# run the container
# don't forget to modify `-p` if you prefer another port
docker run -itd --name fpserver -p 12345:12345 karmenzind/fp-server:stable
# check the output inside the container
docker logs -f fpserver

For custom configuratiuon, see this section.

Manually install

  1. Install Redis and python>=3.5(I use Python-3.6.5).
  2. Clone this repo.
  3. Install python packages by:
pip install -r requirements.txt
  1. Read the config and modify it according to your need.
  2. Start the server:
python ./src/main.py

web APIs

typical response:

{
    "code": 0,
    "msg": "ok",
    "data": {}
}
  • code: result of event (not http code), 0 for sucess
  • msg: message for event
  • data: detail for sucessful event

get proxies

GET /api/proxy/
params Must/
Optional
detail default
count O the number of proxies you need 1
scheme O choices:HTTP HTTPS both*
anonymity O choices:transparent anonymous both
(TODO)
sort_by_speed
O choices:
1: desending order
0: no order
-1: ascending order
0
  • both: include all type, not grouped

example

  • To acquire 10 proxies in HTTP scheme with anonymity:
    GET /api/proxy/?count=10&scheme=HTTP&anonymity=anonymous
    
    The response:
    {
        "code": 0,
        "msg": "ok",
        "data": {
            "count": 9,
            "items": [
            {
                "port": 2000,
                "ip": "xxx.xxx.xx.xxx",
                "scheme": "HTTP",
                "url": "http://xxx.xxx.xxx.xx:xxxx",
                "anonymity": "transparent"
            }
            ]
        }
    }

screenshot

create new proxy manually

POST /api/proxy/
params Must/
Optional
detail default
ip M e.g. 111.111.111.111
port M e.g. 12345
scheme M choices:HTTP HTTPS
anonymity O choices:transparent anonymous transparent
need_auth O choices: 0 1
user O
password O
url O generated by given
scheme+ip+port

screenshot

check status

Check server status. Include:

  • Running spiders
  • Stored proxies
GET /api/status/

No params.

screenshot

Config

Introduction

I choose YAML language for configuration file. The defination and default value for supported items are:

# server's http port
HTTP_PORT: 12345

# redirect output to console other than log file
CONSOLE_OUTPUT: 1

# Log
# dir and filename requires `CONSOLE_OUTPUT: 0`
LOG: 
  level: 'debug'
  dir: './logs'
  filename: 'fp-server.log'

# redis database
REDIS:
  host: '127.0.0.1'
  port: 6379
  db: 0
  password:

# stop crawling new proxies
# after stored this many proxies
PROXY_STORE_NUM: 500

# Check availability in cycle
# It's for each single proxy, not the checker
PROXY_STORE_CHECK_SEC: 3600

Customization

  • If you use Docker:
    • Create a directory such as /x/config_dir and put your config.yml in it. Then modify the docker-run command like this:
      docker run -itd --name fpserver -p 12345:12345 -v "/x/config_dir":"/fps-config" karmenzind/fp-server:stable
      
    • External config.yml doesn't need to contain all config items. For example, it can be:
      PROXY_STORE_NUM: 100
      LOG:
          level: 'info'
      PROXY_STORE_CHECK_SEC: 7200
      
      And other items will be default values.
    • If you need to set a log file, don't modify LOG-dir in config.yml. Instead create a directory for log file such as /x/log_dir and change the docker-run command like:
      docker run -itd --name fpserver -p 12345:12345 -v "/x/config_dir":"/fps_config" -v "/x/log_dir":"/fp_server/logs" karmenzind/fp-server:stable
      
    • There's no need to modify the exposed port of the container. If you prefer publishing it to another port(say, 9999) on the host, change the -p parameter in docker-run command to -p 9999:12345
    • If you need to access the Redis from host, add a new publishing parameter like -p 6379:6379 to docker-run command.
  • If you manually deploy the project:
    • Modify the internal config file: src/config/common.py

Source webs

Growing……

If you knew good free-proxy websites, please tell me and I will add them to this project.

Supporting:

Thanks to: Golmic Eric_Chan

FAQ

  • How about the availability and quality of the proxies?

    Before storing new proxy, fp-server will check its availability, anonymity and speed based on your local network. So, feel free to use the crawled proxies.

  • How many PROXY_STORE_NUM should I set? Is there any limitation?

    You should set it depends on your real requirement. If your project is a normal spider, then 300-500 will be fair enough. I haven't set any limitation for now. After stored 10000 available proxies, I stopped testing. The upper limit is relevant to source websites. I will add more websites if more people use this project.

  • How to use it in my project?

    See the next section.

Examples

These code can be directly copied to your project. Remember to modify the configuration and settings at first.

I will write more snippets at leisure. Or you can tell me what example you want.

Use fp-server with Python requests module

Here.

Use fp-server in Scrapy Project

Here is a middleware for Scrapy to fetch and apply proxy for each request. Copy it to your middlewares.py and add the name to DOWNLOADER_MIDDLEWARES in your settings.py.

If you want to keep a cookie pool for your proxies(an independent cookiejar for each IP), this middleware may help you.

Bugs and feature requests

I need your feedback to make it better.
Please create an issue for any problems or advice.

Known bugs:

  • Block while using Tornado-4.5.3
  • Afer check, the redis key might change

TODOs and ideas

  • Use ZSET
  • Add supervisor
  • Divide log module
  • More detailed api
  • Web frontend via bootstrap
  • Add user-agent pool
  • the checker's scheduler:
    • Periodically calculating the average speed of checking request, then reassign the checker based on this average and the quantity of stored proxies.
  • Provide region information.
  • use redis's HSET for calculation

fp-server's People

Contributors

dependabot[bot] avatar karmenzind avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

fp-server's Issues

感谢一波

Your operating system and Python version?
centos7
版本为python34-->python36

成功用nginx + 代码 部署在服务器上面
其中遇到坑就是需要安装python36-devel 安装python-redis插件需要这个 。
感谢大佬

还有就是能不能在返回的字段加入返回的延时啊
还有有个代理叫31代理 网址是http://31f.cn/

显示:服务器内部错误

一直抓取不到代理
/api/status/
{"code": 0, "msg": "success", "data": {"spiders": [{"status": "stopped", "name": "coderbusy", "last_start_time": "1533775630"}, {"status": "stopped", "name": "kuaidaili", "last_start_time": "1533775630"}, {"status": "stopped", "name": "mix", "last_start_time": "1533775630"}, {"status": "stopped", "name": "data5u", "last_start_time": "1533775630"}, {"status": "stopped", "name": "xicidaili", "last_start_time": "1533775079"}, {"status": "stopped", "name": "checker", "last_start_time": "1533776171"}, {"status": "stopped", "name": "coolproxy", "last_start_time": "1533775079"}, {"status": "stopped", "name": "3464", "last_start_time": "1533775630"}, {"status": "stopped", "name": "yundaili", "last_start_time": "1533775630"}, {"status": "stopped", "name": "ip66", "last_start_time": "1533775630"}], "proxies": {"total": 0, "detail": {"http": 0, "https": 0, "transparent": 0, "anonymous": 0}}}}
/api/spider/run_all/
{"code": 500, "msg": "\u670d\u52a1\u5668\u5185\u90e8\u9519\u8bef", "data": {}}
显示:服务器内部错误
环境:Debian 9 x64 (stretch),python3.6.5

utils/tools.py 是不是有问题

函数recuresive_update
old_value为字符串的时候, value为list会报错。value为tuple时,old_value字符串转成list失去了本来的意义了

老大的程序很稳定

`1
爬虫名称: xicidaili
运行状态: 正在运行
最后运行时间: 2018-08-17 23:42:02
运行时长: 0年2天0小时48分58秒

2
爬虫名称: coolproxy
运行状态: 正在运行
最后运行时间: 2018-08-17 23:42:02
运行时长: 0年2天0小时48分58秒

3
爬虫名称: checker
运行状态: 正在运行
最后运行时间: 2018-08-17 23:38:00
运行时长: 0年2天0小时53分0秒

4
爬虫名称: data5u
运行状态: 停止
最后运行时间: 2018-08-17 23:32:00
运行时长: 0秒

5
爬虫名称: yundaili
运行状态: 正在运行
最后运行时间: 2018-08-17 23:42:02
运行时长: 0年2天0小时48分58秒

6
爬虫名称: ip66
运行状态: 正在运行
最后运行时间: 2018-08-17 14:50:27
运行时长: 0年2天9小时40分33秒

7
爬虫名称: 3464
运行状态: 正在运行
最后运行时间: 2018-08-17 23:42:02
运行时长: 0年2天0小时48分58秒

8
爬虫名称: coderbusy
运行状态: 正在运行
最后运行时间: 2018-08-17 23:42:02
运行时长: 0年2天0小时48分58秒

9
爬虫名称: kuaidaili
运行状态: 正在运行
最后运行时间: 2018-08-17 13:00:11
运行时长: 0年2天11小时30分49秒

10
爬虫名称: mix
运行状态: 停止
最后运行时间: 2018-08-17 23:32:00
运行时长: 0秒

代理总计: 265681
其中http: 130667
其中https: 135014
其中透明代理: 26667
其中匿名代理: 239014

[页面执行时间:7.3968110084534 秒]`

[建议]能否精简一下api,感觉过于繁琐,许多字段用不上的.

/api/proxy/

{"code": 0, "msg": "success", "data": {"count": 1, "detail": [{"ip": "91.196.39.196", "scheme": "https", "port": "32585", "need_auth": "0", "url": "https://91.196.39.196:32585", "anonymity": "anonymous"}]}}

可否改为或加个简单版如/api/proxy/simple/

{"code": 0, "msg": "success", "data": {"count": 1, "detail": [{"scheme": "https", "ip": "91.196.39.196:32585"}]}}

或更加干脆的直接

无用网站!!!

经测试,IP海已经停止服务(访问生么页面都是404/500)
希望修改一下站点列表

报错了,centos+py3.7.1

     File "/usr/local/lib/python3.7/site-packages/scrapy/middleware.py", line 34, in from_settings
        mwcls = load_object(clspath)
      File "/usr/local/lib/python3.7/site-packages/scrapy/utils/misc.py", line 44, in load_object
        mod = import_module(module)
      File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
      File "<frozen importlib._bootstrap>", line 983, in _find_and_load
      File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
      File "<frozen importlib._bootstrap_external>", line 728, in exec_module
      File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
      File "/usr/local/lib/python3.7/site-packages/scrapy/extensions/telnet.py", line 12, in <module>
        from twisted.conch import manhole, telnet
      File "/usr/local/lib/python3.7/site-packages/twisted/conch/manhole.py", line 154
        def write(self, data, async=False):
                                  ^
    SyntaxError: invalid syntax

我希望大佬能够写一篇关于这个项目大致思路,或者关于tornado于scrapy进行交互的文章

在网上面找了一些关于scrapy嵌入到web的方式,但是都有些不太满意,而且主要看到了Django和scrapy,或者是通过scrapyd来进行操控。

这个项目源码我也看了一些,但是异步的部分是在是不怎么懂,特别是在tornado中get,post方法那些地方,没有找到大佬究竟做了什么事,那些属性或者方法是在什么地方写进去的。
image

确实很多不理解的地方,我学的还是太浅了

就是获取不到ip

main.py应该是能运行
api也能访问,但是获取的 内柔就是count为0 是main.py无法存储数据到redis的原因吗

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.