Giter VIP home page Giter VIP logo

django-proxylist's Introduction

Django-ProxyList-For-Grab

Build Status https://coveralls.io/repos/gotlium/django-proxylist/badge.png?branch=master Current version on PyPi Downloads from PyPi

This application is useful for keep an updated list of proxy servers, it contains everything you need to make periodic checks to verify the properties of the proxies. Also you can periodically collect the proxy server from the Internet, remove broken and slow proxies.

Installing the package

django-proxylist-for-grab can be easily installed using pip:

$ pip install django-proxylist-for-grab

Configuration

After that you need to include django-proxylist-for-grab into your INSTALLED_APPS list of your django settings file.

INSTALLED_APPS = (
   ...
   'proxylist',
   ...
)

Add django-proxylist-for-grab into urls.py

urlpatterns = patterns(
   ...
   url(r'', include('proxylist.urls')),
   ...
)

django-proxylist-for-grab has a list of variables that you can configure throught django's settings file. You can see the entire list at Advanced Configuration.

Database creation

You have two choices here:

Using south

We ancourage recommend you using south for your database migrations. If you already use it you can migrate django-proxylist-for-grab:

$ python manage.py migrate proxylist

Using syncdb

If you don't want to use south you can make a plain syncdb:

$ python manage.py syncdb

Basic setup

At first, add a mirror. For working mirror, you need to install app on server with external ip. This is in order to be able to verify the correctness of data through proxy server. After adding mirror, you can add and test your proxies.

Asynchronously checking

django-proxylist-for-grab has configured by default to non-async check. You can change this behavior. Insert into your django settings PROXY_LIST_USE_CALLERY and change it to True.

After you need to install and configure django-celery and rabbit-mq.

For example on OS X

Packages installation

$ sudo pip install django-celery
$ sudo port install rabbitmq-server

Add the 'djcelery' application to 'INSTALLED_APPS' in settings

INSTALLED_APPS = (
   ...
   'djcelery',
   ...
)

Sync database

$ ./manage.py syncdb

Run rabbitmq and celery

$ sudo rabbitmq-server -detached
$ nohup python manage.py celery worker >& /dev/null &

Command line reference

update_proxies

Add new proxies from a file.

$ python manage.py update_proxies [file1] <file2> <...>

check_proxies

Check proxies availability and anonymity.

$ python manage.py check_proxies

grab_proxies

Search proxy list on internet

$ python manage.py grab_proxies

clean_proxies

Remove broken proxies

$ python manage.py clean_proxies

GrabLib usage example:

from proxylist import grabber

grab = grabber.Grab()

# Get your ip (You can do this a few times to see how the proxy will be changed)
grab.go('http://ifconfig.me/ip')
if grab.response.code == 200:
    print grab.response.body.strip()

# Get count of div on google page
grab.go('http://www.ya.ru/')
if grab.response.code == 200:
    print grab.doc.select('//script').number()

GrabLib Spider example:

# filename: apps/app/management/commands/spider.py
# usage: python manage.py spider
from django.core.management.base import BaseCommand
from grab.spider.base import Task
from proxylist.grabber import Spider


class SimpleSpider(Spider):
    initial_urls = ['http://www.lib.ru/']

    def task_initial(self, grab, task):
        grab.set_input('Search', 'linux')
        grab.submit(make_request=False)
        yield Task('search', grab=grab)

    def task_search(self, grab, task):
        if grab.doc.select('//b/a/font/b').exists():
            for elem in grab.doc.select('//b/a/font/b/text()'):
                print elem.text()


class Command(BaseCommand):
    help = 'Simple Spider'

    def handle(self, *args, **options):
        bot = SimpleSpider()
        bot.run()
        print bot.render_stats()
Bitdeli badge

django-proxylist's People

Contributors

gotlium avatar nilp0inter avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.