Giter VIP home page Giter VIP logo

universalproxyscraper's Introduction

Contributors Forks Stargazers Issues AGPL 3.0 License Telegram


Universal Proxy Scraper BETA v 0.1.6

Need some proxys but don't want to scrap them manually?, just give this script the domain!
Give the project a star!

Report Bug · Request Feature

About The Project

Example of the code imported as module

Module

Output

Output

Hi there! The purpose of this script is to demonstrate the power and the functions that will be implemented in the future module of Universal Proxy Scraper, at first, this will be developed until the v 1.0.0 came out, that version will be the first version deployed as module :)

Modules used

Just built-in modules! (Python >= 3.0)

Getting Started

Let's get to it!

Script usage

Setting up the list

To set-up the websites you wanna get the proxies from you have to place every single URL you wanna scrape into a list, just as:

http://free-proxy.cz/es/
https://free-proxy-list.net/
http://www.freeproxylists.net/
https://hidemy.name/es/proxy-list/#list

(For a better reference see the test_urls.txt that is in this same repository).

Using via command-line

The usage of command line is pretty simple :D Ex.

path/to/the/script: python main.py -h


██╗   ██╗   ██████╗ ███████╗    █████╗  ██████╗
██║   ██║   ██╔══██╗██╔════╝██╗██╔══██╗██╔═████╗
██║   ██║   ██████╔╝███████╗╚═╝╚█████╔╝██║██╔██║
██║   ██║   ██╔═══╝ ╚════██║██╗██╔══██╗████╔╝██║
╚██████╔╝██╗██║██╗  ███████║╚═╝╚█████╔╝╚██████╔╝
 ╚═════╝ ╚═╝╚═╝╚═╝  ╚══════╝    ╚════╝  ╚═════╝

            Proxy
Universal           Scraper | Your ideal proxy scraper ;)
       by: @freshSauce
           0.1.6

usage: main.py [-h] -f FILE [-o] [-q QUANTITY] [-v] [-p]

Command-line option for the Universal Scraper

optional arguments:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  name of the file with the sites
  -o, --output          if used, stores the scraped proxies
  -q QUANTITY, --quantity QUANTITY
                        if used, stores the scraped proxies
  -v, --verify          if used, verify every single proxy and returns the live ones
  -p, --print           if used, prints out the obtained list of proxies


As you may see, there's a lot of options you can use :)

  • file (required, value needed) : path or name of the file that contains all the webistes you want to scrape.
  • output (optional, no value needed) : if used, writes a file named "output.txt" with every single scraped proxy.
  • quantity (optional, value needed, 10 by default) : it declares the quantity of proxies to be scraped.
  • verify (optional, no value needed) : if used, verifys every single proxy scraped, and returns the list with those that are alive.
  • print (optional, no value needed) : if used, prints out the list that contains all the proxies.
Example with every single argument
path/to/the/script: python main.py -f test_urls.txt -p -o -v -q 5

██╗   ██╗   ██████╗ ███████╗    █████╗  ██████╗
██║   ██║   ██╔══██╗██╔════╝██╗██╔══██╗██╔═████╗
██║   ██║   ██████╔╝███████╗╚═╝╚█████╔╝██║██╔██║
██║   ██║   ██╔═══╝ ╚════██║██╗██╔══██╗████╔╝██║
╚██████╔╝██╗██║██╗  ███████║╚═╝╚█████╔╝╚██████╔╝
 ╚═════╝ ╚═╝╚═╝╚═╝  ╚══════╝    ╚════╝  ╚═════╝

            Proxy
Universal           Scraper | Your ideal proxy scraper ;)
       by: @freshSauce
           0.1.6

Connection to http://free-proxy.cz/es/ timed out
Proxies obtained !!!
['172.67.181.214:80', '172.67.80.190:80', '45.82.139.34:4443', '188.168.56.82:55443', '150.129.54.111:6667']
Everything is done !!! Wanna get more proxies? (Y[es]/N[o]): n
Have a nice day !!!

Setting up our code

In order to make it work with our own code we have to import it as module, just like:

from main import ProxyScraper

There's no need to import it as 'main', you can change the script's name and import it with the name you gave to the script. Now, once you done that you can use it as you please.

# Storing it on a variable
proxy_scraper = ProxyScraper('test_urls.txt')

proxy_list = proxy_scraper.Proxies()

# Iterating through each proxy

for proxy in ProxyScraper('test_urls.txt').Proxies()
    ...

# Saving the proxies to a file

proxy_scraper = ProxyScraper('test_urls.txt', output=True)

proxy_list = proxy_scraper.Proxies() # This will give you the scraped proxies and save them into a file.

Usage

It's pretty easy-to-use! just make sure to pass the URLs correctly and you're ready to go!

from main import ProxyScraper

proxy_list = ProxyScraper('test_urls.txt').Proxies() # Will save the proxies list on a variable

ProxyScraper('test_urls.txt', output=True).Proxies() # Will save the output into an output file

proxy_list = ProxyScraper('test_urls.txt').Proxies(quantity=15) # Will save 15 of the scraped proxies into a variable (10 by default)

proxy_list = ProxyScraper('test_urls.txt', check=True).Proxies(quantity=15) # Will save 15 of the scraped proxies and will check each one of them

Hope it is useful for you!

Contributing

Wanna contribute to the project? Great! Please follow the next steps in order to submit any feature or bug-fix :) You can also send me your ideas to my Telegram, any submit is greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the AGPL-3.0 License. See LICENSE for more information.

Contact

Telegram: - @freshSauce

Project Link: https://github.com/freshSauce/UniversalProxyScraper

Changelog

0.1.6

  • Added custom exceptions plus minor changes.

0.1.5

  • Added command-line support (yeah, no 0.1.3 nor 0.1.4, heh)

0.1.2

  • Added support to the first specific site: spys.one.

Now, I want to say that, if needed, I will create specific scripts for specific sites, this doesn't mean that I won't keep looking for an 'universal' solution, is just that sites like that one are pretty much different from the others.

Module created for that site.

0.1.1

  • Added support to some sites with JS-based write, such as: 'document.write'.
  • Added handlers for some exceptions.

0.1.0

  • Added proxy checker function
  • Fixed some typos on the script documentation.

universalproxyscraper's People

Contributors

freshsauce avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

universalproxyscraper's Issues

It's always crashing

Traceback (most recent call last):
  File "C:\Proxys\UniversalProxyScraper-master\main.py", line 358, in <module>
    result = ProxyScraper(args['file'], args['output'], args['verify']).Proxies(args['quantity'])
  File "C:\Proxys\UniversalProxyScraper-master\main.py", line 320, in Proxies
    proxy_list = self.__scrape(html['html'], html['site'])
  File "C:\Proxys\UniversalProxyScraper-master\main.py", line 190, in __scrape
    table_body = re.search(pattern, html.strip()).group()
AttributeError: 'NoneType' object has no attribute 'group'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.