The universalproxyscraper from freshsauce

Universal Proxy Scraper BETA v 0.1.6

Need some proxys but don't want to scrap them manually?, just give this script the domain!
Give the project a star!

Report Bug · Request Feature

About The Project

Example of the code imported as module

Output

Hi there! The purpose of this script is to demonstrate the power and the functions that will be implemented in the future module of Universal Proxy Scraper, at first, this will be developed until the v 1.0.0 came out, that version will be the first version deployed as module :)

Modules used

urllib3
re

Just built-in modules! (Python >= 3.0)

Getting Started

Let's get to it!

Script usage

Setting up the list

To set-up the websites you wanna get the proxies from you have to place every single URL you wanna scrape into a list, just as:

http://free-proxy.cz/es/
https://free-proxy-list.net/
http://www.freeproxylists.net/
https://hidemy.name/es/proxy-list/#list

(For a better reference see the test_urls.txt that is in this same repository).

Using via command-line

The usage of command line is pretty simple :D Ex.

path/to/the/script: python main.py -h


██╗   ██╗   ██████╗ ███████╗    █████╗  ██████╗
██║   ██║   ██╔══██╗██╔════╝██╗██╔══██╗██╔═████╗
██║   ██║   ██████╔╝███████╗╚═╝╚█████╔╝██║██╔██║
██║   ██║   ██╔═══╝ ╚════██║██╗██╔══██╗████╔╝██║
╚██████╔╝██╗██║██╗  ███████║╚═╝╚█████╔╝╚██████╔╝
 ╚═════╝ ╚═╝╚═╝╚═╝  ╚══════╝    ╚════╝  ╚═════╝

            Proxy
Universal           Scraper | Your ideal proxy scraper ;)
       by: @freshSauce
           0.1.6

usage: main.py [-h] -f FILE [-o] [-q QUANTITY] [-v] [-p]

Command-line option for the Universal Scraper

optional arguments:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  name of the file with the sites
  -o, --output          if used, stores the scraped proxies
  -q QUANTITY, --quantity QUANTITY
                        if used, stores the scraped proxies
  -v, --verify          if used, verify every single proxy and returns the live ones
  -p, --print           if used, prints out the obtained list of proxies

As you may see, there's a lot of options you can use :)

file (required, value needed) : path or name of the file that contains all the webistes you want to scrape.
output (optional, no value needed) : if used, writes a file named "output.txt" with every single scraped proxy.
quantity (optional, value needed, 10 by default) : it declares the quantity of proxies to be scraped.
verify (optional, no value needed) : if used, verifys every single proxy scraped, and returns the list with those that are alive.
print (optional, no value needed) : if used, prints out the list that contains all the proxies.

Example with every single argument

path/to/the/script: python main.py -f test_urls.txt -p -o -v -q 5

██╗   ██╗   ██████╗ ███████╗    █████╗  ██████╗
██║   ██║   ██╔══██╗██╔════╝██╗██╔══██╗██╔═████╗
██║   ██║   ██████╔╝███████╗╚═╝╚█████╔╝██║██╔██║
██║   ██║   ██╔═══╝ ╚════██║██╗██╔══██╗████╔╝██║
╚██████╔╝██╗██║██╗  ███████║╚═╝╚█████╔╝╚██████╔╝
 ╚═════╝ ╚═╝╚═╝╚═╝  ╚══════╝    ╚════╝  ╚═════╝

            Proxy
Universal           Scraper | Your ideal proxy scraper ;)
       by: @freshSauce
           0.1.6

Connection to http://free-proxy.cz/es/ timed out
Proxies obtained !!!
['172.67.181.214:80', '172.67.80.190:80', '45.82.139.34:4443', '188.168.56.82:55443', '150.129.54.111:6667']
Everything is done !!! Wanna get more proxies? (Y[es]/N[o]): n
Have a nice day !!!

Setting up our code

In order to make it work with our own code we have to import it as module, just like:

from main import ProxyScraper

There's no need to import it as 'main', you can change the script's name and import it with the name you gave to the script. Now, once you done that you can use it as you please.

# Storing it on a variable
proxy_scraper = ProxyScraper('test_urls.txt')

proxy_list = proxy_scraper.Proxies()

# Iterating through each proxy

for proxy in ProxyScraper('test_urls.txt').Proxies()
    ...

# Saving the proxies to a file

proxy_scraper = ProxyScraper('test_urls.txt', output=True)

proxy_list = proxy_scraper.Proxies() # This will give you the scraped proxies and save them into a file.

Usage

It's pretty easy-to-use! just make sure to pass the URLs correctly and you're ready to go!

from main import ProxyScraper

proxy_list = ProxyScraper('test_urls.txt').Proxies() # Will save the proxies list on a variable

ProxyScraper('test_urls.txt', output=True).Proxies() # Will save the output into an output file

proxy_list = ProxyScraper('test_urls.txt').Proxies(quantity=15) # Will save 15 of the scraped proxies into a variable (10 by default)

proxy_list = ProxyScraper('test_urls.txt', check=True).Proxies(quantity=15) # Will save 15 of the scraped proxies and will check each one of them

Hope it is useful for you!

Contributing

Wanna contribute to the project? Great! Please follow the next steps in order to submit any feature or bug-fix :) You can also send me your ideas to my Telegram, any submit is greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Distributed under the AGPL-3.0 License. See LICENSE for more information.

Contact

Telegram: - @freshSauce

Project Link: https://github.com/freshSauce/UniversalProxyScraper

Changelog

0.1.6

Added custom exceptions plus minor changes.

0.1.5

Added command-line support (yeah, no 0.1.3 nor 0.1.4, heh)

0.1.2

Added support to the first specific site: spys.one.

Now, I want to say that, if needed, I will create specific scripts for specific sites, this doesn't mean that I won't keep looking for an 'universal' solution, is just that sites like that one are pretty much different from the others.

Module created for that site.

0.1.1

Added support to some sites with JS-based write, such as: 'document.write'.
Added handlers for some exceptions.

0.1.0

Added proxy checker function
Fixed some typos on the script documentation.

freshsauce / universalproxyscraper Goto Github PK

universalproxyscraper's Introduction