pgaref / http_request_randomizer Goto Github PK

View Code? Open in Web Editor NEW

146.0 11.0 58.0 5.95 MB

Proxying Python Requests

Home Page: http://pgaref.com/blog/python-proxy/

License: MIT License

Python 100.00%

proxy http python user-agent proxies anonymity requests

http_request_randomizer's Introduction

HTTP Request Randomizer

Vietnamese version

A convenient way to implement HTTP requests is using Pythons' requests library. One of requests’ most popular features is simple proxying support. HTTP as a protocol has very well-defined semantics for dealing with proxies, and this contributed to the widespread deployment of HTTP proxies

Proxying is very useful when conducting intensive web crawling/scrapping or when you just want to hide your identity (anonymization).

In this project I am using public proxies to randomise http requests over a number of IP addresses and using a variety of known user agent headers these requests look to have been produced by different applications and operating systems.

Proxies

Proxies provide a way to use server P (the middleman) to contact server A and then route the response back to you. In more nefarious circles, it's a prime way to make your presence unknown and pose as many clients to a website instead of just one client. Often times websites will block IPs that make too many requests, and proxies is a way to get around this. But even for simulating an attack, you should know how it's done.

User Agent

Surprisingly, the only thing that tells a server the application triggered the request (like browser type or from a script) is a header called a "user agent" which is included in the HTTP request.

The source code

The project code in this repository is crawling five different public proxy websites:

After collecting the proxy data and filtering the slowest ones it is randomly selecting one of them to query the target url. The request timeout is configured at 30 seconds and if the proxy fails to return a response it is deleted from the application proxy list. I have to mention that for each request a different agent header is used. The different headers are stored in the /data/user_agents.txt file which contains around 900 different agents.

Installation

If you wish to use this module as a CLI tool, install it globally via pip:

  pip install http-request-randomizer

Otherwise, you can clone the repository and use setup tools:

python setup.py install

Dev testing

Clone repo, install requirements, develop and run tests:

pip install -r requirements.txt
tox -e pyDevVerbose

Command-line interface

Assuming that you have http-request-randomizer installed, you can use the commands below:

show help message:

proxyList   -h, --help

specify proxy provider(s) (required):

  -s {proxyforeu,rebro,samair,freeproxy,all}

Specify output stream (default: sys.stdout), could also be a file:

  -o, --outfile

specify provider timeout threshold in seconds:

  -t, --timeout

specify proxy bandwidth threshold in KBs:

  -bw, --bandwidth

show program's version number:

  -v, --version

API

To use http-request-randomizer as a library, include it in your requirements.txt file. Then you can simply generate a proxied request using a method call:

import logging
import time
from http_request_randomizer.requests.proxy.requestProxy import RequestProxy

if __name__ == '__main__':

    start = time.time()
    req_proxy = RequestProxy(log_level=logging.ERROR)
    print("Initialization took: {0} sec".format((time.time() - start)))
    print("Size: {0}".format(len(req_proxy.get_proxy_list())))
    print("ALL = {0} ".format(list(map(lambda x: x.get_address(), req_proxy.get_proxy_list()))))

    test_url = 'http://ipv4.icanhazip.com'

    while True:
        start = time.time()
        request = req_proxy.generate_proxied_request(test_url)
        print("Proxied Request Took: {0} sec => Status: {1}".format((time.time() - start), request.__str__()))
        if request is not None:
            print("\t Response: ip={0}".format(u''.join(request.text).encode('utf-8')))
        print("Proxy List Size: {0}".format(len(req_proxy.get_proxy_list())))

        print("-> Going to sleep..")
        time.sleep(10)

Changing log levels

The RequestProxy constructor accepts an optional parameter of log_level that can be used to change the level of logging. By default, this is equal to 0, or NOTSET. The python logging levels are documented here. You can either use integers or their equivalent constant in the logging module. (e.g. logging.DEBUG, logging.ERROR, etc)

Documentation

http-request-randomizer documentation

Contributing

Many thanks to the open-source community for contributing to this project!

Faced an issue?

Open an issue here, and be as detailed as possible :)

Feels like a feature is missing?

Feel free to open a ticket! PRs are always welcome!

License

This project is licensed under the terms of the MIT license.

http_request_randomizer's People

Contributors

Stargazers

Watchers

http_request_randomizer's Issues

Error Handling

Since we are using HTML parsers and each proxy list web page can change radically overnight it would really make sense to have some custom exception handling trying to translate that kind of errors

Expose Rest API

question

anyway to speed up or set timeout for finding proxies so it doesn't hang for ~5 + minutes trying to find suitable proxy?
ie, don't find suitable proxy in x amount of time, skip etc?

Wrong author

Samair.ru and rebro.weebly.com parsers were developed by me. Even if you separete it by modules (which I agree with), I'd should appear as author.

Commits:
b585c5d @ieguiguren added samair.ru to web parsers
83cebd5 @ieguiguren added samair parsing
70f1602 @ieguiguren added http://rebro.weebly.com/proxy-list.html

Test cases

Implement test cases to examine if provider layout changed.

How To Make Post Requests

Can Someone Help me of how to use http-requests-randomizer and use random proxies cuz in the docs there is example of only GET request not post request plz plz help me

Typo in installation instructions at pypi page

The pypi page for the module at https://pypi.python.org/pypi/http-request-randomizer tells user to install the package using command pip install http-requests-randomizer instead of pip install http-request-randomizer i.e. 'requests' instead of 'request'

Support Geonode proxy provider

As an addition to #36
Support https://geonode.com/free-proxy-list/

BS4 error from samair proxy parser

File "/usr/lib/python3.6/site-packages/http_request_randomizer/requests/proxy/requestProxy.py", line 55, in __init__ self.proxy_list += parsers[i].parse_proxyList() File "/usr/lib/python3.6/site-packages/http_request_randomizer/requests/parsers/SamairProxyParser.py", line 42, in parse_proxyList headings = [th.get_text() for th in table.find("tr").find_all("th")] AttributeError: 'NoneType' object has no attribute 'find'

How to import the library to create the RequestProxy() object

Hi, as above:
I have tried importing RequestProxy but with no success.

from project.http.requests.proxy.requestProxy import RequestProxy

Following the style of:

from project.http.requests.parsers.UrlParser import UrlParser

The error returned is:

Traceback (most recent call last):
File "crawler_copy.py", line 18, in
requests_proxy = RequestProxy()
File "/usr/local/lib/python2.7/dist-packages/http_request_randomizer-0.0.3-py2.7.egg/project/http/requests/proxy/requestProxy.py", line 21, in init
self.userAgent = UserAgentManager()
File "/usr/local/lib/python2.7/dist-packages/http_request_randomizer-0.0.3-py2.7.egg/project/http/requests/useragent/userAgent.py", line 8, in init
self.useragents = self.load_user_agents(self.agent_file)
File "/usr/local/lib/python2.7/dist-packages/http_request_randomizer-0.0.3-py2.7.egg/project/http/requests/useragent/userAgent.py", line 16, in load_user_agents
with open(useragentsfile, 'rb') as uaf:
IOError: [Errno 2] No such file or directory: '/usr/local/lib/python2.7/dist-packages/http_request_randomizer-0.0.3->py2.7.egg/project/http/requests/useragent/../data/user_agents.txt'

How can I import the library correctly? Please advise. Thank you

webdriver hangs with this proxy

Selenium webdriver stuck on loading the page and gives timeout error or network error.

Update documentation

Document ProxyObject attributes including Country and Anonymity etc.
Document #5 and usage examples

Info should be included both in README and docs

Allowing log_level to be set during instantiation of RequestProxy

There is no current way to change RequestProxy's logging level as its hardcoded as part of init.

We should allow passing in an optional LOG level.

[Suggestion] use fake-useragent package

Hi,
Thanks for your package. I suggest use https://pypi.python.org/pypi/fake-useragent for user_agent part. It seems good.
Here is get random user_agent steps with this package:

from fake_useragent import UserAgent
ua = UserAgent()
print(ua.random)

HTML parser + JS evaluation

Some providers like Samair have javascript code that must be evaluated in order to retrieve proxy addresses.
This can not be done solely by BeautifulSoup which is just a parser.

Library not found error

I launch the script from the 'proxy' subdirectory in GNU/Linux. This causes to get the following error:
[~/dev/HTTP_Request_Randomizer/project/http/requests/proxy]>python requestProxy.py
Traceback (most recent call last):
File "requestProxy.py", line 1, in
from project.http.requests.parsers.freeproxyParser import freeproxyParser
ImportError: No module named project.http.requests.parsers.freeproxyParser

to solve it, I had to add the project's directory to the path:
+import sys
+import os
+sys.path.insert(0, os.path.abspath('../../../../'))

I didn't want to push it but, as long as the lines remain when I pushed the commit, I'd rather explain why they are there. How do you call the script to avoid having to add these lines?

Bump support to python 3

File not found no such file or directory

Code runs well while I run the .py file but as soon as i convert it into .exe file via pyinstaller by typing :
pyinstaller --onefile "Filename.py"
and then go and run the .exe file I receive error

I have installed this as : pip install http-request-randomizer
pyinstaller : pyinstaller --onefile Filename.py

Error occurring in .exe file only

Moving project documentation under GH pages

Adding some sort of callback?

These free proxies we parse from the web are often unreliable and therefore result in an unsuccessful request. I think it would be nice to have some kind of callback to handle this situation. Like on_success() and on_failure() or something.. It would make it asynchronous which would also eliminate the (sometimes very) long waiting time before we get a response from the proxy. I found a possible way to do this without any additional library. I'm thinking about something like this: https://stackoverflow.com/a/44917020
So to be clear, there are two things here i'm suggesting. 1) to make it async and 2) to have an optional callback when the request returns or fails.
What do you think? How should this be implemented?

Address MultiEnum incompatibility issue with Python 3.6+

As part of (#30) and proxyObject we introduced MultiEnums. The jira below describes the issue
https://bitbucket.org/stoneleaf/enum34/issues/19/enum34-isnt-compatible-with-python-36

Possible solution?
grpc/grpc#10227

Why does this site not work with the library?

I'm using a free API that has a daily download limit.
The endpoint is for example: https://www.srrdb.com/download/file/Far.Cry.DVDRip-MYTH/myth.nfo
It should download a simple text file, but the daily limit is around 6-700 requests.
I've tested it to reach the limit using this code:

import requests
from time import sleep

url = 'https://www.srrdb.com/download/file/Far.Cry.DVDRip-MYTH/myth.nfo'

i = 1
while True:
    r = requests.get(url)
    print('#{} {}'.format(i, r.status_code))
    if r.status_code != 200:
        print(r.text)
        exit()
    i += 1
    sleep(1)

After that I tried to use this library but despite it says in the logs that it using different proxies for each requests, it always returns 429 status code.

import time
from http_request_randomizer.requests.proxy.requestProxy import RequestProxy

if __name__ == '__main__':
    req_proxy = RequestProxy()
    
    test_url = 'https://www.srrdb.com/download/file/Far.Cry.DVDRip-MYTH/myth.nfo'
    status = 0
    while status != 200:
        request = req_proxy.generate_proxied_request(test_url)
        status = request.status_code
        if status == 200:
            print('OK')
        time.sleep(2)

What am I missing here? Thank you

Empty proxy list

Proof https://replit.com/@d3banjan/KhakiStarchyCustomer#main.py

The following code fails ---

from random import choice
from http_request_randomizer.requests.proxy.requestProxy import RequestProxy

proxies = RequestProxy().get_proxy_list()

PROXY = choice(proxies).get_address()

print(PROXY, type(PROXY))

... with the following error message --

2021-08-18 10:14:57,091 http_request_randomizer.requests.useragent.userAgent INFO     Using local file for user agents: /opt/virtualenvs/python3/lib/python3.8/site-packages/http_request_randomizer/requests/proxy/../data/user_agents.txt
2021-08-18 10:14:57,093 root   DEBUG    === Initialized Proxy Parsers ===
2021-08-18 10:14:57,093 root   DEBUG         FreeProxy parser of 'http://free-proxy-list.net' with required bandwidth: '150' KBs
2021-08-18 10:14:57,093 root   DEBUG         PremProxy parser of 'https://premproxy.com/list/' with required bandwidth: '150' KBs
2021-08-18 10:14:57,093 root   DEBUG         SslProxy parser of 'https://www.sslproxies.org' with required bandwidth: '150' KBs
2021-08-18 10:14:57,093 root   DEBUG    =================================
2021-08-18 10:14:57,525 http_request_randomizer.requests.parsers.FreeProxyParser ERROR    Provider FreeProxy failed with Attribute error: 'NoneType' object has no attribute 'find'
2021-08-18 10:14:57,526 root   DEBUG    Added 0 proxies from FreeProxy
2021-08-18 10:14:58,051 http_request_randomizer.requests.parsers.PremProxyParser WARNING  Proxy Provider url failed: https://premproxy.com/list/
2021-08-18 10:14:58,051 http_request_randomizer.requests.parsers.PremProxyParser DEBUG    Pages: set()
2021-08-18 10:14:58,465 http_request_randomizer.requests.parsers.PremProxyParser WARNING  Proxy Provider url failed: https://premproxy.com/list/
2021-08-18 10:14:58,466 root   DEBUG    Added 0 proxies from PremProxy
2021-08-18 10:14:58,792 http_request_randomizer.requests.parsers.SslProxyParser ERROR    Provider SslProxy failed with Attribute error: 'NoneType' object has no attribute 'find'
2021-08-18 10:14:58,792 root   DEBUG    Added 0 proxies from SslProxy
2021-08-18 10:14:58,792 root   DEBUG    Total proxies = 0
2021-08-18 10:14:58,792 root   DEBUG    Filtered proxies = 0
Traceback (most recent call last):
  File "main.py", line 4, in <module>
    proxies = RequestProxy().get_proxy_list()
  File "/opt/virtualenvs/python3/lib/python3.8/site-packages/http_request_randomizer/requests/proxy/requestProxy.py", line 69, in __init__
    self.current_proxy = self.randomize_proxy()
  File "/opt/virtualenvs/python3/lib/python3.8/site-packages/http_request_randomizer/requests/proxy/requestProxy.py", line 86, in randomize_proxy
    raise ProxyListException("list is empty")
http_request_randomizer.requests.errors.ProxyListException.ProxyListException: list is empty

Categorise by anonymity level

root   DEBUG    Using proxy: http://182.23.28.180:3128
RR Status 200
Proxied Request Took: 2.21629214287 sec => Status: <Response [200]>
	 Response: ip=86.19.251.144,182.23.28.180

Querying canihazmyip lately we get either one, or two comma separated values.
In the example above the first one is our real IP and the second one is the proxy.

In general HTTP proxy servers, upon receiving a request from a client/user, append a new field (X-Forwarded-For) in the HTTP header and this is how icanhazip knows if we are using a proxy or not! The X-Forwarded-For field has the client's IP address and by analyzing it a website can figure out the real IP address.

Of course, different proxy servers provide different levels of anonymity - that's why in some cases you we see only the proxy IP in the response! In that case, the proxies do not include such headers in their requests. More examples of such headers: https://github.com/major/icanhaz/blob/master/icanhaz.py

An interesting feature would be to categorize the proxy providers by the level of anonymity they provide!

Empty Proxy List

I am running http_request_randomizer via your python api on python 3.6 on OSX 10.13.1. Running the method call recommended here results in the following log and error. The list of proxies is apparently empty due to various errors in the proxy list parser requests. Is this something I'm doing wrong or a bug?


2018-02-11 19:38:45,974 root   DEBUG    === Initialized Proxy Parsers ===
2018-02-11 19:38:45,977 root   DEBUG             FreeProxy Parser of 'http://free-proxy-list.net' with required bandwidth: '150' KBs
2018-02-11 19:38:45,979 root   DEBUG             ProxyForEU Parser of 'http://proxyfor.eu/geo.php' with required bandwidth: '1.0' KBs
2018-02-11 19:38:45,985 root   DEBUG             RebroWeebly Parser of 'http://rebro.weebly.com' with required bandwidth: '150' KBs
2018-02-11 19:38:45,991 root   DEBUG             SemairProxy Parser of 'https://premproxy.com/list/' with required bandwidth: '150' KBs
2018-02-11 19:38:45,994 root   DEBUG    =================================
/anaconda3/lib/python3.6/site-packages/urllib3/connection.py:344: SubjectAltNameWarning: Certificate for free-proxy-list.net has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
2018-02-11 19:38:46,140 urllib3.connection ERROR    Certificate did not match expected hostname: free-proxy-list.net. Certificate: {'subject': ((('commonName', 'sni108365.cloudflaressl.com'),),), 'subjectAltName': []}
2018-02-11 19:38:46,143 http_request_randomizer.requests.parsers.FreeProxyParser ERROR    Provider FreeProxy failed with Unknown error: HTTPSConnectionPool(host='free-proxy-list.net', port=443): Max retries exceeded with url: / (Caused by SSLError(CertificateError("hostname 'free-proxy-list.net' doesn't match 'sni108365.cloudflaressl.com'",),))
2018-02-11 19:38:46,222 http_request_randomizer.requests.parsers.ProxyForEuParser ERROR    Provider ProxyForEU failed with Attribute error: 'NoneType' object has no attribute 'find'
2018-02-11 19:38:46,590 http_request_randomizer.requests.parsers.RebroWeeblyParser WARNING  Proxy Provider url failed: http://rebro.weebly.com
/anaconda3/lib/python3.6/site-packages/urllib3/connection.py:344: SubjectAltNameWarning: Certificate for premproxy.com has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
2018-02-11 19:38:46,741 http_request_randomizer.requests.parsers.SamairProxyParser DEBUG    Pages: set()
Traceback (most recent call last):

  File "<ipython-input-35-fafe89d53dac>", line 1, in <module>
    runfile('/Users/cole/Desktop/webScraping/requestProxy.py', wdir='/Users/cole/Desktop/webScraping')

  File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "/Users/cole/Desktop/webScraping/requestProxy.py", line 18, in <module>
    req_proxy = RequestProxy()

  File "/anaconda3/lib/python3.6/site-packages/http_request_randomizer/requests/proxy/requestProxy.py", line 60, in __init__
    self.current_proxy = self.randomize_proxy()

  File "/anaconda3/lib/python3.6/site-packages/http_request_randomizer/requests/proxy/requestProxy.py", line 77, in randomize_proxy
    raise ProxyListException("list is empty")

ProxyListException: list is empty

[Error] req_proxy = RequestProxy() error

Hi @pgaref ,
I installed your package with sudo pip install http-request-randomizer and want test it with same code as on project README but it's return below error:

Traceback (most recent call last):
  File "myTest.py", line 7, in <module>
    req_proxy = RequestProxy()
  File "/usr/local/lib/python2.7/dist-packages/http_request_randomizer/requests/proxy/requestProxy.py", line 55, in __init__
    self.proxy_list += parsers[i].parse_proxyList()
  File "/usr/local/lib/python2.7/dist-packages/http_request_randomizer/requests/parsers/FreeProxyParser.py", line 29, in parse_proxyList
    headings = [th.get_text() for th in table.find("tr").find_all("th")]
AttributeError: 'NoneType' object has no attribute 'find'

Can you help me?

Add more Proxy Providers

Possible Proxy Lists

Providers that require an API key:

Could also parse related forum thread - blackhatworld:

100-scrapebox-proxies

Is there a way to get the location based ips?

Actually, I want to rotate ips from the United States only is there a way to modify the code to archive this?

Create Agent Provider

Different agents are now loaded from a file inside the main - requestProxy class. Logically-wise the Agent provider class should be in a separate class.

Move project build/testing to Github actions

trace time so proxies may be ordered by response time

When a proxy is checked, if request time is traced and kept, it may be used later to order the list of proxies per response speed, being able to return the fastest proxy in the list in stead of a random one. As checking the whole proxies list takes a while, it could return the fastest proxy as a fastest proxy appears during checks.

I can't tag this as 'enhancement'

Error ProxySchemeUnknown: Not supported proxy scheme None

Hey Pgaref! your work looks great! I am very interested in trying this package as an API Library, I just started by installing it via PIP and trying to snippet that you wrote in Readme, but I have problems about it, below is the code I use

import time
from http_request_randomizer.requests.proxy.requestProxy import RequestProxy

if __name__ == '__main__':

    start = time.time()
    req_proxy = RequestProxy()
    print("Initialization took: {0} sec".format((time.time() - start)))
    print("Size: {0}".format(len(req_proxy.get_proxy_list())))
    print("ALL = {0} ".format(list(map(lambda x: x.get_address(), req_proxy.get_proxy_list()))))

    test_url = 'http://ipv4.icanhazip.com'

    while True:
        start = time.time()
        request = req_proxy.generate_proxied_request(test_url)
        print("Proxied Request Took: {0} sec => Status: {1}".format((time.time() - start), request.__str__()))
        if request is not None:
            print("\t Response: ip={0}".format(u''.join(request.text).encode('utf-8')))
        print("Proxy List Size: {0}".format(len(req_proxy.get_proxy_list())))

        print("-> Going to sleep..")
        time.sleep(10)

And I got these error message

Traceback (most recent call last):
  File "coba.py", line 16, in <module>
    request = req_proxy.generate_proxied_request(test_url)
  File "/Users/imamdigmi/.pyenv/versions/3.7.6/envs/data-scraping/lib/python3.7/site-packages/http_request_randomizer/requests/proxy/requestProxy.py", line 112, in generate_proxied_request
    proxies={"http": self.current_proxy.get_address(), "https": self.current_proxy.get_address()})
  File "/Users/imamdigmi/.pyenv/versions/3.7.6/envs/data-scraping/lib/python3.7/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/imamdigmi/.pyenv/versions/3.7.6/envs/data-scraping/lib/python3.7/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/imamdigmi/.pyenv/versions/3.7.6/envs/data-scraping/lib/python3.7/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/Users/imamdigmi/.pyenv/versions/3.7.6/envs/data-scraping/lib/python3.7/site-packages/requests/adapters.py", line 412, in send
    conn = self.get_connection(request.url, proxies)
  File "/Users/imamdigmi/.pyenv/versions/3.7.6/envs/data-scraping/lib/python3.7/site-packages/requests/adapters.py", line 309, in get_connection
    proxy_manager = self.proxy_manager_for(proxy)
  File "/Users/imamdigmi/.pyenv/versions/3.7.6/envs/data-scraping/lib/python3.7/site-packages/requests/adapters.py", line 199, in proxy_manager_for
    **proxy_kwargs)
  File "/Users/imamdigmi/.pyenv/versions/3.7.6/envs/data-scraping/lib/python3.7/site-packages/urllib3/poolmanager.py", line 492, in proxy_from_url
    return ProxyManager(proxy_url=url, **kw)
  File "/Users/imamdigmi/.pyenv/versions/3.7.6/envs/data-scraping/lib/python3.7/site-packages/urllib3/poolmanager.py", line 429, in __init__
    raise ProxySchemeUnknown(proxy.scheme)
urllib3.exceptions.ProxySchemeUnknown: Not supported proxy scheme None

how can i solve this problem? thank you!

Initial Update

Hi 👊

This is my first visit to this fine repo, but it seems you have been working hard to keep all dependencies updated so far.

Once you have closed this issue, I'll create separate pull requests for every update as soon as I find one.

That's it for now!

Happy merging! 🤖

Installation Issue

Stacktrace:
Windows 10 in PyCharm

(venv) C:\Users\matth\Documents\DeltaAI\InstagramMetadata>pip install http-request-randomizer
Collecting http-request-randomizer
  Using cached https://files.pythonhosted.org/packages/7b/84/ea11a2ccbe215ac200c0e6342245f2db0747ca963f38339219e6df46b546/http_request_randomizer-1.3.2.tar.gz
Requirement already satisfied: beautifulsoup4>=4.9.3 in c:\users\matth\pycharmprojects\math1061\venv\lib\site-packages (from http-request-randomizer) (4.9.3)
Collecting httmock>=1.3.0 (from http-request-randomizer)
  Using cached https://files.pythonhosted.org/packages/ce/99/f950e23335affb58ae116aaf32565258a732b2b570aa961764df2ac0540d/httmock-1.4.0-py3-none-any.whl
Collecting psutil>=5.7.2 (from http-request-randomizer)
  Using cached https://files.pythonhosted.org/packages/e1/b0/7276de53321c12981717490516b7e612364f2cb372ee8901bd4a66a000d7/psutil-5.8.0.tar.gz
Collecting python-dateutil>=2.8.1 (from http-request-randomizer)
  Using cached https://files.pythonhosted.org/packages/d4/70/d60450c3dd48ef87586924207ae8907090de0b306af2bce5d134d78615cb/python_dateutil-2.8.1-py2.py3-none-any.whl
Requirement already satisfied: requests>=2.24.0 in c:\users\matth\pycharmprojects\math1061\venv\lib\site-packages (from http-request-randomizer) (2.25.1)
Collecting pyOpenSSL>=19.1.0 (from http-request-randomizer)
  Using cached https://files.pythonhosted.org/packages/b2/5e/06351ede29fd4899782ad335c2e02f1f862a887c20a3541f17c3fa1a3525/pyOpenSSL-20.0.1-py2.py3-none-any.whl
Collecting fake-useragent>=0.1.11 (from http-request-randomizer)
  Using cached https://files.pythonhosted.org/packages/d1/79/af647635d6968e2deb57a208d309f6069d31cb138066d7e821e575112a80/fake-useragent-0.1.11.tar.gz
Requirement already satisfied: soupsieve>1.2; python_version >= "3.0" in c:\users\matth\pycharmprojects\math1061\venv\lib\site-packages (from beautifulsoup4>=4.9.3->http-request-randomizer) (2.1)
Requirement already satisfied: six>=1.5 in c:\users\matth\pycharmprojects\math1061\venv\lib\site-packages (from python-dateutil>=2.8.1->http-request-randomizer) (1.15.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\matth\pycharmprojects\math1061\venv\lib\site-packages (from requests>=2.24.0->http-request-randomizer) (2020.12.5)
Requirement already satisfied: chardet<5,>=3.0.2 in c:\users\matth\pycharmprojects\math1061\venv\lib\site-packages (from requests>=2.24.0->http-request-randomizer) (4.0.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\matth\pycharmprojects\math1061\venv\lib\site-packages (from requests>=2.24.0->http-request-randomizer) (1.26.3)
Requirement already satisfied: idna<3,>=2.5 in c:\users\matth\pycharmprojects\math1061\venv\lib\site-packages (from requests>=2.24.0->http-request-randomizer) (2.10)
Collecting cryptography>=3.2 (from pyOpenSSL>=19.1.0->http-request-randomizer)
  Using cached https://files.pythonhosted.org/packages/fa/2d/2154d8cb773064570f48ec0b60258a4522490fcb115a6c7c9423482ca993/cryptography-3.4.6.tar.gz
  Installing build dependencies ... error
  Complete output from command C:\Users\matth\PycharmProjects\MATH1061\venv\Scripts\python.exe C:\Users\matth\PycharmProjects\MATH1061\venv\lib\site-packages\pip-19.0.3-py3.8.egg\pip install --ignore-installed --no-user --prefix C:\Users\matth\AppData\Local\Temp\pi
p-build-env-kk4l3ksy\overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- setuptools>=40.6.0 wheel "cffi>=1.12; platform_python_implementation != 'PyPy'" setuptools-rust>=0.11.4:
  Collecting setuptools>=40.6.0
    Using cached https://files.pythonhosted.org/packages/15/0e/255e3d57965f318973e417d5b7034223f1223de500d91b945ddfaef42a37/setuptools-53.0.0-py3-none-any.whl
  Collecting wheel
    Using cached https://files.pythonhosted.org/packages/65/63/39d04c74222770ed1589c0eaba06c05891801219272420b40311cd60c880/wheel-0.36.2-py2.py3-none-any.whl
  Collecting cffi>=1.12
    Using cached https://files.pythonhosted.org/packages/a8/20/025f59f929bbcaa579704f443a438135918484fffaacfaddba776b374563/cffi-1.14.5.tar.gz
      Complete output from command python setup.py egg_info:
      Traceback (most recent call last):
        File "<string>", line 1, in <module>
        File "C:\Users\matth\AppData\Local\Temp\pip-install-mm38gupb\cffi\setup.py", line 127, in <module>
          if sys.platform == 'win32' and uses_msvc():
        File "C:\Users\matth\AppData\Local\Temp\pip-install-mm38gupb\cffi\setup.py", line 105, in uses_msvc
          return config.try_compile('#ifndef _MSC_VER\n#error "not MSVC"\n#endif')
        File "C:\Program Files (x86)\Python38-32\lib\distutils\command\config.py", line 225, in try_compile
          self._compile(body, headers, include_dirs, lang)
        File "C:\Program Files (x86)\Python38-32\lib\distutils\command\config.py", line 132, in _compile
          self.compiler.compile([src], include_dirs=include_dirs)
        File "C:\Program Files (x86)\Python38-32\lib\distutils\_msvccompiler.py", line 360, in compile
          self.initialize()
        File "C:\Program Files (x86)\Python38-32\lib\distutils\_msvccompiler.py", line 253, in initialize
          vc_env = _get_vc_env(plat_spec)
        File "C:\Users\matth\PycharmProjects\MATH1061\venv\lib\site-packages\setuptools-40.8.0-py3.8.egg\setuptools\msvc.py", line 185, in msvc14_get_vc_env
        File "C:\Users\matth\PycharmProjects\MATH1061\venv\lib\site-packages\setuptools-40.8.0-py3.8.egg\setuptools\msvc.py", line 1227, in return_env
        File "C:\Users\matth\PycharmProjects\MATH1061\venv\lib\site-packages\setuptools-40.8.0-py3.8.egg\setuptools\msvc.py", line 876, in VCIncludes
        File "C:\Users\matth\PycharmProjects\MATH1061\venv\lib\site-packages\setuptools-40.8.0-py3.8.egg\setuptools\msvc.py", line 555, in VCInstallDir
      distutils.errors.DistutilsPlatformError: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": https://visualstudio.microsoft.com/downloads/

      ----------------------------------------
  Command "python setup.py egg_info" failed with error code 1 in C:\Users\matth\AppData\Local\Temp\pip-install-mm38gupb\cffi\

  ----------------------------------------
Command "C:\Users\matth\PycharmProjects\MATH1061\venv\Scripts\python.exe C:\Users\matth\PycharmProjects\MATH1061\venv\lib\site-packages\pip-19.0.3-py3.8.egg\pip install --ignore-installed --no-user --prefix C:\Users\matth\AppData\Local\Temp\pip-build-env-kk4l3ksy\o
verlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- setuptools>=40.6.0 wheel "cffi>=1.12; platform_python_implementation != 'PyPy'" setuptools-rust>=0.11.4" failed with error code 1 in None

requests module work slow

I pip install http-request-randomizer then try to use it, but speed was too slow and i decide not to. But then my regular requests by reuqests module get extrimly slow, seems like it is still working through proxi. Then i pip uninstall http-request-randomizer, but it doesnt help, also i try to reinstall requests.
Still requests working so slow. I'm a newbie in python please tell me what to do

Parser Abstraction

Moving a step further, if we want to support numerous proxy providers it would be nice having a Parser Abstraction class and each specific parser to extend it

HTTP Authentication

Extend lib to support Proxy HTTP authentication. Could use: https://github.com/requests/requests-ntlm

How to use with Pandas Datareader?

Would love some input on how to make that work, specifically when using DataReader and the Yahoo Finance API to get stock data. I can make requests for stock data using the DataReader once, and then after that I get an error, until the next day. My code looks simple:

import pandas as pd
import pandas_datareader.data as web
from datetime import datetime
from time import sleep
from http_request_randomizer.requests.proxy.requestProxy import RequestProxy

# list of stock's data to get
stocks = [

'USO',
'FXI',
'EEM'

]

start = datetime(2018, 9, 19)
end = datetime.now()

for i in stocks:
    sleep(5)
    
    try:
        data = web.DataReader(i, 'yahoo', start, end)
        df = data[['Date','Open','High','Low','Close','Adj Close','Volume']]
        # Round some of the results to 2 decimal places, then save .csv file
        df = df.round({"Open":2, "High":2, "Low":2, "Close":2, "Adj Close":2})
        df.to_csv(str(i) + '.csv')
        print('Successfully downloaded ' + str(i))
        continue
    
    except:
        print('Failed to download ' + str(i))
        continue

So how could one integrate the http randomizer into that? I tried playing around with it a bit but couldn't figure it out. Something like replacing the url used in the request with the datareader somehow? If that makes sense?

Extend parsers to support Anonimity level

A first step towards supporting Anonymity levels ( #26 ) would be to extend parsers to use that kind of information where possible.

For example, free-proxy-list.net does expose anonymity levels. We need to:

extend existing parser to read the extra information
introduce a new proxy object, instead of just a plain string, with: address, anonymity level (could be enum), country, etc.

Readme in different languages

Repository can be translated in multiple languages (probably as forks) - the main README file can link to those translations.
@syunhan I would be more than happy to add a link to your fork, please specify the language :) - or just send me pull request!

NameError: global name 'test_url' is not defined

When I do

    >>> import requestProxy
    >>> proxy = requestProxy.RequestProxy()
    >>> proxy.generate_proxied_request("http://www.google.com")

I get the error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "requestProxy.py", line 64, in generate_proxied_request
    request = requests.get(test_url, proxies={"http": rand_proxy},
NameError: global name 'test_url' is not defined

I believe in line 64 of requestProxy.py it should say "url" instead of "test_url":

https://github.com/pgaref/HTTP_Request_Randomizer/blob/master/project/http/requests/proxy/requestProxy.py#L64

Error whilst checking proxy breaks connection with RST

`Next proxy: http://41.231.120.118:8888
Traceback (most recent call last):
File "requestProxy.py", line 192, in
request = req_proxy.generate_proxied_request(test_url)
File "requestProxy.py", line 169, in generate_proxied_request
headers=req_headers, timeout=req_timeout)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 69, in get
return request('get', url, params=params, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 50, in request
response = session.request(method=method, url=url, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 465, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 605, in send
r.content
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 750, in content
self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 673, in generate
for chunk in self.raw.stream(chunk_size, decode_content=True):
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/response.py", line 307, in stream
data = self.read(amt=amt, decode_content=decode_content)
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/response.py", line 243, in read
data = self._fp.read(amt)
File "/usr/lib/python2.7/httplib.py", line 573, in read
s = self.fp.read(amt)
File "/usr/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
socket.error: [Errno 104] Connection reset by peer

proxy_reset.zip
`

Command Line Arguments

Allow users to run the request randomizer from command line with configurable:

~~proxy providers (whitelist, blacklist)~~
output-format (json, txt)
~~timeout (where supported)~~

We could use argparse for that purpose. Might also be useful to include:

anonymity-levels=(all, transparent, anonymous, elite)
protocols=(http, https, sock4, sock5)
countries=(us,ca, etc.)

Add protocol filter as command line argument

#47 introduced Proxy filtering as an optional parameter in the RequestProxy constructor.
It would be great to extend #5 to support that!

Can't install to python 3.6

When I try with pip, error is:
Could not find a version that satisfies the requirement HTTP_Request_Randomizer-master (from versions: )
No matching distribution found for HTTP_Request_Randomizer-master

When I try as 'python setup.py install', error is:
zip_safe flag not set; analyzing archive contents...

Installed c:\users\anike\http_request_randomizer-master.eggs\pytest_runner-3.0-py3.6.egg
Searching for setuptools-scm
Reading https://pypi.python.org/simple/setuptools-scm/
Downloading https://pypi.python.org/packages/40/fa/d4fff7846c36909846c2148990cdb01e77e62d59ee4d19ca872a60844475/setuptools_scm-1.15.7-py3.6.egg#md5=77134e6f8cfebd07c7e4fd2b7c775b7a
Best match: setuptools-scm 1.15.7
Processing setuptools_scm-1.15.7-py3.6.egg
Moving setuptools_scm-1.15.7-py3.6.egg to c:\users\anike\http_request_randomizer-master.eggs

Installed c:\users\anike\http_request_randomizer-master.eggs\setuptools_scm-1.15.7-py3.6.egg
Traceback (most recent call last):
File "setup.py", line 98, in
'Topic :: Software Development :: Libraries :: Python Modules',
File "C:\Users\anike\Anaconda3\lib\distutils\core.py", line 108, in setup
setup_distribution = dist = klass(attrs)
File "C:\Users\anike\Anaconda3\lib\site-packages\setuptools\dist.py", line 338, in init
Distribution.init(self, attrs)
File "C:\Users\anike\Anaconda3\lib\distutils\dist.py", line 281, in init
self.finalize_options()
File "C:\Users\anike\Anaconda3\lib\site-packages\setuptools\dist.py", line 471, in finalize_options
ep.load()(self, ep.name, value)
File "c:\users\anike\http_request_randomizer-master.eggs\setuptools_scm-1.15.7-py3.6.egg\setuptools_scm\integration.py", line 22, in version_keyword
File "c:\users\anike\http_request_randomizer-master.eggs\setuptools_scm-1.15.7-py3.6.egg\setuptools_scm_init.py", line 119, in get_version
File "c:\users\anike\http_request_randomizer-master.eggs\setuptools_scm-1.15.7-py3.6.egg\setuptools_scm_init.py", line 97, in _do_parse
LookupError: setuptools-scm was unable to detect version for 'C:\Users\anike\HTTP_Request_Randomizer-master'.

Make sure you're either building from a fully intact git repository or PyPI tarballs. Most other sources (such as GitHub's tarballs, a git checkout without the .git folder) don't contain the necessary metadata and will not work.

For example, if you're using pip, instead of https://github.com/user/proj/archive/master.zip use git+https://github.com/user/proj.git#egg=proj

Help needed,
Thanks!

Integate with Travis

Blocked by #12

doen't work for facebook urls

Thanks for the very useful package, however, it doesn't work for Facebook urls (the public groups), such as

https://www.facebook.com/groups/1166154663436469/permalink/4206825836035988

I get this error : check_hostname requires server_hostname

How to use the same proxy for multiple URL requests?

Not really an issue, but would love some input as I can't seem to figure out how to make it work.

My code (pseudo) looks something like this:

    req_proxy = RequestProxy()

    url_list = [www.example1.com, www.example2.com, www.example3.com, www.example4.com]

    for url in url_list:

        while True:

            request = req_proxy.generate_proxied_request(url)

            if request is not None:

                (THE REST OF MY CODE IS HERE ONCE WE GET A GOOD RESPONDING PROXY)

                break # Break out of While loop if we got a good response
            continue # Move on to the next url in the for loop

I'm wondering if there's a way to use the same proxy for say, 2 items in my url_list before trying to request another one. In my main application I have a long list of url's and would like to re-use some good responding proxies for multiple url's before making a new request. How could I go about structuring this? Thanks a lot! (Or if there's documentation that I missed, point me in the right direction. Thanks!)