carsonyl / pypac Goto Github PK

View Code? Open in Web Editor NEW

71.0 71.0 18.0 801 KB

Find and use proxy auto-config (PAC) files with Python and Requests.

Home Page: https://pypac.readthedocs.io

License: Apache License 2.0

Python 100.00%

proxy-pac python-library python-requests

pypac's People

Contributors

Stargazers

Watchers

Forkers

sdementen schlomo lkleinnux maximinus invgate aslafy-z seyfsv alexrohvarger pombredanne yogeshlele zanachka actlaboratory liuxhit mpkuth duydo vargenau karelchanivecky

pypac's Issues

When get_pac returns None, there is no way to tell if there is no PAC, or if error occurred

In download_pac, when an error occurs during an HTTP request, the error is ignored, and the next URL is attempted. At the end, if all URLs failed with error, the function returns None. It can be implied from this function that an error has occurred. If the list of URLs contained at least one URL, it should be expected that a response is available. However, it cannot be implied from get_pac, as perhaps PAC is not configured at all. Hence, either:

get_pac raises an error if PAC is not configured (This way the user can imply whether to expect a PAC)
get_pac evaluates if a PAC script was expected and raises an error otherwise
download_pac collects all the errors that occured, and chooses one to raise
download_pac collects all the errors and throws a new error type that contains the list of errors that occurred. (Leave it up to the user to figure out what error he cares about)

pypac + js2py uses a lot of memory

I am working on Px, a proxy server that does NTLM/Kerberos authentication and I recently added pypac to support PAC files.

Before pypac, Px was 6MB executable with PyInstaller and using around 10MB of RAM per process. After pypac, it's at 14MB and using 140MB of RAM per process. This is all on Windows.

I know pypac in itself only uses around 10-20MB of RAM but the dependency on js2py is 120MB worth. And this adds up quickly since Px can run in multiple processes to improve scale.

Long story short, at least on Windows, there's JScript which is free and available to execute the JS functions so depending on js2py seems relatively expensive. Using JScript along with PAC JS code from here, you have a simpler mechanism to process the PAC file.

No doubt, it might be slower (I plan on testing it) to do the check for each URL since you need to spawn a cscript.exe process for each call but RAM and size wise, it might be worth it.

I'm curious what your thoughts are on this matter.

Is my PACSession setting the wrong proxy string?

I created a PACSession with a pac url but it seems to be incorrectly setting the proxy as "PROXY proxyIP:port" instead of just using "proxyIP:port".

I've created 2 test cases to illustrate my problem:

from pypac import PACSession, get_pac, download_pac
import requests

url = "https://www.turbotax.intuit.com"
pac_url="https://www.somepacurl.com/pac"
pac = get_pac(url=pac_url, allowed_content_types=['text/plain])

"""
This works
"""
proxy = pac.find_proxy_for_url("https://www.turbotax.intuit.com", "https://www.turbotax.intuit.com")
proxy = proxy.split(" ")[1]

proxies = { 'https': proxy, 'http': proxy }


s = requests.Session()
s.proxies = proxies

r = s.get(url)
print(r.text)



"""
This doesn't work
"""
s = PACSession(pac)
r = s.get(url)
print(r.text)

Make the script validity check optional

I just realized the check I uncommented in #68 breaks my application in more ways. For some reason the same PAC script, if subjected to the check, takes exceedingly long time to complete, crippling performance. I suppose it's not the problem in pypac rather in the script, but given it appears it works OK on real URLs, so I'd like to make the validation optional.

Loading PAC from file

My VPN adds an AutoConfigURL to the registry with the file:/// format. Since os.path.isfile() doesn't recognize this as a file, pypac fails in api.py and doesn't load the PAC file.

It will be great if pypac could identify such file:/// URLs and still load them.

Further, my VPN's PAC path is file:///Users\username... so it even skips the drive letter on Windows so file:/// should be replaced with \.

Drop Python 3.4 support

Assuming there are no objections. Travis is currently failing with Python 3.4 due to some issue with requirements / pipenv.

Connect to web URL

Hi,

I need to use Google API to get json info file and I must use proxy in order to connect in internet. On my computer I use .pac file to connect in internet as describre here.

The script I used it is :

from urllib2 import urlopen
import json

def get_jsonparsed_data(url):
    response = urlopen(url)
    data = response.read().decode("utf-8")
    return json.loads(data)

url = ("http://maps.googleapis.com/maps/api/geocode/json?"
       "address=googleplex&sensor=false")
print(get_jsonparsed_data(url))

When we want use proxy with urllib.open method the script syntax is :

urllib.urlopen(url[, data[, proxies[, context]]])

as desribe in the documentation.

How can insert pypac into my script in order to use my proxy (pac file) ?

Thanks a lot for your help.

Pypac is not working on Mac without installing install certificates.command and update shell profile.command

Hi Carsonyl,

        from pypac import PACSession
        s = PACSession()
        r = s.request('GET', url, None, stream=True, verify=self.get_verification_condition(event_data),
                    cookies=cookies)

Above is the piece of code I am running through my .pkg file while building an app using py2app.
I am getting some errors like SSL verify failed even verify is false for Mac OS.
If I run the install certificates.command and update shell profile.command then it works fine but in every system, there is a dependency of python for running those files.

Thanks in advance,
Shashanka

no useful pac function to use in the pypac, hope one day can come true the function

fail first time
I try to let itself to find system's proxy setting, while I really set proxy and can visit the goal site, it can't work
fail second time
I try to set the pac url in the code with the get_pac(), I dont know reason, the function result is None ,haha
fail third time
I try to download the pac file, read the pac file content to PACSession, yes, here is the Exception about follow picture.....really bad using tring, I quit haha

Multiple Proxies in Requests with Various Redirect Chains

If there is a complicated redirect chain (e.g. HTTP -> HTTPS -> DIRECT ) PyPAC will not appropriately set the proxy server for each URL.

The following PoC provides an example fix:
https://gist.github.com/brad-anton/4e27b15df76e6eb2390d2b4c4e7e930d

Allow setting the client ip address

In order to test our proxy pac, we would like to be able to set the ip address that myIpAddress returns.

Would it be possible to have a config key that overwrites this value?

Thanks!

how to get past pacs at workplace in anaconda(spyder)

Hi:

I added the pypac module, and ran the following in ipython console of anaconda(spyder):

from pypac import PACSession

session = PACSession()
session.get('http://example.org')
Out[554]: <Response [407]>

What do I do next if i need to "pip install implicit"?

Still not clear how to go about downloading modules in anaconda by getting past pacs. Normally when i am websurfing i always have to enter user and password for proxy.

Is pypac the right way?

Thank you so much for your help. This is very frustating! :-)

Best regards,
Sagar

tld 0.11+ is not python 2.7 compatible

https://pypi.org/project/tld/ dropped support for Python 2.7 without bumping major version - I think pypac should act accordingly.

I would gladly submit a pull request to fix that - but I'm not sure what's the preferred way to fix this:

lock tld version to below 0.11
drop support for python 2.7

dnsDomainIs behavior

Hello, after several testing I found that dnsDomainIs is not following the common behavior.

For now it works as follow:
def dnsDomainIs(host, domain):
if domain.startswith('.'):
domain = '*' + domain
return shExpMatch(host, domain)

Therefor it means that when you have case like this one:
host = "subdomain.domain.com"
domain = "domain.com" (note the missing '.' in front of the domain)

In this case def dnsDomainIs doesn't match. However when trying this case in multiple browser (chrome, firefox, ie) it does match.

It makes me think that it this would be a lot more accurate with this kind of implementation:

def dnsDomainIs(host, domain):
return host.endswith(domain)

Let me know what you think about it.
Best regards,

Is it possible to reduce the number of dependencies?

Hi Carson,

Thanks for this library! This is great. It does mostly what we need it to do.

There's only one request here. Would it be possible to make some of the dependencies optional? We're trying to use it in the context of an async application, and I'm trying to limit the number of dependencies (because of supply chain attack surface, follow-up on security incidents, etc...) Given that we've an async application, there is zero need to have things like requests or requests-file in our dependency tree. I think tldextract is also not needed.

We can fetch the pacfile from a URL using httpx. The only thing we'd like to use pypac for is to parse the pacfile. So, pure I/O work. Having to add 5 additional dependencies to our dependency tree feels like overkill for resolving a proxy URL.

Is that anything you would consider? If you'd like, maybe I can find somebody to prepare a PR.

Pypac module is not working in Mac OS with cx_freeze

I am creating PACSession from pypac module and calling request method from PACSession object it not giving any response back when I am creating excutable using cx_freeze module.Same code works fine from python console.

I tried debugging the inbuilt files then I came to know that there is some issue with this method autoconfig_url_from_preferences() in this line config=SystemConfiguration.SCDynamicStoreCopyProxies(None)
which is in /pypac/os_settings.py . I tried it running as a subprocess it works fine but subprocess returns output as bytes or string but I want output as response object. Below is my code snippet:

try:
        from pypac import PACSession
        s = PACSession()
        get_logger().debug("Getting Internet details from PACSession")
        r = s.request('GET', url, None, stream=True, verify=self.get_verification_condition(event_data),
                  cookies=cookies)
    except requests.exceptions.ProxyError as e:
        get_logger().debug(e)
        get_logger().debug("proxy failed trying direct connection")

I expect the output as response object but actually it is not returning anything i.e, it does not show error or exception.

import error observed when "let" keyword is used to create a global variable

pac file contents has "let" keyword declaring an integer variable.

let test_pac = 0;

Traceback (most recent call last):
File "/Users/shaahidahams/Desktop/automation/gvenv/lib/python3.8/site-packages/pypac/parser.py", line 60, in init
self._context.evaljs(pac_js)
File "/Users/shaahidahams/Desktop/automation/gvenv/lib/python3.8/site-packages/dukpy/evaljs.py", line 57, in evaljs
res = _dukpy.eval_string(self, jscode, jsvars)
_dukpy.JSRuntimeError: SyntaxError: unterminated statement (line 1)
at [anon] (eval:1) internal
at [anon] (duk_js_compiler.c:6826) internal
During handling of the above exception, another exception occurred:

Stack overflow when parsing (big) pac file

Hi,

I opened an issue on pyjsparser right here: PiotrDabkowski/pyjsparser#16
I originally stumbled on this when using pypac. I just want to notify you of this.

If there is no progress in pyjsparser considering this problem, then I would probably reconsider the usage of that package.

Rgds
Nils

Question about Windows PAC file

Hello,

I was testing the PAC handler on Windows. When retrieving the file from the registry and it is a file://, this will not work: file://e:\\proxy.pac. To make it work, we must convert to file://e:/proxy.pac.

My question is: is it always in the good format in the registry? In your tests, you are not testing this form. Note that there is no issue here, I was just asking to prevent any regression in our product using your module :)

Version 0.10.1 has a memory leak

This is due to the new dukpy library.

An alternate dukpy fixes this, the merger request is here: #31

pypac with google analytics?

Thank you for pypac.
I have used it successfully to make api requests through my pac
This is the code I used:

def proxyconnect():
from pypac import PACSession, get_pac
return PACSession(get_pac(url='the url of my pac'))

then

def myfunction(variable):
session = proxyconnect()
api = 'the api url'
return session.get(api + variable).json()

This was basically from your usage examples and I found it very helpful for the particular api I needed to get data from

But, I'm also trying to query google analytics. My code is fine when not needing to go through my pac

I'm wondering if you are familiar with the python code provided by google analytics api v4 and if pypac can be made to work through it? I am not so skilled to know what to do.

this is the code I have working (when not needing to go through the proxy)

from oauth2client.service_account import ServiceAccountCredentials
from googleapiclient.discovery import build

def initialize_analyticsreporting():
credentials = ServiceAccountCredentials.from_json_keyfile_name(
KEY_FILE_LOCATION, SCOPES)
return build('analyticsreporting', 'v4', credentials = credentials)

def get_report(analytics, query):
return analytics.reports().batchGet(body=query).execute()

I'm really stuck. Is it possible to set python (I'm using Spyder through anaconda) to always go through a pac when trying to access the internet? (this pac has no user login or password)
Sorry if these questions are too ignorant.

Support PAC served without content type header

My PAC file is served without a Content-Type header. pypac checks for this header and doesn't accept the PAC file if the header is missing.

Support keywords: HTTP, HTTPS, SOCKS4, SOCKS5

Currently, the following keywords are recognized from the PAC: DIRECT PROXY, SOCKS.

Mozilla's PAC file doc says Firefox supports more keywords: HTTP, HTTPS, SOCKS4, SOCKS5.

Proposed interpretation:

HTTP host:port -> http://host:port (synonym for PROXY)
HTTPS host:port -> https://host:port
SOCKS host:port -> socks4://host:port
SOCKS5 host:port -> socks5://host:port

The SOCKS keyword is already assumed by default to be socks5://.

can't get the pac

Hi, could you please help me to understand what I am doing wrong in my code?
I can't understand why the pac is not get.
I'm new in Python and have some trouble in understanding of the exceptions mechanics.
Trying to guess how to check an exception, I used these variants:

try:
  _pac = get_pac(url = 'https://antizapret.prostovpn.org/proxy.pac')
except MalformedPacError as e:
  print(e.msg)
except Exception as e:
  print(e.__class__)
except:
  print('Unknown exception')
finally:
  if not _pac:
    print('pac is null')
  else:
    print('ok!')

I have none of exceptions to occur, but i get the pac is null message as a result.
Is the problem in the specified pac-file code (it's pretty complicated)? But then PyPAC should tell me about it.
And I wonder if I ever do have to check exceptions at all, aren't they to be shown automatically by the py compiler with the default settings? I use to get a lot of exceptions in my other py exercises.
I'm confused and feel stupid.

Sorry if its a lame question and for my bad english also.

PS: Using python 3.7.3 in Windows if it matters.

consider that TLDExtract cache will be used by default when evaluating WPAD

TLDExtract performs an HTTP query to fetch valid top level domains. This is fine, except that this library will be mostly run within the context of a domain where proxy is enforced.

Enterprises that enforce proxying, are also likely to block requests that are not dispatched per policy. For this reason, it doesn't make sense to dispatch an HTTP request with the purpose of evaluating the proxy, as the proxy URL is more likely than not to be needed to dispatch such request.

For this reason, it should be considered that the base case for the library is that TLDExtract will not be able to dispatch this request and that it will fallback to the file with the TLDs.

Hence this library should:

not use TLD extract
already include a copy of the TLD file that TLDExtract is fetching and direct the cache_dir to the containing directory
Contain a function for setting the cache_dir. One can just set an env var for this, but for the sake of convenience and to provide a point where this is guaranteed to be set before using(basically initializing the lib).
Make it explicit in the documentation that TLDEXTRACT_CACHE, XDG_CACHE_HOME, HOME, or python-tldextract environment variables will be used to evaluate the cache dir, and falling back to the directory containing the TLDExtract package itself

Some of the options used by TLDExtract are not bad at all, however, they are not able to accommodate all cases. For example, within a pyinstaller executable, in which the package directory itself will be the location where the executable is located. In such cases where the application is being distributed on scale, the application may choose to contain a specific directory for such uses. Thus, applying one of the recommendations would be meaningful, and avoid the implementer a deep-dive into foreign code.

ProxyResolver can't find the correct proxy for URL with port number

I think there is a bug in https://github.com/carsonyl/pypac/blob/master/pypac/resolver.py#L51 and https://github.com/carsonyl/pypac/blob/master/pypac/resolver.py#L55:

https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlparse is used to parse the URL. Then the .netloc part is used as host when calling FindProxyForURL to determine the proxy.

BUT the .netloc part also contains the : and the port number and not just the pure host name.

In contrast to that http://findproxyforurl.com/netscape-documentation/ states:

host
the hostname extracted from the URL. This is only for convenience, it is the exact same string as between :// and the first : or / after that. The port number is not included in this parameter. It can be extracted from the URL when necessary.

So splitting before the : would help to resolve the correct host if there is a port number in the URL.

replace `tld` requirement which is under GPL2 license

Hi folks.

The library requires tld for the sole purpose of parsing the tld out of the host in one place in the code.
It makes using pypac a bit harder for commercial usage. There is a much more popular library called tldextract which is distributed under BSD-3clause license - which is much more permissive. I'd be happy to open a PR replacing tld with tldextract.

https://github.com/john-kurkowski/tldextract

AttributeError: 'str' object has no attribute 'find_proxy_for_url'

Running v0.1.0 downloaded from PyPI:

In [6]: session = PACSession('http://proxy.dataeng.mycompany.net/user/dataeng/proxy.pac')
In [7]: session.get("http://ip-100-74-44-105.ec2.internal:20888/proxy/application_1478638756790_1206426/")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-7-586997e4bdf7> in <module>()
----> 1 session.get("http://ip-100-74-44-105.ec2.internal:20888/proxy/application_1478638756790_1206426/")

/usr/local/lib/python2.7/site-packages/requests/sessions.pyc in get(self, url, **kwargs)
    485
    486         kwargs.setdefault('allow_redirects', True)
--> 487         return self.request('GET', url, **kwargs)
    488
    489     def options(self, url, **kwargs):

/usr/local/lib/python2.7/site-packages/pypac/api.pyc in request(self, method, url, proxies, **kwargs)
    155
    156         if using_pac:
--> 157             proxies = self._proxy_resolver.get_proxy_for_requests(url)
    158
    159         while True:

/usr/local/lib/python2.7/site-packages/pypac/resolver.pyc in get_proxy_for_requests(self, url)
     72             and 'DIRECT' is not configured as a fallback.
     73         """
---> 74         proxy = self.get_proxy(url)
     75         if not proxy:
     76             raise ProxyConfigExhaustedError(url)

/usr/local/lib/python2.7/site-packages/pypac/resolver.pyc in get_proxy(self, url)
     57         :rtype: str|None
     58         """
---> 59         proxies = self.get_proxies(url)
     60         for proxy in proxies:
     61             if proxy == 'DIRECT' or proxy not in self._offline_proxies:

/usr/local/lib/python2.7/site-packages/pypac/resolver.pyc in get_proxies(self, url)
     35         :rtype: list[str]
     36         """
---> 37         value_from_js_func = self.pac.find_proxy_for_url(url, urlparse(url).netloc)
     38         if value_from_js_func in self._cache:
     39             return self._cache[value_from_js_func]

AttributeError: 'str' object has no attribute 'find_proxy_for_url'

For reference, the PAC file contains:

function regExpMatch(url, pattern) {
    try { return new RegExp(pattern,"i").test(url); } catch(ex) { return false; }
}

function FindProxyForURL(url, host) {
  if (regExpMatch(host, "^h2td\.master\.dataeng\.mycompany\.net$") ||
      regExpMatch(host, "^bdp_h2td_[^\.]+\.master\.dataeng\.mycompany\.net$") ||
      regExpMatch(host, "^100\.74\.([0-9]|[0-9][0-9]|1[01][0-9]|12[0-7])\.[0-9]+$") ||
      regExpMatch(host, "^ip-100-74-([0-9]|[0-9][0-9]|1[01][0-9]|12[0-7])-[0-9]+(\.ec2\.internal)?$")
     ) {
    return "SOCKS5 proxy.dataeng.mycompany.net:7778";
  }

  if (regExpMatch(host, "^10\.20[0-2]\.[0-9]+\.[0-9]+$") ||
      regExpMatch(host, "^ip-10-20[0-2]-[0-9]+-[0-9]+(\.ec2\.internal)?$") ||
      regExpMatch(host, "^100\.(6[4-9]|[7-9][0-9]|1[01][0-9]|12[0-7])\.[0-9]+\.[0-9]+$") ||
      regExpMatch(host, "^ip-100-(6[4-9]|[7-9][0-9]|1[01][0-9]|12[0-7])-[0-9]+-[0-9]+(\.ec2\.internal)?$")
     ) {
    return "DIRECT";
  }

  if (regExpMatch(host, "^.*\.master\.dataeng\.mycompany\.net$") ||
      regExpMatch(host, "^.*\.internal$") ||
      regExpMatch(host, "^10\.[0-9]+\.[0-9]+\.[0-9]+$") ||
      regExpMatch(host, "^ip-10-[0-9]+-[0-9]+-[0-9]+$")
     ) {
    return "SOCKS5 proxy.dataeng.mycompany.net:7777";
  }

  return "DIRECT";
}

Doesn't handle redirects

pypac doesn't handle redirects in PAC server response. Additionally, it should consider both the original URL's response and the target URL's Content-Type for validation. I've discovered certain servers serve them with different content types, so if at least one response's Content-Type satisfies the check, it should be considered valid.

Using the same proxy file for parser_functions

Hi, thank you for creating pypac. I'm not certain if below use case was addressed before and I was not successful in finding this. Please help.

After loading the file I was able to use find_proxy_for_url to test the hosts and called parser_functions as an individual for testing. But how can I use the same file to run the parser_functions so I can pass the new URL/Host to the existing file "f" for testing? something like below:

and yes, I used find_proxy_for_url it returns the proxy redirect. which doesn't help in my use case.

proxy.pac
if(shExpMatch(host, *.abc)) return proxy_general;

with open('proxy.pac') as f:
pac = PACFile(f.read())
session = PACSession(pac)

shExp = parser_functions.shExpMatch(bc, pac )
or
shExp = parser_functions.shExpMatch (bc, f)

this returns me below error, but how can I achieve this?

shExpMatch
return fnmatch(host.lower(), pattern.lower())
AttributeError: 'PACFile' object has no attribute 'lower'

Possible duktape error when dnsResolve fails

The current implementation of dnsResolve() returns None if the host cannot be resolved.

Because we now use dukpy, that value is potentially propagated back to the js engine if we depend on the result. For example, code such as:

isInNet(dnsResolve('bad-host', "10.1.1.0", "255.255.255.0"));

Will result in a duktape crash.

dnsDomainIs matching not correct

It seems that the pattern matching done by dnsDomainIs (in file parser_functions.py) is not correct.

For example, dsnDomainIs('www.example.com', 'example.com') returns False while it should return True.

"TypeError: putenv() argument 2 must be string, not None" Error when using pac_context_for_url

pypac version: 0.9.0 (retrieved from pip)
Python version: Python 2.7.15 (v2.7.15:ca079a3ea3, Apr 30 2018, 16:30:26) [MSC v.1500 64 bit (AMD64)] on win32

When the configured pac file returns "DIRECT" for a URL then the function will fail with the following error:

  File "file.py", line 27, in get_job_info
    with pac_context_for_url('http://' + self.host):
  File "C:\Python27\lib\contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "C:\Python27\lib\site-packages\pypac\api.py", line 305, in pac_context_for_url
    os.environ['HTTP_PROXY'] = proxies.get('http')
  File "C:\Python27\lib\os.py", line 422, in __setitem__
    putenv(key, item)
TypeError: putenv() argument 2 must be string, not None

In this case the pac file will return "DIRECT" for the url - the code gets to line 303
proxies = resolver.get_proxy_for_requests(url)
which will result in the code going to pypac/resolver.py:133 and the function proxy_parameter_for_requests("DIRECT") will be called. This will then set proxy_url_or_direct to None which will be used for the returns values in the dictionary for 'http' and 'https'. These value of None are used in

os.environ['HTTP_PROXY'] = proxies.get('http')
os.environ['HTTPS_PROXY'] = proxies.get('https')

Resulting in the error putenv() argument 2 must be string, not None

AttributeError: module 'js2py.translators' has no attribute 'pyjsparser'

I installed pypac using pip on a fresh Python 3.6.0 install and I get the following error when trying to run the three lines sample:

Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 07:18:10) [MSC v.1900 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.`
>>> from pypac import PACSession
>>> session = PACSession()
>>> session.get('http://www.google.com')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\jfroy3\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\sessions.py", line 501, in get
    return self.request('GET', url, **kwargs)
  File "C:\Users\jfroy3\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pypac\api.py", line 150, in request
    self.get_pac()
  File "C:\Users\jfroy3\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pypac\api.py", line 215, in get_pac
    pac = get_pac()
  File "C:\Users\jfroy3\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pypac\api.py", line 41, in get_pac
    return PACFile(downloaded_pac)
  File "C:\Users\jfroy3\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pypac\parser.py", line 20, in __init__
    orig_pyimport_meth = js2py.translators.pyjsparser.PyJsParser.parsePyimportStatement
AttributeError: module 'js2py.translators' has no attribute 'pyjsparser'

I'm not sure why and what the correct fix is, but I managed to patch it by removing 'pyjsparser.' from parser.py on lines 20, 21 and 33, like this:

20:        orig_pyimport_meth = js2py.translators.PyJsParser.parsePyimportStatement
21:        js2py.translators.PyJsParser.parsePyimportStatement = _raise_pyimport_error
...
33:            js2py.translators.PyJsParser.parsePyimportStatement = orig_pyimport_meth

Cannot build on Linux (tested on Ubuntu 16.04)

Cannot install a built wheel on Ubuntu (tested with Ubuntu 16.04/ pypac 0.10.1 / Python 3.5.4)

git clone https://github.com/carsonyl/pypac.git
cd pypac
python setup.py bdist_wheel
cd dist
pip install pypac-0.10.1-py2-py3-none-any.whl

FileNotFoundError: [Errno 2] No such file or directory: '/System/Library/CoreServices/SystemVersion.plist'

This is probably related to the line

pyobjc-framework-SystemConfiguration >= 3.2.1; sys.platform=="darwin"

in setup.py; but changing to 'linux' and rebuilding gives the same error.

Use a cache for domain lookup

Right now, only the proxy parsing gets cached:

There should be a mechanism to not have to evaluate JS (costly) multiple time for the same domain name. I've been profiling my app which does a lot of requests, and this is a bottleneck.

missing self.trust_env = False in PACSession

When environment variables HTTPS_PROXY or HTTP_PROXY are defined, and the PACSession detects that no proxy should be used, self.proxies is None and as self.trust_env is not set to False, the proxies defined in the env variables are used instead of no proxy at all.
Adding self.trust_env = False in the PACSession.init fixes the issue.

PyPAC does not work with file url that uses IP and does not use domain

Hello, I work in a company where the security team uses IP in the url of the proxy file instead of the domain, the justification for it is that it becomes more difficult to track the domain that is best known.

In this step above I came across a problem, when the pypac searches for the domain and does not find it, generating an error, with this I will prepare a workaround solution to be used when you have the name of the url in IP, however it would be good to embed this in my own lib, I am creating some libs to work safely in python and I would like to leave here my contribution to pypac, remembering that I will be able to use this function in other projects of mine on github.

this causes an error:
pac = get_pac (url = 'http: //10.1.1.1/fileName.pac')

this works:
pac = get_pac (url = 'http: //domain.com/fileName.pac')

Workaround for the error:

url_domain_pac = exchange_ip_by_domain ('http://10.1.1.1/fileName.pac')
pac = get_pac (url = url_domain_pac)

Function of the solution described below:
def exchange_ip_by_domain (proxy_url):
import socket
import re
"" "
Function that receives the wpac url with ip and
after dns search translates to hostname with
domain
"" "
proxy_ip = None
pattern1 = re.compile ("(http | https): // (. *?) + /")
match = pattern1.match (proxy_url)
url_ip = match.group ()
pattern2 = re.compile (r '(\ d {1,3} . \ d {1,3} . \ d {1,3} . \ d {1,3})')
try:
proxy_ip = pattern2.search (url_ip) [0]
except (TypeError, AttributeError):
proxy_ip = None
proxy_name = None
try:
data = socket.gethostbyaddr (proxy_ip)
proxy_name = str (data [0])
except (socket.gaierror, socket.herror):
proxy_name = None
#print ("host_name", proxy_name)
if proxy_ip is not None and proxy_name is not None:
proxy_url = proxy_url.replace (proxy_ip, proxy_name)
return proxy_url

multiple source ip addresss test

Hi,
Can you tell how can i test it with multiple source ip address . Something i tried like

from requests_toolbelt.adapters import source
session = PACSession(pac)
list_source_ip_and_url = [("10.129.xx.yy", "https://google.com"), ("10.189.yy.xx", "https://random.com/Surveyor.Web/")]
responses = []
try:
    for source_ip, url in list_source_ip_and_url:
        new_source = source.SourceAddressAdapter(source_ip)
        session.mount('http://', new_source)
        session.mount('https://', new_source)
        responses.append(session.get(url))
    print(responses)
except Exception as e:
    print(e)

I am getting error like:
HTTPSConnectionPool(host='google.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x033D5547>: Failed to establish a new connection: [WinError 10049] The requested address is not valid in its context'))
Looks like im hitting some requests module error.

IndentationError: too many levels of indentation

Hi,

I just installed pypac with pip and replaced my Session by a PACSession, and I get :

Traceback (most recent call last):
  File "D:\TimecardAutomation\absReaderWitPAC.py", line 41, in <module>
    login_page = s.get(login_url)
  File "C:\Python27\lib\site-packages\requests\sessions.py", line 521, in get
    return self.request('GET', url, **kwargs)
  File "C:\Python27\lib\site-packages\pypac\api.py", line 184, in request
    self.get_pac()
  File "C:\Python27\lib\site-packages\pypac\api.py", line 249, in get_pac
    pac = get_pac(recursion_limit=self._recursion_limit)
  File "C:\Python27\lib\site-packages\pypac\api.py", line 57, in get_pac
    return PACFile(downloaded_pac, **kwargs)
  File "C:\Python27\lib\site-packages\pypac\parser.py", line 60, in __init__
    context.execute(pac_js)
  File "C:\Python27\lib\site-packages\js2py\evaljs.py", line 176, in execute
    compiled = cache[hashkey] = compile(code, '<EvalJS snippet>', 'exec')
  File "<EvalJS snippet>", line 221
    def PyJs_LONG_87_(var=var):
                              ^
IndentationError: too many levels of indentation

code is the following :

[from pypac import PACSession
from bs4 import BeautifulSoup
import os

username_field_name = "ctl00$ContentPlaceHolder1$Login1$UserName"
password_field_name = "ctl00$ContentPlaceHolder1$Login1$Password"
form_action_name = "***"
username = "***"
password = "***"

login_url = "***"
abscence_history_url = "***"

payload = {
    username_field_name: username,
    password_field_name: password,
    "ctl00$ContentPlaceHolder1$HiddenUrlPage" : "***",
    "ctl00$ContentPlaceHolder1$Login1$LoginButton" : "***",
    "__EVENTTARGET" : "",
    "__EVENTARGUMENT" : ""
}

with PACSession() as s:
login_page = s.get(login_url)
    print "getting page 1"
    login_page = s.get(login_url
    login_soup = BeautifulSoup(login_page.content)
    payload["__VIEWSTATE"] = login_soup.select_one("#__VIEWSTATE")["value"]
    payload["__VIEWSTATEGENERATOR"] = login_soup.select_one("#__VIEWSTATEGENERATOR")["value"]
    first_rep = s.post(login_url, data=payload)
    response = s.get(abscence_history_url)](url)

Doesn't accept `https://antizapret.prostovpn.org/proxy.pac` as valid script

It raises _dukpy.JSRuntimeError: EvalError: Error while calling Python Function: TypeError('inet_aton() argument 1 must be str, not bool') when testing self.find_proxy_for_url("/", "0.0.0.0"). Removing the check locally allows the file to be correctly consumed by my code. Note that my other issue needs to be resolved before you could use the URL as is, or manually follow the redirect and use the end URL when working with the PAC file.