This URL parses correctly, however the redirect leads to a bogus URL --- (nice work Patch.com :-)
Can we cause this to generate a more useful exception? Something specifically about the redirect being bogus?
Here is the URL parsing correctly:
>>> import urlparse
>>> urlparse.urlparse('http://stclairshores.patch.com/articles/shores-veteran-to-receive-complimentary-wedding-on-veterans-day')
ParseResult(scheme='http', netloc='stclairshores.patch.com', path='/articles/shores-veteran-to-receive-complimentary-wedding-on-veterans-day', params='', query='', fragment='')
>>> urlparse.urlparse('http://stclairshores.patch.com/articles/shores-veteran-to-receive-complimentary-wedding-on-veterans-day').port
>>>
Here is the redirect that it generates... notice the http://http:// at the beginning of the new location!
>>> r = requests.get('http://stclairshores.patch.com/articles/shores-veteran-to-receive-complimentary-wedding-on-veterans-day', allow_redirects=False)
>>> r.headers
{'status': '302', 'content-length': '160', 'content-encoding': 'gzip', 'set-cookie': 'p13n=%5B%5D; path=/, _patch_session=BAh7BzoPc2Vzc2lvbl9pZCIlNmEzYzM4YzZjMWIxMTdiNDkxMmEwNmEwM2JmZDQzYTU6FnByb21wdF9mb3Jfc3VydmV5aQA%3D--547af227538bb1d039809a3d01eaadb320a9a42b; domain=patch.com; path=/', 'x-powered-by': 'Phusion Passenger (mod_rails/mod_rack) 3.0.11', 'vary': 'Accept-Encoding', 'server': 'Apache/2.2.15 (Unix) mod_ssl/2.2.15 OpenSSL/0.9.8l Phusion_Passenger/3.0.11', 'x-runtime': '15', 'location': 'http://http://www.dailytribune.com/articles/2011/11/09/news/doc4ebb336cad1c7378471368.txt?viewmode=fullstory', 'cache-control': 'no-cache', 'date': 'Mon, 23 Jan 2012 13:32:36 GMT', 'content-type': 'text/html; charset=utf-8', 'x-rack-cache': 'miss'}
>>>
So, of course urllib cannot open it:
>>> import urllib
>>> g = urllib.urlopen('http://stclairshores.patch.com/articles/shores-veteran-to-receive-complimentary-wedding-on-veterans-day').read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/urllib.py", line 84, in urlopen
return opener.open(url)
File "/usr/lib/python2.7/urllib.py", line 205, in open
return getattr(self, name)(url)
File "/usr/lib/python2.7/urllib.py", line 356, in open_http
return self.http_error(url, fp, errcode, errmsg, headers)
File "/usr/lib/python2.7/urllib.py", line 369, in http_error
result = method(url, fp, errcode, errmsg, headers)
File "/usr/lib/python2.7/urllib.py", line 632, in http_error_302
data)
File "/usr/lib/python2.7/urllib.py", line 659, in redirect_internal
return self.open(newurl)
File "/usr/lib/python2.7/urllib.py", line 205, in open
return getattr(self, name)(url)
File "/usr/lib/python2.7/urllib.py", line 331, in open_http
h = httplib.HTTP(host)
File "/usr/lib/python2.7/httplib.py", line 1061, in __init__
self._setup(self._connection_class(host, port, strict))
File "/usr/lib/python2.7/httplib.py", line 693, in __init__
self._set_hostport(host, port)
File "/usr/lib/python2.7/httplib.py", line 718, in _set_hostport
raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
httplib.InvalidURL: nonnumeric port: ''
urllib3 has the same error:
>>> http_pool = urllib3.connection_from_url('http://stclairshores.patch.com/articles/shores-veteran-to-receive-complimentary-wedding-on-veterans-day')>>> r = http_pool.get_url('http://stclairshores.patch.com/articles/shores-veteran-to-receive-complimentary-wedding-on-veterans-day')Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "urllib3/request.py", line 136, in get_url
**urlopen_kw)
File "urllib3/request.py", line 78, in request_encode_url
return self.urlopen(method, url, **urlopen_kw)
File "urllib3/connectionpool.py", line 410, in urlopen
retries - 1, redirect, assert_same_host)
File "urllib3/connectionpool.py", line 336, in urlopen
if assert_same_host and not self.is_same_host(url):
File "urllib3/connectionpool.py", line 246, in is_same_host
scheme, host, port = get_host(url)
File "urllib3/connectionpool.py", line 538, in get_host
port = int(port)
ValueError: invalid literal for int() with base 10: ''
>>>
and it propagates through to requests too:
>>>
>>> r = requests.get('http://stclairshores.patch.com/articles/shores-veteran-to-receive-complimentary-wedding-on-veterans-day')Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 50, in get
return request('get', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 38, in request
return s.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 200, in request
r.send(prefetch=prefetch)
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 514, in send
self._build_response(r)
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 253, in _build_response
request.send()
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 430, in send
conn = self._poolmanager.connection_from_url(url)
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/poolmanager.py", line 94, in connection_from_url
scheme, host, port = get_host(url)
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py", line 524, in get_host
port = int(port)
ValueError: invalid literal for int() with base 10: ''