kjd / idna Goto Github PK
View Code? Open in Web Editor NEWInternationalized Domain Names for Python (IDNA 2008 and UTS #46)
License: BSD 3-Clause "New" or "Revised" License
Internationalized Domain Names for Python (IDNA 2008 and UTS #46)
License: BSD 3-Clause "New" or "Revised" License
this would be great for Requests to have.
Regenerate the PVALID/CONTEXTO/CONTEXTJ data based on the Unicode 7 standard that was released recently. This is pending official publication of the 7.0.0 repertoire by IANA, which in turn is likely pending the resolution of the issue described in http://tools.ietf.org/html/draft-klensin-idna-5892upd-unicode70-00
On python 2.6, messaging formatting like is found here: https://github.com/kjd/idna/blob/master/idna%2Fcore.py#L247-L254
is an error, this is because 2.6 doesn't support the {}
format, you have to explicitly name which value you want (e.g. {0}
).
The decode method can throw an exception when it finds characters not acceptable in IDNA2008. I think that the characters are acceptable in UTS46.
idna.decode("xn--co8ha.tk")
There isn't a way of signalling to decode that it should apply uts46 rules. UTS46 (in section 4.3) says:
Like [RFC3490], this will always produce a converted Unicode string. Unlike ToASCII of [RFC3490], this always signals whether or not there was an error.
The decode method currently indicates whether there was an error, but it does not always produce a converted unicode string.
The domain name above is a valid domain name and can be accessed: http://🐔🐔.tk/
Also, trying to encode this domain name also fails, even with uts46=True and transitional=True.
The python call
"xn--co8ha.tk".decode("idna")
does produce the right answer.
I would stick with the python idna2003 implementation, except that I need to improved handling of the german ß character.
Hi,
observe:
>>> idna.encode("x" * 65)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/hynek/.virtualenvs/cust_ws/lib/python3.5/site-packages/idna/core.py", line 355, in encode
result.append(alabel(label))
File "/Users/hynek/.virtualenvs/cust_ws/lib/python3.5/site-packages/idna/core.py", line 276, in alabel
check_label(label)
File "/Users/hynek/.virtualenvs/cust_ws/lib/python3.5/site-packages/idna/core.py", line 253, in check_label
raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+0027 at position 2 of "b'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'" not allowed
Codepoint U+0027 at position 2 of "b'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'" not allowed
The problem are the following lines.
You check the length of the label and you do raise an IDNAError('Label too long')
, however there’s an except UnicodeError:
and IDNAError
happens to be a subclass of UnicodeError
. I that’s where I closed my pdb session. :)
Hello @kjd , I have url looking like this https://tjhughes_co_uk.secure-cdn.visualsoft.co.uk/images/4-pack-solar-path-finder-p6049-14413_image.jpg I use library that uses idna, and I encountered following failure:
In [1]: url = 'https://tjhughes_co_uk.secure-cdn.visualsoft.co.uk/images/4-pack-solar-path-finder-p6049-14413_image.jpg'
...:
In [2]: import idna
In [3]: idna.encode(urlparse(url).netloc)
---------------------------------------------------------------------------
InvalidCodepoint Traceback (most recent call last)
<ipython-input-3-f510d576e964> in <module>()
----> 1 idna.encode(urlparse(url).netloc)
/home/pawel/.virtualenvs/scrapy/local/lib/python2.7/site-packages/idna/core.pyc in encode(s, strict, uts46, std3_rules, transitional)
353 trailing_dot = True
354 for label in labels:
--> 355 result.append(alabel(label))
356 if trailing_dot:
357 result.append(b'')
/home/pawel/.virtualenvs/scrapy/local/lib/python2.7/site-packages/idna/core.pyc in alabel(label)
274
275 label = unicode(label)
--> 276 check_label(label)
277 label = _punycode(label)
278 label = _alabel_prefix + label
/home/pawel/.virtualenvs/scrapy/local/lib/python2.7/site-packages/idna/core.pyc in check_label(label)
251 raise InvalidCodepointContext('Codepoint {0} not allowed at position {1} in {2}'.format(_unot(cp_value), pos+1, repr(label)))
252 else:
--> 253 raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
254
255 check_bidi(label)
InvalidCodepoint: Codepoint U+005F at position 9 of u'tjhughes_co_uk' not allowed
In version 2.2 it complains about character c after underscore so I'm guessing underscores in domain name are invalid.
In vesion 2.5 of idna I get
In [7]: idna.encode(urlparse(a).netloc)
---------------------------------------------------------------------------
IDNAError Traceback (most recent call last)
<ipython-input-7-6f26ef5f0d17> in <module>()
----> 1 idna.encode(urlparse(a).netloc)
/home/pawel/.virtualenvs/scrapy/local/lib/python2.7/site-packages/idna/core.pyc in encode(s, strict, uts46, std3_rules, transitional)
353 trailing_dot = True
354 for label in labels:
--> 355 result.append(alabel(label))
356 if trailing_dot:
357 result.append(b'')
/home/pawel/.virtualenvs/scrapy/local/lib/python2.7/site-packages/idna/core.pyc in alabel(label)
263 ulabel(label)
264 except IDNAError:
--> 265 raise IDNAError('The label {0} is not a valid A-label'.format(label))
266 if not valid_label_length(label):
267 raise IDNAError('Label too long')
IDNAError: The label tjhughes_co_uk is not a valid A-label
I tried
idna.encode(url, uts46=True, transitional=True)
but it still breaks. I'm using Python 2. I stumbled on this via Scrapy that uses Twisted. Twisted does following thing with idna https://github.com/twisted/twisted/blob/trunk/src/twisted/internet/_idna.py#L28 is that ok?
If this url is invalid according to idna specs (I'm not sure it is, is it invalid?) is there some way for me to encode it to idna? Or do we need to skip idna encoding on failures like this?
Some import statements do not use relative imports which makes vendoring more difficult.
Modified core.py, build-uts46data.py, and uts46data.py to support 64kb Java Class limits.
See https://github.com/metavero/idna/pulls
Thanks for this code, it's nice to see support for modern urls in Python! The built-in codec raises for any hostname containing ß, complaining
UnicodeError: ('IDNA does not round-trip', b'xn--einla-pqa', b'einlass')
idna 2008 fixes this stupidity.
However, when you're dealing with general-purpose urls you end up with hostnames like '::1', which is localhost for ipv6. This means that your module isn't so easy to drop in as a replacement. I either need to wrap it or stick try / except idna.core.InvalidCodepoint in my code, which looks a bit odd. I think it might be prettier if idna encode/decode grew options to pass-thru in this case?
>>> u='::1'
>>> u.encode('idna')
b'::1'
>>> import idna
>>> idna.encode(u)
Traceback (most recent call last):
File "/home/lindahl/.pyenv/versions/3.6.4/lib/python3.6/site-packages/idna/core.py", line 263, in alabel
ulabel(label)
File "/home/lindahl/.pyenv/versions/3.6.4/lib/python3.6/site-packages/idna/core.py", line 299, in ulabel
check_label(label)
File "/home/lindahl/.pyenv/versions/3.6.4/lib/python3.6/site-packages/idna/core.py", line 253, in check_label
raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+003A at position 1 of '::1' not allowed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/lindahl/.pyenv/versions/3.6.4/lib/python3.6/site-packages/idna/core.py", line 355, in encode
result.append(alabel(label))
File "/home/lindahl/.pyenv/versions/3.6.4/lib/python3.6/site-packages/idna/core.py", line 265, in alabel
raise IDNAError('The label {0} is not a valid A-label'.format(label))
idna.core.IDNAError: The label b'::1' is not a valid A-label
| noarch/idna-2.6-py_0.tar.bz2
From https://anaconda.org/conda-forge/idna/files
Breaks conda update conda
Package plan for installation in environment /opt/conda:
The following packages will be downloaded:
package | build
---------------------------|-----------------
conda-env-2.6.0 | 0 1017 B conda-forge
libffi-3.2.1 | 3 47 KB conda-forge
asn1crypto-0.22.0 | py35_0 149 KB conda-forge
certifi-2017.7.27.1 | py35_0 204 KB conda-forge
chardet-3.0.4 | py35_0 189 KB conda-forge
idna-2.6 | py_0 47 KB conda-forge
pycparser-2.18 | py35_0 169 KB conda-forge
pysocks-1.6.7 | py35_0 21 KB conda-forge
cffi-1.10.0 | py35_0 388 KB conda-forge
cryptography-2.0.3 | py35_0 853 KB conda-forge
pyopenssl-17.2.0 | py35_0 77 KB conda-forge
urllib3-1.22 | py35_0 155 KB conda-forge
requests-2.18.4 | py35_1 91 KB conda-forge
conda-4.3.24 | py35_0 514 KB conda-forge
------------------------------------------------------------
Total: 2.8 MB
The following NEW packages will be INSTALLED:
asn1crypto: 0.22.0-py35_0 conda-forge
certifi: 2017.7.27.1-py35_0 conda-forge
cffi: 1.10.0-py35_0 conda-forge
chardet: 3.0.4-py35_0 conda-forge
cryptography: 2.0.3-py35_0 conda-forge
idna: 2.6-py_0 conda-forge
libffi: 3.2.1-3 conda-forge
pycparser: 2.18-py35_0 conda-forge
pyopenssl: 17.2.0-py35_0 conda-forge
pysocks: 1.6.7-py35_0 conda-forge
urllib3: 1.22-py35_0 conda-forge
The following packages will be UPDATED:
conda: 4.1.11-py35_0 --> 4.3.24-py35_0 conda-forge
conda-env: 2.5.2-py35_0 --> 2.6.0-0 conda-forge
requests: 2.10.0-py35_0 --> 2.18.4-py35_1 conda-forge
�[91mTraceback (most recent call last):
File "/opt/conda/lib/python3.5/site-packages/conda/exceptions.py", line 634, in conda_exception_handler
return_value = func(*args, **kwargs)
File "/opt/conda/lib/python3.5/site-packages/conda/cli/main.py", line 114, in _main
imported = importlib.import_module(module)
File "/opt/conda/lib/python3.5/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 986, in _gcd_import
File "<frozen importlib._bootstrap>", line 969, in _find_and_load
File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 665, in exec_module
File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
File "/opt/conda/lib/python3.5/site-packages/conda/cli/main_update.py", line 10, in <module>
from .install import install
File "/opt/conda/lib/python3.5/site-packages/conda/cli/install.py", line 20, in <module>
from ..core.index import get_index, get_channel_priority_map
File "/opt/conda/lib/python3.5/site-packages/conda/core/index.py", line 8, in <module>
from .package_cache import PackageCache
File "/opt/conda/lib/python3.5/site-packages/conda/core/package_cache.py", line 9, in <module>
from .path_actions import CacheUrlAction, ExtractPackageAction
File "/opt/conda/lib/python3.5/site-packages/conda/core/path_actions.py", line 33, in <module>
from ..gateways.download import download
File "/opt/conda/lib/python3.5/site-packages/conda/gateways/download.py", line 10, in <module>
from requests.exceptions import ConnectionError, HTTPError, InvalidSchema, SSLError
File "/opt/conda/lib/python3.5/site-packages/requests/__init__.py", line 98, in <module>
from . import packages
File "/opt/conda/lib/python3.5/site-packages/requests/packages.py", line 7, in <module>
locals()[package] = __import__(package)
ImportError: No module named 'idna'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/bin/conda", line 6, in <module>
sys.exit(conda.cli.main())
File "/opt/conda/lib/python3.5/site-packages/conda/cli/main.py", line 182, in main
return conda_exception_handler(_main, *args)
File "/opt/conda/lib/python3.5/site-packages/conda/exceptions.py", line 638, in conda_exception_handler
return handle_exception(e)
File "/opt/conda/lib/python3.5/site-packages/conda/exceptions.py", line 628, in handle_exception
print_unexpected_error_message(e)
File "/opt/conda/lib/python3.5/site-packages/conda/exceptions.py", line 590, in print_unexpected_error_message
stderrlogger.info(get_main_info_str(get_info_dict()))
File "/opt/conda/lib/python3.5/site-packages/conda/cli/main_info.py", line 162, in get_info_dict
from ..connection import user_agent
File "/opt/conda/lib/python3.5/site-packages/conda/connection.py", line 12, in <module>
from requests import Session, __version__ as REQUESTS_VERSION
File "/opt/conda/lib/python3.5/site-packages/requests/__init__.py", line 98, in <module>
from . import packages
File "/opt/conda/lib/python3.5/site-packages/requests/packages.py", line 7, in <module>
locals()[package] = __import__(package)
ImportError: No module named 'idna'
>>> import idna
>>> idna.encode('ドメイン.テスト')
b'xn--eckwd4c7c.xn--zckzah'
>>> idna.encode('☃')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.4/site-packages/idna/core.py", line 355, in encode
result.append(alabel(label))
File "/usr/lib64/python3.4/site-packages/idna/core.py", line 276, in alabel
check_label(label)
File "/usr/lib64/python3.4/site-packages/idna/core.py", line 253, in check_label
raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+2603 at position 1 of '☃' not allowed
Thank you for your work to create and maintain the idna module.
Can you consider to add an additional identifier of the license type to the LICENSE.rst file?
I've checked that it's The 3-Clause BSD License but that required an additional effort.
Thank you.
Hello,
We have a python program with resident process running in separate thread. We make requests in said thread and when master thread spots a timeout, a custom exception is dispatched to child thread via OS signal. We use Requests 2.12.4.
However, that exception is caught not by our handler (in child thread main loop), but by except:
directive in alabel
function deep in idna/core.py
: https://github.com/kennethreitz/requests/blob/master/requests/packages/idna/core.py#L264 because that handler is an unconditional "catch-all" except:
directive. That leads to very cryptic behaviour like child thread not stopping and trying to access urls with unnecessary IDNA-encoding (all our URLs are in plain ascii).
Expected behaviour for that handler is not to catch everything, but to expect a specific exception class(es) and work just with them.
Requests maintainer asked me to open this issue here, since apparently requests
package uses idna
package vendored without edits from this repo. Original issue: kennethreitz/requests#3843
RFC5892 A.7 states that the KATAKANA MIDDLE DOT character https://tools.ietf.org/html/rfc5892#appendix-A.7 requires at least one character in the {Hiragana, Katakana, Han} scripts. The validation in this library requires that ALL characters in the label (besides the dot character itself) be in those scripts.
Hi,
I found an instance of domain which cannot be decoded by IDNA library.
In [3]: idna.decode('xn--hjdarna07-b07a.cf') InvalidCodepoint Traceback (most recent call last) in () ----> 1 idna.decode('xn--hjdarna07-b07a.cf') /usr/local/lib/python2.7/dist-packages/idna/core.pyc in decode(s, strict, uts46, std3_rules) 382 trailing_dot = True 383 for label in labels: --> 384 result.append(ulabel(label)) 385 if trailing_dot: 386 result.append(u'') /usr/local/lib/python2.7/dist-packages/idna/core.pyc in ulabel(label) 301 302 label = label.decode('punycode') --> 303 check_label(label) 304 return label 305 /usr/local/lib/python2.7/dist-packages/idna/core.pyc in check_label(label) 251 raise InvalidCodepointContext('Codepoint {0} not allowed at position {1} in {2}'.format(_unot(cp_value), pos+1, repr(label))) 252 else: --> 253 raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label))) 254 255 check_bidi(label) InvalidCodepoint: Codepoint U+10A5 at position 7 of u'hjdarn\u10a5a07' not allowed
The excepted output is "hjdarnႥa07.cf". The domain exists (http://xn--hjdarna07-b07a.cf/) and is correct IDNA code (http://unicode.org/cldr/utility/idna.jsp?a=hjdarn%E1%82%A5a07.cf).
The fix for #36 (ebefacd) changes the exception hierarchy and breaks several downstream dependencies (twisted, pyca/cryptography, possibly more). Looking at the diff, would it be possible to revert and simply catch UnicodeEncodeError
rather than UnicodeError
? UnicodeEncodeError
will be raised by encode
, which is what you're trying to catch, while you don't want to catch any of the IDNAErrors. Alternately, this ugly code (which could be improved) also passes the tests:
def alabel(label):
if not label:
raise IDNAError('No Input')
if not valid_label_length(label):
raise IDNAError('Label too long')
try:
label = label.encode('ascii')
encoded = True
except UnicodeError:
encoded = False
if encoded is True:
try:
ulabel(label)
return label
except IDNAError:
raise IDNAError('The label {0} is not a valid A-label'.format(label))
label = unicode(label)
check_label(label)
label = _punycode(label)
label = _alabel_prefix + label
if not valid_label_length(label):
raise IDNAError('Label too long')
return label
Hello,
The change from IDNA 2003 to 2008 (and from Python's built-in codec to the idna
module) is a bit complicated for application developers. Some documentation about application-level requirements before calling encode
might be handy, i.e. RFC 5895 section 2. I'm just working through it now so I'm not sure if I understand it all correctly, but I've got:
1 Uppercase characters are mapped to their lowercase equivalents by
using the algorithm for mapping case in Unicode characters. This
step was chosen because the output will behave more like ASCII
host names behave.
domain = domain.lower()
2 Fullwidth and halfwidth characters (those defined with
Decomposition Types < wide > and < narrow >) are mapped to their
decomposition mappings as shown in the Unicode character
database. This step was chosen because many input mechanisms,
particularly in Asia, do not allow you to easily enter characters
in the form used by IDNA2008. Even if they do allow the correct
character form, the user might not know which form they are
entering.
def wide_narrow(c):
decomp = unicodedata.decomposition(c)
m = re.match('<(?:narrow|wide)> ([A-Z0-9]+)$', decomp)
if m:
return chr_func(int(m.group(1), 16)) # chr in Py3, unichr in Py2
return c
domain = ''.join(wide_narrow(c) for c in domain)
3 All characters are mapped using Unicode Normalization Form C
(NFC). This step was chosen because it maps combinations of
combining characters into canonical composed form. As with the
fullwidth/halfwidth mapping, users are not generally aware of the
particular form of characters that they are entering, and
IDNA2008 requires that only the canonical composed forms from NFC
be used.
domain = unicodedata.normalize("NFC", domain)
4 [IDNA2008protocol] is specified such that the protocol acts on
the individual labels of the domain name. If an implementation
of this mapping is also performing the step of separation of the
parts of a domain name into labels by using the FULL STOP
character (U+002E), the IDEOGRAPHIC FULL STOP character (U+3002)
can be mapped to the FULL STOP before label separation occurs.
There are other characters that are used as "full stops" that one
could consider mapping as label separators, but their use as such
has not been investigated thoroughly. This step was chosen
because some input mechanisms do not allow the user to easily
enter proper label separators. Only the IDEOGRAPHIC FULL STOP
character (U+3002) is added in this mapping because the authors
have not fully investigated the applicability of other characters
and the environments where they should and should not be
considered domain name label separators.
domain = domain.replace('\u3002', '.')
^ Seems to be handled by the idna
module.
Does this look right?
Step 2 is a bit awkward so maybe this all can be incorporated into the module?
Is idna supposed to work on Python 2.6? The documentation does not explicitly state that 2.7 is required and 2.6 is not. There has been at least one commit to improve 2.6 compat as well.
I've added to obvious fxes to the tests for 2.6 that, but I'm not exactly sure what the correct fix for the two remaining tests is.
Patch:
diff --git a/tests/test_idna_codec.py b/tests/test_idna_codec.py
index fe3737a..2d0cd14 100755
--- a/tests/test_idna_codec.py
+++ b/tests/test_idna_codec.py
@@ -24,7 +24,7 @@ class IDNACodecTests(unittest.TestCase):
)
for decoded, encoded in incremental_tests:
- if sys.version_info.major == 2:
+ if sys.version_info[0] == 2:
self.assertEqual("".join(codecs.iterdecode(encoded, "idna")),
decoded)
else:
diff --git a/tests/test_idna_uts46.py b/tests/test_idna_uts46.py
index d8a2836..c93c0df 100755
--- a/tests/test_idna_uts46.py
+++ b/tests/test_idna_uts46.py
@@ -51,6 +51,8 @@ class TestIdnaTest(unittest.TestCase):
return "%s.%d" % (super(TestIdnaTest, self).id(), self.lineno)
def shortDescription(self):
+ if not self.fields:
+ return ""
return "IdnaTest.txt line %d: %r" % (self.lineno,
u"; ".join(self.fields))
Failures:
======================================================================
ERROR: test_decode (tests.test_idna.IDNATests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/rbu/devel/code/idna/tests/test_idna.py", line 258, in test_decode
self.assertEqual(idna.decode('xn---------90gglbagaar.aa'),
File "/home/rbu/devel/code/idna/idna/core.py", line 383, in decode
result.append(ulabel(label))
File "/home/rbu/devel/code/idna/idna/core.py", line 302, in ulabel
check_label(label)
File "/home/rbu/devel/code/idna/idna/core.py", line 254, in check_label
check_bidi(label)
File "/home/rbu/devel/code/idna/idna/core.py", line 70, in check_bidi
raise IDNABidiError('Unknown directionality in label {0} at position {1}'.format(repr(label), idx))
IDNABidiError: Unknown directionality in label u'\u0521\u0525\u0523-\u0523\u0523-----\u0521\u0523\u0523\u0523' at position 2
======================================================================
ERROR: test_encode (tests.test_idna.IDNATests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/rbu/devel/code/idna/tests/test_idna.py", line 244, in test_encode
self.assertEqual(idna.encode(u'\u0521\u0525\u0523-\u0523\u0523-----\u0521\u0523\u0523\u0523.aa'),
File "/home/rbu/devel/code/idna/idna/core.py", line 354, in encode
result.append(alabel(label))
File "/home/rbu/devel/code/idna/idna/core.py", line 275, in alabel
check_label(label)
File "/home/rbu/devel/code/idna/idna/core.py", line 254, in check_label
check_bidi(label)
File "/home/rbu/devel/code/idna/idna/core.py", line 70, in check_bidi
raise IDNABidiError('Unknown directionality in label {0} at position {1}'.format(repr(label), idx))
IDNABidiError: Unknown directionality in label u'\u0521\u0525\u0523-\u0523\u0523-----\u0521\u0523\u0523\u0523' at position 2
----------------------------------------------------------------------
Ran 17 tests in 0.007s
FAILED (errors=2)
The tests do not pass according to Travis: https://travis-ci.org/kjd/idna/builds/61088615
Enable the ability to create IDNA and UTS46 data for any given version of Unicode, rather than the current version or the version that has been pegged by the IAB (see issue #8). The requires refactoring the build-idnadata.py support tool to accept a version argument, obtaining the relevant underlying data needed to implement RFC 5892, and generating IANA-style tables that are input to the existing methods.
This is useful for debugging, and also provides options for implementers who want to track to newer Unicode versions despite potential risks.
When idna is installed on alpine linux from the wheel from pypi the source files have 0700 permission meaning no-one except root can execute them.
Consider the following
Python 2.7.12 (default, Jul 1 2016, 15:12:24)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import idna
>>> idna.encode('r6---sn-i5onxoxu-cxgl.c.doc-0-0-sj.sj.googleusercontent.com')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/idna/core.py", line 355, in encode
result.append(alabel(label))
File "/usr/local/lib/python2.7/dist-packages/idna/core.py", line 276, in alabel
check_label(label)
File "/usr/local/lib/python2.7/dist-packages/idna/core.py", line 239, in check_label
check_hyphen_ok(label)
File "/usr/local/lib/python2.7/dist-packages/idna/core.py", line 134, in check_hyphen_ok
raise IDNAError('Label has disallowed hyphens in 3rd and 4th position')
idna.core.IDNAError: Label has disallowed hyphens in 3rd and 4th position
Same issue for decoding. The hostname is valid, so it should just pass without error.
I came here because requests unconditionally does idna encoding of the hostname.
Please see commit ac7ea30
It seems that the U+1F52B PISTOL character added to Unicode in version 6.0 (2010) is not supported by the idna
package:
>>> idna.encode(u'\U0001f52b.test')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/chirila/env/local/lib/python2.7/site-packages/idna/core.py", line 354, in encode
result.append(alabel(label))
File "/home/chirila/env/local/lib/python2.7/site-packages/idna/core.py", line 275, in alabel
check_label(label)
File "/home/chirila/env/local/lib/python2.7/site-packages/idna/core.py", line 252, in check_label
raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+1F52B at position 1 of u'\U0001f52b' not allowed
>>> idna.decode('xn--bw8h.test')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/chirila/env/local/lib/python2.7/site-packages/idna/core.py", line 383, in decode
result.append(ulabel(label))
File "/home/chirila/env/local/lib/python2.7/site-packages/idna/core.py", line 302, in ulabel
check_label(label)
File "/home/chirila/env/local/lib/python2.7/site-packages/idna/core.py", line 252, in check_label
raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+1F52B at position 1 of u'\U0001f52b' not allowed
>>>
While the python standard library can indeed encode/decode the codepoint:
>>> u'\U0001f52b.test'.encode("idna")
'xn--bw8h.test'
>>> 'xn--bw8h.test'.decode("idna")
u'\U0001f52b.test'
import idna
idna.encode('ex_ample.example.org')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/nail/home/benp/pg/utils/bravado-promised/virtualenv_run/local/lib/python2.7/site-packages/idna/core.py", line 354, in encode
result.append(alabel(label))
File "/nail/home/benp/pg/utils/bravado-promised/virtualenv_run/local/lib/python2.7/site-packages/idna/core.py", line 275, in alabel
check_label(label)
File "/nail/home/benp/pg/utils/bravado-promised/virtualenv_run/local/lib/python2.7/site-packages/idna/core.py", line 252, in check_label
raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+005F at position 3 of u'ex_ample' not allowed
Seems like underscores in hostname should be perfectly allowed? Can't find any RFC that suggests otherwise (and browsers seem happy with URLs in hostname).
Perhaps this is more of a twisted bug where it shouldn't pass this sort of url to idna ever, not sure. This is just an issue I hit as an end user
Please see commit 2afd1a2
I have implemented UTS46, please see https://github.com/jribbens/idna
It passes 100% of the tests in IdnaTest.txt, except 34 which all contain 0x200c or 0x200d. I don't know what the problem is with those tests.
Can you pull this please?
Hello
Given the string u'\u2709'
(or the encoded form xn--4bi
)
Trying to use encode / decode on that string will always raise the following error:
Codepoint U+2709 at position 1 of u'\u2709' not allowed
The domain in question exists and is resolving, the standard python idna module correctly encodes/decodes the string.
Is this an issue with idna module or am I missing something?
Thanks for your code,I find a bug when i use it.
When I try to eocode a underlined domain name(Such as:test_a.abc.com
),I get a ERROR.Although it is lack of standardization,it can be used by some people.And it can work normaly in some occasions.So I think that it shouldn't be a error.What about you?
>>> import idna
>>> idna.encode('abc.com')
b'abc.com'
>>> idna.encode('test_1.abc.com')
Traceback (most recent call last):
File "D:\Program Files\Python37\lib\site-packages\idna\core.py", line 263, in alabel
ulabel(label)
File "D:\Program Files\Python37\lib\site-packages\idna\core.py", line 299, in ulabel
check_label(label)
File "D:\Program Files\Python37\lib\site-packages\idna\core.py", line 253, in check_label
raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value)
, pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+005F at position 5 of 'test_1' not allowed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\Program Files\Python37\lib\site-packages\idna\core.py", line 355, in encode
result.append(alabel(label))
File "D:\Program Files\Python37\lib\site-packages\idna\core.py", line 265, in alabel
raise IDNAError('The label {0} is not a valid A-label'.format(label))
idna.core.IDNAError: The label b'test_1' is not a valid A-label
The only licensing information is BSD-like
in setup.py
.
Hello,
I recently upgraded to requests 2.12 which uses idna and I started to see this import error:
(Pdb) request('get', url, params=params, **kwargs)
*** ImportError: No module named uts46data
Looking in the changes I saw that it was recently added idna.
My projects are using import hooks and inside of it the hook use requests, which uses idna.
The code of idna is doing an import inside the method uts46_remap
, which is breaking my code.
Moving the import uts46_remap outside fixes this issue.
>>> idna.encode("one‐two.com".decode("utf-8"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/idna/core.py", line 354, in encode
result.append(alabel(label))
File "/usr/lib/python2.7/dist-packages/idna/core.py", line 275, in alabel
check_label(label)
File "/usr/lib/python2.7/dist-packages/idna/core.py", line 252, in check_label
raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+2010 at position 4 of u'one\u2010two' not allowed
I've been looking at IDNA domain registrations and using your library in conjunction with the built in python tools.
>>> domain = "xn--53hy7af013i.ws".encode("utf-8")
>>> domain.decode("idna")
'☕🦊✈.ws'
The IDNA package is a HUGE life saver for me. I monitored approximately over 111,000 IDNA domains being registered, and a small percentage of them failed. I've attached the output that I thought you might find useful.
As you can see above, there is an uptick of registrations of emoji domains now. Although it is not part of the specification, it would be very helpful if that was incorporated into this package.
I accidentally passed a name with a blank to idna.encode; It seems to encode it to bytes and then complains about the apostrophe of b'foo bar'.
Python2 correctly complains: idna.core.InvalidCodepoint: Codepoint U+0020 at position 4 of u'foo bar' not allowed
Test case:
import idna
idna.encode('foo bar')
Result:
$ python3 foo.py
Traceback (most recent call last):
File "foo.py", line 2, in <module>
idna.encode('foo bar')
File "/home/joern/env/lib/python3.5/site-packages/idna/core.py", line 355, in encode
result.append(alabel(label))
File "/home/joern/env/lib/python3.5/site-packages/idna/core.py", line 276, in alabel
check_label(label)
File "/home/joern/env/lib/python3.5/site-packages/idna/core.py", line 253, in check_label
raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+0027 at position 2 of "b'foo bar'" not allowed
I believe this to be a bug with idna rather than in requests but I think that @kennethreitz should probably be aware of this too:
Using python 3.4 and requests 2.12.1 I tried to get this URL:
requests.get('http://[::]:5000/files')
Which returns this bunch of errors:
python3 fileclient.py
Traceback (most recent call last):
File "/usr/local/lib/python3.4/dist-packages/requests/packages/idna/core.py", line 327, in uts46_remap
raise IndexError()
IndexError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.4/dist-packages/requests/models.py", line 370, in prepare_url
host = idna.encode(host, uts46=True).decode('utf-8')
File "/usr/local/lib/python3.4/dist-packages/requests/packages/idna/core.py", line 340, in encode
s = uts46_remap(s, std3_rules, transitional)
File "/usr/local/lib/python3.4/dist-packages/requests/packages/idna/core.py", line 332, in uts46_remap
_unot(code_point), pos + 1, repr(domain)))
requests.packages.idna.core.InvalidCodepoint: Codepoint U+005B not allowed at position1 in '[::1]'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "fileclient.py", line 55, in <module>
buy_file()
File "fileclient.py", line 17, in buy_file
response = requests.get(url=server_url+'files')
File "/usr/local/lib/python3.4/dist-packages/two1/bitrequests/bitrequests.py", line 169, in get
return self.request('get', url, max_price, **kwargs)
File "/usr/local/lib/python3.4/dist-packages/two1/bitrequests/bitrequests.py", line 134, in request
response = requests.request(method, url, **kwargs)
File "/usr/local/lib/python3.4/dist-packages/requests/api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.4/dist-packages/requests/sessions.py", line 474, in request
prep = self.prepare_request(req)
File "/usr/local/lib/python3.4/dist-packages/requests/sessions.py", line 407, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "/usr/local/lib/python3.4/dist-packages/requests/models.py", line 302, in prepare
self.prepare_url(url, params)
File "/usr/local/lib/python3.4/dist-packages/requests/models.py", line 372, in prepare_url
raise InvalidURL('URL has an invalid label.')
requests.exceptions.InvalidURL: URL has an invalid label.
I downgraded to requests 2.11.1 (prior to the idna update) and now my code works again.
Hi 👊
This is my first visit to this fine repo, but it seems you have been working hard to keep all dependencies updated so far.
Once you have closed this issue, I'll create separate pull requests for every update as soon as I find one.
That's it for now!
Happy merging! 🤖
AIUI, when std3_rules
is True, invalid characters for a domain should be rejected. Instead, uts46_remap
allows them unless std3_rules
is False.
The tests do not pass according to Travis: https://travis-ci.org/kjd/idna/builds/61088615
Python 2 exception seems to relate to assuming chr() is from Python 3. Mapping it to unichr() for Python 2 partially fixes it, but fails on a narrow Python build.
Hi!
I just happily discovered that idna has already been ported to Python 3! Would you mind to release a 0.3.1 so people can see it on PyPI too?
Thanks!
Hi,
it would be great if you could upload a wheel of idna to PyPI which install much faster.
The current version of pip does it automatically but it would be nice if I could skip this step. :)
If you need any help, https://hynek.me/articles/sharing-your-labor-of-love-pypi-quick-and-dirty/ should get you started.
Thanks!
I am looking at bringing native IDNA2008 support into the Python core library, and had a long conversation with Nathaniel J. Smith (@njsmith) and Christian Heimes (@tiran) about the work required. The end goal of this work is to have Python be able to natively handle IDNs as a first class citizen, and recycle as much code use as possible.
To summarize the current conversation, the first step would be implementing a new codec in Python's code library, and then extend the standard library to be able to natively handle IDNs in such a way that the following code snippit could work:
#!/usr/bin/env python3
import urllib
import urllib.request
req = urllib.request.Request('http://fuß.standcore.com')
response = urllib.request.urlopen(req)
the_page = response.read()
print(the_page.decode(encoding='utf-8'))
From the conversation on Zulip, the first step would be implementing idna2008 as a new encoding codec, and then work on modifying the core library to be able to accept and interoperate with IDNs seamlessly.
I'm willing to do much of the legwork required to get code integrated into CPython. My first question is what (if any) blockers exist in implementation that would make it difficult to bring into CPython, and any tips or suggestions to help bring things forward. Right now, I'm just trying to get the ball rolling on figuring out a solid plan on hopefully having Python 3.8 be able to treat IDNs as first class citizens.
I'm not really sure how to describe this in idna terms. I never used idna directly.
Here's the issue I opened for requests: psf/requests#4569
They told me to hand it over to you (they're too lazy to do it themselves).
Anyway, hope this is helpful. Sorry if not.
P.S. here's the exception, just for convenience:
idna.core.IDNAError: The label b'xn--mn8ha4uc' is not a valid A-label
Hello
We've encountered a regression with the 2.7 release and the additonal validation of A-labels
I'm not certain if should be fixed but I just wanted to let you know.
# Context, handling of DNS records
host = u'*.foö.fi'
return {'host_ulabel' : host,
'host_alabel' : idna.encode(host, uts46=True)}
Which now raises idna.core.IDNAError: The label * is not a valid A-label
This URL works in Firefox but not in my App!
>>> idna.encode(u'sande.møre-og-romsdal.no', uts46=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/idna/core.py", line 339, in encode
s = uts46_remap(s, std3_rules, transitional)
File "/usr/local/lib/python2.7/dist-packages/idna/core.py", line 331, in uts46_remap
_unot(code_point), pos + 1, repr(domain)))
idna.core.InvalidCodepoint: Codepoint U+00B8 not allowed at position 9 in u'sande.m\xc3\xb8re-og-romsdal.no'
Unicode 7.0.0 introduced support for the ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1) character which has been seen as controversial in some circles. While discussion continues on whether to amend the IDNA protocol or do some other kind of remediation for this character, IAB has recommended IANA essentially reverse publication of the 7.0.0 dataset on which the IDNA library relies. Therefore the library should revert to the next-to-last most recent data set which is from Unicode 6.3.0.
See https://www.iab.org/documents/correspondence-reports-documents/2015-2/iab-statement-on-identifiers-and-unicode-7-0-0/ for details.
I get IndexError
instead of idna.core.IDNAError
- if I understood correctly this is not what should happen:
>>> import idna
>>> idna.decode('xn--eckwd4c7c.xn--zckzah')
'ドメイン.テスト'
>>> idna.decode('xn--eckwd4c7c.xn--zckzah ')
Traceback (most recent call last):
File "/usr/local/lib/python3.6/encodings/punycode.py", line 207, in decode
res = punycode_decode(input, errors)
File "/usr/local/lib/python3.6/encodings/punycode.py", line 194, in punycode_decode
return insertion_sort(base, extended, errors)
File "/usr/local/lib/python3.6/encodings/punycode.py", line 165, in insertion_sort
bias, errors)
File "/usr/local/lib/python3.6/encodings/punycode.py", line 146, in decode_generalized_number
% extended[extpos])
IndexError: string index out of range
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/site-packages/idna/core.py", line 389, in decode
s = ulabel(label)
File "/usr/local/lib/python3.6/site-packages/idna/core.py", line 307, in ulabel
label = label.decode('punycode')
IndexError: decoding with 'punycode' codec failed (IndexError: string index out of range)
д_ругойтест.рф
Codepoint U+005F at position 2 of 'д_ругойтест' not allowed
Full traceback:
Traceback (most recent call last):
File "/home/user/somedirectory/code.py", line 34, in <module>
punnycode_to_replace = idna.encode("д_ругойтест.рф")
File "/usr/local/lib/python3.5/dist-packages/idna/core.py", line 355, in encode
result.append(alabel(label))
File "/usr/local/lib/python3.5/dist-packages/idna/core.py", line 276, in alabel
check_label(label)
File "/usr/local/lib/python3.5/dist-packages/idna/core.py", line 253, in check_label
raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+005F at position 2 of 'д_ругойтест' not allowed
[Finished in 0.4s with exit code 1]
`2017-07-04 16:19:44 [scrapy.core.scraper] ERROR: Error downloading <GET https://sloneczne_stablowice.forumoteka.pl/kategoria,4,mieszkancy-luzne-rozmowy.html>
Traceback (most recent call last):
File "c:\users\bukowa\vritualenv2\lib\site-packages\idna\core.py", line 263, in alabel
ulabel(label)
File "c:\users\bukowa\vritualenv2\lib\site-packages\idna\core.py", line 299, in ulabel
check_label(label)
File "c:\users\bukowa\vritualenv2\lib\site-packages\idna\core.py", line 253, in check_label
raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+005F at position 10 of 'sloneczne_stablowice' not allowed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\users\bukowa\vritualenv2\lib\site-packages\twisted\internet\defer.py", line 1384, in inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "c:\users\bukowa\vritualenv2\lib\site-packages\twisted\python\failure.py", line 393, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "c:\users\bukowa\vritualenv2\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
File "c:\users\bukowa\vritualenv2\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred
result = f(*args, **kw)
File "c:\users\bukowa\vritualenv2\lib\site-packages\scrapy\core\downloader\handlers_init.py", line 65, in download_request
return handler.download_request(request, spider)
File "c:\users\bukowa\vritualenv2\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 63, in download_request
return agent.download_request(request)
File "c:\users\bukowa\vritualenv2\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 300, in download_request
method, to_bytes(url, encoding='ascii'), headers, bodyproducer)
File "c:\users\bukowa\vritualenv2\lib\site-packages\twisted\web\client.py", line 1633, in request
endpoint = self._getEndpoint(parsedURI)
File "c:\users\bukowa\vritualenv2\lib\site-packages\twisted\web\client.py", line 1617, in _getEndpoint
return self._endpointFactory.endpointForURI(uri)
File "c:\users\bukowa\vritualenv2\lib\site-packages\twisted\web\client.py", line 1494, in endpointForURI
uri.port)
File "c:\users\bukowa\vritualenv2\lib\site-packages\scrapy\core\downloader\contextfactory.py", line 59, in creatorForNetloc
return ScrapyClientTLSOptions(hostname.decode("ascii"), self.getContext())
File "c:\users\bukowa\vritualenv2\lib\site-packages\twisted\internet_sslverify.py", line 1152, in init
self._hostnameBytes = _idnaBytes(hostname)
File "c:\users\bukowa\vritualenv2\lib\site-packages\twisted\internet_idna.py", line 30, in _idnaBytes
return idna.encode(text)
File "c:\users\bukowa\vritualenv2\lib\site-packages\idna\core.py", line 355, in encode
result.append(alabel(label))
File "c:\users\bukowa\vritualenv2\lib\site-packages\idna\core.py", line 265, in alabel
raise IDNAError('The label {0} is not a valid A-label'.format(label))
idna.core.IDNAError: The label b'sloneczne_stablowice' is not a valid A-label`
idna == 2.5
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.