Giter VIP home page Giter VIP logo

idna's People

Contributors

alex avatar cclauss avatar cornmander avatar diogoteles08 avatar elliotwutingfeng avatar hugovk avatar j-bernard avatar jayaddison avatar jdufresne avatar jribbens avatar jschueller avatar kjd avatar mayeut avatar mgorny avatar mplonka avatar orianek avatar sethmlarson avatar slingamn avatar sreekanth370 avatar tomprince avatar underrun avatar verhovsky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

idna's Issues

Alternative handling of illegal IDNs (such as domains with emojis)

The decode method can throw an exception when it finds characters not acceptable in IDNA2008. I think that the characters are acceptable in UTS46.

idna.decode("xn--co8ha.tk")

There isn't a way of signalling to decode that it should apply uts46 rules. UTS46 (in section 4.3) says:

Like [RFC3490], this will always produce a converted Unicode string. Unlike ToASCII of [RFC3490], this always signals whether or not there was an error.

The decode method currently indicates whether there was an error, but it does not always produce a converted unicode string.

The domain name above is a valid domain name and can be accessed: http://🐔🐔.tk/

Also, trying to encode this domain name also fails, even with uts46=True and transitional=True.

The python call

"xn--co8ha.tk".decode("idna")

does produce the right answer.

I would stick with the python idna2003 implementation, except that I need to improved handling of the german ß character.

Length checks don't work and lead to funky errors

Hi,

observe:

>>> idna.encode("x" * 65)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/hynek/.virtualenvs/cust_ws/lib/python3.5/site-packages/idna/core.py", line 355, in encode
    result.append(alabel(label))
  File "/Users/hynek/.virtualenvs/cust_ws/lib/python3.5/site-packages/idna/core.py", line 276, in alabel
    check_label(label)
  File "/Users/hynek/.virtualenvs/cust_ws/lib/python3.5/site-packages/idna/core.py", line 253, in check_label
    raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+0027 at position 2 of "b'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'" not allowed
Codepoint U+0027 at position 2 of "b'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'" not allowed

The problem are the following lines.

You check the length of the label and you do raise an IDNAError('Label too long'), however there’s an except UnicodeError: and IDNAError happens to be a subclass of UnicodeError. I that’s where I closed my pdb session. :)

handling invalid domain name

Hello @kjd , I have url looking like this https://tjhughes_co_uk.secure-cdn.visualsoft.co.uk/images/4-pack-solar-path-finder-p6049-14413_image.jpg I use library that uses idna, and I encountered following failure:

In [1]: url = 'https://tjhughes_co_uk.secure-cdn.visualsoft.co.uk/images/4-pack-solar-path-finder-p6049-14413_image.jpg'
   ...: 

In [2]: import idna

In [3]: idna.encode(urlparse(url).netloc)
---------------------------------------------------------------------------
InvalidCodepoint                          Traceback (most recent call last)
<ipython-input-3-f510d576e964> in <module>()
----> 1 idna.encode(urlparse(url).netloc)

/home/pawel/.virtualenvs/scrapy/local/lib/python2.7/site-packages/idna/core.pyc in encode(s, strict, uts46, std3_rules, transitional)
    353         trailing_dot = True
    354     for label in labels:
--> 355         result.append(alabel(label))
    356     if trailing_dot:
    357         result.append(b'')

/home/pawel/.virtualenvs/scrapy/local/lib/python2.7/site-packages/idna/core.pyc in alabel(label)
    274 
    275     label = unicode(label)
--> 276     check_label(label)
    277     label = _punycode(label)
    278     label = _alabel_prefix + label

/home/pawel/.virtualenvs/scrapy/local/lib/python2.7/site-packages/idna/core.pyc in check_label(label)
    251                 raise InvalidCodepointContext('Codepoint {0} not allowed at position {1} in {2}'.format(_unot(cp_value), pos+1, repr(label)))
    252         else:
--> 253             raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
    254 
    255     check_bidi(label)

InvalidCodepoint: Codepoint U+005F at position 9 of u'tjhughes_co_uk' not allowed

In version 2.2 it complains about character c after underscore so I'm guessing underscores in domain name are invalid.

In vesion 2.5 of idna I get

In [7]: idna.encode(urlparse(a).netloc)
---------------------------------------------------------------------------
IDNAError                                 Traceback (most recent call last)
<ipython-input-7-6f26ef5f0d17> in <module>()
----> 1 idna.encode(urlparse(a).netloc)

/home/pawel/.virtualenvs/scrapy/local/lib/python2.7/site-packages/idna/core.pyc in encode(s, strict, uts46, std3_rules, transitional)
    353         trailing_dot = True
    354     for label in labels:
--> 355         result.append(alabel(label))
    356     if trailing_dot:
    357         result.append(b'')

/home/pawel/.virtualenvs/scrapy/local/lib/python2.7/site-packages/idna/core.pyc in alabel(label)
    263             ulabel(label)
    264         except IDNAError:
--> 265             raise IDNAError('The label {0} is not a valid A-label'.format(label))
    266         if not valid_label_length(label):
    267             raise IDNAError('Label too long')

IDNAError: The label tjhughes_co_uk is not a valid A-label

I tried

idna.encode(url, uts46=True, transitional=True)

but it still breaks. I'm using Python 2. I stumbled on this via Scrapy that uses Twisted. Twisted does following thing with idna https://github.com/twisted/twisted/blob/trunk/src/twisted/internet/_idna.py#L28 is that ok?

If this url is invalid according to idna specs (I'm not sure it is, is it invalid?) is there some way for me to encode it to idna? Or do we need to skip idna encoding on failures like this?

Passthrough for things that appear as hostnames (like IP addresses)

Thanks for this code, it's nice to see support for modern urls in Python! The built-in codec raises for any hostname containing ß, complaining

UnicodeError: ('IDNA does not round-trip', b'xn--einla-pqa', b'einlass')

idna 2008 fixes this stupidity.

However, when you're dealing with general-purpose urls you end up with hostnames like '::1', which is localhost for ipv6. This means that your module isn't so easy to drop in as a replacement. I either need to wrap it or stick try / except idna.core.InvalidCodepoint in my code, which looks a bit odd. I think it might be prettier if idna encode/decode grew options to pass-thru in this case?

>>> u='::1'
>>> u.encode('idna')
b'::1'
>>> import idna
>>> idna.encode(u)
Traceback (most recent call last):
  File "/home/lindahl/.pyenv/versions/3.6.4/lib/python3.6/site-packages/idna/core.py", line 263, in alabel
    ulabel(label)
  File "/home/lindahl/.pyenv/versions/3.6.4/lib/python3.6/site-packages/idna/core.py", line 299, in ulabel
    check_label(label)
  File "/home/lindahl/.pyenv/versions/3.6.4/lib/python3.6/site-packages/idna/core.py", line 253, in check_label
    raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+003A at position 1 of '::1' not allowed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lindahl/.pyenv/versions/3.6.4/lib/python3.6/site-packages/idna/core.py", line 355, in encode
    result.append(alabel(label))
  File "/home/lindahl/.pyenv/versions/3.6.4/lib/python3.6/site-packages/idna/core.py", line 265, in alabel
    raise IDNAError('The label {0} is not a valid A-label'.format(label))
idna.core.IDNAError: The label b'::1' is not a valid A-label

New package py_0 breaks conda update

| noarch/idna-2.6-py_0.tar.bz2

From https://anaconda.org/conda-forge/idna/files

Breaks conda update conda

Package plan for installation in environment /opt/conda:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    conda-env-2.6.0            |                0         1017 B  conda-forge
    libffi-3.2.1               |                3          47 KB  conda-forge
    asn1crypto-0.22.0          |           py35_0         149 KB  conda-forge
    certifi-2017.7.27.1        |           py35_0         204 KB  conda-forge
    chardet-3.0.4              |           py35_0         189 KB  conda-forge
    idna-2.6                   |             py_0          47 KB  conda-forge
    pycparser-2.18             |           py35_0         169 KB  conda-forge
    pysocks-1.6.7              |           py35_0          21 KB  conda-forge
    cffi-1.10.0                |           py35_0         388 KB  conda-forge
    cryptography-2.0.3         |           py35_0         853 KB  conda-forge
    pyopenssl-17.2.0           |           py35_0          77 KB  conda-forge
    urllib3-1.22               |           py35_0         155 KB  conda-forge
    requests-2.18.4            |           py35_1          91 KB  conda-forge
    conda-4.3.24               |           py35_0         514 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.8 MB

The following NEW packages will be INSTALLED:

    asn1crypto:   0.22.0-py35_0      conda-forge
    certifi:      2017.7.27.1-py35_0 conda-forge
    cffi:         1.10.0-py35_0      conda-forge
    chardet:      3.0.4-py35_0       conda-forge
    cryptography: 2.0.3-py35_0       conda-forge
    idna:         2.6-py_0           conda-forge
    libffi:       3.2.1-3            conda-forge
    pycparser:    2.18-py35_0        conda-forge
    pyopenssl:    17.2.0-py35_0      conda-forge
    pysocks:      1.6.7-py35_0       conda-forge
    urllib3:      1.22-py35_0        conda-forge

The following packages will be UPDATED:

    conda:        4.1.11-py35_0                  --> 4.3.24-py35_0 conda-forge
    conda-env:    2.5.2-py35_0                   --> 2.6.0-0       conda-forge
    requests:     2.10.0-py35_0                  --> 2.18.4-py35_1 conda-forge

�[91mTraceback (most recent call last):
  File "/opt/conda/lib/python3.5/site-packages/conda/exceptions.py", line 634, in conda_exception_handler
    return_value = func(*args, **kwargs)
  File "/opt/conda/lib/python3.5/site-packages/conda/cli/main.py", line 114, in _main
    imported = importlib.import_module(module)
  File "/opt/conda/lib/python3.5/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 986, in _gcd_import
  File "<frozen importlib._bootstrap>", line 969, in _find_and_load
  File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 665, in exec_module
  File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
  File "/opt/conda/lib/python3.5/site-packages/conda/cli/main_update.py", line 10, in <module>
    from .install import install
  File "/opt/conda/lib/python3.5/site-packages/conda/cli/install.py", line 20, in <module>
    from ..core.index import get_index, get_channel_priority_map
  File "/opt/conda/lib/python3.5/site-packages/conda/core/index.py", line 8, in <module>
    from .package_cache import PackageCache
  File "/opt/conda/lib/python3.5/site-packages/conda/core/package_cache.py", line 9, in <module>
    from .path_actions import CacheUrlAction, ExtractPackageAction
  File "/opt/conda/lib/python3.5/site-packages/conda/core/path_actions.py", line 33, in <module>
    from ..gateways.download import download
  File "/opt/conda/lib/python3.5/site-packages/conda/gateways/download.py", line 10, in <module>
    from requests.exceptions import ConnectionError, HTTPError, InvalidSchema, SSLError
  File "/opt/conda/lib/python3.5/site-packages/requests/__init__.py", line 98, in <module>
    from . import packages
  File "/opt/conda/lib/python3.5/site-packages/requests/packages.py", line 7, in <module>
    locals()[package] = __import__(package)
ImportError: No module named 'idna'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/bin/conda", line 6, in <module>
    sys.exit(conda.cli.main())
  File "/opt/conda/lib/python3.5/site-packages/conda/cli/main.py", line 182, in main
    return conda_exception_handler(_main, *args)
  File "/opt/conda/lib/python3.5/site-packages/conda/exceptions.py", line 638, in conda_exception_handler
    return handle_exception(e)
  File "/opt/conda/lib/python3.5/site-packages/conda/exceptions.py", line 628, in handle_exception
    print_unexpected_error_message(e)
  File "/opt/conda/lib/python3.5/site-packages/conda/exceptions.py", line 590, in print_unexpected_error_message
    stderrlogger.info(get_main_info_str(get_info_dict()))
  File "/opt/conda/lib/python3.5/site-packages/conda/cli/main_info.py", line 162, in get_info_dict
    from ..connection import user_agent
  File "/opt/conda/lib/python3.5/site-packages/conda/connection.py", line 12, in <module>
    from requests import Session, __version__ as REQUESTS_VERSION
  File "/opt/conda/lib/python3.5/site-packages/requests/__init__.py", line 98, in <module>
    from . import packages
  File "/opt/conda/lib/python3.5/site-packages/requests/packages.py", line 7, in <module>
    locals()[package] = __import__(package)
ImportError: No module named 'idna'

idna-2.2: idna.encode('☃') does not return 'xn--n3h'

>>> import idna
>>> idna.encode('ドメイン.テスト')
b'xn--eckwd4c7c.xn--zckzah'
>>> idna.encode('☃')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.4/site-packages/idna/core.py", line 355, in encode
    result.append(alabel(label))
  File "/usr/lib64/python3.4/site-packages/idna/core.py", line 276, in alabel
    check_label(label)
  File "/usr/lib64/python3.4/site-packages/idna/core.py", line 253, in check_label
    raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+2603 at position 1 of '☃' not allowed

`except` without exception class specified makes IDNA-encoding non-threadsafe.

Hello,

We have a python program with resident process running in separate thread. We make requests in said thread and when master thread spots a timeout, a custom exception is dispatched to child thread via OS signal. We use Requests 2.12.4.

However, that exception is caught not by our handler (in child thread main loop), but by except: directive in alabel function deep in idna/core.py: https://github.com/kennethreitz/requests/blob/master/requests/packages/idna/core.py#L264 because that handler is an unconditional "catch-all" except: directive. That leads to very cryptic behaviour like child thread not stopping and trying to access urls with unnecessary IDNA-encoding (all our URLs are in plain ascii).

Expected behaviour for that handler is not to catch everything, but to expect a specific exception class(es) and work just with them.

Requests maintainer asked me to open this issue here, since apparently requests package uses idna package vendored without edits from this repo. Original issue: kennethreitz/requests#3843

Failing domain 'xn--hjdarna07-b07a.cf'

Hi,

I found an instance of domain which cannot be decoded by IDNA library.

In [3]:  idna.decode('xn--hjdarna07-b07a.cf')

InvalidCodepoint                          Traceback (most recent call last)
 in ()
----> 1 idna.decode('xn--hjdarna07-b07a.cf')

/usr/local/lib/python2.7/dist-packages/idna/core.pyc in decode(s, strict, uts46, std3_rules)
    382         trailing_dot = True
    383     for label in labels:
--> 384         result.append(ulabel(label))
    385     if trailing_dot:
    386         result.append(u'')

/usr/local/lib/python2.7/dist-packages/idna/core.pyc in ulabel(label)
    301 
    302     label = label.decode('punycode')
--> 303     check_label(label)
    304     return label
    305 

/usr/local/lib/python2.7/dist-packages/idna/core.pyc in check_label(label)
    251                 raise InvalidCodepointContext('Codepoint {0} not allowed at position {1} in {2}'.format(_unot(cp_value), pos+1, repr(label)))
    252         else:
--> 253             raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
    254 
    255     check_bidi(label)

InvalidCodepoint: Codepoint U+10A5 at position 7 of u'hjdarn\u10a5a07' not allowed

The excepted output is "hjdarnႥa07.cf". The domain exists (http://xn--hjdarna07-b07a.cf/) and is correct IDNA code (http://unicode.org/cldr/utility/idna.jsp?a=hjdarn%E1%82%A5a07.cf).

Change in exception hierarchy

The fix for #36 (ebefacd) changes the exception hierarchy and breaks several downstream dependencies (twisted, pyca/cryptography, possibly more). Looking at the diff, would it be possible to revert and simply catch UnicodeEncodeError rather than UnicodeError? UnicodeEncodeError will be raised by encode, which is what you're trying to catch, while you don't want to catch any of the IDNAErrors. Alternately, this ugly code (which could be improved) also passes the tests:

def alabel(label):

    if not label:
        raise IDNAError('No Input')

    if not valid_label_length(label):
        raise IDNAError('Label too long')

    try:
        label = label.encode('ascii')
        encoded = True
    except UnicodeError:
        encoded = False

    if encoded is True:
        try:
            ulabel(label)
            return label
        except IDNAError:
            raise IDNAError('The label {0} is not a valid A-label'.format(label))

    label = unicode(label)
    check_label(label)
    label = _punycode(label)
    label = _alabel_prefix + label

    if not valid_label_length(label):
        raise IDNAError('Label too long')

    return label

Document lowercase before encode?

Hello,

The change from IDNA 2003 to 2008 (and from Python's built-in codec to the idna module) is a bit complicated for application developers. Some documentation about application-level requirements before calling encode might be handy, i.e. RFC 5895 section 2. I'm just working through it now so I'm not sure if I understand it all correctly, but I've got:

1 Uppercase characters are mapped to their lowercase equivalents by
using the algorithm for mapping case in Unicode characters. This
step was chosen because the output will behave more like ASCII
host names behave.

domain = domain.lower()

2 Fullwidth and halfwidth characters (those defined with
Decomposition Types < wide > and < narrow >) are mapped to their
decomposition mappings as shown in the Unicode character
database. This step was chosen because many input mechanisms,
particularly in Asia, do not allow you to easily enter characters
in the form used by IDNA2008. Even if they do allow the correct
character form, the user might not know which form they are
entering.

def wide_narrow(c):
    decomp = unicodedata.decomposition(c)
    m = re.match('<(?:narrow|wide)> ([A-Z0-9]+)$', decomp)
    if m:
        return chr_func(int(m.group(1), 16)) # chr in Py3, unichr in Py2
    return c

domain = ''.join(wide_narrow(c) for c in domain)

3 All characters are mapped using Unicode Normalization Form C
(NFC). This step was chosen because it maps combinations of
combining characters into canonical composed form. As with the
fullwidth/halfwidth mapping, users are not generally aware of the
particular form of characters that they are entering, and
IDNA2008 requires that only the canonical composed forms from NFC
be used.

domain = unicodedata.normalize("NFC", domain)

4 [IDNA2008protocol] is specified such that the protocol acts on
the individual labels of the domain name. If an implementation
of this mapping is also performing the step of separation of the
parts of a domain name into labels by using the FULL STOP
character (U+002E), the IDEOGRAPHIC FULL STOP character (U+3002)
can be mapped to the FULL STOP before label separation occurs.
There are other characters that are used as "full stops" that one
could consider mapping as label separators, but their use as such
has not been investigated thoroughly. This step was chosen
because some input mechanisms do not allow the user to easily
enter proper label separators. Only the IDEOGRAPHIC FULL STOP
character (U+3002) is added in this mapping because the authors
have not fully investigated the applicability of other characters
and the environments where they should and should not be
considered domain name label separators.

domain = domain.replace('\u3002', '.')

^ Seems to be handled by the idna module.

Does this look right?

Step 2 is a bit awkward so maybe this all can be incorporated into the module?

Tests fail on Python 2.6

Is idna supposed to work on Python 2.6? The documentation does not explicitly state that 2.7 is required and 2.6 is not. There has been at least one commit to improve 2.6 compat as well.

I've added to obvious fxes to the tests for 2.6 that, but I'm not exactly sure what the correct fix for the two remaining tests is.

Patch:

diff --git a/tests/test_idna_codec.py b/tests/test_idna_codec.py
index fe3737a..2d0cd14 100755
--- a/tests/test_idna_codec.py
+++ b/tests/test_idna_codec.py
@@ -24,7 +24,7 @@ class IDNACodecTests(unittest.TestCase):
         )

         for decoded, encoded in incremental_tests:
-            if sys.version_info.major == 2:
+            if sys.version_info[0] == 2:
                 self.assertEqual("".join(codecs.iterdecode(encoded, "idna")),
                                 decoded)
             else:
diff --git a/tests/test_idna_uts46.py b/tests/test_idna_uts46.py
index d8a2836..c93c0df 100755
--- a/tests/test_idna_uts46.py
+++ b/tests/test_idna_uts46.py
@@ -51,6 +51,8 @@ class TestIdnaTest(unittest.TestCase):
         return "%s.%d" % (super(TestIdnaTest, self).id(), self.lineno)

     def shortDescription(self):
+        if not self.fields:
+            return ""
         return "IdnaTest.txt line %d: %r" % (self.lineno,
             u"; ".join(self.fields))

Failures:

======================================================================
ERROR: test_decode (tests.test_idna.IDNATests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/rbu/devel/code/idna/tests/test_idna.py", line 258, in test_decode
    self.assertEqual(idna.decode('xn---------90gglbagaar.aa'),
  File "/home/rbu/devel/code/idna/idna/core.py", line 383, in decode
    result.append(ulabel(label))
  File "/home/rbu/devel/code/idna/idna/core.py", line 302, in ulabel
    check_label(label)
  File "/home/rbu/devel/code/idna/idna/core.py", line 254, in check_label
    check_bidi(label)
  File "/home/rbu/devel/code/idna/idna/core.py", line 70, in check_bidi
    raise IDNABidiError('Unknown directionality in label {0} at position {1}'.format(repr(label), idx))
IDNABidiError: Unknown directionality in label u'\u0521\u0525\u0523-\u0523\u0523-----\u0521\u0523\u0523\u0523' at position 2

======================================================================
ERROR: test_encode (tests.test_idna.IDNATests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/rbu/devel/code/idna/tests/test_idna.py", line 244, in test_encode
    self.assertEqual(idna.encode(u'\u0521\u0525\u0523-\u0523\u0523-----\u0521\u0523\u0523\u0523.aa'),
  File "/home/rbu/devel/code/idna/idna/core.py", line 354, in encode
    result.append(alabel(label))
  File "/home/rbu/devel/code/idna/idna/core.py", line 275, in alabel
    check_label(label)
  File "/home/rbu/devel/code/idna/idna/core.py", line 254, in check_label
    check_bidi(label)
  File "/home/rbu/devel/code/idna/idna/core.py", line 70, in check_bidi
    raise IDNABidiError('Unknown directionality in label {0} at position {1}'.format(repr(label), idx))
IDNABidiError: Unknown directionality in label u'\u0521\u0525\u0523-\u0523\u0523-----\u0521\u0523\u0523\u0523' at position 2

----------------------------------------------------------------------
Ran 17 tests in 0.007s

FAILED (errors=2)

Unicode version agility

Enable the ability to create IDNA and UTS46 data for any given version of Unicode, rather than the current version or the version that has been pegged by the IAB (see issue #8). The requires refactoring the build-idnadata.py support tool to accept a version argument, obtaining the relevant underlying data needed to implement RFC 5892, and generating IANA-style tables that are input to the existing methods.

This is useful for debugging, and also provides options for implementers who want to track to newer Unicode versions despite potential risks.

Label has disallowed hyphens

Consider the following

Python 2.7.12 (default, Jul  1 2016, 15:12:24) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import idna
>>> idna.encode('r6---sn-i5onxoxu-cxgl.c.doc-0-0-sj.sj.googleusercontent.com')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/idna/core.py", line 355, in encode
    result.append(alabel(label))
  File "/usr/local/lib/python2.7/dist-packages/idna/core.py", line 276, in alabel
    check_label(label)
  File "/usr/local/lib/python2.7/dist-packages/idna/core.py", line 239, in check_label
    check_hyphen_ok(label)
  File "/usr/local/lib/python2.7/dist-packages/idna/core.py", line 134, in check_hyphen_ok
    raise IDNAError('Label has disallowed hyphens in 3rd and 4th position')
idna.core.IDNAError: Label has disallowed hyphens in 3rd and 4th position

Same issue for decoding. The hostname is valid, so it should just pass without error.

I came here because requests unconditionally does idna encoding of the hostname.

Unable to encode/decode U+1F52B

It seems that the U+1F52B PISTOL character added to Unicode in version 6.0 (2010) is not supported by the idna package:

>>> idna.encode(u'\U0001f52b.test')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/chirila/env/local/lib/python2.7/site-packages/idna/core.py", line 354, in encode
    result.append(alabel(label))
  File "/home/chirila/env/local/lib/python2.7/site-packages/idna/core.py", line 275, in alabel
    check_label(label)
  File "/home/chirila/env/local/lib/python2.7/site-packages/idna/core.py", line 252, in check_label
    raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+1F52B at position 1 of u'\U0001f52b' not allowed
>>> idna.decode('xn--bw8h.test')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/chirila/env/local/lib/python2.7/site-packages/idna/core.py", line 383, in decode
    result.append(ulabel(label))
  File "/home/chirila/env/local/lib/python2.7/site-packages/idna/core.py", line 302, in ulabel
    check_label(label)
  File "/home/chirila/env/local/lib/python2.7/site-packages/idna/core.py", line 252, in check_label
    raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+1F52B at position 1 of u'\U0001f52b' not allowed
>>> 

While the python standard library can indeed encode/decode the codepoint:

>>> u'\U0001f52b.test'.encode("idna")
'xn--bw8h.test'
>>> 'xn--bw8h.test'.decode("idna")
u'\U0001f52b.test'

idna is dissatisfied with underscores

import idna
 idna.encode('ex_ample.example.org')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nail/home/benp/pg/utils/bravado-promised/virtualenv_run/local/lib/python2.7/site-packages/idna/core.py", line 354, in encode
    result.append(alabel(label))
  File "/nail/home/benp/pg/utils/bravado-promised/virtualenv_run/local/lib/python2.7/site-packages/idna/core.py", line 275, in alabel
    check_label(label)
  File "/nail/home/benp/pg/utils/bravado-promised/virtualenv_run/local/lib/python2.7/site-packages/idna/core.py", line 252, in check_label
    raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+005F at position 3 of u'ex_ample' not allowed

Seems like underscores in hostname should be perfectly allowed? Can't find any RFC that suggests otherwise (and browsers seem happy with URLs in hostname).

Perhaps this is more of a twisted bug where it shouldn't pass this sort of url to idna ever, not sure. This is just an issue I hit as an end user

implement UTS46

I have implemented UTS46, please see https://github.com/jribbens/idna

It passes 100% of the tests in IdnaTest.txt, except 34 which all contain 0x200c or 0x200d. I don't know what the problem is with those tests.

Can you pull this please?

"codepoint not allowed" (with existing name)

Hello

Given the string u'\u2709' (or the encoded form xn--4bi)

Trying to use encode / decode on that string will always raise the following error:

Codepoint U+2709 at position 1 of u'\u2709' not allowed

The domain in question exists and is resolving, the standard python idna module correctly encodes/decodes the string.

Is this an issue with idna module or am I missing something?

Bug:fail to encode domain(test_a.abc.com)

Thanks for your code,I find a bug when i use it.
When I try to eocode a underlined domain name(Such as:test_a.abc.com),I get a ERROR.Although it is lack of standardization,it can be used by some people.And it can work normaly in some occasions.So I think that it shouldn't be a error.What about you?

>>> import idna
>>> idna.encode('abc.com')
b'abc.com'
>>> idna.encode('test_1.abc.com')
Traceback (most recent call last):
  File "D:\Program Files\Python37\lib\site-packages\idna\core.py", line 263, in alabel
    ulabel(label)
  File "D:\Program Files\Python37\lib\site-packages\idna\core.py", line 299, in ulabel
    check_label(label)
  File "D:\Program Files\Python37\lib\site-packages\idna\core.py", line 253, in check_label
    raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value)
, pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+005F at position 5 of 'test_1' not allowed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Program Files\Python37\lib\site-packages\idna\core.py", line 355, in encode
    result.append(alabel(label))
  File "D:\Program Files\Python37\lib\site-packages\idna\core.py", line 265, in alabel
    raise IDNAError('The label {0} is not a valid A-label'.format(label))
idna.core.IDNAError: The label b'test_1' is not a valid A-label

Import error inside import hook.

Hello,

I recently upgraded to requests 2.12 which uses idna and I started to see this import error:

(Pdb) request('get', url, params=params, **kwargs)
*** ImportError: No module named uts46data

Looking in the changes I saw that it was recently added idna.
My projects are using import hooks and inside of it the hook use requests, which uses idna.

The code of idna is doing an import inside the method uts46_remap, which is breaking my code.
Moving the import uts46_remap outside fixes this issue.

Codepoint U+2010 not allowed

>>> idna.encode("one‐two.com".decode("utf-8"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/idna/core.py", line 354, in encode
    result.append(alabel(label))
  File "/usr/lib/python2.7/dist-packages/idna/core.py", line 275, in alabel
    check_label(label)
  File "/usr/lib/python2.7/dist-packages/idna/core.py", line 252, in check_label
    raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+2010 at position 4 of u'one\u2010two' not allowed

Failing domain names

I've been looking at IDNA domain registrations and using your library in conjunction with the built in python tools.

>>> domain = "xn--53hy7af013i.ws".encode("utf-8")                                                                                     
>>> domain.decode("idna")         
'☕🦊✈.ws'

The IDNA package is a HUGE life saver for me. I monitored approximately over 111,000 IDNA domains being registered, and a small percentage of them failed. I've attached the output that I thought you might find useful.

As you can see above, there is an uptick of registrations of emoji domains now. Although it is not part of the specification, it would be very helpful if that was incorporated into this package.

failed_output.txt

Misleading error message: Codepoint U+0027 at position 2 of "b'foo bar'" not allowed

I accidentally passed a name with a blank to idna.encode; It seems to encode it to bytes and then complains about the apostrophe of b'foo bar'.
Python2 correctly complains: idna.core.InvalidCodepoint: Codepoint U+0020 at position 4 of u'foo bar' not allowed

Test case:

import idna
idna.encode('foo bar')

Result:

$ python3 foo.py
Traceback (most recent call last):
  File "foo.py", line 2, in <module>
    idna.encode('foo bar')
  File "/home/joern/env/lib/python3.5/site-packages/idna/core.py", line 355, in encode
    result.append(alabel(label))
  File "/home/joern/env/lib/python3.5/site-packages/idna/core.py", line 276, in alabel
    check_label(label)
  File "/home/joern/env/lib/python3.5/site-packages/idna/core.py", line 253, in check_label
    raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+0027 at position 2 of "b'foo bar'" not allowed

Codepoint U+005B not allowed at position1 in '[::1]'

I believe this to be a bug with idna rather than in requests but I think that @kennethreitz should probably be aware of this too:

Using python 3.4 and requests 2.12.1 I tried to get this URL:
requests.get('http://[::]:5000/files')

Which returns this bunch of errors:

python3 fileclient.py
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/requests/packages/idna/core.py", line 327, in uts46_remap
    raise IndexError()
IndexError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/requests/models.py", line 370, in prepare_url
    host = idna.encode(host, uts46=True).decode('utf-8')
  File "/usr/local/lib/python3.4/dist-packages/requests/packages/idna/core.py", line 340, in encode
    s = uts46_remap(s, std3_rules, transitional)
  File "/usr/local/lib/python3.4/dist-packages/requests/packages/idna/core.py", line 332, in uts46_remap
    _unot(code_point), pos + 1, repr(domain)))
requests.packages.idna.core.InvalidCodepoint: Codepoint U+005B not allowed at position1 in '[::1]'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "fileclient.py", line 55, in <module>
    buy_file()
  File "fileclient.py", line 17, in buy_file
    response = requests.get(url=server_url+'files')
  File "/usr/local/lib/python3.4/dist-packages/two1/bitrequests/bitrequests.py", line 169, in get
    return self.request('get', url, max_price, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/two1/bitrequests/bitrequests.py", line 134, in request
    response = requests.request(method, url, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/requests/api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/requests/sessions.py", line 474, in request
    prep = self.prepare_request(req)
  File "/usr/local/lib/python3.4/dist-packages/requests/sessions.py", line 407, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/usr/local/lib/python3.4/dist-packages/requests/models.py", line 302, in prepare
    self.prepare_url(url, params)
  File "/usr/local/lib/python3.4/dist-packages/requests/models.py", line 372, in prepare_url
    raise InvalidURL('URL has an invalid label.')
requests.exceptions.InvalidURL: URL has an invalid label.

I downgraded to requests 2.11.1 (prior to the idna update) and now my code works again.

Initial Update

Hi 👊

This is my first visit to this fine repo, but it seems you have been working hard to keep all dependencies updated so far.

Once you have closed this issue, I'll create separate pull requests for every update as soon as I find one.

That's it for now!

Happy merging! 🤖

Bringing idna into the Python core library

I am looking at bringing native IDNA2008 support into the Python core library, and had a long conversation with Nathaniel J. Smith (@njsmith) and Christian Heimes (@tiran) about the work required. The end goal of this work is to have Python be able to natively handle IDNs as a first class citizen, and recycle as much code use as possible.

To summarize the current conversation, the first step would be implementing a new codec in Python's code library, and then extend the standard library to be able to natively handle IDNs in such a way that the following code snippit could work:

#!/usr/bin/env python3
import urllib
import urllib.request

req = urllib.request.Request('http://fuß.standcore.com')
response = urllib.request.urlopen(req)
the_page = response.read()
print(the_page.decode(encoding='utf-8')) 

From the conversation on Zulip, the first step would be implementing idna2008 as a new encoding codec, and then work on modifying the core library to be able to accept and interoperate with IDNs seamlessly.

I'm willing to do much of the legwork required to get code integrated into CPython. My first question is what (if any) blockers exist in implementation that would make it difficult to bring into CPython, and any tips or suggestions to help bring things forward. Right now, I'm just trying to get the ball rolling on figuring out a solid plan on hopefully having Python 3.8 be able to treat IDNs as first class citizens.

Getting exception for certain URLs that work with curl and other tools

I'm not really sure how to describe this in idna terms. I never used idna directly.

Here's the issue I opened for requests: psf/requests#4569

They told me to hand it over to you (they're too lazy to do it themselves).

Anyway, hope this is helpful. Sorry if not.

P.S. here's the exception, just for convenience:

idna.core.IDNAError: The label b'xn--mn8ha4uc' is not a valid A-label

2.7 regression (A-label validation)

Hello

We've encountered a regression with the 2.7 release and the additonal validation of A-labels

I'm not certain if should be fixed but I just wanted to let you know.

# Context, handling of DNS records

host = u'*.foö.fi'

return {'host_ulabel' : host,
        'host_alabel' : idna.encode(host, uts46=True)}

Which now raises idna.core.IDNAError: The label * is not a valid A-label

Cannot encode u'sande.møre-og-romsdal.no'

>>> idna.encode(u'sande.møre-og-romsdal.no', uts46=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/idna/core.py", line 339, in encode
    s = uts46_remap(s, std3_rules, transitional)
  File "/usr/local/lib/python2.7/dist-packages/idna/core.py", line 331, in uts46_remap
    _unot(code_point), pos + 1, repr(domain)))
idna.core.InvalidCodepoint: Codepoint U+00B8 not allowed at position 9 in u'sande.m\xc3\xb8re-og-romsdal.no'

Implement IAB Guidance on Unicode 7.0.0

Unicode 7.0.0 introduced support for the ARABIC LETTER BEH WITH HAMZA ABOVE (U+08A1) character which has been seen as controversial in some circles. While discussion continues on whether to amend the IDNA protocol or do some other kind of remediation for this character, IAB has recommended IANA essentially reverse publication of the 7.0.0 dataset on which the IDNA library relies. Therefore the library should revert to the next-to-last most recent data set which is from Unicode 6.3.0.

See https://www.iab.org/documents/correspondence-reports-documents/2015-2/iab-statement-on-identifiers-and-unicode-7-0-0/ for details.

IndexError instead of IDNAError

I get IndexError instead of idna.core.IDNAError - if I understood correctly this is not what should happen:

>>> import idna
>>> idna.decode('xn--eckwd4c7c.xn--zckzah')
'ドメイン.テスト'
>>> idna.decode('xn--eckwd4c7c.xn--zckzah ')
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/encodings/punycode.py", line 207, in decode
    res = punycode_decode(input, errors)
  File "/usr/local/lib/python3.6/encodings/punycode.py", line 194, in punycode_decode
    return insertion_sort(base, extended, errors)
  File "/usr/local/lib/python3.6/encodings/punycode.py", line 165, in insertion_sort
    bias, errors)
  File "/usr/local/lib/python3.6/encodings/punycode.py", line 146, in decode_generalized_number
    % extended[extpos])
IndexError: string index out of range

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/site-packages/idna/core.py", line 389, in decode
    s = ulabel(label)
  File "/usr/local/lib/python3.6/site-packages/idna/core.py", line 307, in ulabel
    label = label.decode('punycode')
IndexError: decoding with 'punycode' codec failed (IndexError: string index out of range)

symbol '_' not allowed

д_ругойтест.рф
Codepoint U+005F at position 2 of 'д_ругойтест' not allowed

Full traceback:

Traceback (most recent call last):
  File "/home/user/somedirectory/code.py", line 34, in <module>
    punnycode_to_replace = idna.encode("д_ругойтест.рф")
  File "/usr/local/lib/python3.5/dist-packages/idna/core.py", line 355, in encode
    result.append(alabel(label))
  File "/usr/local/lib/python3.5/dist-packages/idna/core.py", line 276, in alabel
    check_label(label)
  File "/usr/local/lib/python3.5/dist-packages/idna/core.py", line 253, in check_label
    raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+005F at position 2 of 'д_ругойтест' not allowed
[Finished in 0.4s with exit code 1]

The label {0} is not a valid A-label'.format(label) - Codepoint U+005F

`2017-07-04 16:19:44 [scrapy.core.scraper] ERROR: Error downloading <GET https://sloneczne_stablowice.forumoteka.pl/kategoria,4,mieszkancy-luzne-rozmowy.html>
Traceback (most recent call last):
File "c:\users\bukowa\vritualenv2\lib\site-packages\idna\core.py", line 263, in alabel
ulabel(label)
File "c:\users\bukowa\vritualenv2\lib\site-packages\idna\core.py", line 299, in ulabel
check_label(label)
File "c:\users\bukowa\vritualenv2\lib\site-packages\idna\core.py", line 253, in check_label
raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+005F at position 10 of 'sloneczne_stablowice' not allowed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "c:\users\bukowa\vritualenv2\lib\site-packages\twisted\internet\defer.py", line 1384, in inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "c:\users\bukowa\vritualenv2\lib\site-packages\twisted\python\failure.py", line 393, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "c:\users\bukowa\vritualenv2\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
File "c:\users\bukowa\vritualenv2\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred
result = f(*args, **kw)
File "c:\users\bukowa\vritualenv2\lib\site-packages\scrapy\core\downloader\handlers_init
.py", line 65, in download_request
return handler.download_request(request, spider)
File "c:\users\bukowa\vritualenv2\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 63, in download_request
return agent.download_request(request)
File "c:\users\bukowa\vritualenv2\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 300, in download_request
method, to_bytes(url, encoding='ascii'), headers, bodyproducer)
File "c:\users\bukowa\vritualenv2\lib\site-packages\twisted\web\client.py", line 1633, in request
endpoint = self._getEndpoint(parsedURI)
File "c:\users\bukowa\vritualenv2\lib\site-packages\twisted\web\client.py", line 1617, in _getEndpoint
return self._endpointFactory.endpointForURI(uri)
File "c:\users\bukowa\vritualenv2\lib\site-packages\twisted\web\client.py", line 1494, in endpointForURI
uri.port)
File "c:\users\bukowa\vritualenv2\lib\site-packages\scrapy\core\downloader\contextfactory.py", line 59, in creatorForNetloc
return ScrapyClientTLSOptions(hostname.decode("ascii"), self.getContext())
File "c:\users\bukowa\vritualenv2\lib\site-packages\twisted\internet_sslverify.py", line 1152, in init
self._hostnameBytes = _idnaBytes(hostname)
File "c:\users\bukowa\vritualenv2\lib\site-packages\twisted\internet_idna.py", line 30, in _idnaBytes
return idna.encode(text)
File "c:\users\bukowa\vritualenv2\lib\site-packages\idna\core.py", line 355, in encode
result.append(alabel(label))
File "c:\users\bukowa\vritualenv2\lib\site-packages\idna\core.py", line 265, in alabel
raise IDNAError('The label {0} is not a valid A-label'.format(label))
idna.core.IDNAError: The label b'sloneczne_stablowice' is not a valid A-label`

idna == 2.5

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.