Comments (18)
Oops. It's not IDNA2008: http://unicode.org/cldr/utility/character.jsp?a=2603 closing.
from idna.
It's valid uts46 though which is what browsers use. You might want to reconsider this.
from idna.
Nice utility http://unicode.org/cldr/utility/idna.jsp?a=%E2%98%83.net
from idna.
It's not valid UTS46 for IDNA 2008, it is only valid for IDNA 2003. Look for the "NV8" in the UTS46 table data. Now there may be an argument to add fall-through IDNA 2003 processing, but as of today this library only supports IDNA 2008.
from idna.
+1 for optional 2003 fall-through, I run into real web corner cases that are IDNA2003.
from idna.
@kjd UTS46 (the actual document) doesn't have a processing mode where this code point is somehow rejected and this is the first implementation I have seen that does such a thing.
from idna.
That's because this isn't an implementation of UTS46, it's an implementation of IDNA2008. If for some reason you want only UTS46 and not IDNA2008 then presumably you can call idna.uts46_remap
directly.
from idna.
idna.uts46_remap doesn't encode anything though...
from idna.
import idna
import encodings.idna as idna2003
def questionable_encode(s):
try:
return idna.encode(s, uts46=True)
except idna.IDNAError:
try:
return idna2003.ToASCII(idna.uts46_remap(s))
except:
raise idna.IDNAError("Input string is supported by no flavour of IDNA")
>>> questionable_encode("\u2603")
"xn--n3h"
from idna.
Thanks! Stick that function in the library ;)
So is this a bona fide implementation of uts46? (Sorry for still not being totally clear on what the spec entails)
from idna.
Doh.
>>> questionable_encode("\u2603.net")
b'xn--.net-4g3b'
from idna.
This was just a quick function I rattled off the top of my head, not tested. You probably need to do a few more lines to break down the input string into individual labels to use the idna2003 portion. If we added support to do something like this in this library (see issue #18) then it will have a proper test suite etc.
from idna.
Ok, thanks. This version works for at least these two test inputs:
import idna
import encodings.idna as idna2003
def questionable_encode(s):
try:
return idna.encode(s, uts46=True)
except idna.IDNAError:
try:
labels = idna.uts46_remap(s).split(".")
punycode_labels = [idna2003.ToASCII(label) for label in labels]
return b".".join(punycode_labels)
except:
raise idna.IDNAError("Input string is supported by no flavour of IDNA")
>>> questionable_encode("\u2603.net")
b'xn--n3h.net'
>>> questionable_encode("\u2603")
b'xn--n3h'
from idna.
If we added support to do something like this in this library (see issue #18) then it will have a proper test suite etc.
Yes please!
from idna.
FYI the function above mishandles faß.de
. Correct result is fass.de
>>> questionable_encode('faß.de')
b'xn--fa-hia.de'
In fact a number of the examples from http://unicode.org/cldr/utility/idna.jsp don't work.
from idna.
No, fass.de
is only the correct result for transitional mode, which is not what we want to align on.
from idna.
Oh. Well, chromium does fass.de
, firefox does xn--fa-hia.de
.
from idna.
Yeah I know, bugs have been filed.
from idna.
Related Issues (20)
- publish new release? HOT 3
- IDNA v3.2 not compatible with Requests 2.25.1 HOT 2
- The library is not thread-safe HOT 6
- [Uncaught exception] UnicodeDecodeError when calling decode with arbitrary data HOT 1
- [Uncaught exception] UnicodeError (punycode) when calling decode with arbitrary data
- Potential inconsistency after decode/encode HOT 2
- U+2010 not allowed HOT 1
- Uncaught Python exception: UnicodeDecodeError HOT 4
- Continuous fuzzing by way of OSS-Fuzz HOT 1
- Source tarballs use very large uids/gids HOT 3
- Problem on encode some cyrylic domain names HOT 9
- Using idna codec HOT 2
- v3.4 sdist contains empty setup.py HOT 10
- Codec tests and documentation needs to be updated HOT 1
- VerifyDnsLength support HOT 3
- Codepoint U+2603 not allowed HOT 5
- Found a bug in idna version 2.8:UnicodeEncodeError: 'ascii' codec can't encode character '\uff2a' in position 0: ordinal not in range(128) HOT 3
- Type error happened when join byte string.
- CI: Set minimum permissions on GitHub Workflow HOT 2
- Create a Security Policy
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from idna.