Giter VIP home page Giter VIP logo

Comments (14)

blakeembrey avatar blakeembrey commented on August 27, 2024 3

Here's a framework level function that should work (but could use some improvements):

function encoding (ignore) {
  return {
    encode: function (value) {
      return encodeURIComponent(value)
    },
    decode: function (value) {
      return value.replace(/(?:%[ef][0-9a-f](?:%[0-9a-f]{2}){2}|%[cd][0-9a-f]%[0-9a-f]{2}|%[0-9a-f]{2})/ig, function (m) {
        const char = decodeURIComponent(m)

        if (ignore.indexOf(char) > -1) return m

        return char
      })
    }
  }
}

Note: You can not technically just use decodeURI() because a : won't be decoded but it can be encoded.

from path-to-regexp.

dougwilson avatar dougwilson commented on August 27, 2024

I'm fine with this, as long as it expands to the correct Unicode normalization form, whatever the specs say IRIs are expanded to.

from path-to-regexp.

blakeembrey avatar blakeembrey commented on August 27, 2024

@dougwilson Awesome, I'll start an investigation on that 👍

from path-to-regexp.

dougwilson avatar dougwilson commented on August 27, 2024

Looks like it's RFC 3987 section 3.1 :)

Also, if it helps, Mojolicious already does this stuff, so it'd be interesting to know how they are doing their transformation, since it has worked well for a long time.

from path-to-regexp.

blakeembrey avatar blakeembrey commented on August 27, 2024

Very nice, thank you 😄

from path-to-regexp.

blakeembrey avatar blakeembrey commented on August 27, 2024

@dougwilson Just looking through the implementation, it looks like they decode it at the framework level before passing it to route matches? Did I understand that correctly? http://mojolicio.us/perldoc/Mojo/Path#to_route

Is that something that could be considered for Express 5.0?

from path-to-regexp.

dougwilson avatar dougwilson commented on August 27, 2024

I'll have to look more at their source code (which can be browsed at https://metacpan.org/source/SRI/Mojolicious-6.0/lib), but I'm open to whatever we want to do for Express 5.0. Currently Express 5.0 just passes the raw req.url down to this module, so theoretically that makes it the most flexible of all, but if it's not flexible enough, we can always make it more flexible :)

But really, what I wanted to know is how do they do the UTF-8 -> URI transformation. The characters in JavaScript source code are UCS-2 and so we need to have some kind of way to transform source code strings to URIs reliably. For example, if I type ú in source code, there are actually several different byte sequences that result in that character and even multiple different Unicode code points (U+00FA or U+0075 U+0301). Which ones do we support, or do we treat them the same?

Example:

decodeURIComponent('%C3%BA') // -> ú
decodeURIComponent('u%CC%81') // -> ú

Should a user have to understand that the ú they type in their editor may not match the ú that comes in the URI?

from path-to-regexp.

bajtos avatar bajtos commented on August 27, 2024

decode it at the framework level before passing it to route matches

This is a great idea, especially if there is a way how to get it working reliably for all edge cases. It should solve the problem where there are multiple ways how to encode a single character, be it ú as mentioned by @dougwilson, or characters like ~ that some clients encode and some don't (IIRC).

It may be possible to implement it in a backward-compatible way if we url-decode the path inputs passed by API users too.

from path-to-regexp.

blakeembrey avatar blakeembrey commented on August 27, 2024

@dougwilson So ES6 has a built in method for this: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize

decodeURIComponent('u%CC%81').normalize() === decodeURIComponent('%C3%BA')

Edit: Only 0.12 for String#normalize, but we can also use a third party module like https://github.com/walling/unorm for older node versions. I think putting this at the framework level actually makes more sense here, because we'll be decoding it a bunch of times and it'll make the flow more confusing otherwise. If the path accepts unicode, I think we should be able to manually compare too req.url === 'ú'.

from path-to-regexp.

nicooprat avatar nicooprat commented on August 27, 2024

Any news on this?

from path-to-regexp.

blakeembrey avatar blakeembrey commented on August 27, 2024

@nicooprat This issue seems like the opposite to what you're looking for (this was about unicode inputs to path matching). From kadirahq/flow-router#599, it seems like what you'd like is the ability to specify the way to encode - currently it's only pretty but that won't be enough to handle unicode characters outside of ASCII.

from path-to-regexp.

blakeembrey avatar blakeembrey commented on August 27, 2024

By the way, I took another stab at doing this for the path-to-regexp 2.0. I don't think it's reasonably possible since every single character could be encoded differently. This has reinforced my view that it needs to be encoded at the router level. Would love additional input though.

For a simple example, type /+ into the browser. The + is not encoded. Now do encodeURIComponent (which you should be using programmatically to safely encode user input). Now you've got an escaped plus. The route will only match one or the other. This gets even more complex when you have multiple representations of the same character input. I'd recommend decoding at the framework level (it's more performant than using path-to-regexp and having this library decode the same string hundreds of times).

from path-to-regexp.

blakeembrey avatar blakeembrey commented on August 27, 2024

I have added a basic normalizePathname function to this library in 2c3baf1. It uses a simpler implementation than above so you need to make sure of two things if you use this function:

  1. You don't double decode a string (e.g. normalizePathname already calls decodeURIComponent, you shouldn't call it again in application code)
  2. You don't mind that %2F will become /, which will change the match (e.g. /test/route and /test%2Froute were previously different)

from path-to-regexp.

blakeembrey avatar blakeembrey commented on August 27, 2024

Unfortunately that was a quick 9 hours. I've released a 5.x which removes normalizePathname - I looked around to some routing libraries online and played with URL, and it wasn't consistent. Also noted that String.prototype.normalize would not be supported on IE. Instead, I've opted to document the solution in the README.

Note for implementors: if you're using URL, I added an encode option to pathToRegexp so you can encode paths using encodeURI. This will ensure path names from URL match against the RegEx as you cannot use the solution in the README (URL will always encode the pathname).

from path-to-regexp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.