Parentheses within custom match regexp,about pillarjs/path-to-regexp

Comments (28)

plandem commented on July 24, 2024 1

damn...I needed something like you requested: /(list|edit/\d+) and can't solve it :(

from path-to-regexp.

crdrost commented on July 24, 2024 1

I am surprised that the primary hangup here is the semantics. To me the semantics are obvious:

Internal groups never propagate up. That is, any containing (___) is rewritten to (?:___). Anything else is likely to be unbelievably confusing to everyone involved and there is provably no loss in generality.
- Suppose you are a consumer of this library and you do want to capture something into its own group, your logic will be much easier to understand if you would be to use path-to-regexp with a path that looks like ':param(' + myRegexString+ ')' and then afterwords run new RegExp(myRegexString).exec(results.param) to extract the group. Any 'convenience' that we try to provide beyond this is going to confuse the readers of their code -- "wait what does THAT do?" -- when the alternative is to force them to write two more lines of code whose meanings are obvious to any reader. Like the only downside I can see is that TypeScript etc will not be able to detect that the match results are not null, but it's got various "I know it's not null I swear!" constructs so I'm not concerned.
The outer parens are part of the URL grammar, not the regexp. Writing :param(?=abc) throws a SyntaxError when the child RegExp is compiled because ? cannot be the first character of any valid RegExp, likewise with ?! and ?:.

I am under a bit of a deadline crunch but I might be able to submit a PR which parses JS regexp along these lines, if they're acceptable to you?

from path-to-regexp.

blakeembrey commented on July 24, 2024

You can't define groups within groups at this time as there's no defined way this should act. If you can come up with a proposal for how it behaves, we can look at implementing support for it.

from path-to-regexp.

dizlexik commented on July 24, 2024

I don't think there's any reason for groups within groups to have any special behavior outside of their usefulness for doing things like what I described in my original comment or, for another contrived example, like /:foo(b(ar|az)) to match /bar or /baz, etc.

I don't think inner groups should be parsed at all and should just be left alone as part of the custom regexp. Does that make sense?

from path-to-regexp.

blakeembrey commented on July 24, 2024

Not really. What do we do with the matches inside the parentheses? The example above would create two matches. What about other types of matching groups that don't have output in the RegExp? How are unbalanced brackets handled? You are welcome to submit a PR if you know how it should work.

Edit: I'd love to see it work, but the reason it does not exist is because it's not clear how/why.

from path-to-regexp.

dizlexik commented on July 24, 2024

I just spent a bit of time attempting to put together a pull request for this but it turns out it's probably going to be really difficult.

I found the regex that needs to be modified to accomplish what I've described and the portion of it that will require modification \$((?:\\\\.|[^\\\\()])+)\$

The problem is JavaScript's lack of recursive regex support so detecting inner parentheses will either mean drastically increasing the length and complexity of PATH_REGEXP to support an arbitrary number of levels of nested parentheses or introducing a new project dependency like XRegExp and utilizing its matchRecursive method to support unlimited levels. I think I could implement the former but it doesn't feel like a great solution. Using XRegExp seems like it could work but the decision to pull in a new dependency for this somewhat-edge case isn't mine to make.

It sure would be nice if this were supported but I can see why it might make more sense, at least for now, to leave it as it is and maybe include a note in the README about this limitation.

By the way I wrote a test for this case while working on my pull request. I'll include it here in case someone wants to move forward with this at some point:

[
  '/:test(\\d+(\\.\\d+)?)',
  null,
  [
    {
      name: 'test',
      prefix: '/',
      delimiter: '/',
      optional: false,
      repeat: false,
      partial: false,
      asterisk: false,
      pattern: '\\d+(\\.\\d+)'
    }
  ],
  [
    ['/123', ['/123', '123']],
    ['/abc', null],
    ['/123/abc', null],
    ['/123.123', ['/123.123', '123.123']],
    ['/123.abc', null]
  ],
  [
    [{ test: 'abc' }, null],
    [{ test: '123' }, '/123'],
    [{ test: '123.123' }, '/123.123'],
    [{ test: '123.abc' }, null]
  ]
]

from path-to-regexp.

blakeembrey commented on July 24, 2024

Yeah, it's tricky. The problem really comes from internal group matches. There's no clearly defined way of how the internal match would interact with the current support for tokens. It's also not clear how things like the reverse path-to-regexp would work either as a result. The clearest thing here is to make them non-matching groups, but that's a little opinionated too because maybe someone actually wants internal matches to be part of the result. The easiest way around it today would be to put the two matches next to each other as separate groups, but I'd love to find the correct way to support it how people want natively.

Edit: The above example is a good case. Right now, keys is based on what you have above - but the above pattern has two many matches and causes a mismatch on results to keys.

from path-to-regexp.

dizlexik commented on July 24, 2024

Yeah I was assuming they would be treated as non-matching somehow. I realize I'm probably coming off as really naive with this request but thanks for humoring me! You've clearly put a lot of thought into this yourself. I'll keep an eye on this to see where it goes, but thanks so much for all your hard work! Aside from this incredibly minor nitpick the library is amazing :)

from path-to-regexp.

blakeembrey commented on July 24, 2024

I wouldn't worry about the implementation, we can always make that work. The harder thing right now is deciding how it would work. Should internal brackets create a matching group? If they do, do we add a token to the list? If we add a token to the list, how do people use the reverse "create a path" from the tokens? I think that's a dead end, so a non-matching group is probably a better idea. In that case, we can replace ( with (?: so it becomes non-matching. After that, the next question is probably - how do we treat the outer "primary" parens? If the internal ones work with (?! and (?=, why wouldn't the outer ones?

I just know the issues around it, not the solutions, sorry 😄 Maybe the best way is just to allow (?:, (?= and (?= moving forward and add something to the tokens to indicate they are non-matching (not a key). Or we just keep it as is and leave those tokens for internal use only.

from path-to-regexp.

dizlexik commented on July 24, 2024

FWIW I'd be perfectly content with the non-matching group solution. The implementation is a little beyond me though since I think there's still the issue of balancing parentheses.

from path-to-regexp.

Woodya commented on July 24, 2024

@blakeembrey are you planning to add support for non-capturing groups?
those are quite integral to our use case. We're evaluating react-router, which is using this package and this is a bit of a blocker for a "smooth" transition.

from path-to-regexp.

blakeembrey commented on July 24, 2024

I haven't any plans, but can take another look at it. I'd also accept a PR that just ignores non-matching groups.

from path-to-regexp.

sorahn commented on July 24, 2024

@plandem when using React Router recently (which just hands it's path to this library) I discovered that this works for /foo and /bar

<Route path="/:_(foo|bar)" component={FooBar} />

And at this point I don't know it's a bug or a feature.

Edit:
Oh I see what happened.
Looking here: https://github.com/pillarjs/path-to-regexp#custom-match-parameters I can see that what I did was create a custom param named _ that only matched that regex. Interesting.

from path-to-regexp.

sergeysova commented on July 24, 2024

Any progress here?

from path-to-regexp.

sergeysova commented on July 24, 2024

I want something like:

/:username{/|/i/:image}

match:

/somename
/somename/i/imagename

don't match:

/somename/asdasd

from path-to-regexp.

blakeembrey commented on July 24, 2024

@sergeysova Why don't you use an array or a regular expression? You can also do /:username/(i/[^/]+)? but I'd advocate for the array as it's easier to understand.

from path-to-regexp.

sergeysova commented on July 24, 2024

@blakeembrey I want have named parameter :image

And more subroutes will add under /:username/

from path-to-regexp.

blakeembrey commented on July 24, 2024

@sergeysova Try an array? ['/:username', '/:username/i/:image']?

from path-to-regexp.

crdrost commented on July 24, 2024

Also if I don't get to this ever, I wanted to mention for anyone reading this thread that unless you are repeating something in parentheses indefinitely, you probably do not need () in your regexp.

Just to break down the cases in this, the first comment starts with the regexp \d+(\.\d+)?. First observe that (xyz)? is the same as (|xyz), so this is \d+(|\.\d+). Now distribute that prefix over the rest, to get \d+|\d+\.\d+.

A more complicated case: if you wanted to match a JSON float the regex for that is apparently: -?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?. Yuck. You can see that it has three groups, two with ? and one with |. But no indefinite repetition of a parenthesized component! So you can go through the 2^3 = 8 cases:

-?0
-?0[eE][+-]?\d+
-?0\.\d+
-?0\.\d+[eE][+-]?\d+
-?[1-9]\d*
-?[1-9]\d*[eE][+-]?\d+
-?[1-9]\d*\.\d+
-?[1-9]\d*\.\d+[eE][+-]?\d+

Putting these together with | yields a regex without ().

A language with support for list comprehensions can do a lot of this for you, e.g. in Python you can convert (abc|def|ghi) to the tuple ("abc", "def", "ghi"), as mentioned above you can convert (abc)? to (|abc) to ("", "abc") as well, and finally you can combine them with the list comprehension:

> print('|'.join(a + b + c + d 
    for a in ("-?",)
    for b in ("0", "[1-9]\d*")
    for c in ("", "\.\d+")
    for d in ("", "[eE][+-]?\d+")))
-?0|-?0[eE][+-]?\d+|-?0\.\d+|-?0\.\d+[eE][+-]?\d+|-?[1-9]\d*|-?[1-9]\d*[eE][+-]?\d+|-?[1-9]\d*\.\d+|-?[1-9]\d*\.\d+[eE][+-]?\d+

from path-to-regexp.

blakeembrey commented on July 24, 2024

You’re welcome to submit a PR, but it’s definitely not just semantics stopping an implementation. Balancing nested matching brackets is a big rewrite of the existing code, if you’re able to do it simply without many changes that’s fantastic and definitely please submit a PR. Currently, I believe the logic needs to be rewritten to support a lexer instead of using a regexp internally.

from path-to-regexp.

crdrost commented on July 24, 2024

Oh yeah definitely. The language of regular expressions is famously not regular.

from path-to-regexp.

MANTENN commented on July 24, 2024

What I found out is that if you are working with groups utilize an another set of parenthesis after. This of course will not work with nested grouping.
For example, this pattern /([F][S]|[R][A])(\d{5}) will match /FS12345 or RA12345, and it will not match FA12345.

Or use (\d+\.*\d+), and then split inside the component on the dot

from path-to-regexp.

garrettmaring commented on July 24, 2024

I believe that there is a good use case for having more control over the group: base64 validation:

^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$

This cannot be done with the current restrictions on custom match groups.

from path-to-regexp.

crdrost commented on July 24, 2024

@garrettmaring yeah that {4} followed by that * does indeed seem difficult or impossible. I would be surprised if / worked in the URL since it's also a path separator, but there is always RFC4648 base64url if you don't have that.

However, I would remind you and anyone else reading that these regexes are not in my view primarily security-related and so "validation" is a little bit of a dicey subject. Like, to my mind the point of the regex is to say "this route looks like _____". The basic problem is that you try to write an API which has both GET /widgets/:id and also GET /widgets/report and then run into the problem that the API is not aware that report is anything other than an :id substitution, so you write /widgets/:id([0-9]+) to make it clear that that's a number, say.

Base64 on the other hand is so permissive that it makes this very tricky because it doesn't rule out so many other routes -- so if :id is Base64 then you see a problem with the endpoint GET /widgets/smallest because smallest is the Base64 encoding of the byte string b2 66 a5 95 eb 2d. So once you start using base 64 IDs I would really start to recommend rewriting to GET /widgets/by-id/:id so that you have a separate namespace here to avoid the route conflicts, and then you don't need the regexes anymore.

So I think you're 100% right that there is a thing you cannot do here, but even if you could do it, you would not buy much over just the regex [-A-Za-z0-9_]*=?=? to disambiguate the routes (maybe your other routes contain a ~ or something) followed by checking req.params.id.length % 4 == 0 to get the rest of the validation in the actual client code.

from path-to-regexp.

sorahn commented on July 24, 2024

@crdrost At that point, I think you'd just be looking for some application logic to do that for you right?

GET /widgets/:thing then in the handler for that, if (thing === 'reports') { // do it; return; } else { // handle thing as id (validate guid, etc) },

from path-to-regexp.

crdrost commented on July 24, 2024

Yeah, I'd say so. What's really at stake here is that URL paths come from a filesystem world but are a generic text format being used to transmit data in a very free-form format.

If we were to instead take a more structured view towards API development, where we have valid API "midpoints" as well as the endpoints when the full URL is constructed, we might see this better. So each midpoint instead comes from some sort of philosophy, "the API structures we care about are Records (well-known properties which I can access, GET /my-record/property-name), Lists (collections whose items I can access under numeric keys, so GET /my-list/123 for the 124th element), Dictionaries (collections whose items I can access under string keys, so GET /my-dict/my-key), and Tagged Unions or Enums (a structure which can be one of many well-known kinds and you have to assert what kind it is in order to use it, GET /my-union-item/as-widget failing because it was actually a foobar, but GET /my-union-item/as-foobar succeeding and granting me access to some internal foobar-style properties)."

Most people seem to get by without explicit lists or tagged unions; lists are in some sense subsumed by dictionaries but tagged unions really have their own genuine power to them.

The error that is being described above is an error of an API midpoint which does not know whether it is a Dictionary or a Record and thus is trying to be both, and the proposed solution is to make the midpoint firmly into a Record and add a property by-id which maps to the desired Dictionary.

from path-to-regexp.

garrettmaring commented on July 24, 2024

@crdrost delta! I agree that validation of that sort doesn't make sense in the URL as a regex. Thanks for the reply 👍

from path-to-regexp.

blakeembrey commented on July 24, 2024

Added support with 1327699, but opted to not transform anything people write. It'll instead throw an error and prompt you to use the non-capturing group directly. So (?: will be allowed to be nested in the next major release.

from path-to-regexp.

Parentheses within custom match regexp about path-to-regexp HOT 28 CLOSED

Comments (28)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent