Comments (28)
damn...I needed something like you requested: /(list|edit/\d+) and can't solve it :(
from path-to-regexp.
I am surprised that the primary hangup here is the semantics. To me the semantics are obvious:
- Internal groups never propagate up. That is, any containing
(___)
is rewritten to(?:___)
. Anything else is likely to be unbelievably confusing to everyone involved and there is provably no loss in generality.- Suppose you are a consumer of this library and you do want to capture something into its own group, your logic will be much easier to understand if you would be to use path-to-regexp with a path that looks like
':param(' + myRegexString+ ')'
and then afterwords runnew RegExp(myRegexString).exec(results.param)
to extract the group. Any 'convenience' that we try to provide beyond this is going to confuse the readers of their code -- "wait what does THAT do?" -- when the alternative is to force them to write two more lines of code whose meanings are obvious to any reader. Like the only downside I can see is that TypeScript etc will not be able to detect that the match results are notnull
, but it's got various "I know it's not null I swear!" constructs so I'm not concerned.
- Suppose you are a consumer of this library and you do want to capture something into its own group, your logic will be much easier to understand if you would be to use path-to-regexp with a path that looks like
- The outer parens are part of the URL grammar, not the regexp. Writing
:param(?=abc)
throws a SyntaxError when the child RegExp is compiled because?
cannot be the first character of any valid RegExp, likewise with?!
and?:
.
I am under a bit of a deadline crunch but I might be able to submit a PR which parses JS regexp along these lines, if they're acceptable to you?
from path-to-regexp.
You can't define groups within groups at this time as there's no defined way this should act. If you can come up with a proposal for how it behaves, we can look at implementing support for it.
from path-to-regexp.
I don't think there's any reason for groups within groups to have any special behavior outside of their usefulness for doing things like what I described in my original comment or, for another contrived example, like /:foo(b(ar|az))
to match /bar
or /baz
, etc.
I don't think inner groups should be parsed at all and should just be left alone as part of the custom regexp. Does that make sense?
from path-to-regexp.
Not really. What do we do with the matches inside the parentheses? The example above would create two matches. What about other types of matching groups that don't have output in the RegExp? How are unbalanced brackets handled? You are welcome to submit a PR if you know how it should work.
Edit: I'd love to see it work, but the reason it does not exist is because it's not clear how/why.
from path-to-regexp.
I just spent a bit of time attempting to put together a pull request for this but it turns out it's probably going to be really difficult.
I found the regex that needs to be modified to accomplish what I've described and the portion of it that will require modification \\(((?:\\\\.|[^\\\\()])+)\\)
The problem is JavaScript's lack of recursive regex support so detecting inner parentheses will either mean drastically increasing the length and complexity of PATH_REGEXP
to support an arbitrary number of levels of nested parentheses or introducing a new project dependency like XRegExp and utilizing its matchRecursive method to support unlimited levels. I think I could implement the former but it doesn't feel like a great solution. Using XRegExp seems like it could work but the decision to pull in a new dependency for this somewhat-edge case isn't mine to make.
It sure would be nice if this were supported but I can see why it might make more sense, at least for now, to leave it as it is and maybe include a note in the README about this limitation.
By the way I wrote a test for this case while working on my pull request. I'll include it here in case someone wants to move forward with this at some point:
[
'/:test(\\d+(\\.\\d+)?)',
null,
[
{
name: 'test',
prefix: '/',
delimiter: '/',
optional: false,
repeat: false,
partial: false,
asterisk: false,
pattern: '\\d+(\\.\\d+)'
}
],
[
['/123', ['/123', '123']],
['/abc', null],
['/123/abc', null],
['/123.123', ['/123.123', '123.123']],
['/123.abc', null]
],
[
[{ test: 'abc' }, null],
[{ test: '123' }, '/123'],
[{ test: '123.123' }, '/123.123'],
[{ test: '123.abc' }, null]
]
]
from path-to-regexp.
Yeah, it's tricky. The problem really comes from internal group matches. There's no clearly defined way of how the internal match would interact with the current support for tokens. It's also not clear how things like the reverse path-to-regexp
would work either as a result. The clearest thing here is to make them non-matching groups, but that's a little opinionated too because maybe someone actually wants internal matches to be part of the result. The easiest way around it today would be to put the two matches next to each other as separate groups, but I'd love to find the correct way to support it how people want natively.
Edit: The above example is a good case. Right now, keys
is based on what you have above - but the above pattern has two many matches and causes a mismatch on results to keys.
from path-to-regexp.
Yeah I was assuming they would be treated as non-matching somehow. I realize I'm probably coming off as really naive with this request but thanks for humoring me! You've clearly put a lot of thought into this yourself. I'll keep an eye on this to see where it goes, but thanks so much for all your hard work! Aside from this incredibly minor nitpick the library is amazing :)
from path-to-regexp.
I wouldn't worry about the implementation, we can always make that work. The harder thing right now is deciding how it would work. Should internal brackets create a matching group? If they do, do we add a token to the list? If we add a token to the list, how do people use the reverse "create a path" from the tokens? I think that's a dead end, so a non-matching group is probably a better idea. In that case, we can replace (
with (?:
so it becomes non-matching. After that, the next question is probably - how do we treat the outer "primary" parens? If the internal ones work with (?!
and (?=
, why wouldn't the outer ones?
I just know the issues around it, not the solutions, sorry 😄 Maybe the best way is just to allow (?:
, (?=
and (?=
moving forward and add something to the tokens to indicate they are non-matching (not a key). Or we just keep it as is and leave those tokens for internal use only.
from path-to-regexp.
FWIW I'd be perfectly content with the non-matching group solution. The implementation is a little beyond me though since I think there's still the issue of balancing parentheses.
from path-to-regexp.
@blakeembrey are you planning to add support for non-capturing groups?
those are quite integral to our use case. We're evaluating react-router, which is using this package and this is a bit of a blocker for a "smooth" transition.
from path-to-regexp.
I haven't any plans, but can take another look at it. I'd also accept a PR that just ignores non-matching groups.
from path-to-regexp.
@plandem when using React Router recently (which just hands it's path to this library) I discovered that this works for /foo and /bar
<Route path="/:_(foo|bar)" component={FooBar} />
And at this point I don't know it's a bug or a feature.
Edit:
Oh I see what happened.
Looking here: https://github.com/pillarjs/path-to-regexp#custom-match-parameters I can see that what I did was create a custom param named _
that only matched that regex. Interesting.
from path-to-regexp.
Any progress here?
from path-to-regexp.
I want something like:
/:username{/|/i/:image}
match:
/somename
/somename/i/imagename
don't match:
/somename/asdasd
from path-to-regexp.
@sergeysova Why don't you use an array or a regular expression? You can also do /:username/(i/[^/]+)?
but I'd advocate for the array as it's easier to understand.
from path-to-regexp.
@blakeembrey I want have named parameter :image
And more subroutes will add under /:username/
from path-to-regexp.
@sergeysova Try an array? ['/:username', '/:username/i/:image']
?
from path-to-regexp.
Also if I don't get to this ever, I wanted to mention for anyone reading this thread that unless you are repeating something in parentheses indefinitely, you probably do not need ()
in your regexp.
Just to break down the cases in this, the first comment starts with the regexp \d+(\.\d+)?
. First observe that (xyz)?
is the same as (|xyz)
, so this is \d+(|\.\d+)
. Now distribute that prefix over the rest, to get \d+|\d+\.\d+
.
A more complicated case: if you wanted to match a JSON float the regex for that is apparently: -?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?
. Yuck. You can see that it has three groups, two with ?
and one with |
. But no indefinite repetition of a parenthesized component! So you can go through the 2^3 = 8 cases:
-?0
-?0[eE][+-]?\d+
-?0\.\d+
-?0\.\d+[eE][+-]?\d+
-?[1-9]\d*
-?[1-9]\d*[eE][+-]?\d+
-?[1-9]\d*\.\d+
-?[1-9]\d*\.\d+[eE][+-]?\d+
Putting these together with |
yields a regex without ()
.
A language with support for list comprehensions can do a lot of this for you, e.g. in Python you can convert (abc|def|ghi)
to the tuple ("abc", "def", "ghi")
, as mentioned above you can convert (abc)?
to (|abc)
to ("", "abc")
as well, and finally you can combine them with the list comprehension:
> print('|'.join(a + b + c + d
for a in ("-?",)
for b in ("0", "[1-9]\d*")
for c in ("", "\.\d+")
for d in ("", "[eE][+-]?\d+")))
-?0|-?0[eE][+-]?\d+|-?0\.\d+|-?0\.\d+[eE][+-]?\d+|-?[1-9]\d*|-?[1-9]\d*[eE][+-]?\d+|-?[1-9]\d*\.\d+|-?[1-9]\d*\.\d+[eE][+-]?\d+
from path-to-regexp.
You’re welcome to submit a PR, but it’s definitely not just semantics stopping an implementation. Balancing nested matching brackets is a big rewrite of the existing code, if you’re able to do it simply without many changes that’s fantastic and definitely please submit a PR. Currently, I believe the logic needs to be rewritten to support a lexer instead of using a regexp internally.
from path-to-regexp.
Oh yeah definitely. The language of regular expressions is famously not regular.
from path-to-regexp.
What I found out is that if you are working with groups utilize an another set of parenthesis after. This of course will not work with nested grouping.
For example, this pattern /([F][S]|[R][A])(\d{5})
will match /FS12345
or RA12345
, and it will not match FA12345
.
Or use (\d+\.*\d+)
, and then split inside the component on the dot
from path-to-regexp.
I believe that there is a good use case for having more control over the group: base64 validation:
^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$
This cannot be done with the current restrictions on custom match groups.
from path-to-regexp.
@garrettmaring yeah that {4}
followed by that *
does indeed seem difficult or impossible. I would be surprised if /
worked in the URL since it's also a path separator, but there is always RFC4648 base64url if you don't have that.
However, I would remind you and anyone else reading that these regexes are not in my view primarily security-related and so "validation" is a little bit of a dicey subject. Like, to my mind the point of the regex is to say "this route looks like _____". The basic problem is that you try to write an API which has both GET /widgets/:id
and also GET /widgets/report
and then run into the problem that the API is not aware that report
is anything other than an :id
substitution, so you write /widgets/:id([0-9]+)
to make it clear that that's a number, say.
Base64 on the other hand is so permissive that it makes this very tricky because it doesn't rule out so many other routes -- so if :id
is Base64 then you see a problem with the endpoint GET /widgets/smallest
because smallest
is the Base64 encoding of the byte string b2 66 a5 95 eb 2d
. So once you start using base 64 IDs I would really start to recommend rewriting to GET /widgets/by-id/:id
so that you have a separate namespace here to avoid the route conflicts, and then you don't need the regexes anymore.
So I think you're 100% right that there is a thing you cannot do here, but even if you could do it, you would not buy much over just the regex [-A-Za-z0-9_]*=?=?
to disambiguate the routes (maybe your other routes contain a ~
or something) followed by checking req.params.id.length % 4 == 0
to get the rest of the validation in the actual client code.
from path-to-regexp.
@crdrost At that point, I think you'd just be looking for some application logic to do that for you right?
GET /widgets/:thing
then in the handler for that, if (thing === 'reports') { // do it; return; } else { // handle thing as id (validate guid, etc) }
,
from path-to-regexp.
Yeah, I'd say so. What's really at stake here is that URL paths come from a filesystem world but are a generic text format being used to transmit data in a very free-form format.
If we were to instead take a more structured view towards API development, where we have valid API "midpoints" as well as the endpoints when the full URL is constructed, we might see this better. So each midpoint instead comes from some sort of philosophy, "the API structures we care about are Records (well-known properties which I can access, GET /my-record/property-name
), Lists (collections whose items I can access under numeric keys, so GET /my-list/123
for the 124th element), Dictionaries (collections whose items I can access under string keys, so GET /my-dict/my-key
), and Tagged Unions or Enums (a structure which can be one of many well-known kinds and you have to assert what kind it is in order to use it, GET /my-union-item/as-widget
failing because it was actually a foobar
, but GET /my-union-item/as-foobar
succeeding and granting me access to some internal foobar
-style properties)."
Most people seem to get by without explicit lists or tagged unions; lists are in some sense subsumed by dictionaries but tagged unions really have their own genuine power to them.
The error that is being described above is an error of an API midpoint which does not know whether it is a Dictionary or a Record and thus is trying to be both, and the proposed solution is to make the midpoint firmly into a Record and add a property by-id
which maps to the desired Dictionary.
from path-to-regexp.
@crdrost delta! I agree that validation of that sort doesn't make sense in the URL as a regex. Thanks for the reply 👍
from path-to-regexp.
Added support with 1327699, but opted to not transform anything people write. It'll instead throw an error and prompt you to use the non-capturing group directly. So (?:
will be allowed to be nested in the next major release.
from path-to-regexp.
Related Issues (20)
- Regression > 0.1.7 HOT 4
- regex match fail HOT 3
- Double decoding path parameters HOT 17
- Publish to NPM HOT 1
- Next dynamic route with spread (...) throws 'Unexpected MODIFIER at 70, expected END' HOT 2
- end option does not work as expected HOT 2
- When parsed to tokens, why are backslashes stripped from pattern? HOT 2
- Invalid regex patterns not properly detected. HOT 2
- Another dumb question: How to run tests? HOT 2
- Erroneous test result. HOT 3
- `?` in optional capture group isn't connected to group in output - named param HOT 11
- SyntaxError: Invalid regular expression: invalid group specifier name HOT 1
- The validate config in options is not work in compile function. HOT 1
- throw new TypeError(`Unexpected ${nextType} at ${index}, expected ${type}`) HOT 3
- How about match an empty array? HOT 2
- Regex for non-negative integers only matches two or more digits HOT 1
- Return the matched path/regex
- Escaping the question mark for a query string throws and error HOT 2
- Checking if a path is a subset of another path HOT 1
- params are being joined without `/` HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from path-to-regexp.