Giter VIP home page Giter VIP logo

Comments (8)

anacronw avatar anacronw commented on August 30, 2024

I found the rationale behind it, its the default masking behavior for relation modifiers:

http://www.loc.gov/standards/sru/cql/contextSets/theCqlContextSet.html#relmods

According to this, I should specify my query as:

file =/cql.unmasked "some\\path\\in\\windows\.exe"

Despite this, I don't see how it is useful to actually escape the masking characters within the parser. By consuming the symbols, the parser is basically saying that: index = "my \* pony" is the same as index = "my * pony" when it is semantically untrue. The interpretation of the original query should retain the fact that the former means the literal asterisk.

I still believe that the line should be removed. I will submit a PR if requested.

from cql-java.

adamdickmeiss avatar adamdickmeiss commented on August 30, 2024

Backslash has both meaning in CQL and regular expressions and that's part of the problem. Your initial CQL term is invalid "." has undefined semantics.. Instead use

file = some\\path\\in\\windows\\.exe

from cql-java.

anacronw avatar anacronw commented on August 30, 2024

I don't think the expression you provided means the same thing. I think need the expression to be in "unmasked" mode effectively and pass the string as-is.

However, what I don't understand is why this parser is interpreting the masking characters in the first place. Can you elaborate on that? I believe that should be the function of whatever is using the parser.

If the cql-java parser is interpreting the masking characters, then its effectively ignoring the difference between using a literal asterisk and the masked version of the asterisk (for example). After tokenization - any users of this library would receive the same output.

from cql-java.

adamdickmeiss avatar adamdickmeiss commented on August 30, 2024
  1. unmasked: It's not supported by cql-java.
  2. cql-java preserves the backslash in its output term if that is followed by one of *?^\\ . If followed by double-quotes it terminates. In all other cases it's eaten. Hence for all those characters that DO have meaning in CQL it is preserved and a user can look at the resulting term and judge to whether masked/anchored etc.. If backslash is followed by other character then (including the period .) it is removed. The spec even calls that INVALID. This is why you should use double-backslash plus dot for your regular expression!

The CQL parser of YAZ is different in this respect - In that it always preserves them. I like that better, and you do too it seems. It certainly makes it possible to support the unmasked option (that is the lexer will NOT care about a mode or similar and things are preserved).,

indexdata/yaz@8521c0a

In the commit above you'll see that regression results 06/03.xcql and 06/06.xcql was changed .. Until that point they were identical.

from cql-java.

anacronw avatar anacronw commented on August 30, 2024

Ah you're right, it does say its an error in my case.

I still disagree with one point:

In all other cases it's eaten. Hence for all those characters that DO have meaning in CQL it is preserved and a user can look at the resulting term and judge to whether masked/anchored etc..

They have meanings in CQL in masked mode of course, I agree. However, why does that mean you must eat it? As I said before, as an example, by eating the backslash before the character, you lose the difference between a literal asterisk and the asterisk as a masked character.

I would say, that by eating the backslash in that case, the user CANNOT judge whether its masked or not because whether I feed it to cql-java escaped or not, the lexer outputs the same thing.

Edit: Actually I lost context of the current behavior and confused myself. I understand the point now - the fact that its eating all non-masked characters after the backslash is almost a preference since its undefined by the spec (or invalid rather). In the yaz project you mentioned, it does one behavior and here, it does another. Both are are technically valid, but I do prefer retaining the backslash in my case.

Thanks for working w/ me on this

from cql-java.

adamdickmeiss avatar adamdickmeiss commented on August 30, 2024

First revisions of CQL only had the "masked" mode.. Now that we have unmasked / regular etc.. I think it's time to update the spec.. Of course for regular expressions we can't have anything being lost. (or any other masking rule for that sake).. So for that reason, IMHO, the lexer should never remove anything.

I tried to update-cql java a bit.. It's not a huge change.. a few tests needs update here and there.. And clearly, people could not have relied on the backslash-non-mask-char behavior because that was undefined or disallowed.

from cql-java.

MikeTaylor avatar MikeTaylor commented on August 30, 2024

Sorry, I am late to this party.

Let's be clear on what's what here. Your problem is one of lexing strings -- something that the CQL parser does. That is a completely separate issue from what the spec calls "masking" (i.e. pattern matching) which is done by the search engine. So using a relation modifier, as in =/unmasked makes no difference at all to how the string is lexed: it's just an instruction to be passed to the search engine, so it knows how to interpret the query term.

The only legitimate use for backslashes in a CQL string is to prevent a double-quote from terminating a string, as in "I said \"Hello\".". In every other case, the backslash should be a literal, left for the search engine to interpret. This is what the specification is trying to say, albeit rather clumsily.

The fact that CQL-Java tries to be "helpful" by special-casing the masking characters is very _un_helpful, and only causes confusion. (BTW., that was almost certainly my own code originally, so I can be as rude about it as I like :-) )

My conclusion: the fix for this bug is to remove the special-casing in the CQL-Java lexer.

from cql-java.

anacronw avatar anacronw commented on August 30, 2024

I can attempt a pull request. I have it working locally, except I don't understand the rationale behind the regression tests (that ensured its behavior) so I hadn't submitted it yet.

from cql-java.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.