couchbaselabs / gojsonsm Goto Github PK
View Code? Open in Web Editor NEWGo implementation of my JSONSM algorithm.
Go implementation of my JSONSM algorithm.
For this expression:
(county = "United States" OR country = "Canada" AND type="brewery") OR (type="beer" AND DATE(updated) >= DATE("2019-01-01"))
things work correctly.
However, when extra parenthesis are added:
((county = "United States" OR country = "Canada") AND type="brewery") OR (type="beer" AND DATE(updated) >= DATE("2019-01-01"))
the logic is broken.
Need investigating on why the AST is different.
The regex has an extra escape character
When the top level ExecNode contains Afters and does not get run, it could lead to incorrect results.
For example, something simple like:
field0 < field1
would not be executed and the result would be incorrect.
The XDCR team at Couchbase has requested that it be possible to have multiple roots specified (ie: provide matching against explicitly separately defined documents.
https://issues.couchbase.com/browse/MB-34363 was discovered when running a date comparison on invalid formats.
JSONSM should support performing filtering based on a regexp string match. This is an issue to track ownership, requirements and implementation.
Loops do not have the correct logic that causes them to skip processing the loop entirely if the bintree already indicates that the loops outcome will not matter to the overall result of the expression.
These can now be implemented by using the expression support which was recently introduced.
Right now a NullValue
Fastval
outputs to 0
or 0.0
when its AsUint()/AsInt()/AsFloat64()
is called.
Since 0 is commonly used as values, it could easily lead to an incorrect match. Consider the following test case that incorrectly matches:
fe = &FilterExpression{}
err = parser.ParseString("fieldpath.path IS NOT NULL", fe)
...
userData := map[string]interface{}{
"fieldpath": map[string]interface{}{
"path": 0,
},
}
udMarsh, _ := json.Marshal(userData)
match, err := m.Match(udMarsh)
...
fieldpath.path
is not null in this case but because the nil value on the right side of EqualsExpr
inserted by the parser equates to 0 when matches with LHS's 0(int), this evaluation results in the incorrect value.
The simplest way to do it IMO is to make NullValue
output MinInt64/MaxUint64/MaxFloat64
when AsInt()/AsUint()/AsFloat()
is called. Chances are, those values are unlikely to exist in real world scenarios.
Another option is to do more complicated checks in CompareOp
, which is more correct, but that will break the pretty symmetric function of the current code.
Thoughts, @brett19 ?
N1QL supports XOR... and it'll be nice to have in case someone actually needs it
With the incorrect recursive grammar, parser will not be able to hit a base case in case of an incorrectly typed syntax
A few odd parenthesis parsing cases:
This is valid:
(country="United States" OR country="Canada") AND type="brewery"
But this is invalid:
((country="United States" OR country="Canada") AND type="brewery")
due to potential recapturing of the AND clause?
One of the functions that would be good to have is to have time comparisons in the matcher.
The thinking behind is to have a new type of FastVal called "TimeValue", which will hold a golang Time struct, and time provides us comparators of Before, After, Equal, etc. We'll be sticking with the RFC-3339 format that is used in the golang time library, since N1QL uses ISO-8601, but Golang doesn't directly support ISO-8601.
The miss here is that a lot of documents may have the values of "YYYY-MM-DD" and we can't do matching on that. Perhaps we can address that in another PR/issue. Is there a good way to address this at this point, though?
For the corner cases where we're doing different scope of time comparison, we will just simply pass in and let time library do comparison. For example:
If user is checking for a field is equivalent to a time: "2018-11-21T00:01:02Z", but the document field has the actual time value of "2018-11-21T00:01:02.03Z", the the Equal() operator will return false.
The workaround is for user to specify a range that covers that specific time in the document, as we don't want to spend the resources guessing the user's intent.
There was a request that we rewrite the grammer into something more cross-platform so that it can be shared among the various implementations of JSONSM.
Ideally, we can't expect users to enter the syntax perfectly w.r.t. white spaces. Parser should be smart enough about it to parse regardless of the white spaces.
When an integer fastVal is used to compare against a float, and the float that is being compared to can overflow an int64, the comparison result is more than likely to be incorrect.
Given this JSON document:
{
"foo": [
{
"bar": [1, 2, 3],
"zot" : 2
}
]
}
The following match expression fails because zot
appears in the JSON after bar
, and the code that defers loop evaluation only looks for fields rooted at $doc (ignores fields rooted at outer loops).
any $f in $doc.foo {
any $b in $f.bar {
$f.zot == $b
}
}
Matcher has support for mathRound. This issue tracks the implementation of functions, and specifically, ROUND() function for simpleParser. The goal is to set it up so future functions implementations are painless.
There's a bug where the parser doesn't handle multi-token fields well, though it was supposed to.
When a numeric FastVal is created with JsonStringValue, it's unable to correctly execute AsUint() or AsInt(). This could lead to failed numerical comparisons.
Part of the SimpleParser's multi token loop is incorrect as it should retry a whole loop should something be marked invalid
The library should support providing an expression such as (tags[1] == "frank").
I think it might make sense to support comparing objects and/or arrays using the Equals/NotEquals operators. For objects, this would come in the form of comparison that the keys and values match (but not necessarily the ordering), and for arrays that the elements and ordering match. I think this might actually be possible to implement today using a compound operator which decomposes an array or object into a set of EQUALs checks (and maybe something to confirm the length of the object/array matching).
JSONSM should support performing looping over objects as well as arrays.
There are a few functions which make a lot of sense to be included in JSONSM. Some of the ones that particularly come to mind are non-trivial mathematical functions, or date/time style functions (since JSON doesn't have one set standard here). JSONSM should be expanded to support this. This issue is for discussion on the possible implementations of this.
We should support the use of IS-NULL and IS MISSING type expressions.
One of the requirements for XDCR as a consumer of gojsonsm is to able to support (negative) lookahead/lookbehind (MB-30311). From various conversations in the past, we have decided to go with pcre as the library of choice for doing such matching.
This issue tracks the gojsonsm side of things.
At the end, resolve() marks any unresolved to false.
But logically, given an expression
field1 <> "value"
where field1 is not present, the result will evaluate to false.
Since field1 is not present, logically, this statement should be true.
FastVal and the tokenizer incorrectly parse false
values as TrueValue inside the FastVal.
Given a data structure of:
userData := map[string]interface{}{
"KEY": map[string]interface{}{
"internalKey": "value",
},
}
A filter expression of "KEY EXISTS" fails, even though technically it does exist.
From what I can tell, matchExec sees KEY, and then goes ahead and does 2 more token gets, which are ":" followed by "{".
Something like this:
NEIL DEBUG testMap: map[[$%XDCRInternalKey*%$]:TestDocKey Key:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA [$%XDCRInternalMeta*%$]:map[AnotherXattr:TestValueString TestXattr:30]]
NEIL DEBUG Expr:
NEIL DEBUG objStart going into objOrArray
NEIL DEBUG tokenData: "[$%XDCRInternalKey*%$]"
NEIL DEBUG autostep tokenData: :
NEIL DEBUG autostep2 tokenData: "TestDocKey"
NEIL DEBUG KeyString: [$%XDCRInternalKey*%$]
NEIL DEBUG tokenData: "Key"
NEIL DEBUG autostep tokenData: :
NEIL DEBUG autostep2 tokenData: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
NEIL DEBUG KeyString: Key
NEIL DEBUG tokenData: "[$%XDCRInternalMeta*%$]"
NEIL DEBUG autostep tokenData: :
NEIL DEBUG autostep2 tokenData: {
NEIL DEBUG KeyString: [$%XDCRInternalMeta*%$]
NEIL DEBUG KeyString found with token: 1 tokenData: { keyElem: :ops
[0] @ exists @
So it sees that there is no operations for tokenData "{" and then bails, and the original [0] got lost.
We need additional test cases for the matcher's.
N1QL supports negative index on array, so people can do something like arr[-1]. Golang seems to not want to support it in its language construct. But in trying to keep it similar to N1QL, I'm thinking we may need to support this at the matcher level.
Any thoughts on this?
While the Transformer generates the correct output for this case, the Matcher will then panic if it sees any variables access on the RHS...
The Golang memory management system can cause memory to be arbitrarily allocated, this causes the testing to fail occasionally when it otherwise should be passing.
We currently back up the current position by the tokenData length:
startPos -= len(tokenData)
This is probably not entirely safe, and may lead to odd bugs. We should probably update the tokenizer to support fetching the last position, or potentially pass the last position through to matchExec so it can use that.
It was parsing true and false, but not as the JSON values of true and false.
Various places in the JSON reader assume that the input data is correct and will panic rather than error if (for instance) a function block is included with no function name.
Now that simpleParser and matcher both have basic math functions framework, this issue tracks other math functions that should be implemented
Something like ABS(ROUND(-5.4))
Right now matcher assumes a valid param following a function name.
We should add support for taking multiple expressions and generating a matcher definition which can match against multiple of these expressions at once in one pass.
An expression such as NOT(name eq "frank")
with the name
field not existing will cause the expression to fail. This may be expected behaviour, but I think that the more intuitive behaviour would be to implement some form of post-completion resolution of unresolved leaf nodes to false.
The existing logic in FastVal makes it so that checking a specific datatype is done using the dataType field (possibly should be moved to a function). However, the method to check if something is string-typed is called IsStringLike
rather than simply IsString
. This should be fixed to match the rest of the functions.
Since we currently support various math functions, it would make sense to add support to simple math arithmetic operations, as these are supported by N1QL.
https://docs.couchbase.com/server/6.0/n1ql/n1ql-language-reference/arithmetic.html
These (+ - / * % -) would translate nicely into functions part of the match tree.
In the case of a very simple loop, the binTree bucket of the loop ends up being the root element, this causes the IsResolved checking within the matcher to trigger when the loop runs once, rather than when the stallIndex is reset after the loop.
Matcher has support for recursive functions calls, so simpleMatcher should allow such syntax
For some more complicated expressions, when the dependancies are entirely within a subtree, the after block is placed on the root of the document instead of on the subtree.
Expression:
$doc.name.first = $doc.name.last
Transformed:
match tree:
:elems
`name`:
:elems
`first`:
:store $1
`last`:
:store $2
:after:
#with $1:
:ops
[0] eq $2
bin tree:
[0:0] leaf
match buckets:
0: 0
num buckets: 1
num fetches: 2
max depth: 1
Field values should be encased with `
Should allow both " and ' for values.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.