couchbaselabs / gojsonsm Goto Github PK

View Code? Open in Web Editor NEW

9.0 5.0 7.0 3.46 MB

Go implementation of my JSONSM algorithm.

Go 100.00%

gojsonsm's People

Stargazers

Watchers

Forkers

nelio2k avsej griels brett19 ysui6888 tezheng sumukhbhat2701

gojsonsm's Issues

Extra Parenthesis leads to incorrect logic

For this expression:
(county = "United States" OR country = "Canada" AND type="brewery") OR (type="beer" AND DATE(updated) >= DATE("2019-01-01"))

things work correctly.

However, when extra parenthesis are added:
((county = "United States" OR country = "Canada") AND type="brewery") OR (type="beer" AND DATE(updated) >= DATE("2019-01-01"))

the logic is broken.

Need investigating on why the AST is different.

SimpleParser to parse floating point as Values

The regex has an extra escape character

Matcher does not execute the Afters in the top root node

When the top level ExecNode contains Afters and does not get run, it could lead to incorrect results.

For example, something simple like:
field0 < field1

would not be executed and the result would be incorrect.

Add support for multiple roots.

The XDCR team at Couchbase has requested that it be possible to have multiple roots specified (ie: provide matching against explicitly separately defined documents.

simpleParser to handle multi-word values

Potentially seg fault when comparing invalid dates

https://issues.couchbase.com/browse/MB-34363 was discovered when running a date comparison on invalid formats.

Add Support for LIKE (Regexp)

JSONSM should support performing filtering based on a regexp string match. This is an issue to track ownership, requirements and implementation.

loops don't exit early if they are resolved by another segment of the binary tree

Loops do not have the correct logic that causes them to skip processing the loop entirely if the bintree already indicates that the loops outcome will not matter to the overall result of the expression.

FastVal IsUint() should check for uint types

Add support for `EXISTS`, `IS MISSING` and `IS NULL` to the grammar

These can now be implemented by using the expression support which was recently introduced.

Null Fastval should not output as 0 when used as numeric values

Right now a NullValue Fastval outputs to 0 or 0.0 when its AsUint()/AsInt()/AsFloat64() is called.
Since 0 is commonly used as values, it could easily lead to an incorrect match. Consider the following test case that incorrectly matches:

	fe = &FilterExpression{}
	err = parser.ParseString("fieldpath.path IS NOT NULL", fe)
...
	userData := map[string]interface{}{
		"fieldpath": map[string]interface{}{
			"path": 0,
		},
	}
	udMarsh, _ := json.Marshal(userData)
	match, err := m.Match(udMarsh)
...

fieldpath.path is not null in this case but because the nil value on the right side of EqualsExpr inserted by the parser equates to 0 when matches with LHS's 0(int), this evaluation results in the incorrect value.

The simplest way to do it IMO is to make NullValue output MinInt64/MaxUint64/MaxFloat64 when AsInt()/AsUint()/AsFloat() is called. Chances are, those values are unlikely to exist in real world scenarios.

Another option is to do more complicated checks in CompareOp, which is more correct, but that will break the pretty symmetric function of the current code.

Thoughts, @brett19 ?

Fix ambiguous use of 'Variables' between expressions and compiler

Matcher (and parser) to support XOR

N1QL supports XOR... and it'll be nice to have in case someone actually needs it

Incorrect FilterExpression recursive grammar could lead to stack overflow

With the incorrect recursive grammar, parser will not be able to hit a base case in case of an incorrectly typed syntax

Parenthese parsing issues

A few odd parenthesis parsing cases:

This is valid:
(country="United States" OR country="Canada") AND type="brewery"
But this is invalid:
((country="United States" OR country="Canada") AND type="brewery")

due to potential recapturing of the AND clause?

Matcher and simpleParser to support Time based comparison

One of the functions that would be good to have is to have time comparisons in the matcher.
The thinking behind is to have a new type of FastVal called "TimeValue", which will hold a golang Time struct, and time provides us comparators of Before, After, Equal, etc. We'll be sticking with the RFC-3339 format that is used in the golang time library, since N1QL uses ISO-8601, but Golang doesn't directly support ISO-8601.

The miss here is that a lot of documents may have the values of "YYYY-MM-DD" and we can't do matching on that. Perhaps we can address that in another PR/issue. Is there a good way to address this at this point, though?

For the corner cases where we're doing different scope of time comparison, we will just simply pass in and let time library do comparison. For example:
If user is checking for a field is equivalent to a time: "2018-11-21T00:01:02Z", but the document field has the actual time value of "2018-11-21T00:01:02.03Z", the the Equal() operator will return false.

The workaround is for user to specify a range that covers that specific time in the document, as we don't want to spend the resources guessing the user's intent.

Rewrite parser using BNF or other grammar grammar

There was a request that we rewrite the grammer into something more cross-platform so that it can be shared among the various implementations of JSONSM.

parser to handle parsing of expressions without spaces

Ideally, we can't expect users to enter the syntax perfectly w.r.t. white spaces. Parser should be smart enough about it to parse regardless of the white spaces.

fastLitParser should parse number into float64 if int64 overflows

When an integer fastVal is used to compare against a float, and the float that is being compared to can overflow an int64, the comparison result is more than likely to be incorrect.

Failure when inner loop references not-yet-parsed field from outer loop

Given this JSON document:

{
  "foo": [
    {
      "bar": [1, 2, 3],
      "zot" : 2
    }
  ]
}

The following match expression fails because zot appears in the JSON after bar, and the code that defers loop evaluation only looks for fields rooted at $doc (ignores fields rooted at outer loops).

any $f in $doc.foo {
    any $b in $f.bar {
        $f.zot == $b
    }
}

Implement mathRound function in parser

Matcher has support for mathRound. This issue tracks the implementation of functions, and specifically, ROUND() function for simpleParser. The goal is to set it up so future functions implementations are painless.

parser to handle multi-token fields using escape literal

There's a bug where the parser doesn't handle multi-token fields well, though it was supposed to.

FastVal need methods for dealing with JsonStringValue

When a numeric FastVal is created with JsonStringValue, it's unable to correctly execute AsUint() or AsInt(). This could lead to failed numerical comparisons.

Multi-token parser loop bug

Part of the SimpleParser's multi token loop is incorrect as it should retry a whole loop should something be marked invalid

support direct array indexing in expressions

The library should support providing an expression such as (tags[1] == "frank").

support for full object/array comparisons?

I think it might make sense to support comparing objects and/or arrays using the Equals/NotEquals operators. For objects, this would come in the form of comparison that the keys and values match (but not necessarily the ordering), and for arrays that the elements and ordering match. I think this might actually be possible to implement today using a compound operator which decomposes an array or object into a set of EQUALs checks (and maybe something to confirm the length of the object/array matching).

Support looping over objects rather than just arrays

JSONSM should support performing looping over objects as well as arrays.

Add support for functions as part of filter expressions

There are a few functions which make a lot of sense to be included in JSONSM. Some of the ones that particularly come to mind are non-trivial mathematical functions, or date/time style functions (since JSON doesn't have one set standard here). JSONSM should be expanded to support this. This issue is for discussion on the possible implementations of this.

Support `IS NULL` and `IS MISSING` type expressions

We should support the use of IS-NULL and IS MISSING type expressions.

Support for (negative) lookahead/lookbehind using pcre

One of the requirements for XDCR as a consumer of gojsonsm is to able to support (negative) lookahead/lookbehind (MB-30311). From various conversations in the past, we have decided to go with pcre as the library of choice for doing such matching.

This issue tracks the gojsonsm side of things.

NotEquals transformation should use NOT expression

At the end, resolve() marks any unresolved to false.

But logically, given an expression
field1 <> "value"
where field1 is not present, the result will evaluate to false.

Since field1 is not present, logically, this statement should be true.

Boolean values are not parsed correctly

FastVal and the tokenizer incorrectly parse false values as TrueValue inside the FastVal.

Matcher doesn't perform EXIST correctly on key with value of embedded map

Given a data structure of:

	userData := map[string]interface{}{
		"KEY": map[string]interface{}{
			"internalKey": "value",
		},
	}

A filter expression of "KEY EXISTS" fails, even though technically it does exist.
From what I can tell, matchExec sees KEY, and then goes ahead and does 2 more token gets, which are ":" followed by "{".

Something like this:

NEIL DEBUG testMap: map[[$%XDCRInternalKey*%$]:TestDocKey Key:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA [$%XDCRInternalMeta*%$]:map[AnotherXattr:TestValueString TestXattr:30]]
NEIL DEBUG Expr: $doc.[$%XDCRInternalMeta*%$] EXISTS
NEIL DEBUG objStart going into objOrArray
NEIL DEBUG tokenData: "[$%XDCRInternalKey*%$]"
NEIL DEBUG autostep tokenData: :
NEIL DEBUG autostep2 tokenData: "TestDocKey"
NEIL DEBUG KeyString: [$%XDCRInternalKey*%$]
NEIL DEBUG tokenData: "Key"
NEIL DEBUG autostep tokenData: :
NEIL DEBUG autostep2 tokenData: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
NEIL DEBUG KeyString: Key
NEIL DEBUG tokenData: "[$%XDCRInternalMeta*%$]"
NEIL DEBUG autostep tokenData: :
NEIL DEBUG autostep2 tokenData: {
NEIL DEBUG KeyString: [$%XDCRInternalMeta*%$]
NEIL DEBUG KeyString found with token: 1 tokenData: { keyElem: :ops
[0] @ exists @

So it sees that there is no operations for tokenData "{" and then bails, and the original [0] got lost.

Improve test cases

We need additional test cases for the matcher's.

Matcher to support negative array index

N1QL supports negative index on array, so people can do something like arr[-1]. Golang seems to not want to support it in its language construct. But in trying to keep it similar to N1QL, I'm thinking we may need to support this at the matcher level.
Any thoughts on this?

Matcher does not support variables on the RHS of an operation

While the Transformer generates the correct output for this case, the Matcher will then panic if it sees any variables access on the RHS...

bintree allocation check test fails spontaneously

The Golang memory management system can cause memory to be arbitrarily allocated, this causes the testing to fail occasionally when it otherwise should be passing.

Matcher finds slot start/end via odd means

We currently back up the current position by the tokenData length:

startPos -= len(tokenData)

This is probably not entirely safe, and may lead to odd bugs. We should probably update the tokenizer to support fetching the last position, or potentially pass the last position through to matchExec so it can use that.

simpleParser not parsing true and false values as the correct JSON type

It was parsing true and false, but not as the JSON values of true and false.

Expression JSON format readers do not do proper input validation

Various places in the JSON reader assume that the input data is correct and will panic rather than error if (for instance) a function block is included with no function name.

Implement misc math functions

Now that simpleParser and matcher both have basic math functions framework, this issue tracks other math functions that should be implemented

Should we support embedded functions?

Something like ABS(ROUND(-5.4))
Right now matcher assumes a valid param following a function name.

Add support for merged expressions

We should add support for taking multiple expressions and generating a matcher definition which can match against multiple of these expressions at once in one pass.

NOT objects do not invert unresolved values

An expression such as NOT(name eq "frank") with the name field not existing will cause the expression to fail. This may be expected behaviour, but I think that the more intuitive behaviour would be to implement some form of post-completion resolution of unresolved leaf nodes to false.

FastVal IsString function is called IsStringLike

The existing logic in FastVal makes it so that checking a specific datatype is done using the dataType field (possibly should be moved to a function). However, the method to check if something is string-typed is called IsStringLike rather than simply IsString. This should be fixed to match the rest of the functions.

Expression:
		$doc.name.first = $doc.name.last
Transformed:
		match tree:
		  :elems
		    `name`:
		      :elems
		        `first`:
		          :store $1
		        `last`:
		          :store $2
		  :after:
		    #with $1:
		      :ops
		        [0] eq $2
		bin tree:
		  [0:0] leaf
		match buckets:
		  0: 0
		num buckets: 1
		num fetches: 2
		max depth: 1

simpleParser field and value

Field values should be encased with `
Should allow both " and ' for values.

couchbaselabs / gojsonsm Goto Github PK

gojsonsm's People

Stargazers

Watchers

Forkers

gojsonsm's Issues

Recommend Projects

Recommend Topics

Recommend Org