Comments (7)
I am a person who is interested in contributing to open source. Looks like this issue is a good one for a starter. May I work on this issue?
from velox.
Welcome, @amaliujia . Sure, feel free to pick this up. CC: @atanu1991 who is working on a related issue #153
from velox.
I want to discuss a bit on high level before start to implement.
MultiRange is a class that contains a list of Filter, with nan_allowed and null_allowed. So my current idea of the implementation is:
- Two MultiRange can merge only if they have the same nan_allowed and null_allowed.
- Two MultiRange can merge only if they have the same data type on all the Filters.
- If 1 and 2 are true, then simply have a nested loop to do
for f1 in MultiRangeA.getFilters():
for f2 in MultiRangeB.getFilters():
f1.mergeWith(f2);
Can you give me some suggestions?
from velox.
Rui,
MultiRange represents and OR of multiple filters. NaN values pass only if MultiRange::nanAllowed is true. Null values pass only if nullAllowed is true.
Merging two filters is combining the filters with an AND. A value passes the new filter only if it passes both the original filters.
Here is how I would represent a merge of two MultiRange filters (with just 2 filters in each for simplicity):
(a OR b) AND (c OR d) = (a AND c) OR (a AND d) OR (b AND c) OR (b AND d)
Therefore, what you have proposed is very close.
- Calculate nullAllowed for the combined filter:
bool bothNullAllowed = nullAllowed_ && other->testNull();
- Calculate nanAllowed for the combined filter: {same as above}
- Loop over all combinations of filters from left and right side and merge these.
- Construct a new filter using flags and list of filters from previous steps.
In step (3), make sure to check if a merge of two filter is kAlwaysFalse, kIsNull, kIsNotNull. In this case, the filter can be dropped from the result (null and NaN flags in the nested filters do not count for MultiRange filter; only top-level flags take effect).
Step (3) may result in an empty list of filters or a list of one. If the list is empty, return kAlwaysFalse, kIsNull, or kIsNotNull as appropriate (see nullOrFalse(bothNullAllowed) in Filter.cpp). If the list contains just one filter, return that filter after incorporating the null flag, e.g. if bothNullAllowed && !filter->testNull() return filter.mergeWith(IsNull) else return filter.
That should be it.
(Check out #233 that implements merge logic for BytesValues and BytesRanges.)
from velox.
Thanks @mbasmanova!
I prepared a WIP impl based on your suggestions: #299.
Can you comment on the PR to see if I am on the right track? I haven't added test cases and I can add those later.
Sorry to occupy too much of your time. I am a beginner so probably need more guidance at this moment (for example on how to deal with nullness in Volex)? Maybe later I could help to contribute a doc about nullness, nan, always true, always false, is null, is not null with Filter.
from velox.
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
from velox.
Looks like this issue was resolved with #299 . Closing.
from velox.
Related Issues (20)
- `setup-adapters.sh` script doesn't work on ARM Linux machines HOT 1
- Support "parquet_use_column_names" = false in Velox HOT 2
- Failed to read the parquet file HOT 3
- An unsupported nested encoding was found. HOT 5
- Velox doesn't support to read binary as string in Parquet HOT 5
- VeloxBackend logging does not respect spark.redaction.regex config property when logging spark config. HOT 1
- Split adapters install to corresponding platform setup scripts
- The find_package and linking of the 'gtest' library is wrong HOT 1
- Incorrect results when query has filters on info/metadata columns HOT 2
- Untracked directories will appears after the setup script is executed。
- Aggregation fuzzer is failing in nightly runs HOT 2
- Support inplace replace if possible for spark mask function
- Sort merge join failed with state.data == nullptr exception HOT 3
- Change FunctionSignature returnType to std::optional to support the return type varied with config HOT 16
- SIMDComparisonUtil.h:43:69: error: no matching function for call to ‘xsimd::batch_bool<signed char, xsimd::neon64>::batch_bool(xsimd::batch<signed char>)’
- bugs in Conbench performance report HOT 5
- Linux CI jobs have not been running since 2 weeks
- Parquet reader: can't read null map row in a single line file
- Build failed on mac m1 HOT 8
- str_to_map Spark function doesn't match Spark HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from velox.