Currently, quick only supports point queries. Some use-cases need the support of range

We might also take a look at this utility class: <a href="https://hbase.apache.org/api

Multi-Index Keys: User ID Servic

The following approaches were examined: Flattened Strings,</li

Evaluate range queries about quick HOT 5 CLOSED

raminqaf commented on June 8, 2024

Evaluate range queries

from quick.

Comments (5)

DawidNiezgodka commented on June 8, 2024

We might also take a look at this utility class: https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/OrderedBytes.html

from quick.

torbsto commented on June 8, 2024

Multi-Index
Keys:

User ID
Service ID

Query:

getCount(userId: 5, serviceId: 8) {
  count
}

Result:

"data": {
  "count": 100
}

Range
Keys:

Timestamp

Query:

getCount(timestampFrom: 0, timestampTo: 5) {
  count
}

Result:

"data": {
  "count": [10, 5, 8, 50, 12]
}

Index + Range
Keys:

User ID
Timestamp

Query:

getCount(userId: 5, timestampFrom: 0, timestampTo: 5) {
  count
}

Result:

"data": {
  "count": [5, 2, 3, 25, 7]
}

Multi-Index + Range
Keys:

User ID
Service ID
Timestamp

Query:

getCount(userId: 5, serviceId: 2, timestampFrom: 0, timestampTo: 5) {
  count
}

Result:

"data": {
  "count": [1, 4, 2, 12, 5]
}

from quick.

DawidNiezgodka commented on June 8, 2024

The following approaches were examined:

Flattened Strings,
Converting an avro to the byte array,
Converting a proto to the byte array,
Using a custom comparator.

Because of the specifics of the serialisation in both avro and proto, it is difficult to implement range queries as the order is not kept in a desired way.
Using a custom comparator is challenging as there is no simple method to exchange it with the default one (for example, through dependency injection). One possible approach would be to implement a custom state store but again this is challenging.
It seems that the most convenient method is the use of flattened strings.

Concerning the approach with flattened strings:

For single index or multi-index, it's easy and works good,
If we include a range, then it makes no difference whether it's a simple range, range + index, or range + multi-index,
In the case of (multi)index + multi-range, it is not possible to make a query in a single run as the nature of lexicographical order prohibits this.

from quick.

raminqaf commented on June 8, 2024

Benchmarks (Part 1)

List vs. Range

List

Key/Value
askm -> [a, b ,c ,d ,e]

This can be implemented and benched marked in two ways:

Using java lists: and serializing the list when making a put. For a new coming key-value pair, we use the key to get the value. If the value doesn't exist we create a Java list add the value to it, serialize it, and then put it in RocksDB. If the key exists we deserialize the byte array to a Java list and append the value to it. Then repeat the serialization and putting.
We call this method later on putList/getList.
Use the merge operator of RocksDB. With this approach, we have to define and set our merge operator (e.g., StringAppendOperator) and RocksDB appends the values for a given key.
We call this method later on putMerge/getMerge.

Range

Key/Value
askm_1 -> a
askm_2 -> b
askm_3 -> c
askm_4 -> d
askm_5 -> e

This approach flattens the previous list approach into key-value pairs. We append an identifier (an incremental integer) to the key and use the put method of RocksDB to write the key/value pair. For an incoming key, we always need to check the biggest suffix to calculate the new suffix. For reading, we implemented a prefixScan method, which retrieves the values for a given prefix.
We call this method later on putRange/getRange.

Setup

For N values for userID "askm", how fast are writes and reads when:

we use one list for all the values (implementation: use RocksDB merge operator and Java list)
we use a range for each value

The values are randomly generated UUIDs.
We have one record with a list of N values for the first approach. We have N key/value pairs in the second approach.

Possible values for N: 10/100/1000/10000/100000/1000000 randomly generated UUIDs.

To keep the experiment setup simple, we avoid any CF and store the data as Key/Value in RocksDB.

The values should be unique IDs. These values represent pointers to the actual value in the Value CF later on.

The benchmarks are running on my notebook:
Model: MacBook Pro (14-inch, 2021)
Chip: Apple M1 Pro
Memory: 32 GB
OS: macOS Monterey Version 12.4

Write throughput

For each N value, we should ingest the data into RocksDB (putList, putMerge, and putRange) and calculate the average throughput.

Read throughput

Before the benchmark starts, we populate the RocksDB with N randomly generated UUIDS for the key askm. After the preparation is ready, for the list approach, we send only one get request (getList, getMerge) to retrieve the whole bytes array from RocksDB.
For the range approach, we have to make a prefix scan on our key (getRange) to retrieve all the values. This can be done using the Prefix Seek and implementing our own prefix scan method.

from quick.

raminqaf commented on June 8, 2024

We are now planning to implement the range query functionality. For more information, refer to #55

from quick.

Evaluate range queries about quick HOT 5 CLOSED

Comments (5)

Benchmarks (Part 1)

List vs. Range

List

Range

Setup

Write throughput

Read throughput

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent