Giter VIP home page Giter VIP logo

Comments (5)

DawidNiezgodka avatar DawidNiezgodka commented on June 8, 2024

We might also take a look at this utility class: https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/OrderedBytes.html

from quick.

torbsto avatar torbsto commented on June 8, 2024

Multi-Index
Keys:

  • User ID
  • Service ID

Query:

getCount(userId: 5, serviceId: 8) {
  count
}

Result:

"data": {
  "count": 100
}

Range
Keys:

  • Timestamp

Query:

getCount(timestampFrom: 0, timestampTo: 5) {
  count
}

Result:

"data": {
  "count": [10, 5, 8, 50, 12]
}

Index + Range
Keys:

  • User ID
  • Timestamp

Query:

getCount(userId: 5, timestampFrom: 0, timestampTo: 5) {
  count
}

Result:

"data": {
  "count": [5, 2, 3, 25, 7]
}

Multi-Index + Range
Keys:

  • User ID
  • Service ID
  • Timestamp

Query:

getCount(userId: 5, serviceId: 2, timestampFrom: 0, timestampTo: 5) {
  count
}

Result:

"data": {
  "count": [1, 4, 2, 12, 5]
}

from quick.

DawidNiezgodka avatar DawidNiezgodka commented on June 8, 2024

The following approaches were examined:

  1. Flattened Strings,
  2. Converting an avro to the byte array,
  3. Converting a proto to the byte array,
  4. Using a custom comparator.

Because of the specifics of the serialisation in both avro and proto, it is difficult to implement range queries as the order is not kept in a desired way.
Using a custom comparator is challenging as there is no simple method to exchange it with the default one (for example, through dependency injection). One possible approach would be to implement a custom state store but again this is challenging.
It seems that the most convenient method is the use of flattened strings.

Concerning the approach with flattened strings:

  1. For single index or multi-index, it's easy and works good,
  2. If we include a range, then it makes no difference whether it's a simple range, range + index, or range + multi-index,
  3. In the case of (multi)index + multi-range, it is not possible to make a query in a single run as the nature of lexicographical order prohibits this.

from quick.

raminqaf avatar raminqaf commented on June 8, 2024

Benchmarks (Part 1)

List vs. Range

List

Key/Value
askm -> [a, b ,c ,d ,e]

This can be implemented and benched marked in two ways:

  1. Using java lists: and serializing the list when making a put. For a new coming key-value pair, we use the key to get the value. If the value doesn't exist we create a Java list add the value to it, serialize it, and then put it in RocksDB. If the key exists we deserialize the byte array to a Java list and append the value to it. Then repeat the serialization and putting.
    We call this method later on putList/getList.

  2. Use the merge operator of RocksDB. With this approach, we have to define and set our merge operator (e.g., StringAppendOperator) and RocksDB appends the values for a given key.
    We call this method later on putMerge/getMerge.

Range

Key/Value
askm_1 -> a
askm_2 -> b
askm_3 -> c
askm_4 -> d
askm_5 -> e

This approach flattens the previous list approach into key-value pairs. We append an identifier (an incremental integer) to the key and use the put method of RocksDB to write the key/value pair. For an incoming key, we always need to check the biggest suffix to calculate the new suffix. For reading, we implemented a prefixScan method, which retrieves the values for a given prefix.
We call this method later on putRange/getRange.

Setup

For N values for userID "askm", how fast are writes and reads when:

  1. we use one list for all the values (implementation: use RocksDB merge operator and Java list)
  2. we use a range for each value

The values are randomly generated UUIDs.
We have one record with a list of N values for the first approach. We have N key/value pairs in the second approach.

Possible values for N: 10/100/1000/10000/100000/1000000 randomly generated UUIDs.

To keep the experiment setup simple, we avoid any CF and store the data as Key/Value in RocksDB.

The values should be unique IDs. These values represent pointers to the actual value in the Value CF later on.

The benchmarks are running on my notebook:
Model: MacBook Pro (14-inch, 2021)
Chip: Apple M1 Pro
Memory: 32 GB
OS: macOS Monterey Version 12.4

Write throughput

For each N value, we should ingest the data into RocksDB (putList, putMerge, and putRange) and calculate the average throughput.

Read throughput

Before the benchmark starts, we populate the RocksDB with N randomly generated UUIDS for the key askm. After the preparation is ready, for the list approach, we send only one get request (getList, getMerge) to retrieve the whole bytes array from RocksDB.
For the range approach, we have to make a prefix scan on our key (getRange) to retrieve all the values. This can be done using the Prefix Seek and implementing our own prefix scan method.

from quick.

raminqaf avatar raminqaf commented on June 8, 2024

We are now planning to implement the range query functionality. For more information, refer to #55

from quick.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.