Comments (5)
We might also take a look at this utility class: https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/OrderedBytes.html
from quick.
Multi-Index
Keys:
- User ID
- Service ID
Query:
getCount(userId: 5, serviceId: 8) {
count
}
Result:
"data": {
"count": 100
}
Range
Keys:
- Timestamp
Query:
getCount(timestampFrom: 0, timestampTo: 5) {
count
}
Result:
"data": {
"count": [10, 5, 8, 50, 12]
}
Index + Range
Keys:
- User ID
- Timestamp
Query:
getCount(userId: 5, timestampFrom: 0, timestampTo: 5) {
count
}
Result:
"data": {
"count": [5, 2, 3, 25, 7]
}
Multi-Index + Range
Keys:
- User ID
- Service ID
- Timestamp
Query:
getCount(userId: 5, serviceId: 2, timestampFrom: 0, timestampTo: 5) {
count
}
Result:
"data": {
"count": [1, 4, 2, 12, 5]
}
from quick.
The following approaches were examined:
- Flattened Strings,
- Converting an avro to the byte array,
- Converting a proto to the byte array,
- Using a custom comparator.
Because of the specifics of the serialisation in both avro and proto, it is difficult to implement range queries as the order is not kept in a desired way.
Using a custom comparator is challenging as there is no simple method to exchange it with the default one (for example, through dependency injection). One possible approach would be to implement a custom state store but again this is challenging.
It seems that the most convenient method is the use of flattened strings.
Concerning the approach with flattened strings:
- For single index or multi-index, it's easy and works good,
- If we include a range, then it makes no difference whether it's a simple range, range + index, or range + multi-index,
- In the case of (multi)index + multi-range, it is not possible to make a query in a single run as the nature of lexicographical order prohibits this.
from quick.
Benchmarks (Part 1)
List vs. Range
List
Key/Value |
---|
askm -> [a, b ,c ,d ,e] |
This can be implemented and benched marked in two ways:
-
Using java lists: and serializing the list when making a put. For a new coming key-value pair, we use the key to get the value. If the value doesn't exist we create a Java list add the value to it, serialize it, and then put it in RocksDB. If the key exists we deserialize the byte array to a Java list and append the value to it. Then repeat the serialization and putting.
We call this method later on putList/getList. -
Use the merge operator of RocksDB. With this approach, we have to define and set our merge operator (e.g., StringAppendOperator) and RocksDB appends the values for a given key.
We call this method later on putMerge/getMerge.
Range
Key/Value |
---|
askm_1 -> a |
askm_2 -> b |
askm_3 -> c |
askm_4 -> d |
askm_5 -> e |
This approach flattens the previous list approach into key-value pairs. We append an identifier (an incremental integer) to the key and use the put method of RocksDB to write the key/value pair. For an incoming key, we always need to check the biggest suffix to calculate the new suffix. For reading, we implemented a prefixScan method, which retrieves the values for a given prefix.
We call this method later on putRange/getRange.
Setup
For N values for userID "askm", how fast are writes and reads when:
- we use one list for all the values (implementation: use RocksDB merge operator and Java list)
- we use a range for each value
The values are randomly generated UUIDs.
We have one record with a list of N values for the first approach. We have N key/value pairs in the second approach.
Possible values for N: 10/100/1000/10000/100000/1000000 randomly generated UUIDs.
To keep the experiment setup simple, we avoid any CF and store the data as Key/Value in RocksDB.
The values should be unique IDs. These values represent pointers to the actual value in the Value CF later on.
The benchmarks are running on my notebook:
Model: MacBook Pro (14-inch, 2021)
Chip: Apple M1 Pro
Memory: 32 GB
OS: macOS Monterey Version 12.4
Write throughput
For each N value, we should ingest the data into RocksDB (putList, putMerge, and putRange) and calculate the average throughput.
Read throughput
Before the benchmark starts, we populate the RocksDB with N randomly generated UUIDS for the key askm. After the preparation is ready, for the list approach, we send only one get request (getList, getMerge) to retrieve the whole bytes array from RocksDB.
For the range approach, we have to make a prefix scan on our key (getRange) to retrieve all the values. This can be done using the Prefix Seek and implementing our own prefix scan method.
from quick.
We are now planning to implement the range query functionality. For more information, refer to #55
from quick.
Related Issues (20)
- Create E2E test for gateway mutation
- Support GraphQL mutation as list
- Refine and standardise logs
- DateTime JSON Conversion
- Proposal: Quick-Logging Approach
- Add a header for distinguishing the services in logs HOT 2
- Create E2E test for range queries with `range-key`
- Add documentation for the range queries with `range-key`
- keyField fetching raises error
- Add Protobuf support to `ListFieldFetcher` and add tests
- Add tests for GenericRecord and DynamicMessage types of the `keyFieldFetcher` and the `ListFieldFetcher`
- Create an MDC to clearly extract the request ids set by the RequestHeaderFilter.
- Provide a mechanism for propagating the request id (outgoing requests)
- Logging configuration over the config-map does not work properly
- Fix typo in the range-key docs
- Test Traefik 2.9+ HOT 1
- Update Quick version for the car sharing demo HOT 1
- Use GitHub Actions to add issues automatically to project board
- Fix deletion of Mirrors with `--range-key` option
- Delete topics from topic registry over Kafka instead of the ingest service
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from quick.