Comments (2)
Did one more test with the following table (2 partitions/1 shard each)
CREATE TABLE uservisits (
"sourceIP" STRING,
"destinationURL" STRING,
"visitDate" TIMESTAMP,
"adRevenue" FLOAT,
"UserAgent" STRING INDEX USING FULLTEXT,
"cCode" STRING,
"lCode" STRING,
"searchWord" STRING,
"duration" INTEGER,
"y" GENERATED ALWAYS AS date_trunc('year', "visitDate") / 1000000000000,
INDEX uagent_plain USING PLAIN("UserAgent")
-- ^^ used for regex matching ^^ --
) CLUSTERED INTO 1 shards PARTITIONED BY("y") WITH (
number_of_replicas = 0
);
loading first the data files 0-8, and benchmarking the 9th.
0-8 preloaded data files yield the following shard info:
cr> select size / 1024 / 1024, num_docs, table_name, partition_ident from sys.shards;
+----------------------------------------+----------+------------+-----------------+
| ((size / 1024::bigint) / 1024::bigint) | num_docs | table_name | partition_ident |
+----------------------------------------+----------+------------+-----------------+
| 971 | 5259365 | uservisits | 04130 |
| 373 | 1699738 | uservisits | 04132 |
+----------------------------------------+----------+------------+-----------------+
and the benchmark results are:
Q: insert into uservisits ("adRevenue", "destinationURL", "searchWord", "UserAgent", "duration", "visitDate", "sourceIP", "lCode", "cCode") values ($1, $2, $3, $4, $5, $6, $7, $8, $9)
C: 20
| Version | Mean ± Stdev | Min | Median | Q3 | Max |
| V1 | 6.815 ± 14.030 | 1.956 | 4.119 | 4.939 | 182.224 |
| V2 | 4.156 ± 17.014 | 0.692 | 1.945 | 2.600 | 291.667 |
├---------┴-------------------------┴------------┴------------┴------------┴------------┘
| - 48.47% - 71.69%
This shows, that as the shards grow larger the optimized insert path provides more significant performance improvement.
from crate.
Related Issues (20)
- Add `quarter` to `INTERVAL` values HOT 1
- Improvement to the shard allocation logic when `max_shards_per_node` configured HOT 2
- Consider if some words could be allowed as identifier names without quoting HOT 1
- Vector Store: Support for Cosine similarity and Dot Product when creating a FLOAT_VECTOR HOT 3
- fdw: Parquet foreign data wrapper (write support)
- `_raw` returns IDs instead of column names HOT 4
- FDW - Can't query data from remote server using a non-superuser HOT 1
- ElasticsearchUncaughtExceptionHandler] [crate1] uncaught exception in thread [main] HOT 2
- Expand blob data type limitations in the docs
- dev cluster hash join regression HOT 1
- JWT: support public keys caching HOT 3
- Architecture image not readable in dark mode HOT 1
- fdw/jdbc: Support and document adding additional jdbc drivers
- max_shards_per_node not behaving as documented HOT 5
- Improve SQLParseException to include query and approximate position of the error. HOT 2
- COPY FROM does not work on all files inside folder HOT 10
- Vector Store: Provide distance functions as scalar functions HOT 3
- Support for CREATE TYPE HOT 5
- Unable to copy data between tables using the syntax: `INSERT INTO test2 (SELECT * FROM test)` HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crate.