Comments (2)
Notes from today's discussion
storage
abstract block backends for file/cloud/...
block references in API, checksums on blocks
configurable block size
fsync costs depending on file size
async io?
use direct io O_DIRECT
group commit?
hashing
logical checks? format correct? validation
write-verify before transaction commit on checkpoint
WAL write-verify?
1 MB min for magnetic
32K for flash
from duckdb.
I just wanted to add one more +1 to this feature! I know it is on the roadmap already, just excited for it! I also figured I'd lay out my use case in case it is helpful to see how DuckDB is coming in handy "in the wild". Since SQL is the language I'm best at, I really enjoy using DuckDB rather than fighting with data frame syntax.
I am working with an 80GB data set to start with and my data processing is expanding the size to 400 GB. Each query is using almost all of my RAM (200 of 250GB) so I have to disconnect/reconnect between each query to trigger a checkpoint. That is leading to a full 80-400GB re-checkpoint that slows things down a fair bit (30 minutes runtime for each of 100 queries...). I'm still on version 0.2.2, but I didn't see something in the release notes that would change this behavior quite yet.
Thank you for all of the work you do!! Local big data processing on DuckDB is so close!!
from duckdb.
Related Issues (20)
- Aggregate Function 'last' flip-flops between correct and incorrect result HOT 6
- Triggered an INTERNAL Error: BoundExpression::GetExpression called on empty bound expression
- read_csv_auto() ignore_errors stopped working in v0.9 HOT 2
- Column identifier is case sensitive when used as a macro parameter. HOT 1
- DuckDB SIGSEGV when creating TABLE CONSTRAINT with non-existing INDEX
- duckdb raise the error too late when create index with an exists name HOT 4
- Can not find JDBC driver class when adding new connection on DBeaver in v0.9.2 HOT 6
- DuckDB parser crashes when giving empty ROW
- Getting Segmentation fault (core dumped) while loading Extension HOT 10
- Error: INTERNAL Error: Calling DefaultValue() on a generated column
- Lag function with row() function returns error HOT 3
- Using show() after DISTINCT on INTs causes query to hang in Python HOT 2
- EPOCH FROM can't work in TIMESTAMP_MS, TIMESTAMP_S, TIMESTAMP_NS
- Interval fractional second precision is discarded/fails to be parsed HOT 6
- Error when a parquet partition key contains numbers and alphabetics characters HOT 5
- Error when reading a parquet partition with an empty string key HOT 4
- Unexpected error message HOT 3
- S3 region must always be set, even when working with non-AWS object store HOT 2
- Issue with "week" in date_add function HOT 3
- Python bus error when using a UDF with a LIMIT clause HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from duckdb.