Comments (7)
MacOS has compressed memory which is why the physical memory usage might be different than the memory usage reported by DuckDB. DuckDB's memory limit and memory usage is measured in uncompressed memory usage.
from duckdb.
I would recommend using ndjson as your json format - that should be more (memory) efficient
from duckdb.
I assume the profiler is incorrect, but to verify this, try setting the thread limit to 1 (pragma threads=1
) and try the experiments again.
My hunch is that the memory limit can be much lower without erroring in that case.
from duckdb.
I assume the profiler is incorrect, but to verify this, try setting the thread limit to 1 (
pragma threads=1
) and try the experiments again.
Wouldn't that accomplish the same as the
DUCKDB_WORKER_THREADS = 1
in the reproduction script?
I tried with a
duckdb_conn.execute("SET THREADS to 1;")
I am seeing the same results. I am also checking with asitop on the OS level to view memory limit when I run it and am seeing about 380.3 MiB
to match.
from duckdb.
Using DuckDB CLI with the default memory limit of 12.7GB
and 10
threads doesn't seem to be using a ton of memory for the file, but inside a server it seems to be trying to need more than 26GB of memory even though it works fine on a 16 GB Mac M1. The example above is the closest example I could find to replicate the behavior.
Might have to do with having a few nested columns 🤔
from duckdb.
MacOS has compressed memory which is why the physical memory usage might be different than the memory usage reported by DuckDB. DuckDB's memory limit and memory usage is measured in uncompressed memory usage.
Thanks for your response and help. Good to know.
Are there any methods / tricks to be able to load / query JSON without so much memory?
For ex would any of the below methods help:
- loading json from pandas and querying the resulting dataframe
- providing json structure so it doesn't need to infer etc
fetch_record_batch
from duckdb.
I am also facing the same problem. Does anyone knows how to resolve this ?
from duckdb.
Related Issues (20)
- EXCEL File corrupt after exporting it to S3 HOT 1
- Process killed with DISTINCT ON and column_type MAP, STRUCT or VARCHAR[] HOT 3
- DuckDB constructs wrong URL when using fsspec's WebHDFS filesystem to read files HOT 2
- Inconsistent Timing CLI vs. Python HOT 2
- Linear scan filters are much slower than expected HOT 7
- FATAL Error: Failed to create checkpoint because of error: Invalid bit width for bitpacking HOT 6
- Importing from Pandas has incorrect column type for empty string/object columns
- datetime values read from pandas dataframe are broken if the the df has a gapped index
- Precedence for @ is too low when used with + HOT 2
- duckdb.duckdb.OutOfMemoryException: Out of Memory Error: Failed to allocate block of 262144 bytes (bad allocation) HOT 4
- "How can I use the GBK encoding? HOT 2
- Cannot specify the name of the filename column using the filename argument HOT 3
- odd `extract` interval behavior HOT 4
- INSERT OR IGNORE with Autoincrement primary key / sequence RETURNING wrong id; HOT 2
- duckdb segv seg faults depending on python import order
- Inconsistency between duckdb_settings() and duckdb_get_config_flag HOT 1
- Invalid Input Error: executemany requires a list of parameter sets to be provided
- datepart('quarter', INTERVAL) is broken HOT 3
- Branches of CASE statement reevaluate `random()` expression HOT 1
- Insert on conflict always returning nextval of primary key sequence
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from duckdb.