Giter VIP home page Giter VIP logo

Comments (7)

Mytherin avatar Mytherin commented on June 16, 2024 1

MacOS has compressed memory which is why the physical memory usage might be different than the memory usage reported by DuckDB. DuckDB's memory limit and memory usage is measured in uncompressed memory usage.

from duckdb.

Mytherin avatar Mytherin commented on June 16, 2024 1

I would recommend using ndjson as your json format - that should be more (memory) efficient

from duckdb.

Tishj avatar Tishj commented on June 16, 2024

I assume the profiler is incorrect, but to verify this, try setting the thread limit to 1 (pragma threads=1) and try the experiments again.
My hunch is that the memory limit can be much lower without erroring in that case.

from duckdb.

cfahlgren1 avatar cfahlgren1 commented on June 16, 2024

I assume the profiler is incorrect, but to verify this, try setting the thread limit to 1 (pragma threads=1) and try the experiments again.

Wouldn't that accomplish the same as the

DUCKDB_WORKER_THREADS = 1

in the reproduction script?

I tried with a

duckdb_conn.execute("SET THREADS to 1;")

I am seeing the same results. I am also checking with asitop on the OS level to view memory limit when I run it and am seeing about 380.3 MiB to match.

from duckdb.

cfahlgren1 avatar cfahlgren1 commented on June 16, 2024

Using DuckDB CLI with the default memory limit of 12.7GB and 10 threads doesn't seem to be using a ton of memory for the file, but inside a server it seems to be trying to need more than 26GB of memory even though it works fine on a 16 GB Mac M1. The example above is the closest example I could find to replicate the behavior.

Might have to do with having a few nested columns 🤔

from duckdb.

cfahlgren1 avatar cfahlgren1 commented on June 16, 2024

MacOS has compressed memory which is why the physical memory usage might be different than the memory usage reported by DuckDB. DuckDB's memory limit and memory usage is measured in uncompressed memory usage.

Thanks for your response and help. Good to know.

Are there any methods / tricks to be able to load / query JSON without so much memory?

For ex would any of the below methods help:

  • loading json from pandas and querying the resulting dataframe
  • providing json structure so it doesn't need to infer etc
  • fetch_record_batch

from duckdb.

MotazBellah avatar MotazBellah commented on June 16, 2024

I am also facing the same problem. Does anyone knows how to resolve this ?

from duckdb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.