If I run <div class="highlight highlight-source-julia notranslate position-relativ

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Sure, it's the page_views_sample.csv from <a href="ht

This should be fixed on master since we're now relying on plain <code class="notransla

Inconsistent resources used about csv.jl HOT 7 CLOSED

pearcemc commented on September 27, 2024

Inconsistent resources used

from csv.jl.

Comments (7)

nalimilan commented on September 27, 2024

I think that's because WeakRefString objects can only be used with NullableArray at the moment, so lots of small strings need to be allocated when nullable=true. Does the dataset contain string columns? If you have a small number of categories, using a CategoricalArray would save a lot of memory (not sure it's supported yet.)

from csv.jl.

pearcemc commented on September 27, 2024

@nalimilan Thanks, yes, that's correct.

I'm still not entirely sure why this should explain the disparity though as:

maximum length of any of the small strings is 14 UInt8s (so 14 bytes?)
minimum size of a WeakRefString appears to be 24 bytes (3 Int64s?)

I would have thought that the file should be small enough to fit into RAM using actual strings as:

nrow = 9999999
strlen = 14
colsize_mb = nrow*strlen*sizeof(UInt8))/1e6
#139.999986

There are 6 cols and I have 4GB RAM, so I'd expect a String version to take up roughly 840MB or less.

I'd expect a WeakRefString version of the CSV to be similar. When I end my julia process after reading in with WeakRefString about 1GB RAM gets released (some cols are Int64), which is roughly consistent with the above numbers.

from csv.jl.

nalimilan commented on September 27, 2024

In Julia 0.5, String objects have a significant overhead due to their Array field (this will be much better in 0.6). I don't remember what the exact value is, but for short strings like yours it's a lot.

from csv.jl.

pearcemc commented on September 27, 2024

Thanks for following up. Investigating this the container size of a String seems to be 8 bytes.

julia> N = 10000;

julia> v = @timed ["hello" for i in 1:N]
(String["hello","hello","hello","hello","hello","hello","hello","hello","hello","hello"  …  "hello","hello","hello","hello","hello","hello","hello","hello","hello","hello"],0.045251508,1375438,0.0,Base.GC_Diff(1375438,1,0,12641,0,0,0,0,0))

julia> v[3] #mem allocated
1375438

julia> tots = sizeof(v[1]) + length(v[1])*sizeof(v[1][1]) #size of the containers + size of the data contained? 
1300000

julia> tots/N #cost per string
13.0

So if my strings are each 14 bytes + 8byte container = 22 bytes this should still be smaller than a 24 byte WeakRefString. Unless I'm misunderstanding the info given back by @timed.

So I'm still not sure the String overhead is sufficient to explain the nullable frame fitting into 1GB memory, but the typed version taking up 4GB + 4GBswap + ???.

from csv.jl.

quinnj commented on September 27, 2024

hey @pearcemc, thanks for opening an issue! Is there any chance you can share the file you're using? Could you also share your system info? I know there have been a few platform-specific issues before.

from csv.jl.

pearcemc commented on September 27, 2024

Sure, it's the page_views_sample.csv from here.

My laptop /proc/meminfo looks like:

ubuntu@ubuntu-UX21E:/db/outbrain$ cat /proc/meminfo | head
MemTotal:        3946968 kB
MemFree:          413704 kB
Buffers:           61688 kB
Cached:          1299912 kB
SwapCached:       130404 kB
Active:          2008912 kB
Inactive:        1269984 kB
Active(anon):    1748120 kB
Inactive(anon):   993240 kB
Active(file):     260792 kB

I'm on Julia 0.5.0.

from csv.jl.

quinnj commented on September 27, 2024

This should be fixed on master since we're now relying on plain Vector{Union{T, Null}} for non-String columns, and WeakRefStringArray for String arrays, which will be memory efficient.

from csv.jl.

Inconsistent resources used about csv.jl HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent