Comments (7)
The limitation is based on Int32
being the type to represent the size of a collection (String
in this case), so it can only have Int32::MAX
elements.
Now I'm wondering if it really makes sense to actually have such a large String
instance. It might be quite inconvenient because char indices are non-linear. So that specific detail might be up for debate.
But other collections, particularly Slice
, should certainly be able to hold more than 2GB.
@koute Do you have a real use case where you need this or is it just a theoretical discussion?
As a workaround, you can open the file and read it in chunks, performing operations on each individual chunk of data.
Related: #8111 (comment), https://forum.crystal-lang.org/t/is-increasing-array-index-size-in-the-future/6610/2
from crystal.
Also related is #4011. It was originally about getting no error when accessing an out-of-range array index (which is fixed). Now it's only about improving the error message to point out the reason for overflow. I suppose this would be an improvement for this use case as well.
from crystal.
@koute Do you have a real use case where you need this or is it just a theoretical discussion?
Yes, this is a real use case.
I was looking to switch from Ruby to Crystal when writing my scripts to speed them up, and this was the very first issue I've encountered, which really surprised me (sure, I expected Crystal to not have automatic bigint promotion like Ruby has, but I didn't expect it to use 32-bit integers for something like this, which seems baffling to me considering how tiny 2GB is nowadays).
I use File.read
in Ruby on big files all the time to process them, especially in my quick & dirty scripts. I know this can be worked around by e.g. reading the file in chunks, and if this was not a quick & dirty script I would certainly do that, but for quick one-off scripts I just want to minimize the friction while writing them.
from crystal.
Not sure about Ruby's situation, but contiguous allocations in that size range will perform poorly with the Boehm GC, especially on Windows (contrast with #14395), so I don't think the standard library is going to accommodate them in the near future.
Memory-mapped I/O is a notable exception where you could have huge contiguous memory ranges without any allocation. To create a read-only view on Windows:
lib LibC
PAGE_READONLY = 0x02
FILE_MAP_READ = 0x0004
fun CreateFileMappingA(hFile : HANDLE, lpFileMappingAttributes : SECURITY_ATTRIBUTES*, flProtect : DWORD, dwMaximumSizeHigh : DWORD, dwMaximumSizeLow : DWORD, lpName : LPSTR) : HANDLE
fun MapViewOfFile(hFileMappingObject : HANDLE, dwDesiredAccess : DWORD, dwFileOffsetHigh : DWORD, dwFileOffsetLow : DWORD, dwNumberOfBytesToMap : SizeT) : Void*
fun UnmapViewOfFile(lpBaseAddress : Void*) : BOOL
end
File.open(...) do |file|
handle = Crystal::System::FileDescriptor.windows_handle(file.fd)
size = file.size
mapping = LibC.CreateFileMappingA(handle, nil, LibC::PAGE_READONLY,
LibC::DWORD.new!(size >> 32), LibC::DWORD.new!(size), nil)
view = LibC.MapViewOfFile(mapping, LibC::FILE_MAP_READ, 0, 0, 0).as(UInt8*)
# this should be okay even if `size > Int32::MAX`
# bytes = Bytes.new(view, size, read_only: true)
# io = IO::Memory.new(bytes, writeable: false)
LibC.UnmapViewOfFile(view)
LibC.CloseHandle(mapping)
end
or on Unix-like systems:
File.open(...) do |file|
size = file.size
view = LibC.mmap(nil, size, LibC::PROT_READ, LibC::MAP_PRIVATE, file.fd, 0).as(UInt8*)
# this should be okay even if `size > Int32::MAX`
# bytes = Bytes.new(view, size, read_only: true)
# io = IO::Memory.new(bytes, writeable: false)
LibC.munmap(view, size)
end
If Slice
does support 64-bit sizes, then bytes
could probably act as a drop-in replacement for File.read
or, more precisely, File.open(..., &.getb_to_end)
. io
would also work as long as you don't need any IO::FileDescriptor
-specific functionality.
from crystal.
@HertzDevil Nice! I've been wondering about mmap for this use case, and this looks exciting.
from crystal.
This issue has been mentioned on Crystal Forum. There might be relevant details there:
https://forum.crystal-lang.org/t/built-in-support-for-mmap/6772/1
from crystal.
This issue has been mentioned on Crystal Forum. There might be relevant details there:
https://forum.crystal-lang.org/t/built-in-support-for-mmap/6772/2
from crystal.
Related Issues (20)
- Add API for `Base64.encode` / `Base64.decode` with an `IO` as the source HOT 3
- Add methods for filling a buffer from an IO greedily/lazily HOT 5
- Enable `ameba` in this repo HOT 19
- formatter cause syntax error. HOT 2
- `Crypto::Subtle.constant_time_compare` does not work with `StaticArray` / Cannot overwrite `StaticArray` variable HOT 2
- Parser failure on argument list with trailing comma HOT 6
- RFC: Tracing HOT 8
- `Crystal::System::User#from_*?` et al. don't work if required buffer size greater than initial buffer size HOT 1
- If any system user entry exceeds `GETPW_R_SIZE_MAX`, retreiving any user is impossible HOT 2
- `IO#same_content?` returns `true` if `stream1` is a prefix of `stream2` HOT 1
- `docs_main.cr` is a mess HOT 1
- Interpreter Error: can't cast Pointer(Void) to (Pointer(Void) | String)
- Parser considers empty argument list with trailing comma as invalid HOT 5
- Sockets are inherited by subprocesses HOT 5
- Calling String::Formatter#format, when running from a directory with a long path, causes an infinit hang when compiled in release mode. HOT 9
- `Socket#close_on_exec?` fails to build on Windows
- `Indexable#find` and `#find!` with start offsets
- False positive for "Recursive struct detected" HOT 2
- Struct#pretty_print Sorts Fields by Name HOT 6
- Hex Array Literals HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crystal.