Giter VIP home page Giter VIP logo

Comments (7)

straight-shoota avatar straight-shoota commented on June 17, 2024

The limitation is based on Int32 being the type to represent the size of a collection (String in this case), so it can only have Int32::MAX elements.

Now I'm wondering if it really makes sense to actually have such a large String instance. It might be quite inconvenient because char indices are non-linear. So that specific detail might be up for debate.

But other collections, particularly Slice, should certainly be able to hold more than 2GB.

@koute Do you have a real use case where you need this or is it just a theoretical discussion?
As a workaround, you can open the file and read it in chunks, performing operations on each individual chunk of data.

Related: #8111 (comment), https://forum.crystal-lang.org/t/is-increasing-array-index-size-in-the-future/6610/2

from crystal.

straight-shoota avatar straight-shoota commented on June 17, 2024

Also related is #4011. It was originally about getting no error when accessing an out-of-range array index (which is fixed). Now it's only about improving the error message to point out the reason for overflow. I suppose this would be an improvement for this use case as well.

from crystal.

koute avatar koute commented on June 17, 2024

@koute Do you have a real use case where you need this or is it just a theoretical discussion?

Yes, this is a real use case.

I was looking to switch from Ruby to Crystal when writing my scripts to speed them up, and this was the very first issue I've encountered, which really surprised me (sure, I expected Crystal to not have automatic bigint promotion like Ruby has, but I didn't expect it to use 32-bit integers for something like this, which seems baffling to me considering how tiny 2GB is nowadays).

I use File.read in Ruby on big files all the time to process them, especially in my quick & dirty scripts. I know this can be worked around by e.g. reading the file in chunks, and if this was not a quick & dirty script I would certainly do that, but for quick one-off scripts I just want to minimize the friction while writing them.

from crystal.

HertzDevil avatar HertzDevil commented on June 17, 2024

Not sure about Ruby's situation, but contiguous allocations in that size range will perform poorly with the Boehm GC, especially on Windows (contrast with #14395), so I don't think the standard library is going to accommodate them in the near future.

Memory-mapped I/O is a notable exception where you could have huge contiguous memory ranges without any allocation. To create a read-only view on Windows:

lib LibC
  PAGE_READONLY = 0x02

  FILE_MAP_READ = 0x0004

  fun CreateFileMappingA(hFile : HANDLE, lpFileMappingAttributes : SECURITY_ATTRIBUTES*, flProtect : DWORD, dwMaximumSizeHigh : DWORD, dwMaximumSizeLow : DWORD, lpName : LPSTR) : HANDLE
  fun MapViewOfFile(hFileMappingObject : HANDLE, dwDesiredAccess : DWORD, dwFileOffsetHigh : DWORD, dwFileOffsetLow : DWORD, dwNumberOfBytesToMap : SizeT) : Void*
  fun UnmapViewOfFile(lpBaseAddress : Void*) : BOOL
end

File.open(...) do |file|
  handle = Crystal::System::FileDescriptor.windows_handle(file.fd)
  size = file.size
  mapping = LibC.CreateFileMappingA(handle, nil, LibC::PAGE_READONLY,
    LibC::DWORD.new!(size >> 32), LibC::DWORD.new!(size), nil)
  view = LibC.MapViewOfFile(mapping, LibC::FILE_MAP_READ, 0, 0, 0).as(UInt8*)

  # this should be okay even if `size > Int32::MAX`
  # bytes = Bytes.new(view, size, read_only: true)
  # io = IO::Memory.new(bytes, writeable: false)

  LibC.UnmapViewOfFile(view)
  LibC.CloseHandle(mapping)
end

or on Unix-like systems:

File.open(...) do |file|
  size = file.size
  view = LibC.mmap(nil, size, LibC::PROT_READ, LibC::MAP_PRIVATE, file.fd, 0).as(UInt8*)

  # this should be okay even if `size > Int32::MAX`
  # bytes = Bytes.new(view, size, read_only: true)
  # io = IO::Memory.new(bytes, writeable: false)
  
  LibC.munmap(view, size)
end

If Slice does support 64-bit sizes, then bytes could probably act as a drop-in replacement for File.read or, more precisely, File.open(..., &.getb_to_end). io would also work as long as you don't need any IO::FileDescriptor-specific functionality.

from crystal.

ysbaddaden avatar ysbaddaden commented on June 17, 2024

@HertzDevil Nice! I've been wondering about mmap for this use case, and this looks exciting.

from crystal.

crysbot avatar crysbot commented on June 17, 2024

This issue has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/built-in-support-for-mmap/6772/1

from crystal.

crysbot avatar crysbot commented on June 17, 2024

This issue has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/built-in-support-for-mmap/6772/2

from crystal.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.