Giter VIP home page Giter VIP logo

Comments (5)

manucorporat avatar manucorporat commented on May 16, 2024 1

Hey! i guess i change how we generate to not bee in streaming, but it would still be problematic in case someone uploads a streaming generated .zip that has this issue...

Thanks a lot of your help! feel free to close issue :)

from fflate.

101arrowz avatar 101arrowz commented on May 16, 2024

I'll take a look. BTW you should probably try using unzip instead of unzipSync until I fix this issue because it is multithreaded and therefore usually faster.

from fflate.

101arrowz avatar 101arrowz commented on May 16, 2024

@manucorporat I took a look at the example file and unfortunately you've come across one of the shortcomings of streaming unzip in general. This ZIP file was created in a streaming manner, i.e. the lengths of each file are not encoded in the archive until the very end of the archive. Therefore, to see when one file ends and the next begins, fflate has to iterate through each byte in the file and look for a magic number 0x4034B50 that signifies the start of the next file. Unfortunately, 307.dcm encodes this exact byte sequence at byte 420119, so the decompressor thinks it found a new file when in reality it's still in 307.dcm. The reason the non-streaming method works is that it has access to the entire file, so it can read the file lengths encoded at the end of the archive instead of trying to find the magic number.

This isn't something that fflate can resolve; if you look around, you'll see that the reason most people don't offer streaming decompressors is this exact issue.

The conditions for this bug to occur are so specific that I thought it would never happen. I will document this behavior, but unfortunately I don't think there is any way to avoid this problem besides using a non-streaming API. Sorry for this bug; let me know if you have any questions.

from fflate.

manucorporat avatar manucorporat commented on May 16, 2024

Hello @101arrowz, thanks a lot for the amazing response!
I am so happy to know all this interesting details about zip and compression, bummer about the streaming issue, i guess the chances of it happening are more likely for big files, or a zip within a zip?

unzipSync and unzip is working great! our code is already already running within a worker, and safari doesn't support workers within workers (which is lame), so we are forced to use the sync apis.

We are using fflate at https://set.health/, and we love it!

Out of curiosity, do you know if .tar has a similar problem?

Would you be up for adding a configuration second argument to unzipSync() and unzip() to "filter" files (for performance reasons?)

from fflate.

101arrowz avatar 101arrowz commented on May 16, 2024

i guess the chances of it happening are more likely for big files, or a zip within a zip

Pretty much, but the magic number was specifically designed to be quite rare, so it's unfortunate this is a problem.

Out of curiosity, do you know if .tar has a similar problem?

TAR is a far superior format to ZIP IMO. No, it does not suffer from this issue during decompression; however, it can't be streamed in compression.

Since it seems that you're able to choose .zip or .tar, it looks like you have control over the file creation process. In that case, note that the zip command will not yield a streamed ZIP. It's important to note that ZIPs created without streaming can always be decompressed streaming, but ZIPs created with streaming can usually be decompressed streaming. So if you can force the ZIP to be created in a non-streaming manner, you will never face this problem. You can use InfoZIP to do this: zip -r myzip.zip folder/.

from fflate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.