Comments (5)
Hey! i guess i change how we generate to not bee in streaming, but it would still be problematic in case someone uploads a streaming generated .zip that has this issue...
Thanks a lot of your help! feel free to close issue :)
from fflate.
I'll take a look. BTW you should probably try using unzip
instead of unzipSync
until I fix this issue because it is multithreaded and therefore usually faster.
from fflate.
@manucorporat I took a look at the example file and unfortunately you've come across one of the shortcomings of streaming unzip in general. This ZIP file was created in a streaming manner, i.e. the lengths of each file are not encoded in the archive until the very end of the archive. Therefore, to see when one file ends and the next begins, fflate
has to iterate through each byte in the file and look for a magic number 0x4034B50
that signifies the start of the next file. Unfortunately, 307.dcm
encodes this exact byte sequence at byte 420119, so the decompressor thinks it found a new file when in reality it's still in 307.dcm
. The reason the non-streaming method works is that it has access to the entire file, so it can read the file lengths encoded at the end of the archive instead of trying to find the magic number.
This isn't something that fflate
can resolve; if you look around, you'll see that the reason most people don't offer streaming decompressors is this exact issue.
The conditions for this bug to occur are so specific that I thought it would never happen. I will document this behavior, but unfortunately I don't think there is any way to avoid this problem besides using a non-streaming API. Sorry for this bug; let me know if you have any questions.
from fflate.
Hello @101arrowz, thanks a lot for the amazing response!
I am so happy to know all this interesting details about zip and compression, bummer about the streaming issue, i guess the chances of it happening are more likely for big files, or a zip within a zip?
unzipSync
and unzip
is working great! our code is already already running within a worker, and safari doesn't support workers within workers (which is lame), so we are forced to use the sync apis.
We are using fflate at https://set.health/, and we love it!
Out of curiosity, do you know if .tar has a similar problem?
Would you be up for adding a configuration second argument to unzipSync() and unzip() to "filter" files (for performance reasons?)
from fflate.
i guess the chances of it happening are more likely for big files, or a zip within a zip
Pretty much, but the magic number was specifically designed to be quite rare, so it's unfortunate this is a problem.
Out of curiosity, do you know if .tar has a similar problem?
TAR is a far superior format to ZIP IMO. No, it does not suffer from this issue during decompression; however, it can't be streamed in compression.
Since it seems that you're able to choose .zip
or .tar
, it looks like you have control over the file creation process. In that case, note that the zip
command will not yield a streamed ZIP. It's important to note that ZIPs created without streaming can always be decompressed streaming, but ZIPs created with streaming can usually be decompressed streaming. So if you can force the ZIP to be created in a non-streaming manner, you will never face this problem. You can use InfoZIP to do this: zip -r myzip.zip folder/
.
from fflate.
Related Issues (20)
- Deflate/Inflate does not work with zlib/gzip c++ HOT 7
- Add Bzip2?
- Hangs on gunzipSync of compressed content with invalid dictionary HOT 3
- Option to shorten output as much as possible HOT 3
- Incorrect argument order in AsyncFlateStreamHandler HOT 5
- Truncated output of gunzip if SIZE footer is incorrect HOT 5
- return uncompressed size instead of resized output buffer HOT 4
- Improve zip documentation HOT 1
- The encoded data was not valid for encoding utf-8 HOT 1
- Streams onData do not work as expected anymore HOT 8
- CDN bundle for 0.8.1 is broken HOT 2
- Async unzip maxing out CPU and memory for some files HOT 3
- fflate.decompressSync is not a function when using `nifti-reader-js` (React) HOT 7
- Occasional CRC Errors When Streaming Data into Zip using AsyncZipDeflate HOT 7
- `zipSync` can get order of ZIP entries wrong, due to how objects work HOT 3
- delete the rust code
- The 2nd+ JPG images in a zip stream being created are corrupt for some reason? HOT 9
- `gunzipSync` failing (0 gzip length) on some npm package tarballs HOT 1
- Unexpected EOF when inflating ZIP HOT 3
- Unzip error thrown when big file as compression type 9 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fflate.