Comments (14)
That does present challenges, and I understand the implementation differences. You'd probably have to create a map of file name to "byte ranges" to know which file it's reading from at any given time. That would allow me to identify which archive file is corrupted, perhaps. Probably a lot of work for you.
I would probably have to add an optional interface that an io.ReaderAt
could implement that allows you to query what the underyling file(s) were for a given byte offset and then surface that information in a read error. It would likely tie in with #38 to provide a hint if there was any encryption involved that could point to an incorrect password.
Having a list of archive names is probably the easier task. I only rely on that when the extraction finishes without errors. I'm currently more interested in "a full file list after successful extraction" than I am being able to identify the bad file.
Given I have to open every volume up front, a Volumes()
method that just returns the full list would likely be correct in any case so I'll add that.
I see @ulikunitz asked you a question on the PR you opened. Let me know if you need anything from me there to help answer that.
Yes, if you still have your forked copy of the xz library, could you add a fmt.Println
or similar that dumps out the r.h.dictCap
value from the archive header (where it was previously erroring if it was less than MinDictCap
) when reading those archives. I'm curious if it's entirely unset, or just set to a value smaller than 4096.
from sevenzip.
I also have two sets of 7z files that this library will not extract. They're 10-15 years old. One set (4 files) is about 750MB. The other (3 files) is about 450MB. Are you interested in having these files to test with? both produce this error:
os.Open: 1 error occurred:
* lzma: dictionary capacity too small
from sevenzip.
So it just returns either the sole archive filename, or the list of .001, .002, etc. filenames when the archive is split? That doesn't seem unreasonable to add.
I also have two sets of 7z files that this library will not extract. They're 10-15 years old. One set (4 files) is about 750MB. The other (3 files) is about 450MB. Are you interested in having these files to test with? both produce this error:
os.Open: 1 error occurred:
* lzma: dictionary capacity too small
This error comes from the underlying LZMA library I'm using. The LZMA stream in the .7z archive should specify the size of the dictionary but maybe these old archives don't for some reason. This is the code doing the check:
https://github.com/ulikunitz/xz/blob/0b7c695d23f84aa7e968bbcaa1980847683d909a/lzma/reader.go#L72-L78
This code looks a bit suspect to me as I should be able to override the dictionary capacity, but the code checks the size in the stream before allowing the config to override it. I've forked the library and tweaked the order of the statements and pointed a branch to that fork. Can you try the dictcap
branch of the sevenzip library and see if that works any better? If so, I'll file a PR with the xz library. Hopefully that will save passing 450-750 MB archives about 😉
from sevenzip.
Trying it shortly!
EDIT:
first go..
$ go get -u github.com/bodgit/sevenzip@dictcap
go: downloading github.com/bodgit/sevenzip v1.3.1-0.20221211005010-415a639b09a0
go: github.com/bodgit/sevenzip@dictcap: github.com/bodgit/[email protected]: verifying module: github.com/bodgit/[email protected]: reading https://sum.golang.org/lookup/github.com/bodgit/[email protected]: 404 Not Found
server response: not found: fetch timed out
^^ there's an issue (closed yesterday) on the go GitHub repo for this. apparently it's not fixed. This worked:
GOPRIVATE=github.com/bodgit/sevenzip GOPROXY=direct go get -u github.com/bodgit/sevenzip@415a639b09a0342b3e014614f973273f3ebcd539
from sevenzip.
Both archives still produce the same error with that branch. EDIT: I suspect adding that replace
in go.mod isn't working its way through my working tree.
from sevenzip.
I cloned your xz repo, pulled the dictcap branch, and replace'd it in my local go.mod. I even made sure your change was there, and added a word to the error message to make sure I had that change when the error reoccurred. Glad I did too because I typo'd the replace the first time around, and would not have caught it had I not changed the error.
Anywho.. it works now. That changed allowed me to extract those archives. 🤞 you get upstream to accept it
from sevenzip.
ulikunitz/xz#52 raised for fixing the unreadable archives.
from sevenzip.
Now that I think about it, it would be best if the Volumes() method provided point-in-time data, so when we get an error, we could look at the last slice item returned to identify the corrupted archive. In other words, the method should only return files that have been processed, and the one currently processing (if it's not done). I'm taking this ideal from here: https://github.com/nwaples/rardecode/blob/e2fa07408d4b19ae0500efbcc6983863c95f821e/reader.go#L359-L364
The 7-zip code has no idea that its dealing with an archive split into multiple volumes, the OpenReaderWithPassword()
function just opens all of the volumes and creates an io.ReaderAt
implementation that spans the volumes and treats them as one big file and then passes it off to the rest of the code. That's the only interface the code needs.
I'm also under the impression that that particular RAR library treats archives in a linear manner; it starts at the beginning of the archive and processes each volume in order. Whereas my package allows you to access any file randomly in the archive so you can bounce around reading from any volume, plus all of the archive metadata is usually found at the end of the archive, so you need to have opened all of the volumes for the initial seek to hit the right location.
from sevenzip.
That does present challenges, and I understand the implementation differences. You'd probably have to create a map of file name to "byte ranges" to know which file it's reading from at any given time. That would allow me to identify which archive file is corrupted, perhaps. Probably a lot of work for you.
Having a list of archive names is probably the easier task. I only rely on that when the extraction finishes without errors. I'm currently more interested in "a full file list after successful extraction" than I am being able to identify the bad file.
I see @ulikunitz asked you a question on the PR you opened. Let me know if you need anything from me there to help answer that.
Thanks for all your help!
from sevenzip.
Sorry just saw this. Will do that shortly!
EDIT:
First archive (4 * 200MB) printed several numbers :
dictCap 2048
dictCap 67108864
dictCap 1048576
dictCap 1048576
dictCap 67108864
Next archive (3 * 200MB):
dictCap 1024
dictCap 67108864
dictCap 1048576
dictCap 1048576
dictCap 67108864
And a third archive (4 * 200MB):
dictCap 2048
dictCap 67108864
dictCap 1048576
dictCap 1048576
dictCap 67108864
from sevenzip.
Release v0.5.11 of the xz library has solved the problem. All 3 archives extract after updating that module.
from sevenzip.
First archive (4 * 200MB) printed several numbers :
dictCap 2048 dictCap 67108864 dictCap 1048576 dictCap 1048576 dictCap 67108864
The first stream that will be opened in each archive is the compressed archive metadata which will tend to be quite small. So it seems it's set, just to a value smaller than 4096, which apparently is invalid.
Release v0.5.11 of the xz library has solved the problem. All 3 archives extract after updating that module.
That would have been my next request. I'll merge the dependabot PR that bumps the xz package.
from sevenzip.
I've pushed v1.4.0 that has a Volumes()
method and also uses v0.5.11 of the xz library.
Thanks for the feature request and bug report!
from sevenzip.
You rock man, appreciate the quick turnaround and A+ support!
from sevenzip.
Related Issues (20)
- runtime error: index out of range [0] with length 0 HOT 9
- "invalid memory address or nil pointer dereference" when opening 7z file HOT 1
- Errors are not accessible since they are private with a lower case
- Low performance of .7z files with password HOT 2
- Extract files from a self-extracting exe HOT 8
- panic: runtime error: index out of range HOT 2
- Handle wrong passwords HOT 11
- OpenReaderWithPassword() returns nil ("ok,alright") with wrong password (when -mhe=off) HOT 3
- docs/examlpes HOT 1
- about 444G big 7z file, Hangs for 8 hours, don't know what happened HOT 13
- seek over file in archive HOT 2
- Support BCJ method HOT 8
- Support PPC method
- Unable to decrypt 7-ZIP file with password (err: breader.ReadByte: no data!) HOT 1
- Unable to decrypt 7-ZIP file with password (err: breader.ReadByte: no data!) HOT 1
- sevenzip as a guide HOT 3
- Add support for reading self-extracting archives
- Empty File processing issue HOT 4
- Fail to read .exe in .7z HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sevenzip.