arenn / gzrt Goto Github PK
View Code? Open in Web Editor NEWgzip Recovery Toolkit aka gzrecover
Home Page: http://www.urbanophile.com/arenn/hacking/gzrt/gzrt.html
gzip Recovery Toolkit aka gzrecover
Home Page: http://www.urbanophile.com/arenn/hacking/gzrt/gzrt.html
Opened input file for reading: ../firefly-rk3288_sdk_git.tar.gz
Opened output file for writing: firefly-rk3288_sdk_git.tar.recovered
Found error at byte -394639277 in input stream
Found good data at byte -395313152 in input stream
Total decompressed output = -334166016 bytes
In [2]: 2**32 -394639277
Out[2]: 3900328019
very good utility for (re)downloading files from chinese baidu.com site.
thanks a lot !!
Hey this little program just saved me from having to tell the client that we had lost years of accounting data - thank you so much!
Since the purpose of the program is to recover corrupted files, wouldn't it make more sense if it could ignore read failures when it encounters them and just skip forward in the file?
at first i thought it was due to the corruption, but it's happening to lines before the corruption as well.
my file is a stream of text lines. when i get a new line of text, i compress it with gzip and append it to the end of the file. this has been working fine for a month or so. my computer froze and i restarted the process before i noticed the issue. if i use gunzip, the beginning 90% of the file is unzipped fine and the last 10% is missing. with gzrecover, it seems to all be there, but there's random characters at the start of every line. usually "XP" but sometimes up to 10 unprintable characters. i can understand it being there after the corruption and i'm happy to go clean it up but i'm not sure why it's happening in the first 90% of the file.
gziprecover 0.8 on debian 11 amd64
My users often have huge .gz files that they would like to process in parallel.
Can gzrt be adapted so it can extract a valid gz-file in blocks?
Let us assume I have a 1 GB file.gz and I want to extract blocks of around 1 MB of compressed data. I want to do this in parallel. So first I want to identify positions where a valid gz-block starts:
$ gzrt --next-start-of-block 0
0
$ gzrt --next-start-of-block 1000000
1234888
$ gzrt --next-start-of-block 2000000
2123488
...
$ gzrt --next-start-of-block 999000000
999348877
The idea is to seek to the byte position and then identify the next valid gz-block. When it is identified, print the byteposistion and exit.
After identifying where blocks start I would then be able to extract from one block to another:
gzrt --from-byte 0 --to-byte 1234888 | my_program &
gzrt --from-byte 1234888 --to-byte 2123488 | my_program &
gzrt --from-byte 2123488 --to-byte 3212348 | my_program &
...
gzrt --from-byte 998374753 --to-byte 999348877 | my_program &
$ wget http://www.cpan.org/src/5.0/perl-5.10.1.tar.gz
$ gzrecover perl-5.10.1.tar.gz
$ zcat perl-5.10.1.tar.gz |wc -c
59668480
$ wc -c perl-5.10.1.tar.recovered
59668481 perl-5.10.1.tar.recovered
The 1 byte is the last byte.
gzrt can work as a filter, but it is not in the obvious way that UNIX command normally do:
cmd1 | cmd2 | cmd3
cmd1 infile | cmd2
cmd1 infile outfile
Consider adding support to do the above.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.