arenn / gzrt Goto Github PK

View Code? Open in Web Editor NEW

128.0 128.0 26.0 107 KB

gzip Recovery Toolkit aka gzrecover

Home Page: http://www.urbanophile.com/arenn/hacking/gzrt/gzrt.html

C 100.00%

gzrt's People

Contributors

Stargazers

Watchers

gzrt's Issues

probably just variable type issue - number less than zero

Opened input file for reading: ../firefly-rk3288_sdk_git.tar.gz
Opened output file for writing: firefly-rk3288_sdk_git.tar.recovered
Found error at byte -394639277 in input stream
Found good data at byte -395313152 in input stream
Total decompressed output = -334166016 bytes

In [2]: 2**32 -394639277
Out[2]: 3900328019

very good utility for (re)downloading files from chinese baidu.com site.
thanks a lot !!

Just want to say thank you

Hey this little program just saved me from having to tell the client that we had lost years of accounting data - thank you so much!

Program aborts on error

Since the purpose of the program is to recover corrupted files, wouldn't it make more sense if it could ignore read failures when it encounters them and just skip forward in the file?

recovering but prepending characters to every line

at first i thought it was due to the corruption, but it's happening to lines before the corruption as well.

my file is a stream of text lines. when i get a new line of text, i compress it with gzip and append it to the end of the file. this has been working fine for a month or so. my computer froze and i restarted the process before i noticed the issue. if i use gunzip, the beginning 90% of the file is unzipped fine and the last 10% is missing. with gzrecover, it seems to all be there, but there's random characters at the start of every line. usually "XP" but sometimes up to 10 unprintable characters. i can understand it being there after the corruption and i'm happy to go clean it up but i'm not sure why it's happening in the first 90% of the file.

gziprecover 0.8 on debian 11 amd64

Wish: extract parts of gzip file

My users often have huge .gz files that they would like to process in parallel.

Can gzrt be adapted so it can extract a valid gz-file in blocks?

Let us assume I have a 1 GB file.gz and I want to extract blocks of around 1 MB of compressed data. I want to do this in parallel. So first I want to identify positions where a valid gz-block starts:

$ gzrt --next-start-of-block 0
0
$ gzrt --next-start-of-block 1000000
1234888
$ gzrt --next-start-of-block 2000000
2123488
...
$ gzrt --next-start-of-block 999000000
999348877

The idea is to seek to the byte position and then identify the next valid gz-block. When it is identified, print the byteposistion and exit.

After identifying where blocks start I would then be able to extract from one block to another:

gzrt --from-byte 0 --to-byte 1234888 | my_program &
gzrt --from-byte 1234888 --to-byte 2123488 | my_program &
gzrt --from-byte 2123488 --to-byte 3212348 | my_program &
...
gzrt --from-byte 998374753 --to-byte 999348877 | my_program &

gzrecover of perl-5.10.1.tar.gz fails by 1 byte

$ wget http://www.cpan.org/src/5.0/perl-5.10.1.tar.gz
$ gzrecover perl-5.10.1.tar.gz
$ zcat perl-5.10.1.tar.gz |wc -c
59668480
$ wc -c perl-5.10.1.tar.recovered
59668481 perl-5.10.1.tar.recovered

The 1 byte is the last byte.

Wish: more obvious filter support

gzrt can work as a filter, but it is not in the obvious way that UNIX command normally do:

cmd1 | cmd2 | cmd3

cmd1 infile | cmd2

cmd1 infile outfile

Consider adding support to do the above.

arenn / gzrt Goto Github PK

gzrt's People

Contributors

Stargazers

Watchers

Forkers

gzrt's Issues

probably just variable type issue - number less than zero

Just want to say thank you

Program aborts on error

recovering but prepending characters to every line

Wish: extract parts of gzip file

gzrecover of perl-5.10.1.tar.gz fails by 1 byte

Wish: more obvious filter support

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent