Long time ZPAQ user here. Since zpaq</code

breaking backward compatibility with zpaq <p dir="auto"

Feature request: 2-pass/block-based archive mode. about zpaqfranz HOT 10 CLOSED

commented on September 17, 2024

Feature request: 2-pass/block-based archive mode.

from zpaqfranz.

Comments (10)

fcorbelli commented on September 17, 2024

It means breaking backward compatibility with zpaq.
Which I'm not going to do (I'd have 100 things to change that I'd need a lot more, before something like that)
I also oppose, when not strictly necessary, the use of archives split over multiple files. They are fragile, there is nothing to be done about it. I wrote the backup and testbackup command pair to mitigate the problem

from zpaqfranz.

commented on September 17, 2024

breaking backward compatibility with zpaq

I agree. However, the ZPAQ archive is practically 4 kinds of blocks (c, d, h, and i) concatenated together. Read the blocks in sequence and you get the ZPAQ archive. It is like the fixed-size split archives, but in my case the size of the file is one block long, the i blocks are stored in one file (maybe I could give up on that), and the d blocks are compressed with -m0. Compressing it "recompress" it to -m5; the uncompressed block size is the same. The responsibility for ensuring the blocks match is in the operator.

They are fragile, there is nothing to be done about it.

For smaller ones maybe, but in my experience, one single multi-terabyte file is more fragile, especially when trying to compress or update it in one attempt and the power is cut (happened to me many times, due to the filesystem truncating it), so I would argue that in my case storing the blocks separately is necessary to minimize damage, as a partial archive is still useful to recover some of the files.

from zpaqfranz.

fcorbelli commented on September 17, 2024

For smaller ones maybe, but in my experience, one single multi-terabyte file is more fragile, especially when trying to compress or update it in one attempt and the power is cut (happened to me many times, due to the filesystem truncating it), so I would argue that in my case storing the blocks separately is necessary to minimize damage, as a partial archive is still useful to recover some of the files.

Well, no
An incomplete transaction will be discarded on the very next update

If you want to freeze and resume zpaq operation you can always use a virtual machine and suspend it

from zpaqfranz.

commented on September 17, 2024

An incomplete transaction will be discarded on the very next update

Yes, but with the "transaction" taking multiple weeks (for the initial archive) or multiple days (updates to it), you lost everything compared to just one (or a few) blocks. (It is the same with xz but I use ZPAQ specifically for the ability to extract files without extracting the entire archive, plus dedup) I have a UPS but that only lasts a few minutes to cancel and do an fsync+unmount to the drive (I put it in a script). In one case it is not enough and I lost the entire archive file as it was truncated to 0 bytes for some reason.

Use a virtual machine and suspend it.

Tried with QEMU. On top of the overheads it did not work very well. I also tried criu and it may work based on how lucky the PID number is. In fact this is my main compliant with ZPAQ, as to achieve the maximum space saving, I need to deduplicate and compress as much as possible in one go.

from zpaqfranz.

commented on September 17, 2024

Some clarification:
The split files are single blocks (all 4 types of them) described in the section 8 of the spec. Pass 1 creates a ZPAQ archive with -m0xx but with the blocks stored as separate files. Pass 2 compresses the blocks and replace the -m0xx d blocks with -m5xx ones. You can get the "traditional" ZPAQ file by combining the blocks to a large file.
The "index" file I have mentioned in the first post is an additional copy of the c/h/i blocks in one file (as mentioned at the end of section 8). Since the c, h, and i blocks are already in the output folder, the index is optional and is for the case where only part of the archive is available (e. g. archives spanning multiple disks or computers, with a complete index on one (or all) of the disks, so you know which disk you have to take out based on the "missing blocks" output when you try extracting).

from zpaqfranz.

fcorbelli commented on September 17, 2024

This seems quite an overkill to gain some bytes with the placebo-level compression
Use a more reliable hypervisor (VMware for example) if you really want to suspend and restart, and that's it

from zpaqfranz.

commented on September 17, 2024

gain some bytes with the placebo-level compression

For ~500G of uncompressed raw DNG photos (can compress, but cannot dedupe), ZPAQ with -m59 is around 6% smaller than xz -k --lzma2=dict=1610612736,mf=bt4,mode=normal,nice=273,depth=4294967295 (the maximum that can be specified by xz), so I do not consider that as "placebo-level". With dedupe-able files, the advantage would even be higher. The larger the source size, the more ZPAQ saves.

Even without the time-v.-size argument, the "split blocks" format allows more flexibility as I have mentioned above (partial archives, easy to fill among disks, do not have to deal with multi-terabyte files, etc.).

I have used ZPAQ for years to archive data, and I believe that my suggestion fixes all the "pain points" I have encountered.

VMWare for example

I do not feel lucky enough for dkms.

from zpaqfranz.

fcorbelli commented on September 17, 2024

6% does not seems a big gain
I am quite confident that the cost in time, electricity and heating of saving 30GB is not exactly worth it. About 5 euro

However, I really don't think I will do such work. It's difficult, time consuming, and would be used by a single user in the world. But not me 😄

from zpaqfranz.

commented on September 17, 2024

That is fine. I am closing the issue.

from zpaqfranz.

fcorbelli commented on September 17, 2024

Your request is legit, but way too complex.
Sorry

from zpaqfranz.

Feature request: 2-pass/block-based archive mode. about zpaqfranz HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent