Giter VIP home page Giter VIP logo

Comments (10)

fcorbelli avatar fcorbelli commented on September 17, 2024

It means breaking backward compatibility with zpaq.
Which I'm not going to do (I'd have 100 things to change that I'd need a lot more, before something like that)
I also oppose, when not strictly necessary, the use of archives split over multiple files. They are fragile, there is nothing to be done about it. I wrote the backup and testbackup command pair to mitigate the problem

from zpaqfranz.

 avatar commented on September 17, 2024

breaking backward compatibility with zpaq

I agree. However, the ZPAQ archive is practically 4 kinds of blocks (c, d, h, and i) concatenated together. Read the blocks in sequence and you get the ZPAQ archive. It is like the fixed-size split archives, but in my case the size of the file is one block long, the i blocks are stored in one file (maybe I could give up on that), and the d blocks are compressed with -m0. Compressing it "recompress" it to -m5; the uncompressed block size is the same. The responsibility for ensuring the blocks match is in the operator.

They are fragile, there is nothing to be done about it.

For smaller ones maybe, but in my experience, one single multi-terabyte file is more fragile, especially when trying to compress or update it in one attempt and the power is cut (happened to me many times, due to the filesystem truncating it), so I would argue that in my case storing the blocks separately is necessary to minimize damage, as a partial archive is still useful to recover some of the files.

from zpaqfranz.

fcorbelli avatar fcorbelli commented on September 17, 2024

For smaller ones maybe, but in my experience, one single multi-terabyte file is more fragile, especially when trying to compress or update it in one attempt and the power is cut (happened to me many times, due to the filesystem truncating it), so I would argue that in my case storing the blocks separately is necessary to minimize damage, as a partial archive is still useful to recover some of the files.

Well, no
An incomplete transaction will be discarded on the very next update

If you want to freeze and resume zpaq operation you can always use a virtual machine and suspend it

from zpaqfranz.

 avatar commented on September 17, 2024

An incomplete transaction will be discarded on the very next update

Yes, but with the "transaction" taking multiple weeks (for the initial archive) or multiple days (updates to it), you lost everything compared to just one (or a few) blocks. (It is the same with xz but I use ZPAQ specifically for the ability to extract files without extracting the entire archive, plus dedup) I have a UPS but that only lasts a few minutes to cancel and do an fsync+unmount to the drive (I put it in a script). In one case it is not enough and I lost the entire archive file as it was truncated to 0 bytes for some reason.

Use a virtual machine and suspend it.

Tried with QEMU. On top of the overheads it did not work very well. I also tried criu and it may work based on how lucky the PID number is. In fact this is my main compliant with ZPAQ, as to achieve the maximum space saving, I need to deduplicate and compress as much as possible in one go.

from zpaqfranz.

 avatar commented on September 17, 2024

Some clarification:
The split files are single blocks (all 4 types of them) described in the section 8 of the spec. Pass 1 creates a ZPAQ archive with -m0xx but with the blocks stored as separate files. Pass 2 compresses the blocks and replace the -m0xx d blocks with -m5xx ones. You can get the "traditional" ZPAQ file by combining the blocks to a large file.
The "index" file I have mentioned in the first post is an additional copy of the c/h/i blocks in one file (as mentioned at the end of section 8). Since the c, h, and i blocks are already in the output folder, the index is optional and is for the case where only part of the archive is available (e. g. archives spanning multiple disks or computers, with a complete index on one (or all) of the disks, so you know which disk you have to take out based on the "missing blocks" output when you try extracting).

from zpaqfranz.

fcorbelli avatar fcorbelli commented on September 17, 2024

This seems quite an overkill to gain some bytes with the placebo-level compression
Use a more reliable hypervisor (VMware for example) if you really want to suspend and restart, and that's it

from zpaqfranz.

 avatar commented on September 17, 2024

gain some bytes with the placebo-level compression

For ~500G of uncompressed raw DNG photos (can compress, but cannot dedupe), ZPAQ with -m59 is around 6% smaller than xz -k --lzma2=dict=1610612736,mf=bt4,mode=normal,nice=273,depth=4294967295 (the maximum that can be specified by xz), so I do not consider that as "placebo-level". With dedupe-able files, the advantage would even be higher. The larger the source size, the more ZPAQ saves.

Even without the time-v.-size argument, the "split blocks" format allows more flexibility as I have mentioned above (partial archives, easy to fill among disks, do not have to deal with multi-terabyte files, etc.).

I have used ZPAQ for years to archive data, and I believe that my suggestion fixes all the "pain points" I have encountered.

VMWare for example

I do not feel lucky enough for dkms.

from zpaqfranz.

fcorbelli avatar fcorbelli commented on September 17, 2024

6% does not seems a big gain
I am quite confident that the cost in time, electricity and heating of saving 30GB is not exactly worth it. About 5 euro

However, I really don't think I will do such work. It's difficult, time consuming, and would be used by a single user in the world. But not me 😄

from zpaqfranz.

 avatar commented on September 17, 2024

That is fine. I am closing the issue.

from zpaqfranz.

fcorbelli avatar fcorbelli commented on September 17, 2024

Your request is legit, but way too complex.
Sorry

from zpaqfranz.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.