Comments (10)
It means breaking backward compatibility with zpaq.
Which I'm not going to do (I'd have 100 things to change that I'd need a lot more, before something like that)
I also oppose, when not strictly necessary, the use of archives split over multiple files. They are fragile, there is nothing to be done about it. I wrote the backup and testbackup command pair to mitigate the problem
from zpaqfranz.
breaking backward compatibility with zpaq
I agree. However, the ZPAQ archive is practically 4 kinds of blocks (c
, d
, h
, and i
) concatenated together. Read the blocks in sequence and you get the ZPAQ archive. It is like the fixed-size split archives, but in my case the size of the file is one block long, the i
blocks are stored in one file (maybe I could give up on that), and the d blocks are compressed with -m0
. Compressing it "recompress" it to -m5
; the uncompressed block size is the same. The responsibility for ensuring the blocks match is in the operator.
They are fragile, there is nothing to be done about it.
For smaller ones maybe, but in my experience, one single multi-terabyte file is more fragile, especially when trying to compress or update it in one attempt and the power is cut (happened to me many times, due to the filesystem truncating it), so I would argue that in my case storing the blocks separately is necessary to minimize damage, as a partial archive is still useful to recover some of the files.
from zpaqfranz.
For smaller ones maybe, but in my experience, one single multi-terabyte file is more fragile, especially when trying to compress or update it in one attempt and the power is cut (happened to me many times, due to the filesystem truncating it), so I would argue that in my case storing the blocks separately is necessary to minimize damage, as a partial archive is still useful to recover some of the files.
Well, no
An incomplete transaction will be discarded on the very next update
If you want to freeze and resume zpaq operation you can always use a virtual machine and suspend it
from zpaqfranz.
An incomplete transaction will be discarded on the very next update
Yes, but with the "transaction" taking multiple weeks (for the initial archive) or multiple days (updates to it), you lost everything compared to just one (or a few) blocks. (It is the same with xz
but I use ZPAQ specifically for the ability to extract files without extracting the entire archive, plus dedup) I have a UPS but that only lasts a few minutes to cancel and do an fsync
+unmount
to the drive (I put it in a script). In one case it is not enough and I lost the entire archive file as it was truncated to 0 bytes for some reason.
Use a virtual machine and suspend it.
Tried with QEMU. On top of the overheads it did not work very well. I also tried criu
and it may work based on how lucky the PID number is. In fact this is my main compliant with ZPAQ, as to achieve the maximum space saving, I need to deduplicate and compress as much as possible in one go.
from zpaqfranz.
Some clarification:
The split files are single blocks (all 4 types of them) described in the section 8 of the spec. Pass 1 creates a ZPAQ archive with -m0xx
but with the blocks stored as separate files. Pass 2 compresses the blocks and replace the -m0xx
d blocks with -m5xx
ones. You can get the "traditional" ZPAQ file by combining the blocks to a large file.
The "index" file I have mentioned in the first post is an additional copy of the c/h/i blocks in one file (as mentioned at the end of section 8). Since the c, h, and i blocks are already in the output folder, the index is optional and is for the case where only part of the archive is available (e. g. archives spanning multiple disks or computers, with a complete index on one (or all) of the disks, so you know which disk you have to take out based on the "missing blocks" output when you try extracting).
from zpaqfranz.
This seems quite an overkill to gain some bytes with the placebo-level compression
Use a more reliable hypervisor (VMware for example) if you really want to suspend and restart, and that's it
from zpaqfranz.
gain some bytes with the placebo-level compression
For ~500G of uncompressed raw DNG photos (can compress, but cannot dedupe), ZPAQ with -m59
is around 6% smaller than xz -k --lzma2=dict=1610612736,mf=bt4,mode=normal,nice=273,depth=4294967295
(the maximum that can be specified by xz
), so I do not consider that as "placebo-level". With dedupe-able files, the advantage would even be higher. The larger the source size, the more ZPAQ saves.
Even without the time-v.-size argument, the "split blocks" format allows more flexibility as I have mentioned above (partial archives, easy to fill among disks, do not have to deal with multi-terabyte files, etc.).
I have used ZPAQ for years to archive data, and I believe that my suggestion fixes all the "pain points" I have encountered.
VMWare for example
I do not feel lucky enough for dkms
.
from zpaqfranz.
6% does not seems a big gain
I am quite confident that the cost in time, electricity and heating of saving 30GB is not exactly worth it. About 5 euro
However, I really don't think I will do such work. It's difficult, time consuming, and would be used by a single user in the world. But not me 😄
from zpaqfranz.
That is fine. I am closing the issue.
from zpaqfranz.
Your request is legit, but way too complex.
Sorry
from zpaqfranz.
Related Issues (20)
- Memory consumption HOT 14
- Found Unix attributes on Windows => checking for collision HOT 5
- buffer overflow after compiling with FORTIFY_SOURCE=3 HOT 3
- Trim command not working on multi part archive? HOT 3
- Respect NO_COLOR HOT 4
- Backup command index path HOT 17
- Problem in exporting and viewing HOT 5
- bug with -all HOT 27
- Backup with -test fails when -replace or -to is used HOT 46
- v60 build for linux? HOT 12
- Silly but nice request.....Automation for all users. HOT 2
- Ran a command, program suddenly stop (crash I think, its leave 0 byte zpaq file) without any notice HOT 9
- Illegal instruction error on aarch64 while decompressing with JIT HOT 7
- Disabling the display of files in the "sum" HOT 4
- -verify commandline argument for command a (add) overlaps with -test -> both trigger a filesystem based hash check HOT 8
- Machine-parseable output (`-terse` ?) and failure to restore a file HOT 31
- Ctrl+C Error on Synology HOT 16
- Rollback - remove last version HOT 44
- Filelist breaks when multiple backups are running HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zpaqfranz.