mhx / dwarfs Goto Github PK
View Code? Open in Web Editor NEWA fast high compression read-only file system for Linux, Windows and macOS
License: GNU General Public License v3.0
A fast high compression read-only file system for Linux, Windows and macOS
License: GNU General Public License v3.0
It took me 20 minutes to install everything for the project on one of my computer as a test.
I do not want to do it again.
A simple docker container with a volume would solve all of the problems.
Also a seperate project for the mounting part
Also, to not create a separate issue, i found that dwarfs doesn't give an error even if an image wasn't mounted successfully. So, for example, this doesn't work for now:
dwarfs image mountpoint || echo "Mount failed"
I apologize if there is a way to do this and I'm just not seeing it after reading the documentation a few times over, but there doesn't appear to be any direct way to extract created dwarfs images which is a serious missing feature.
Of course, this can be done by instead mounting the FUSE filesystem and copying out, but it seems like a big oversight (assuming there is actually no way to do this and I'm not blind).
Apparently the erofs
read-only file-system has made it into the mainline stable Linux kernel (since 5.x):
https://www.kernel.org/doc/html/latest/filesystems/erofs.html
Therefore it would be nice to compare dwarfs
against erofs
, especially since its purpose seems similar (i.e. performance).
From my own limited experimentation with erofs
it seems to be twice as fast compared to squashfs
.
For example compressing dwarfs
own source folder and build folder (with a block size of 4 KiB and using LZ4 or LZ4HC where possible), yields the following:
# mkdwarfs -i . -o /tmp/dwfs.img -l 9 -S 12 -C lz4hc --no-owner --no-time
197M /tmp/dwfs.img
# mkdwarfs -i . -o /tmp/dwfs.img-d --no-owner --no-time
114M /tmp/dwfs.img-d
# mkfs.erofs -z lz4,9 -x -1 -T 0 -E force-inode-extended -- /tmp/erofs.img .
216M /tmp/erofs.img
# mksquashfs . /tmp/sqfs.img -b 4K -reproducible -mkfs-time 0 -all-time 0 -no-exports -no-xattrs -all-root -progress -comp lz4 -Xhc -noappend
196M /tmp/sqfs.img
Mounting them and reading yields:
erofs
~900 MiB/s, no difference on repeats;squashfs
~400 MiB/s, no difference on repeats;dwarfs
(4KB) ~250 MiB/s, and on repeat ~900 MiB/s;dwarfs
(all defaults) ~300 MiB/s, and on repeat ~900 MiB/s;Thus, and this is a guess, I would say that erofs
is as fast as dwarfs
, at least on repeat reads (although all images were stored on tmpfs
).
If one tries to set the cache size to an exact byte value (say 268435456
which should be 256m
) it complains with:
terminate called after throwing an instance of 'dwarfs::runtime_error'
what(): invalid size suffix
Command terminated by signal 6
A very nice feature to cpio
(actually the only operation mode) or tar
(via the --files-from
) is the option of specifying a list of files to include (instead of recursing through the root folder).
Such a feature would allow one to easily exclude certain files from the source, without having to resort to rsync
for example to build a temporary tree.
This could work in conjunction with -i
as such: any file within the list are treated as relative to the -i
folder, regardless if they start with /
, ./
or plain path. Also warn if one tries to traverse outside the -i
folder. For example, given that -i source
is used:
whatever
is actually source/whatever
;./whatever
is the same as above;/whatever
is the same as above;../whatever
would issue an error as it tries to escape the source;a/b/../../c
is actually source/c
, although it could issue a warning;/some-folder
(given it is a folder) would not be recursed, but only itself is created within the resulting image; (it is assumed that one would add other files afterwards);Also it would be nice to have an option to zero-terminate files list instead of newline.
The above could be quite simple to implement, however an even more useful option would be something like this:
gen_init_cpio.c
(https://github.com/torvalds/linux/blob/master/usr/gen_init_cpio.c#L452) which takes a file describing how an cpio
archive (to be used for the initramfs) should be created (see the source code at the hinted line for the file syntax); thus in addition to the previous feature of file-lists, such a "file-system" descriptor would allow one to create (without root credentials on his machine) a file-system with any layout;As the title says, mounting any dwarfs image results in a seemingly successful mount, with all the filesystem contents visible, however no files within are accessible. Notably, this issue does not occur with the -f
flag enabled, something I noticed when trying to get debug output. Also note that my distribution uses fuse version 29. This happens with both the prebuilt dwarfs2 binary on the releases page and my own local builds.
Looks like another dependency issue. However, I don't understand C++ enought
Ebuild to reproduce issue:
dwarfs-0.4.0.ebuild.txt
Also additional build flags (see ebuild):
-DPREFER_SYSTEM_ZSTD=1
-DPREFER_SYSTEM_XXHASH=1
-DWITH_LEGACY_FUSE=0
environment.txt
emerge--info.txt
The complete build log:
build.log.txt
Extracting 2b2tca.dwarfs...
E 03:24:38.703663 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.714209 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.714279 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.714363 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.714432 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.714515 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.714588 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.714693 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.714805 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.714884 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.715012 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.715088 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.715217 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.715300 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.715390 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.715468 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.715540 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.715607 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.715676 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.715740 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.715820 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.715897 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.715975 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.716048 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.716123 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.716199 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.716266 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.716341 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.716408 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.716484 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.716555 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.716654 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.716719 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.716794 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.716876 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.716950 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.717018 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.717081 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.717174 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.717247 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.717354 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.717426 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.717502 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.717570 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.717641 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.717710 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.717773 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.717842 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.717909 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.717983 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.718189 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.718259 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.718332 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.718401 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.718487 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.718557 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.718631 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.718690 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.718769 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.718831 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.718902 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.718977 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
E 03:24:38.719046 dwarfs::runtime_error: LZMA: decompression failed (data is corrupt)
I 03:24:41.164877 blocks created: 1298
I 03:24:41.164926 blocks evicted: 1290
I 03:24:41.164955 request sets merged: 9576
I 03:24:41.164983 total requests: 150010
I 03:24:41.165008 active hits (fast): 22247
I 03:24:41.165033 active hits (slow): 122699
I 03:24:41.165063 cache hits (fast): 3184
I 03:24:41.165090 cache hits (slow): 582
I 03:24:41.165117 total bytes decompressed: 87106428928
I 03:24:41.180369 average block decompression: 100.0%
I 03:24:41.180425 fast hit rate: 16.953%
I 03:24:41.180464 slow hit rate: 82.182%
I 03:24:41.180502 miss rate: 0.865%
dwarfs::runtime_error: extraction aborted
Mounts just fine with dwarfs
, and almost every file reads just fine - however, as soon as dwarfsextract
encounters invalid data, it seems to just completely bail out, unlike dwarfs
. I would use dwarfs
to extract instead, but the performance seems to be orders of magnitude slower.
I think the expected behavior ought to be printing a warning and skipping the file instead.
PS: dwarfsextract
doesn't seem to give realtime progress updates like mkdwarfs
does either :(
626b8c0 has to be in a release to let it work again.
Granted you might be right. Next time that I try DwarFS, I'll issue a sysctl -q vm.drop_caches=3
which if I'm not mistaken should drop the kernel file-system caches.
(In what follows I refer to the dwarfs
image as just image
, and to the uncompressed files exposed through the mount point as files
.)
However, on the same topic, wouldn't it be useful to have the following complementary options:
dwarfs
daemon accesses the image without using the kernel cache (either via O_DIRECT
or by using madvise
with MADV_DONTNEED
in case of mmap
access after a block was used);At the moment I think that both the files and the image are eventually cached by the kernel, thus increasing the memory pressure of the system.
However by using the two proposed options, one could fine tune the CPU / memory usage to fit one's particular use-case:
Originally posted by @cipriancraciun in #9 (comment)
In case there are both fuse2 and fuse3 version is installed DwarFS will build binaries for both (/usr/sbin/dwarfs for fuse3 and /usr/sbin/dwarfs2 for fuse2). How to override that discouraging behaviour to build dwarfs with only fuse3 or fuse2?
If tests enabled with -DWITH_TESTS=ON and googletest ( dev-cpp/gtest ) installed globally, dwarfs' cmake tries to download it regardless, that causes crash with network-sandbox.
-- Configuring done
-- Generating done
-- Build files have been written to: /var/tmp/portage/sys-fs/dwarfs-0.5.2-r1/work/dwarfs-0.5.2/googletest-download
[1/9] Creating directories for 'googletest'
[2/9] Performing download step (git clone) for 'googletest'
FAILED: googletest-prefix/src/googletest-stamp/googletest-download
cd /var/tmp/portage/sys-fs/dwarfs-0.5.2-r1/work/dwarfs-0.5.2 && /usr/bin/cmake -P /var/tmp/portage/sys-fs/dwarfs-0.5.2-r1/work/dwarfs-0.5.2/googletest-download/googletest-prefix/tmp/googletest-gitclone.cmake && /usr/bin/cmake -E touch /var/tmp/portage/sys-fs/dwarfs-0.5.2-r1/work/dwarfs-0.5.2/googletest-download/googletest-prefix/src/googletest-stamp/googletest-download
Cloning into 'googletest-src'...
fatal: unable to access 'https://github.com/google/googletest.git/': Could not resolve host: github.com
Cloning into 'googletest-src'...
fatal: unable to access 'https://github.com/google/googletest.git/': Could not resolve host: github.com
Cloning into 'googletest-src'...
fatal: unable to access 'https://github.com/google/googletest.git/': Could not resolve host: github.com
-- Had to git clone more than once:
3 times.
CMake Error at googletest-download/googletest-prefix/tmp/googletest-gitclone.cmake:31 (message):
Failed to clone repository: 'https://github.com/google/googletest.git'
Solution: it should use system googletest library
Here's the log:
nabla@satella /media/veracrypt1/squash $ mkdwarfs -i /media/veracrypt1/squash/mp/ -o "/run/media/nabla/General Store/TEMP/everything.dwarfs"
I 17:46:07.266160 scanning /media/veracrypt1/squash/mp/
E 18:14:36.699276 error reading entry: readlink('/media/veracrypt1/squash/mp//raid0array0-2tb-2018.sqsh/Program Files (x86)/Internet Explorer/ExtExport.exe'): Invalid argument
I 19:27:28.763515 assigning directory and link inodes...
I 19:27:29.319281 waiting for background scanners...
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
scanning: /media/veracrypt1/squash/mp//pucktop-echidna-dec2020.sqsh/.local/share/Steam/steamapps/common/Half-Life 2/hl2/bin/server.so
694746 dirs, 299340/1488 soft/hard links, 1254591/5749940 files, 0 other
original size: 1.352 TiB, dedupe: 200.6 GiB (364325 files), segment: 0 B
filesystem: 0 B in 0 blocks (0 chunks, 888778/5384127 inodes)
compressed filesystem: 0 blocks/0 B written
▏ ▏ 0% /
*** Aborted at 1619766236 (Unix time, try 'date -d @1619766236') ***
*** Signal 7 (SIGBUS) (0x7fe8f3af8000) received by PID 15018 (pthread TID 0x7fe94c3e8640) (linux TID 15042) (code: nonexistent physical address), stack trace: ***
/usr/lib64/libfolly.so.0.58.0-dev(+0x2b64bf)[0x7fe9599e54bf]
/usr/lib64/libfolly.so.0.58.0-dev(_ZN5folly10symbolizer21SafeStackTracePrinter15printStackTraceEb+0x31)[0x7fe959924471]
/usr/lib64/libfolly.so.0.58.0-dev(+0x1f6112)[0x7fe959925112]
/lib64/libc.so.6(+0x396cf)[0x7fe9592286cf]
/usr/lib64/libxxhash.so.0(XXH3_64bits_update+0x774)[0x7fe958c6d584]
/usr/lib64/libdwarfs.so(+0x788cd)[0x7fe959e4f8cd]
/usr/lib64/libdwarfs.so(_ZN6dwarfs4file4scanERKSt10shared_ptrINS_4mmifEERNS_8progressE+0x95)[0x7fe959e5b525]
/usr/lib64/libdwarfs.so(+0xe9a89)[0x7fe959ec0a89]
/usr/lib64/libdwarfs.so(+0xf7f6b)[0x7fe959ecef6b]
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/libstdc++.so.6(+0xd315f)[0x7fe95949f15f]
/lib64/libpthread.so.0(+0x7fbd)[0x7fe959142fbd]
/lib64/libc.so.6(clone+0x3e)[0x7fe9592ee26e]
(safe mode, symbolizer not available)
Bus error
I left this running while trying to compress over 8 TiB of data, and after about 13 hours of scanning, it just sorta crashed and gave up. I don't really want to run it again to debug it or anything, so I'm just going to leave this here.
Running Gentoo Linux on a Ryzen 5 3600 with 64 GB of memory, if that helps.
Sorry about the lack of information. I'd really like to provide more - and if there's anything you'd like me to try to resolve this, let me know (I really like dwarfs, and was hoping it would work for this obscenely large dataset too!) Just uhhh.. keep in mind that I'm prooobably not going to wait 13 hours again unless I know it works :/
EDIT: Forgot to specify my version number. Whoops. I'm using 0.5.4-rc2 from the GURU repositoriy here https://github.com/gentoo/guru/blob/master/sys-fs/dwarfs/dwarfs-0.5.4-r2.ebuild - I did build with -O3
, but dwarfs seems to work just fine with smaller inputs so I dunno. Specifically I'm using this: https://github.com/InBetweenNames/gentooLTO
Good daytime!
I'm Gentoo GURU project (semi-official) maintainer. I've ported dwarfs to Gentoo.
From dependency list dwarfs requires libjemalloc-dev, but It was succesfully build without ones? Does jemalloc really need?
Could you set up clear compile-time cmake option (something like current WITH_LUA) to set up that jemalloc really used?
I'm trying to build 0.2.0
on OpenSUSE Tumbleweed, and after I've successfully run cmake
and started building it broke stating it can't find sparsehash
dependency:
[ 92%] Linking CXX static library libfolly.a
/tmp/dwarfs-0.2.0/src/dwarfs/block_manager.cpp:32:10: fatal error: sparsehash/dense_hash_map: No such file or directory
32 | #include <sparsehash/dense_hash_map>
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[2]: *** [CMakeFiles/dwarfs.dir/build.make:108: CMakeFiles/dwarfs.dir/src/dwarfs/block_manager.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
[ 92%] Built target folly
make[1]: *** [CMakeFiles/Makefile2:284: CMakeFiles/dwarfs.dir/all] Error 2
make: *** [Makefile:171: all] Error 2
I think and extra check in CMake should solve this.
Hello!
Mkdwarfs crashes for me when i use compression levels 6 and higher, lower levels work fine. This is the error i get:
** Aborted at 1616416371 (Unix time, try 'date -d @1616416371') ***
*** Signal 4 (SIGILL) (0x4b9ec2) received by PID 24703 (pthread TID 0x25a2140) (linux TID 24703) (code: illegal operand), stack trace: ***
Illegal instruction (core dumped)
I'm using the latest (0.4.1) static executables from the releases page. OS is Arch Linux.
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
scanning: /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2tca/backups/backup-2b2tca-10june2019/main_map_nether/DIM-1/region/r.11.-2.mca
678293 dirs, 297774/10 soft/hard links, 46525/5413471 files, 0 other
original size: 91.28 GiB, dedupe: 24.74 GiB (15061 files), segment: 0 B
filesystem: 0 B in 0 blocks (0 chunks, 31454/5398400 inodes)
compressed filesystem: 0 blocks/0 B written
▏ ▏ 0% /
*** Aborted at 1623187919 (Unix time, try 'date -d @1623187919') ***
*** Signal 7 (SIGBUS) (0x7f3559c9b000) received by PID 5233 (pthread TID 0x7f35853eb700) (linux TID 5254) (code: nonexistent physical address), stack trace: ***
Bus error (core dumped)
The same issue, as described in issue #45, happened again. On the pre-compiled 0.5.5 release binaries. Twice, actually - the first time was on a completely different system, but the second time I was able to get a core dump. Even better, I actually think I know what's causing it.
When I ran mksquashfs
instead of mkdwarfs
on the exact same data, this happened:
nabla@satella /media/veracrypt3/dwarfs $ doas mksquashfs /media/veracrypt3/dwarfs/mount/ /media/veracrypt2/LiterallyEverything-08-Jun-2021.sqsh -comp zstd -Xcompression-level 22 -b 1M
Parallel mksquashfs: Using 12 processors
Creating 4.0 filesystem on /media/veracrypt2/LiterallyEverything-08-Jun-2021.sqsh, block size 1048576.
[| ] 33715/7846602 0%
Read failed because Input/output error
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map/region/r.-105.2.mca, creating empty file
[\ ] 34894/7846602 0%
Read failed because Input/output error
[| ] 34894/7846602 0%
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map/region/r.-13.5.mca, creating empty file
[/ ] 35075/7846602 0%
Read failed because Input/output error
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map/region/r.-1332.157.mcr, creating empty file
[/ ] 35771/7846602 0%
Read failed because Input/output error
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map/region/r.-18.-17.mcr, creating empty file
[/ ] 36003/7846602 0%
Read failed because Input/output error
[- ] 36003/7846602 0%
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map/region/r.-19.6.mcr, creating empty file
[- ] 47362/7846602 0%
Read failed because Input/output error
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map/region/r.15.7.mca, creating empty file
[/ ] 47552/7846602 0%
Read failed because Input/output error
[- ] 47553/7846602 0%
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map/region/r.16.-30.mcr, creating empty file
[=/ ] 48764/7846602 0%
Read failed because Input/output error
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map/region/r.19531.31249.mca, creating empty file
[=/ ] 54736/7846602 0%
Read failed because Input/output error
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map/region/r.45.-34.mcr, creating empty file
[=/ ] 63050/7846602 0%
Read failed because Input/output error
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map_nether/DIM-1/region/r.-14.-9.mcr, creating empty file
[=| ] 63522/7846602 0%
Read failed because Input/output error
[=/ ] 63522/7846602 0%
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map_nether/DIM-1/region/r.-16.-8.mca, creating empty file
[=| ] 63758/7846602 0%
Read failed because Input/output error
[=/ ] 63760/7846602 0%
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map_nether/DIM-1/region/r.-17.5.mcr, creating empty file
[=- ] 66762/7846602 0%
Read failed because Input/output error
[=\ ] 66762/7846602 0%
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2t.ca world and plugins/main map_nether/DIM-1/region/r.-33.21.mcr, creating empty file
[=\ ] 88907/7846602 1%
Read failed because Input/output error
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2tca/backups/backup-2b2tca-10june2019/main_map/region/r.-17.-6.mca, creating empty file
[=\ ] 91183/7846602 1%
Read failed because Input/output error
[=| ] 91183/7846602 1%
Failed to read file /media/veracrypt3/dwarfs/mount//2b2tca.dwarfs/2b2tca/backups/backup-2b2tca-10june2019/main_map/region/r.-3.31.mca, creating empty file
[==- ] 115004/7846602 1%
So, here's my theory (and I'm bad at these theories, so please take it with a grain of salt): mksquashfs
probably just reads these files with regular old open()
and read()
calls, so whenever it encounters an I/O error, it can just skip the file and create an empty one as if nothing happened. But mkdwarfs
, as @mhx mentioned in the previous issue about this, makes extensive use of mmap
, so perhaps every time an I/O error occurs, the region of memory represented by the file it's trying to read becomes inaccessible, and a SIGBUS
is triggered instead?
Perhaps this SIGBUS could be caught, and behavior similar to mksquashfs
could be preserved whereby the file is simply skipped and replaced with an empty one, or maybe we could try re-reading the file several times before giving up and moving on instead?
Also of note - these I/O errors are coming from a mounted DwarFS filesystem. When I got the SIGBUS error in #45, I was trying to read from a bunch of SquashFS filesystems, not a bunch of DwarFS filesystems, so this could be an issue with the physical disk (Although I still think DwarFS should definitely be robust enough to skip these errors rather than completely bailing out, as I uh... do actually need to recover these files)
Even more bizarrely, despite both SquashFS and DwarFS failing consistently when trying to read roughly the same files, when I re-mounted the 2b2tca.dwarfs
filesystem in question, I was able to read all of its content without any I/O errors at all. Truly baffling.
Anyway, I have replied to the email I sent @mhx of the core dump for the previous issue with the new core dump (which is actually much smaller this time), so hopefully it's possible to figure out where this is specifically happening.
I'm struggling to come up with a proper systemd recipe. I'm a systemd noob.
When starting it as systemd service dependent on local-fs my mount is not readable by the local user.
systemctl enable dwarfs-mount
systemctl start dwarfs-mount
$ ls -al
drwxr-xr-x. 5 rurban 69632 Nov 30 13:48 perl
d?????????? ? ? ? ? perl.s
when starting it locally it works fine.
sudo cat /etc/systemd/user/dwarfs-mount.service
[Unit]
Description=Local DwarfFS Mounts
Documentation=man:dwarfs(1) https://github.com/mhx/dwarfs
DefaultDependencies=no
#ConditionKernelCommandLine=
OnFailure=emergency.target
Conflicts=umount.target
# Run after core mounts
After=-.mount var.mount
After=systemd-remount-fs.service
# But we run *before* most other core bootup services that need write access to /etc and /var
#Before=local-fs.target umount.target
#Before=systemd-random-seed.service plymouth-read-write.service systemd-journal-flush.service
#Before=systemd-tmpfiles-setup.service
[Service]
Type=oneshot
User=rurban
Group=root
RemainAfterExit=yes
ExecStart=/usr/local/bin/dwarfs /usr/src/perl/perl.dwarfs /usr/src/perl.s
StandardInput=null
StandardOutput=journal
StandardError=journal+console
[Install]
WantedBy=local-fs.target
Or maybe should I just add it to my fstab?
EDIT: Beware With an syntax error in such a system unit, depending on local-fs, you can lock yourself out and need to boot from USB. emergency mode i.e. rescue.target will not work.
I'm trying to create ebuild for 0.3.0 dwarfs version. And here is an error:
/usr/bin/x86_64-pc-linux-gnu-g++ -DDWARFS_HAVE_LIBLZ4 -DDWARFS_HAVE_LIBLZMA -DDWARFS_HAVE_LIBZSTD -DDWARFS_STATIC_BUILD=OFF -DDWARFS_USE_JEMALLOC -DDWARFS_VERSION="" -DFMT_LOCALE -DFMT_SHARED -DGFLAGS_IS_A_DLL=0 -Ddwarfs_EXPORTS -Iinclude -I/usr/include/libiberty -isystem folly -isystem thrift -isystem fbthrift -isystem zstd/lib -isystem xxHash -isystem . -march=skylake -mtune=skylake -O2 -pipe -mmmx -msse -msse2 -msse3 -mssse3 -mcx16 -msahf -maes -mpclmul -mpopcnt -mabm -mfma -mbmi -msgx -mbmi2 -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrtm -mhle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw -madx -mfxsr -mxsave -mxsaveopt -mclflushopt -mxsavec -mxsaves --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=6144 -fPIC -Wall -Wextra -pedantic -pthread -std=c++17 -MD -MT CMakeFiles/dwarfs.dir/src/dwarfs/logger.cpp.o -MF CMakeFiles/dwarfs.dir/src/dwarfs/logger.cpp.o.d -o CMakeFiles/dwarfs.dir/src/dwarfs/logger.cpp.o -c src/dwarfs/logger.cpp
src/dwarfs/logger.cpp: In member function ‘virtual void dwarfs::stream_logger::write(dwarfs::logger::level_type, const string&, const char*, int)’:
src/dwarfs/logger.cpp:102:51: error: variable ‘folly::symbolizer::Symbolizer symbolizer’ has initializer but incomplete type
102 | Symbolizer symbolizer(LocationInfoMode::FULL);
| ^
I have to admit I've been doing most of my compression tasks on machines with 64GB of memory, so optimizing for low memory consumption hasn't really been a priority yet. There are some knobs you might be able to turn, though. I'm not sure large files per se are an issue, but a large number of files definitely is. You might be able to tweak
--memory-limit
a bit, which determines how many uncompressed blocks can be queued. If you lower this limit, the compressor pool may run out of blocks more quickly, resulting in overall slower compression. Reducing the number of workers (-N
) might also help a bit.
A small update on this (apologies that this is on an unrelated issue). Did some experimentation and found that lowering memory limit and workers works in some instances but not in others. Large files seems to be the biggest hold up, in particular an instance when I tried to put a 3.1 GB file which seemingly had no way of compressing via dwarfs with my 16 GB of memory (even with very low options like -L1m -N1
).
What I did find instead was that using the -l0
option and then recompressing the image works in these cases without issue. Creating the initial image with -S24
results in very well recompressed files in these instances, the 3.1 GB file compressing down to 2.3 GB whereas the default block size for -l0
resulted in a 2.6 GB file (which is approx what mksquashfs -comp zstd -b 1M -Xcompression-level 22
also gave me).
Originally posted by @Phantop in #33 (comment)
At this moment dwarfs use own bundled libraries. That's bad practice if we are building in source-based system like gentoo: I've just found off that revdep-rebuild (utility which checks if all package are consistent) triggers to dwarfs:
RarogCmexDell ~ # revdep-rebuild --pretend
emerge --pretend --oneshot --complete-graph=y sys-fs/dwarfs:0
These are the packages that would be merged, in order:
Calculating dependencies... done!
[ebuild R ~] sys-fs/dwarfs-0.3.1-r2
So it'll be good to unbound at least xxhash and zstd sources (at this moment it used in cmake, I'm not able to patch cmake because I don't know it) like previous 0.2.4 version which does not relly on bundled libraries.
The ebuild to play around in GURU repository:
eselect repository enable guru
to the README and somehow integrate the man/Makefile
into cmake.
I had to manually do it, and only then was able to do a sudo make install
Would be useful to have an additional command line option to tell mkdwarfs to create empty files when it can't access them. Mksquashfs does this by default:
Failed to read file dir/filename, creating empty file
Useful for preserving the file structure of an input directory even when not all files are readble (for instance, if it's owned by another user).
Not a highly important feature, of course, but it would be nice to have it, if it's not too hard to implement.
So I've created quite a largish image, of about 800 MiB, which uncompressed had around 2.5 GiB.
I've tried starting dwarfs
with -o cachesize=128m
, ran a find /tmp/dwarfs -type f -exec md5sum {} +
and after it was done the FUSE daemon process still retains ~800 MiB to ~1 GiB of RAM. (This is not the virtual memory, but instead the RES
column of htop
which reports the actual memory committed. When dwarfs
starts it reports ~ 150 MiB.)
Now given that there are no more any open files, and the fact that I've already passed through all the files, there shouldn't be any uncompressed blocks (thus uncompression state) lingering around.
Even using an uncompressed image (i.e. -l 0
) of ~400 MiB uncompressed size, results in ~400 MiB RAM usage.
make[1]: *** [CMakeFiles/Makefile2:459: CMakeFiles/dwarfsextract.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
/usr/bin/ld: metadata_v2.cpp:(.text._ZN6dwarfs9metadata_INS_19debug_logger_policyEEC2ERNS_6loggerEN5folly5RangeIPKhEES9_RKNS_16metadata_optionsEib[_ZN6dwarfs9metadata_INS_19debug_logger_policyEEC2ERNS_6loggerEN5folly5RangeIPKhEES9_RKNS_16metadata_optionsEib]+0x65c): undefined reference to fmt::v8::vformat[abi:cxx11](fmt::v8::basic_string_view<char>, fmt::v8::basic_format_args<fmt::v8::basic_format_context<fmt::v8::appender, char> >)' /usr/bin/ld: metadata_v2.cpp:(.text._ZN6dwarfs9metadata_INS_19debug_logger_policyEEC2ERNS_6loggerEN5folly5RangeIPKhEES9_RKNS_16metadata_optionsEib[_ZN6dwarfs9metadata_INS_19debug_logger_policyEEC2ERNS_6loggerEN5folly5RangeIPKhEES9_RKNS_16metadata_optionsEib]+0x6d0): undefined reference to
fmt::v8::vformat[abi:cxx11](fmt::v8::basic_string_view, fmt::v8::basic_format_args<fmt::v8::basic_format_context<fmt::v8::appender, char> >)'
/usr/bin/ld: libdwarfs.a(metadata_v2.cpp.o):metadata_v2.cpp:(.text._ZNK6dwarfs9metadata_INS_19debug_logger_policyEE4dumpERSoiRKNS_15filesystem_infoERKSt8functionIFvRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEjEE[_ZNK6dwarfs9metadata_INS_19debug_logger_policyEE4dumpERSoiRKNS_15filesystem_infoERKSt8functionIFvRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEjEE]+0x4b8): more undefined references to `fmt::v8::vformat[abi:cxx11](fmt::v8::basic_string_view, fmt::v8::basic_format_args<fmt::v8::basic_format_context<fmt::v8::appender, char> >)' follow
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [CMakeFiles/dwarfsbench.dir/build.make:118: dwarfsbench] Error 1
make[1]: *** [CMakeFiles/Makefile2:599: CMakeFiles/dwarfsbench.dir/all] Error 2
/usr/bin/ld: folly/folly/experimental/exception_tracer/libfolly_exception_tracer.a(ExceptionStackTraceLib.cpp.o)(.debug_info+0x38): R_AARCH64_ABS64 used with TLS symbol _ZN12_GLOBAL__N_17invalidE
/usr/bin/ld: folly/folly/experimental/exception_tracer/libfolly_exception_tracer.a(ExceptionStackTraceLib.cpp.o)(.debug_info+0x5a): R_AARCH64_ABS64 used with TLS symbol _ZN12_GLOBAL__N_118uncaughtExceptionsE
/usr/bin/ld: folly/folly/experimental/exception_tracer/libfolly_exception_tracer.a(ExceptionStackTraceLib.cpp.o)(.debug_info+0x74): R_AARCH64_ABS64 used with TLS symbol _ZN12_GLOBAL__N_116caughtExceptionsE
/usr/bin/ld: folly/libfolly.a(CacheLocality.cpp.o)(.debug_info+0x13c2b): R_AARCH64_ABS64 used with TLS symbol _ZZN5folly18SequentialThreadId3getEvE5local
/usr/bin/ld: folly/libfolly.a(AsyncStack.cpp.o)(.debug_info+0x64): R_AARCH64_ABS64 used with TLS symbol _ZN5folly12_GLOBAL__N_127currentThreadAsyncStackRootE
/usr/bin/ld: folly/folly/experimental/exception_tracer/libfolly_exception_tracer.a(ExceptionTracerLib.cpp.o)(.debug_info+0x132bf): R_AARCH64_ABS64 used with TLS symbol _ZZN5folly15SharedMutexImplILb0EvSt6atomicNS_24SharedMutexPolicyDefaultEE26tls_lastDeferredReaderSlotEvE2tl
/usr/bin/ld: folly/folly/experimental/exception_tracer/libfolly_exception_tracer.a(ExceptionTracerLib.cpp.o)(.debug_info+0x1355d): R_AARCH64_ABS64 used with TLS symbol _ZZN5folly15SharedMutexImplILb0EvSt6atomicNS_24SharedMutexPolicyDefaultEE21tls_lastTokenlessSlotEvE2tl
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [CMakeFiles/mkdwarfs.dir/build.make:118: mkdwarfs] Error 1
make[1]: *** [CMakeFiles/Makefile2:564: CMakeFiles/mkdwarfs.dir/all] Error 2
/usr/bin/ld: folly/libfolly.a(SharedMutex.cpp.o)(.debug_info+0x6105): R_AARCH64_ABS64 used with TLS symbol _ZZN5folly15SharedMutexImplILb1EvSt6atomicNS_24SharedMutexPolicyDefaultEE21tls_lastTokenlessSlotEvE2tl
/usr/bin/ld: folly/libfolly.a(SharedMutex.cpp.o)(.debug_info+0x613c): R_AARCH64_ABS64 used with TLS symbol _ZZN5folly15SharedMutexImplILb1EvSt6atomicNS_24SharedMutexPolicyDefaultEE26tls_lastDeferredReaderSlotEvE2tl
/usr/bin/ld: folly/libfolly.a(SharedMutex.cpp.o)(.debug_info+0x61a3): R_AARCH64_ABS64 used with TLS symbol _ZZN5folly15SharedMutexImplILb0EvSt6atomicNS_24SharedMutexPolicyDefaultEE21tls_lastTokenlessSlotEvE2tl
/usr/bin/ld: folly/libfolly.a(SharedMutex.cpp.o)(.debug_info+0x61da): R_AARCH64_ABS64 used with TLS symbol _ZZN5folly15SharedMutexImplILb0EvSt6atomicNS_24SharedMutexPolicyDefaultEE26tls_lastDeferredReaderSlotEvE2tl
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [CMakeFiles/dwarfs_compat_test.dir/build.make:121: dwarfs_compat_test] Error 1
make[1]: *** [CMakeFiles/Makefile2:528: CMakeFiles/dwarfs_compat_test.dir/all] Error 2
make: *** [Makefile:163: all] Error 2
I noticed that, during the comparison with zpaq, the "placebo" compression mode (-m5) was used while, in reality, the default one (-m1) is almost always used.
Could you please share the file you used for testing to make some analysis?
Thanks
As the title says, if the given source argument is a symlink to a folder, mkdwarfs
fails with an error stating it wants a folder.
As a workaround one could call mkdwarfs -i .../folder/ -o .../output
.
I would suggest allowing (perhaps warning) a symlink to a folder as a source argument.
For some reason, if running mkdwarfs
over SSH under screen
from urxvt
(i.e. urxvt -e ssh user@host
then run screen
then run mkdwarfs
) the progress bar gets grabbled and outputs â¯â¯â¯â¯â¯â¯
.
I've run strace
and the progress bar seems to be written as:
[pid 17539] write(2, "\342\216\257\342\216\257\342\216\257\342\216\257\342\216\257\342\216\257\342\216\257\342\216\257\342\216\257\342\216\257\342\216"..., 1206)
Running with an empty environment (i.e. env -i
) and even setting TERM
to any of vt100
, linux
, rxvt-unicode
, screen
, screen.rxvt
, xterm
doesn't seem to fix it.
Granted if one doesn't use screen
the progress bar looks OK. (Although locally on my laptop, with a newer screen
it does seem to work just fine.)
My assumption is that the progress bar characters trick screen
into displaying wrong characters.
Perhaps add an option to use ASCII-only progress bar or VT100
only compliant codes. Alternatively, add an option to print the progress from time-to-time as simple print
statements as opposed to the current nice progress dashboard.
Unlike the previous time, now it crashes with any compression level.
** Aborted at 1617652815 (Unix time, try 'date -d @1617652815') ***
*** Signal 4 (SIGILL) (0x4fd686) received by PID 9296 (pthread TID 0x2c0e1c0) (linux TID 9296) (code: illegal operand), stack trace: ***
Illegal instruction (core dumped)
This is with the static 0.5.0 binary from the releases page. Manually compiled dynamically linked build works fine.
After cloning the repository and following the steps i get the following error:
0 [16:04] ~/workspace % git clone --recurse-submodules https://github.com/mhx/dwarfs
[...]
0 [16:04] ~/workspace % cd dwarfs
0 [16:04] ~/workspace/dwarfs@main % mkdir build
0 [16:04] ~/workspace/dwarfs@main % cd build
0 [16:04] workspace/dwarfs@main/build % cmake .. -DWITH_TESTS=1
-- The C compiler identification is GNU 10.2.1
-- The CXX compiler identification is GNU 10.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'Release'
CMake Error at cmake/version.cmake:35 (message):
missing version files
Call Stack (most recent call first):
CMakeLists.txt:61 (include)
-- Configuring incomplete, errors occurred!
See also "/home/palaiologos/workspace/dwarfs/build/CMakeFiles/CMakeOutput.log".
1 [16:04] workspace/dwarfs@main/build %
Attaching my CMakeOutput.log. I'm using the following to build dwarfs:
129 [16:08] workspace/dwarfs@main/build % cmake --version
cmake version 3.18.4
CMake suite maintained and supported by Kitware (kitware.com/cmake).
0 [16:08] workspace/dwarfs@main/build % git --version
git version 2.32.0.rc0
0 [16:08] workspace/dwarfs@main/build % gcc --version
gcc (Debian 10.2.1-6) 10.2.1 20210110
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
0 [16:08] workspace/dwarfs@main/build % clang --version
Debian clang version 11.0.1-2
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
0 [16:08] workspace/dwarfs@main/build %
I've just downloaded both 0.2.2 and 0.2.3 tar.gz
bundles from GitHub (the releases pane on the right), and when trying to configure it with cmake
it complains that it fails to find CMakeLists.txt
in both folly
and fbthrift
. Inspecting those folders they are empty.
Doing a git clone
and submodules update does seem to fix the issue.
Thus it would be a good idea to include in the bundles those two dependencies, or update the readme to point out to users how to populate those folders.
There are bug fix which disables SSE2/AVX2 compile flags for LtHash SIMD code when arch does not match x86_64
-- arch does not match x86_64, skipping setting SSE2/AVX2 compile flags for LtHash SIMD code
In a certain configurations autodetecting does not work as expected :)
It disabled in Intel Skylake (Gentoo)
uname -a
Linux RCEngine 5.11.0-pf6-RarogCmex #1 SMP PREEMPT Fri Apr 2 10:42:06 +05 2021 x86_64 Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz GenuineIntel GNU/Linux
DwarFS seems pretty nice. File access within the archive is quick. Main feature seems to be deduplication. Judging by resulting file sizes, I'm guessing this is based on whole file deduplication rather than being block-based?
Downside... DwarFS seems slow to make, compared to both wimlib wimcapture and squashfs.
Testing with a copy of every released Wine version, extracted by doing for tag in $(git tag); do git archive --prefix=$tag/ $tag | tar -xC /mnt/wine; done
(requires, naturally, the Wine git repository):
$ time mkdwarfs -i wine -o wine.dwarfs
10:04:50.260867 scanning wine
10:04:59.692484 waiting for background scanners...
14:10:05.911222 assigning directory and link inodes...
14:10:06.312963 finding duplicate files...
14:10:11.475558 saved 53.69 GiB / 64.85 GiB in 2907224/3117413 duplicate files
14:10:11.475645 ordering 210189 inodes by similarity...
14:10:11.642889 210189 inodes ordered [167.2ms]
14:10:11.642926 assigning file inodes...
14:10:11.644702 building metadata...
14:10:11.644753 building blocks...
14:10:11.644802 saving names and links...
14:10:12.103973 updating name and link indices...
14:43:56.247557 waiting for block compression to finish...
14:43:56.247884 saving chunks...
14:43:56.275000 saving directories...
14:43:58.693785 waiting for compression to finish...
14:43:58.813425 compressed 64.85 GiB to 183.4 MiB (ratio=0.00276233)
14:43:59.328251 filesystem created without errors [1.675e+04s]
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
waiting for block compression to finish
scanned/found: 362904/362904 dirs, 0/0 links, 3117413/3117413 files
original size: 64.85 GiB, dedupe: 53.69 GiB (2907224 files), segment: 6.309 GiB
filesystem: 4.847 GiB in 311 blocks (1460981 chunks, 210189/210189 inodes)
compressed filesystem: 311 blocks/183.4 MiB written
█████████████████████████████████████████████████████████████████████████▏100% /
real 279m9.270s
user 26m17.945s
sys 3m53.332s
$ time mksquashfs wine wine.squashfs -comp zstd
Parallel mksquashfs: Using 12 processors
Creating 4.0 filesystem on wine.squashfs, block size 131072.
[=================================================================================|] 3284743/3284743 100%
Exportable Squashfs 4.0 filesystem, zstd compressed, data block size 131072
compressed data, compressed metadata, compressed fragments,
compressed xattrs, compressed ids
duplicates are removed
Filesystem size 2074564.87 Kbytes (2025.94 Mbytes)
3.04% of uncompressed filesystem size (68204545.10 Kbytes)
Inode table size 31817047 bytes (31071.33 Kbytes)
28.29% of uncompressed inode table size (112449867 bytes)
Directory table size 28385936 bytes (27720.64 Kbytes)
41.57% of uncompressed directory table size (68284423 bytes)
Number of duplicate files found 2907225
Number of inodes 3480317
Number of files 3117413
Number of fragments 47404
Number of symbolic links 0
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 362904
Number of ids (unique uids + gids) 1
Number of uids 1
chungy (1000)
Number of gids 1
chungy (1000)
real 153m19.319s
user 89m22.676s
sys 2m14.197s
$ time wimcapture --unix-data --solid wine wine.wim
Scanning "wine"
64 GiB scanned (3117413 files, 362904 directories)
Using LZMS compression with 12 threads
Archiving file data: 11 GiB of 11 GiB (100%) done
real 79m20.722s
user 42m30.817s
sys 1m37.350s
$ du wine.*
184M wine.dwarfs
2.0G wine.squashfs
173M wine.wim
wimlib is significantly faster to create this massive archive than DwarFS, and the resulting file size is marginally smaller. Git itself stores the Wine history in about 310MB, though that's not the fairest of comparisons given git's delta-based storage and the inclusion of every interim commit between the releases too.
DwarFS still beats out this particular WIM archive for performance as a mounted file system, because I used solid compression and random access in wimlib is not fast in this circumstance. I also think (correct me if I'm wrong!) that a solid archive was the better comparison, since DwarFS seems to group like files together and compress them as one unit (311 blocks in this particular file system). wimcapture's mode compresses each stream individually and the archive size balloons up to 2.4GB while making random access much quicker.
Hi,
DwarFS looks absolutely brilliant. But are there any plans to make it read-write, or is the plan to keep it as a read-only file system?
I would love to try it out for checking out several large (and similar, but different) git repositories, and then building them. We have several 300GB repositories at work, but most developers only have a 1T disk, so it quickly fills up.
Would you recommend an overlay filesystem on top of DwarFS for this use case, for now?
Thanks!
At least for mkdwarfs
if one sets an extra argument like for example mkdwarfs -i ... -o ... Z
(i.e. the Z
) it just ignores it without failing.
Although this is not a major issue, especially in case of wrapper scripts and automation tools, a small bug in the user's code could fail for example to prepend -S
before the user input, and thus that input would just be silently ignored by the tool.
I'm experimenting with using DwarFS for a very similar use case as yours (a build of each Rakudo commit). However, given there are several new commits every day, creating a new DwarFS image from scratch each time doesn't make sense. Would it be possible to add new data to the compressed filesystem without having to decompress/recompress the whole thing?
I was trying to create an image containing the latest tip of the CDNJS repository (https://github.com/cdnjs/cdnjs), and I encountered the following error twice:
waiting for block compression to finish
scanned/found: 544213/544213 dirs, 121/121 links, 7511842/7511842 files
original size: 235.1 GiB, dedupe: 77.02 GiB (6225274 files), segment: 63.22 GiB
filesystem: 94.81 GiB in 6068 blocks (9108158 chunks, 1286568/1286568 inodes)
compressed filesystem: 6068 blocks/9.235 GiB written
ERROR: std::out_of_range: _Map_base::at
Command exited with non-zero status 1
(If you want to checkout the repository yourself, I strongly suggest to use a shallow checkout, --depth 1 --single-branch
, and prepare for a lot of waiting...) :)
/var/tmp/portage/sys-fs/dwarfs-0.5.6-r1/work/dwarfs-0.5.6/src/dwarfs/block_compressor.cpp:400:10: error: no type named 'mutex' in namespace 'std'
std::mutex mx_;
~~~~~^
/var/tmp/portage/sys-fs/dwarfs-0.5.6-r1/work/dwarfs-0.5.6/src/dwarfs/block_compressor.cpp:433:15: error: no type named 'mutex' in namespace 'std'
static std::mutex s_mx;
~~~~~^
/var/tmp/portage/sys-fs/dwarfs-0.5.6-r1/work/dwarfs-0.5.6/src/dwarfs/block_compressor.cpp:386:22: error: expected ';' after expression
std::lock_guard lock(mx_);
^
;
/var/tmp/portage/sys-fs/dwarfs-0.5.6-r1/work/dwarfs-0.5.6/src/dwarfs/block_compressor.cpp:386:12: error: no member named 'lock_guard' in namespace 'std'
std::lock_guard lock(mx_);
~~~~~^
[I] dev-libs/boost
Available versions: 1.76.0-r1(0/1.76.0)^t (~)1.77.0-r2(0/1.77.0)^t {bzip2 context debug doc icu lzma mpi +nls numpy python static-libs +threads tools zlib zstd ABI_MIPS="n32 n64 o32" ABI_S390="32 64" ABI_X86="32 64 x32" PYTHON_TARGETS="python3_8 python3_9 python3_10"}
Installed versions: 1.77.0-r2(0/1.77.0)^t(16:50:23 09/18/21)(bzip2 context icu lzma nls zlib zstd -debug -doc -mpi -numpy -python -tools ABI_MIPS="-n32 -n64 -o32" ABI_S390="-32 -64" ABI_X86="64 -32 -x32" PYTHON_TARGETS="python3_9 -python3_8 -python3_10")
Homepage: https://www.boost.org/
Description: Boost Libraries for C++
I've tried to use -o mlock=must
and (as it should) failed due to the per user limits.
However the dwarfs
(both v2 and v3 FUSE drivers) aborted with an exception, and failed to properly unmount the file-system.
I think the FUSE driver first should try to execute all initialization steps, and if all succeed, only then to try to mount the filesystem.
After I've successfully built 0.2.0
I've tried to see how many dynamic libraries does dwarfs
depend on. Unfortunately there are quite a few libraries most of which aren't installed by default...
I would be lovely to have an option to build a static linked executable that can be easily moved from one Linux instance to another.
For example, one important use-case for dwarfs
, would be server deployments where one could build a static image of the application (imagine a large Python / Ruby virtual-env and application with lots of files, assets).
$ readelf -d ./dwarfs
0x0000000000000001 (NEEDED) Shared library: [libboost_date_time.so.1.74.0]
0x0000000000000001 (NEEDED) Shared library: [libboost_filesystem.so.1.74.0]
0x0000000000000001 (NEEDED) Shared library: [libboost_program_options.so.1.74.0]
0x0000000000000001 (NEEDED) Shared library: [libboost_system.so.1.74.0]
0x0000000000000001 (NEEDED) Shared library: [libfmt.so.7]
0x0000000000000001 (NEEDED) Shared library: [libdouble-conversion.so.3]
0x0000000000000001 (NEEDED) Shared library: [libgflags.so.2.2]
0x0000000000000001 (NEEDED) Shared library: [libglog.so.0]
0x0000000000000001 (NEEDED) Shared library: [libunwind.so.8]
0x0000000000000001 (NEEDED) Shared library: [liblz4.so.1]
0x0000000000000001 (NEEDED) Shared library: [liblzma.so.5]
0x0000000000000001 (NEEDED) Shared library: [libzstd.so.1]
0x0000000000000001 (NEEDED) Shared library: [libfuse3.so.3]
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6]
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
$ readelf -d ./mkdwarfs
0x0000000000000001 (NEEDED) Shared library: [libboost_date_time.so.1.74.0]
0x0000000000000001 (NEEDED) Shared library: [libboost_filesystem.so.1.74.0]
0x0000000000000001 (NEEDED) Shared library: [libboost_program_options.so.1.74.0]
0x0000000000000001 (NEEDED) Shared library: [libboost_system.so.1.74.0]
0x0000000000000001 (NEEDED) Shared library: [libfmt.so.7]
0x0000000000000001 (NEEDED) Shared library: [libdouble-conversion.so.3]
0x0000000000000001 (NEEDED) Shared library: [libgflags.so.2.2]
0x0000000000000001 (NEEDED) Shared library: [libglog.so.0]
0x0000000000000001 (NEEDED) Shared library: [libcrypto.so.1.1]
0x0000000000000001 (NEEDED) Shared library: [libunwind.so.8]
0x0000000000000001 (NEEDED) Shared library: [liblz4.so.1]
0x0000000000000001 (NEEDED) Shared library: [liblzma.so.5]
0x0000000000000001 (NEEDED) Shared library: [libzstd.so.1]
0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6]
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
Please provide a way to unbundle folly, fbthrift, fsst and parallel-hashmap. I'm packaging this for gentoo and I'll really appreciate
So, I've been using DwarFS for a while now, and I'm loving it - I've had success with compressing my huge multi-terabyte backups with version 0.5.5, and all is well, except for one kind of massive problem: The mount times.... are atrocious!
I have a directory containing 1.5 TiB worth of separate DwarFS archives, but I'm going to focus on just one of them specifically for this example, 3TBDRV-PartC-05-Jun-2021.dwarfs
, which is 129.1 GiB large, and 191.5 GiB uncompressed, with 452854 files. And oh boy, look at this:
I 17:17:22.057967 file system initialized [4460s]
That took over an hour to mount! (Ryzen 5 3600, 64 GB of RAM, DwarFS archive was stored on a Toshiba PC P300 3 TB hard disk)
In fact, this kind of extremely slow mounting time was consistent with all of the other archives too, even some stored on other drives:
I 16:03:03.018227 file system initialized [709.9ms]
I 16:03:03.849098 file system initialized [1.53s]
I 16:03:33.590178 file system initialized [31.27s]
I 16:03:55.387080 file system initialized [53.07s]
I 16:04:05.065723 file system initialized [62.75s]
I 16:04:29.313615 file system initialized [87s]
I 16:04:31.894768 file system initialized [89.57s]
I 16:04:37.869410 file system initialized [95.55s]
I 16:04:47.800754 file system initialized [105.5s]
I 16:07:38.591166 file system initialized [276.3s]
I 16:10:38.510593 file system initialized [456.2s]
I 16:11:29.503998 file system initialized [507.2s]
I 16:13:05.969340 file system initialized [603.7s]
I 16:26:54.248746 file system initialized [1432s]
I 16:27:27.300756 file system initialized [1465s]
I 16:30:56.814199 file system initialized [1675s]
I 16:45:29.011485 file system initialized [2547s]
I 16:45:40.813002 file system initialized [2558s]
I 16:46:33.700585 file system initialized [2611s]
I 17:12:16.627675 file system initialized [4154s]
I 17:13:47.702170 file system initialized [4245s]
I 17:17:22.057967 file system initialized [4460s]
I 17:27:46.640421 file system initialized [5084s]
I found this in the mkdwarfs
documentation:
The metadata has been optimized for very little redundancy and leaving it uncompressed, the default for all levels below 7, has the benefit that it can be mapped to memory and used directly. This improves mount time for large file systems compared to e.g. an lzma compressed metadata block. If you don't care about mount time, you can safely choose lzma compression here, as the data will only have to be decompressed once when mounting the image.
If I'm reading this right, for all compression levels above or equal to 7, DwarFS is taking all of the metadata and decompressing it all at once at mount time. Is there any way this could be improved? Decompressing everything at once seems to be kind of a bad idea. I don't want to outright disable metadata compression, as there's kind of a lot of it and I get the feeling it would benefit from being compressed, but these mount times really are excessive - it actually kind of makes SquashFS preferable to DwarFS for the goal of having a compressed read-only filesystem that mounts quickly and is accessible quickly.
I'll admit I don't know much about DwarFS' actual internals, but how about this: what if the metadata was compressed in multiple separate chunks/blocks of a fixed size, and only the blocks that are actually needed get decompressed at any given time? Perhaps this could be made optional, or even the default at level 7 while 8 and 9 could compress the metadata all at once?
I'm not entirely sure on the specifics of how these mount times could be improved, but I feel like if level 7 is going to be the default then it should at least try to optimize the metadata for fast access (or at least, faster than this) somehow, without completely disabling compression, as compressing the metadata probably helps a lot with DwarFS' excellent space efficiency.
Or maybe compressing the metadata isn't worth it? The statement "The metadata has been optimized for very little redundancy" in the documentation seems to imply that compressing the metadata doesn't really help that much, are there any comparisons we can make between uncompressed and compressed metadata? How worthwhile even is compressing it? Should it continue to be enabled by default?
Static build fails for dwarfsextract due to unresloved references from libarchive.a
I guess it happens because libarchive from focal binary package is statically linked to more dependencies that are specified in static_link.sh (as per https://launchpad.net/ubuntu/focal/+source/libarchive)
So dwarfs static build requires either custom libarchive.a that matches supported formats or larger set of dependencies
[430/431] Linking CXX executable dwarfsextract FAILED: dwarfsextract : && /bin/bash /mnt/d/Projects/5.Projects/tebako/deps/src/_dwarfs/cmake/static_link.sh dwarfsextract CMakeFiles/dwarfsextract.dir/src/dwarfsextract.cpp.o && : ... /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_write_set_format_xar.o): in function
compression_code_bzip2':
(.text+0xc5d): undefined reference to BZ2_bzCompress' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_write_set_format_xar.o): in function
xar_options':
(.text+0x11b1): undefined reference to lzma_cputhreads' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_write_set_format_xar.o): in function
compression_end_bzip2':
(.text+0x17ec): undefined reference to BZ2_bzCompressEnd' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_write_set_format_xar.o): in function
xar_compression_init_encoder':
(.text+0x20fa): undefined reference to BZ2_bzCompressInit' /usr/bin/ld: (.text+0x2261): undefined reference to
lzma_stream_encoder_mt'
/usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_cryptor.o): in function aes_ctr_encrypt_counter': (.text+0x52): undefined reference to
nettle_aes_set_encrypt_key'
/usr/bin/ld: (.text+0x6d): undefined reference to nettle_aes_encrypt' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_cryptor.o): in function
pbkdf2_sha1':
(.text+0x2a1): undefined reference to nettle_pbkdf2_hmac_sha1' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_digest.o): in function
__archive_nettle_sha512final':
(.text+0x11): undefined reference to nettle_sha512_digest' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_digest.o): in function
__archive_nettle_sha384update':
(.text+0x32): undefined reference to nettle_sha512_update' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_digest.o): in function
__archive_nettle_sha512init':
(.text+0x49): undefined reference to nettle_sha512_init' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_digest.o): in function
__archive_nettle_sha384final':
(.text+0x71): undefined reference to nettle_sha384_digest' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_digest.o): in function
__archive_nettle_sha384init':
(.text+0x89): undefined reference to nettle_sha384_init' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_digest.o): in function
__archive_nettle_sha256final':
(.text+0xb1): undefined reference to nettle_sha256_digest' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_digest.o): in function
__archive_nettle_sha256update':
(.text+0xd2): undefined reference to nettle_sha256_update' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_digest.o): in function
__archive_nettle_sha256init':
(.text+0xe9): undefined reference to nettle_sha256_init' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_digest.o): in function
__archive_nettle_sha1final':
(.text+0x111): undefined reference to nettle_sha1_digest' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_digest.o): in function
__archive_nettle_sha1update':
(.text+0x132): undefined reference to nettle_sha1_update' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_digest.o): in function
__archive_nettle_sha1init':
(.text+0x149): undefined reference to nettle_sha1_init' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_digest.o): in function
__archive_nettle_ripemd160final':
(.text+0x171): undefined reference to nettle_ripemd160_digest' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_digest.o): in function
__archive_nettle_ripemd160update':
(.text+0x192): undefined reference to nettle_ripemd160_update' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_digest.o): in function
__archive_nettle_ripemd160init':
(.text+0x1a9): undefined reference to nettle_ripemd160_init' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_digest.o): in function
__archive_nettle_md5final':
(.text+0x1d1): undefined reference to nettle_md5_digest' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_digest.o): in function
__archive_nettle_md5update':
(.text+0x1f2): undefined reference to nettle_md5_update' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_digest.o): in function
__archive_nettle_md5init':
(.text+0x209): undefined reference to nettle_md5_init' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_digest.o): in function
__archive_nettle_sha512update':
(.text+0x232): undefined reference to nettle_sha512_update' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_hmac.o): in function
__hmac_sha1_init':
(.text+0x92): undefined reference to nettle_hmac_sha1_set_key' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_hmac.o): in function
__hmac_sha1_final':
(.text+0x4d): undefined reference to nettle_hmac_sha1_digest' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_hmac.o): in function
__hmac_sha1_update':
(.text+0x6e): undefined reference to nettle_hmac_sha1_update' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_write_set_format_7zip.o): in function
compression_code_bzip2':
(.text+0x32d): undefined reference to BZ2_bzCompress' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_write_set_format_7zip.o): in function
compression_end_bzip2':
(.text+0xc0c): undefined reference to BZ2_bzCompressEnd' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libarchive.a(archive_write_set_format_7zip.o): in function
_7z_compression_init_encoder':`
Would be useful (for the situations when a dwarfs image is a part of another file) to have an ability to mount dwarfs images by offset. Like:
dwarfs -o offset=123 file mountpoint
Just like --no-owner
and --no-time
are useful to build "generic" images, it would also be useful to have an option that normalizes the file-system permissions. (At the moment they are take verbatim.)
Perhaps the easiest solution is the following:
--perms-norm
that basically only cares if any executability bit is set (be it user, group or others), and thus creates entries like r-x r-x r-x
or r-- r-- r--
;--perms-umask
that takes a octal value and basically caps the permissions; for example --perms-umask 007
would only generate r-x r-x ---
or r-- r-- ---
;The complete build log:
build.log.txt
It seems to upstream issue of Folly facebook/folly#1583
There are fix:
facebook/proxygen#361
It introduced when I upgrade the system.
I'm having a consistent problem with all filesystems created with -W
values smaller than 8. When I try to copy the mounted filesystem to another location, or read several files sequentially, the process shortly gets paused indefinitely, and the process manager shows dwarfs at 0% CPU usage. I'm attaching a small sample file created with options -S 26 -B 8 -W 4
I'm using dwarfs (v0.5.6-16-g7345578, fuse version 35)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.