Comments (23)
from bcachefs.
Thanks, I'll have a look at the dumps
from bcachefs.
Augh, the dumps are encrypted. The dump tool should ideally be removing the passphrase in the superblock, but that's going to take some work. I'll try working on that tomorrow...
from bcachefs.
So, having the dump tool run remove-passphrase after the dump is actually going to be a lot of work, due to how the qcow2 code works.
Would you be willing to run remove-passphase and then take another dump?
I also just pushed a patch to just go RO instead of having a BUG_ON(), so even with that issue you still may be able to get data off (actually... it probably won't be able to do journal replay... shit, that's another thing to work on...)
from bcachefs.
from bcachefs.
Actually, dumping after removing the passphrase fails with
recovering from clean shutdown, journal seq 18446744073709551615
journal read done, 3302874 keys in 15170 entries, seq 40848
superblock journal seq (18446744073709551615) doesn't match journal (40848) after clean shutdown, exiting
Unable to continue, halting
bcachefs: bch2_fs_recovery() Error in recovery: cannot allocate memory (1)
filesystem contains errors: please report this to the developers
mount with -o fix_errors to repair
bcachefs: bch2_fs_open() bch_fs_open err opening /dev/sda1: fsck error
error opening /dev/sda1: Invalid argument
(Turns out I still had the original file system (good to know how to use qemu-img though!), but doing it from the repaired dumps doesn't work either.)
Trying to use fsck on the file system with the passphrase removed just segfaults with this in the journal:
Mar 07 23:28:55 nixos kernel: bch_alloc[/dev/[5171]: segfault at 7fdcd3ad6b60 ip 00007fdcd8d5211f sp 00007fdcd3ad6a60 error 6 in libc-2.27.so[7fdcd8d24000+13d000]
Mar 07 23:28:55 nixos kernel: Code: 54 55 53 48 81 ec 20 21 00 00 8b 8f c0 00 00 00 85 c9 0f 85 5b 01 00 00 c7 87 c0 00 00 00 ff ff ff ff 48 8d 84 24 20 01 00 00 <48> 89 bc 24 00 01 00 00 48 89 fb c7 84 24 e0 00 00 00 ff ff ff ff
Apparently you can't use ctrl-c during the fsck because you get tired of the number of ys you have to enter or it'll segfault when you attempt for a second time?
Trying again with a 'fresh' image however the fsck fails with
bcachefs: libbcachefs/buckets.c:651: bch2_mark_pointer: Assertion `!((new.dirty_sectors) != _res)' failed.
Aborted
PS: You don't need to jump through any hoops to try and get my data back, I have a setup to make a backup to the cloud every 5 minutes.
from bcachefs.
from bcachefs.
from bcachefs.
from bcachefs.
Sorry, I have the bad habit of editing my comments after posting them, which obviously doesn't show up in your email.
The process seems to be to take that dump, remove the passphrase, and then run fsck a couple of times on it. That will always fail with either
bcachefs: libbcachefs/buckets.c:651: bch2_mark_pointer: Assertion `!((new.dirty_sectors) != _res)' failed.
Aborted
or the segfault.
(Trying to fsck a dump with the passphrase still set always immediately fails with the former.)
I used qemu-nbd instead of qemu-img convert to access them as it seemed easier.
Here are the original dumps with the passphrase removed.
partition A
partition B
from bcachefs.
That was the process for the segfault I added in the edit.
The process for the seq mismatch should be to just take the dumps linked in the above comment and try to dump them again.
from bcachefs.
so, when the suerblock clean shutdown section doesn't match the journal, we're trusting the superblock, not the journal. I can't think of any way in general to decide which we should trust, but in this case the superblock clean section has a garbage journal seq (U64_MAX, and I'd really like to know where that came from) so trusting the superblock instead of the journal may be the cause of some of the errors we're seeing later.
And, I reproduced that bucket sector count thing - it's an underflow. That's really weird - it doesn't matter what the filesystem contents are, that shouldn't be able to happen. But now I've got a repro it, so I should be able to track it down.
from bcachefs.
aha
it's when a compressed extent get split during journal replay
from bcachefs.
in this case the superblock clean section has a garbage journal seq (U64_MAX, and I'd really like to know where that came from)
It must be somewhere in the code called by remove-passphrase, since the same thing doesn't happen if I don't do that.
from bcachefs.
no, remove-passphrase doesn't actually start the filesystem, so it's not that. some sort of heisenbug.
anyways - fix is pushed
from bcachefs.
oh, that could certainly be caused by remove-passphrase.
will think on that one.
from bcachefs.
Are you able to reproduce the garbage journal seq in the superblock clean section? Because I added an assertion, if you can trigger it again we should be able to see where it's coming from
from bcachefs.
Here you go.
bcachefs: libbcachefs/super-io.c:1025: bch2_fs_mark_clean: Assertion `!(((__u64)(__le64)(sb_clean->journal_seq)) > ((s64)(((u64)~0ULL)>>1)))' failed.
Aborted
from bcachefs.
from bcachefs.
If I did it right this should be the backtrace:
#0 0x00007ffff7ab1be0 in raise () from /nix/store/681354n3k44r8z90m35hm8945vsp95h1-glibc-2.27/lib/libc.so.6
#1 0x00007ffff7ab2dc1 in abort () from /nix/store/681354n3k44r8z90m35hm8945vsp95h1-glibc-2.27/lib/libc.so.6
#2 0x00007ffff7aaa6e7 in __assert_fail_base () from /nix/store/681354n3k44r8z90m35hm8945vsp95h1-glibc-2.27/lib/libc.so.6
#3 0x00007ffff7aaa792 in __assert_fail () from /nix/store/681354n3k44r8z90m35hm8945vsp95h1-glibc-2.27/lib/libc.so.6
#4 0x0000000000478cef in bch2_fs_mark_clean ()
#5 0x000000000047bb1d in bch2_fs_read_only ()
#6 0x000000000047c6a5 in bch2_fs_stop ()
#7 0x000000000040a55c in cmd_remove_passphrase ()
#8 0x0000000000405506 in main ()
Do you need me to do it with debugging symbols or are the addresses good enough for you?
from bcachefs.
The new assertion is triggered both by set-passphrase and remove-passphrase, the functionality of those commands however seems unaffected apart from that.
from bcachefs.
I just pulled the fix into bcachefs-tools - can you try now?
from bcachefs.
That did the job.
from bcachefs.
Related Issues (20)
- can not delete snaptshots and FS turns into ro HOT 8
- mariadb hang on bcachefs HOT 8
- Making snapshots usally lead to big changing files be corrupted ,even origin files HOT 9
- After intensive I/O operations I/O locked up [e0b7ba4acdca] HOT 3
- "copygc requested to run but found no buckets to move! dev 0 fragmented 672 allowed 0" [85619448f54e] HOT 3
- umount returns 0 but fs is still mounted [85619448f54e] HOT 1
- Can't boot kernel, BUG: unable to handle page fault for address [30d4735284f9] HOT 1
- Kernel panic on mount in btree_path_overflow: 30d4735 HOT 5
- kernel BUG at fs/bcachefs/btree_iter.c:284! [30d4735284f9] HOT 1
- source missing "include/linux/seq_buf.h" / Can not disable "CONFIG_TIME_STATS" and build failed HOT 1
- ERO after "error validating btree node at btree lru level 0/1" [630ba749] HOT 4
- Mounting with "ro,nochanges" does not disable fsck. HOT 5
- bcachefs format doesn't really do anything HOT 5
- No bcachefs format of /dev/sdb1 and/or no mount possible on Linux Mint 21. HOT 3
- How to convert ext4 to his potential sucessor bcachefs, on Linux Mint 21? HOT 2
- How to convert BTRFS to his potential sucessor bcachefs, on Linux Mint 21?
- How to convert ZFS to his potential sucessor bcachefs, on Linux Mint 21
- The large HDD (6T) of Hadoop server has bcachefs deadlock (System hanging) HOT 5
- The migration of which fs types are supportet by "bcachef-tools migration" and does they are one sample for how to use it?
- kernel BUG at fsck.c [81ecf67] HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bcachefs.