Giter VIP home page Giter VIP logo

Comments (23)

koverstreet avatar koverstreet commented on June 12, 2024 1

from bcachefs.

koverstreet avatar koverstreet commented on June 12, 2024

Thanks, I'll have a look at the dumps

from bcachefs.

koverstreet avatar koverstreet commented on June 12, 2024

Augh, the dumps are encrypted. The dump tool should ideally be removing the passphrase in the superblock, but that's going to take some work. I'll try working on that tomorrow...

from bcachefs.

koverstreet avatar koverstreet commented on June 12, 2024

So, having the dump tool run remove-passphrase after the dump is actually going to be a lot of work, due to how the qcow2 code works.

Would you be willing to run remove-passphase and then take another dump?

I also just pushed a patch to just go RO instead of having a BUG_ON(), so even with that issue you still may be able to get data off (actually... it probably won't be able to do journal replay... shit, that's another thing to work on...)

from bcachefs.

koverstreet avatar koverstreet commented on June 12, 2024

from bcachefs.

hyperfekt avatar hyperfekt commented on June 12, 2024

Actually, dumping after removing the passphrase fails with

recovering from clean shutdown, journal seq 18446744073709551615
journal read done, 3302874 keys in 15170 entries, seq 40848
superblock journal seq (18446744073709551615) doesn't match journal (40848) after clean shutdown, exiting
Unable to continue, halting
bcachefs: bch2_fs_recovery() Error in recovery: cannot allocate memory (1)
filesystem contains errors: please report this to the developers
mount with -o fix_errors to repair
bcachefs: bch2_fs_open() bch_fs_open err opening /dev/sda1: fsck error
error opening /dev/sda1: Invalid argument

(Turns out I still had the original file system (good to know how to use qemu-img though!), but doing it from the repaired dumps doesn't work either.)
Trying to use fsck on the file system with the passphrase removed just segfaults with this in the journal:

Mar 07 23:28:55 nixos kernel: bch_alloc[/dev/[5171]: segfault at 7fdcd3ad6b60 ip 00007fdcd8d5211f sp 00007fdcd3ad6a60 error 6 in libc-2.27.so[7fdcd8d24000+13d000]
Mar 07 23:28:55 nixos kernel: Code: 54 55 53 48 81 ec 20 21 00 00 8b 8f c0 00 00 00 85 c9 0f 85 5b 01 00 00 c7 87 c0 00 00 00 ff ff ff ff 48 8d 84 24 20 01 00 00 <48> 89 bc 24 00 01 00 00 48 89 fb c7 84 24 e0 00 00 00 ff ff ff ff

Apparently you can't use ctrl-c during the fsck because you get tired of the number of ys you have to enter or it'll segfault when you attempt for a second time?
Trying again with a 'fresh' image however the fsck fails with

bcachefs: libbcachefs/buckets.c:651: bch2_mark_pointer: Assertion `!((new.dirty_sectors) != _res)' failed.
Aborted

PS: You don't need to jump through any hoops to try and get my data back, I have a setup to make a backup to the cloud every 5 minutes.

from bcachefs.

koverstreet avatar koverstreet commented on June 12, 2024

from bcachefs.

koverstreet avatar koverstreet commented on June 12, 2024

from bcachefs.

koverstreet avatar koverstreet commented on June 12, 2024

from bcachefs.

hyperfekt avatar hyperfekt commented on June 12, 2024

Sorry, I have the bad habit of editing my comments after posting them, which obviously doesn't show up in your email.
The process seems to be to take that dump, remove the passphrase, and then run fsck a couple of times on it. That will always fail with either

bcachefs: libbcachefs/buckets.c:651: bch2_mark_pointer: Assertion `!((new.dirty_sectors) != _res)' failed.
Aborted

or the segfault.
(Trying to fsck a dump with the passphrase still set always immediately fails with the former.)

I used qemu-nbd instead of qemu-img convert to access them as it seemed easier.

Here are the original dumps with the passphrase removed.
partition A
partition B

from bcachefs.

hyperfekt avatar hyperfekt commented on June 12, 2024

That was the process for the segfault I added in the edit.

The process for the seq mismatch should be to just take the dumps linked in the above comment and try to dump them again.

from bcachefs.

koverstreet avatar koverstreet commented on June 12, 2024

so, when the suerblock clean shutdown section doesn't match the journal, we're trusting the superblock, not the journal. I can't think of any way in general to decide which we should trust, but in this case the superblock clean section has a garbage journal seq (U64_MAX, and I'd really like to know where that came from) so trusting the superblock instead of the journal may be the cause of some of the errors we're seeing later.

And, I reproduced that bucket sector count thing - it's an underflow. That's really weird - it doesn't matter what the filesystem contents are, that shouldn't be able to happen. But now I've got a repro it, so I should be able to track it down.

from bcachefs.

koverstreet avatar koverstreet commented on June 12, 2024

aha

it's when a compressed extent get split during journal replay

from bcachefs.

hyperfekt avatar hyperfekt commented on June 12, 2024

in this case the superblock clean section has a garbage journal seq (U64_MAX, and I'd really like to know where that came from)

It must be somewhere in the code called by remove-passphrase, since the same thing doesn't happen if I don't do that.

from bcachefs.

koverstreet avatar koverstreet commented on June 12, 2024

no, remove-passphrase doesn't actually start the filesystem, so it's not that. some sort of heisenbug.

anyways - fix is pushed

from bcachefs.

koverstreet avatar koverstreet commented on June 12, 2024

oh, that could certainly be caused by remove-passphrase.

will think on that one.

from bcachefs.

koverstreet avatar koverstreet commented on June 12, 2024

Are you able to reproduce the garbage journal seq in the superblock clean section? Because I added an assertion, if you can trigger it again we should be able to see where it's coming from

from bcachefs.

hyperfekt avatar hyperfekt commented on June 12, 2024

Here you go.

bcachefs: libbcachefs/super-io.c:1025: bch2_fs_mark_clean: Assertion `!(((__u64)(__le64)(sb_clean->journal_seq)) > ((s64)(((u64)~0ULL)>>1)))' failed.
Aborted

from bcachefs.

koverstreet avatar koverstreet commented on June 12, 2024

from bcachefs.

hyperfekt avatar hyperfekt commented on June 12, 2024

If I did it right this should be the backtrace:

#0  0x00007ffff7ab1be0 in raise () from /nix/store/681354n3k44r8z90m35hm8945vsp95h1-glibc-2.27/lib/libc.so.6
#1  0x00007ffff7ab2dc1 in abort () from /nix/store/681354n3k44r8z90m35hm8945vsp95h1-glibc-2.27/lib/libc.so.6
#2  0x00007ffff7aaa6e7 in __assert_fail_base () from /nix/store/681354n3k44r8z90m35hm8945vsp95h1-glibc-2.27/lib/libc.so.6
#3  0x00007ffff7aaa792 in __assert_fail () from /nix/store/681354n3k44r8z90m35hm8945vsp95h1-glibc-2.27/lib/libc.so.6
#4  0x0000000000478cef in bch2_fs_mark_clean ()
#5  0x000000000047bb1d in bch2_fs_read_only ()
#6  0x000000000047c6a5 in bch2_fs_stop ()
#7  0x000000000040a55c in cmd_remove_passphrase ()
#8  0x0000000000405506 in main ()

Do you need me to do it with debugging symbols or are the addresses good enough for you?

from bcachefs.

hyperfekt avatar hyperfekt commented on June 12, 2024

The new assertion is triggered both by set-passphrase and remove-passphrase, the functionality of those commands however seems unaffected apart from that.

from bcachefs.

koverstreet avatar koverstreet commented on June 12, 2024

I just pulled the fix into bcachefs-tools - can you try now?

from bcachefs.

hyperfekt avatar hyperfekt commented on June 12, 2024

That did the job.

from bcachefs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.