Comments (28)
Is the log_090-20240301-205905.log file stored in the root dir?
yes (everything is in the root dir in our case)
Has anything changed hardware-wise?
nope, we've been using the same block layer for ages
Is it possible because of the increased writing this is the first time littlefs has wrapped around storage due to wear-leveling?
that is certainly a possibility.
How large is your storage?
in this case - 448 KB (we have devies with varying fs sizes). about half of it is one big file (index.html.gz), and apart from that it's a dozen or so small files - config, some storage... and the recent addition are the log files, a set of at most 4, at most 4K each.
see if the last blocks on the storage contain any interesting data
no, last 2 blocks appear to be data blocks (contain what appears to be valid failes - a log file and a config json file respectively).
if any other superblocks (blocks with "littlefs" at offset=8 bytes) exist on the system
no, in fact string littlefs
does not appear in the entire image
from littlefs.
assert that it never returns block 0 or 1
this would only work if you don't format
we only format externally, when preparing an image, so i can differentiate between usage by device vs mklfs (our image creation tool).
i added the assertion to lfs_alloc:
diff --git a/littlefs/lfs.c b/littlefs/lfs.c
index 105915b..d9ad91f 100644
--- a/littlefs/lfs.c
+++ b/littlefs/lfs.c
@@ -662,7 +662,11 @@ static int lfs_alloc(lfs_t *lfs, lfs_block_t *block) {
lfs->free.i += 1;
lfs->free.ack -= 1;
}
-
+// Debug LFS corruption
+// https://github.com/littlefs-project/littlefs/issues/953#issuecomment-1984394787
+#ifndef LFS_TOOLS_BUILD
+ LFS_ASSERT(!(*block == 0 || *block == 1));
+#endif
return 0;
}
}
also dropped LFS_NO_ASSERT
(which we had to save space). will try over the next few days and report.
from littlefs.
re-sent with a link instead
from littlefs.
What exact commit hash are you on? The line numbers don't seem to quite line up with 2.8.2.
it's 2.8.2 + assertion + lfs_probe from #947
Here it is, for reference: https://gist.github.com/rojer/cc5fd75cd65a7c64df85c73287616bcb
Do you ever move the lfs_file_t struct?
no, it is allocated on open and never moved while the file is open.
Does the lfs_file_t's .type field make sense?
i think it does. at least doesn't look obviously stomped over:
(gdb) frame 16
#16 0x420445ea in lfs_file_close (lfs=lfs@entry=0x3fca9538, file=file@entry=0x3fcc2ad0) at lfs.c:6067
6067 lfs.c: No such file or directory.
(gdb) print *file
$1 = {next = 0x0, id = 23, type = 1 '\001', m = {pair = {1, 0}, rev = 512, off = 4096, etag = 1343224952, count = 26, erased = false, split = false, tail = {4294967295, 4294967295}}, ctz = {head = 116, size = 136}, flags = 66818, pos = 136, block = 116, off = 136, cache = {block = 4294967295, off = 128, size = 8,
buffer = 0x3fcc2c64 '\377' <repeats 128 times>, "T"}, cfg = 0x3c170d38 <defaults>}
(gdb) print *lfs
$1 = {rcache = {block = 4294967295, off = 128, size = 128, buffer = 0x3fca95c0 '\377' <repeats 128 times>, "\200"}, pcache = {block = 0, off = 0, size = 24, buffer = 0x3fca9644 "\001\002"}, root = {117, 118}, mlist = 0x3fcc2ad0, seed = 596460605, gstate = {tag = 0, pair = {0, 0}}, gdisk = {tag = 0, pair = {0, 0}},
gdelta = {tag = 0, pair = {0, 0}}, free = {off = 116, size = 256, i = 3, ack = 254, buffer = 0x3fca96c8}, cfg = 0x3fca94e8, block_count = 256, name_max = 255, file_max = 2147483647, attr_max = 1022}
from littlefs.
no, we're ok wrt stack: 6K and 4K free out of 8K total. additionally, we have a stack canary watchpoint at the end of he stack (CONFIG_FREERTOS_WATCHPOINT_END_OF_STACK=y
if you are familiar with idf) - it causes and exception in case of stack overflow.
from littlefs.
we have an early indication that it is caused by a change between 2.7.0 and 2.8.2 - we managed to isolate what we think is a reliable repro for this, and going back to 2.7.0 code (with the same fs state), makes it go away (at least the lfs_prog assertion).
this needs more work to fully confirm (the week ended, leaving us with a bit of a cliffhanger there :) ) but if you focus on the changes between 2.7.0 and 2.8.2, it might be there somewhere.
from littlefs.
From an out-of-band discussion, thanks to work from Nikola Kosturski and @rojer a reproducible test case for the missing magic was found.
What's interesting is this turned out to not actually be an erroneous case as far as lfs_mount
is concerned. lfs_mount
is happy as long as a valid superblock entry exists anywhere in the superblock chain.
But this is an issue in terms of documentation and expected behavior. I've put up #959 to try to fix the state of things.
There is still an issue with the superblock entry disappearing entirely, which shouldn't happen and cause is unknown.
from littlefs.
make that three: another device has just succumbed to the same issue, whatever it is.
we are using 2.8.2 in disk format 2.0 mode.
the only somewhat relevant change in the environment has been file logging: we started logging to filesystem, which would increase write activity and fs utilization significantly.
from littlefs.
there is a slight suspicion that #901 may have been a catalyst for it, though it needs confirmation.
from littlefs.
Hmmm, this certainly looks like an unrelated mdir is being written to blocks 0x{0,1}. That at least rules out CTZ skip-list issues. Is the log_090-20240301-205905.log file stored in the root dir?
To be honest this sounds a lot like a driver issue, though I also know from other discussion you've been using littlefs for a while. Has anything changed hardware-wise? Another user had a similar issue that turned out to be using the wrong chip which led to silent address truncation, for example.
Is it possible because of the increased writing this is the first time littlefs has wrapped around storage due to wear-leveling? How large is your storage?
It would be interesting to see if the last blocks on the storage contain any interesting data.
there is a slight suspicion that #901 may have been a catalyst for it, though it needs confirmation.
I agree, it'd be interesting to see if any other superblocks (blocks with "littlefs" at offset=8 bytes) exist on the system. Well, if these devices started with blank storage.
It could also just be wrap-around related.
If you have devices under test still, it would be interesting to put an assert in lfs_alloc
(here) to assert that it never returns block 0 or 1 (actually why don't we already have this?).
I can share full dumps privately, please feel free to reach out.
Feel free to send it to geky at geky.net, though no promises I'll be able to find anything. It's a bit hard to understand a filesystem when the superblock is gone, since you don't know what blocks are out-of-date.
Aside from the above points, the only other thing I can think of would be to use ./scripts/readmdir.py
to try to see if any mdirs contain a pointer to blocks 0, 1, which shouldn't happen normally.
from littlefs.
to assert that it never returns block 0 or 1 (actually why don't we already have this?)
Ah, correction, we rely on lfs_alloc
returning blocks 0, 1 in lfs_format_
. So this would only work if you don't format.
from littlefs.
i sent filesystem dumps by email, along with some logs
from littlefs.
see if the last blocks on the storage contain any interesting data
no, last 2 blocks appear to be data blocks (contain what appears to be valid failes - a log file and a config json file respectively).
Well that rules out address truncation, though it might still be that some layer thinks the block device is larger than it actually is.
if any other superblocks (blocks with "littlefs" at offset=8 bytes) exist on the system
no, in fact string littlefs does not appear in the entire image
That makes an issue with #901 seem less likely. Adding more superblocks should make the "littlefs" string appear more often, not less.
i sent filesystem dumps by email, along with some logs
Hmm, I haven't received these yet. Not in spam either. GitHub is also sending notification updates through this domain just fine.
Maybe try sending an email with no attachments first? Gmail may just be being a butt...
from littlefs.
Two other possibilities come to mind:
-
in this case - 448 KB (we have devies with varying fs sizes)
It could be a lookahead buffer vs block count alignment issue (in littlefs). The block allocator/lookahead buffer was changed a bit recently. We have tests over this, but there can always be holes in tests.
What
lookahead_size
are you using? I'm assumingblock_count=112
for the 448 KiB. -
It could be something internally is causing littlefs to delete id 0 in the superblock, which is what the string "littlefs" and all the superblock info is attached to.
Though I'm not sure what could cause trigger this if it happens without powerloss...
from littlefs.
ok, so we did not get any crashes on the block alloc assertion, bu we are still getting corruptions where superblock ceases to be super. we added this to our write function:
if (block < 2 && off == 0) {
struct shos_str content = SHOS_MK_STR_N((const char *) buffer, size);
struct shos_str lfs = SHOS_MK_STR("littlefs");
assert(shos_strstr(content, lfs) != NULL);
}
which basically is - crash if writing block 0 or 1 and the first 128 byes (our io size) do not contain littlefs
.
and we get quite a few of these, for example:
...
#2 0x403924c6 in __assert_func (file=<optimized out>, file@entry=0x3c170a7d "shos_vfs_lfs.c", line=line@entry=69, func=<optimized out>, func@entry=0x3c170a3c <__func__.14970> "shos_lfs_prog", expr=<optimized out>, expr@entry=0x3c170a58 "shos_strstr(content, lfs) != NULL") at assert.c:85
#3 0x4203ffe8 in shos_lfs_prog (c=0x3fca94e8, block=0, off=0, buffer=0x3fca9644, size=128) at shos_vfs_lfs.c:69
#4 shos_lfs_prog (c=0x3fca94e8, block=0, off=0, buffer=0x3fca9644, size=128) at shos_vfs_lfs.c:64
#5 0x42041e30 in lfs_bd_flush (lfs=lfs@entry=0x3fca9538, pcache=pcache@entry=0x3fca9548, rcache=rcache@entry=0x3fca9538, validate=validate@entry=false) at lfs.c:181
#6 0x420422b4 in lfs_bd_sync (validate=false, rcache=0x3fca9538, pcache=0x3fca9548, lfs=0x3fca9538) at lfs.c:215
#7 lfs_dir_commitcrc (lfs=lfs@entry=0x3fca9538, commit=commit@entry=0x3fc9a0f8 <shos_task_stack+5620>) at lfs.c:1725
#8 0x42042e12 in lfs_dir_compact (lfs=lfs@entry=0x3fca9538, dir=dir@entry=0x3fc9a210 <shos_task_stack+5900>, attrs=attrs@entry=0x3fc9a2b0 <shos_task_stack+6060>, attrcount=attrcount@entry=2, source=source@entry=0x3fc9a210 <shos_task_stack+5900>, begin=begin@entry=0, end=end@entry=0) at lfs.c:2026
#9 0x42042ffc in lfs_dir_splittingcompact (begin=0, end=0, source=0x3fc9a210 <shos_task_stack+5900>, attrcount=2, attrs=0x3fc9a2b0 <shos_task_stack+6060>, dir=0x3fc9a210 <shos_task_stack+5900>, lfs=0x3fca9538) at lfs.c:2180
#10 lfs_dir_relocatingcommit (lfs=lfs@entry=0x3fca9538, dir=dir@entry=0x3fc9a210 <shos_task_stack+5900>, pair=pair@entry=0x3fcc2ad8, attrs=attrs@entry=0x3fc9a2b0 <shos_task_stack+6060>, attrcount=attrcount@entry=2, pdir=pdir@entry=0x3fc9a230 <shos_task_stack+5932>) at lfs.c:2298
#11 0x42043438 in lfs_dir_orphaningcommit (lfs=lfs@entry=0x3fca9538, dir=0x3fcc2ad8, attrs=0x3fc9a2b0 <shos_task_stack+6060>, attrcount=2) at lfs.c:2379
#12 0x420438ee in lfs_dir_commit (lfs=0x3fca9538, dir=<optimized out>, attrs=<optimized out>, attrcount=<optimized out>) at lfs.c:2551
#13 0x42043af2 in lfs_file_rawsync (file=0x3fcc2ad0, lfs=0x3fca9538) at lfs.c:3409
#14 lfs_file_rawsync (lfs=0x3fca9538, file=0x3fcc2ad0) at lfs.c:3373
#15 0x42043b2c in lfs_file_rawclose (lfs=lfs@entry=0x3fca9538, file=file@entry=0x3fcc2ad0) at lfs.c:3190
#16 0x420445ea in lfs_file_close (lfs=lfs@entry=0x3fca9538, file=file@entry=0x3fcc2ad0) at lfs.c:6067
#17 0x4204030e in shos_vfs_fs_lfs_close (fs=<optimized out>, fd=<optimized out>) at shos_vfs_lfs.c:364
...
contents of the buffer
passed to shos_lfs_prog (frame 3):
00000000 01 02 00 00 9f e0 03 f7 75 00 00 00 76 00 00 00 |........u...v...|
00000010 30 10 00 64 be b2 f8 b2 ff ff ff ff ff ff ff ff |0..d............|
00000020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000030 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000040 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000050 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000060 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
another one:
...
#2 0x40097d2c in __assert_func (file=<optimized out>, line=<optimized out>, func=<optimized out>, expr=<optimized out>) at assert.c:85
#3 0x40105c2b in shos_lfs_prog (c=0x3ffcd248, block=0, off=0, buffer=0x3ffcd3a4, size=128) at shos_vfs_lfs.c:64
#4 shos_lfs_prog (c=0x3ffcd248, block=0, off=0, buffer=0x3ffcd3a4, size=128) at shos_vfs_lfs.c:64
#5 0x401073e8 in lfs_bd_flush (lfs=0x3ffcd298, pcache=0x3ffcd2a8, rcache=0x3ffcd298, validate=false) at lfs.c:181
#6 0x401074dc in lfs_bd_prog (size=0, buffer=<optimized out>, off=<optimized out>, block=<optimized out>, validate=<optimized out>, rcache=<optimized out>, pcache=<optimized out>, lfs=<optimized out>) at lfs.c:251
#7 lfs_bd_prog (lfs=0x3ffcd298, pcache=0x3ffcd2a8, rcache=0x3ffcd298, validate=false, block=0, off=<optimized out>, buffer=<optimized out>, size=1) at lfs.c:227
#8 0x40107525 in lfs_dir_commitprog (lfs=0x3ffcd298, commit=0x3ffc4000 <shos_task_stack+7188>, buffer=0x3ffc3ea4 <shos_task_stack+6840>, size=1) at lfs.c:1559
#9 0x401075d4 in lfs_dir_commitattr (lfs=0x3ffcd298, commit=0x3ffc4000 <shos_task_stack+7188>, tag=<optimized out>, buffer=0x3ffc3f88 <shos_task_stack+7068>) at lfs.c:1608
#10 0x401075f1 in lfs_dir_commit_commit (p=0x3ffc402c <shos_task_stack+7232>, tag=2348810469, buffer=0x3ffc3f88 <shos_task_stack+7068>) at lfs.c:1887
#11 0x40106aa5 in lfs_dir_traverse (lfs=0x3ffcd298, dir=0x3ffc4130 <shos_task_stack+7492>, off=<optimized out>, ptag=2348810469, attrs=0x3ffc41e0 <shos_task_stack+7668>, attrcount=2, tmask=tmask@entry=1074789376, begin=begin@entry=0, end=end@entry=23, diff=diff@entry=0,
cb=cb@entry=0x401075e0 <lfs_dir_commit_commit>, data=data@entry=0x3ffc402c <shos_task_stack+7232>, ttag=0) at lfs.c:1039
#12 0x40108110 in lfs_dir_compact (lfs=0x3ffcd298, dir=0x3ffc4130 <shos_task_stack+7492>, attrs=0x3ffc41e0 <shos_task_stack+7668>, attrcount=2, source=0x3ffc4130 <shos_task_stack+7492>, begin=<optimized out>, end=end@entry=23) at lfs.c:1969
#13 0x4010831e in lfs_dir_splittingcompact (begin=0, end=23, source=0x3ffc4130 <shos_task_stack+7492>, attrcount=2, attrs=0x3ffc41e0 <shos_task_stack+7668>, dir=0x3ffc4130 <shos_task_stack+7492>, lfs=0x3ffcd298) at lfs.c:2180
#14 lfs_dir_relocatingcommit (lfs=0x3ffcd298, dir=0x3ffc4130 <shos_task_stack+7492>, pair=0x3ffd6d90, attrs=0x3ffc41e0 <shos_task_stack+7668>, attrcount=2, pdir=<optimized out>) at lfs.c:2298
#15 0x4010873a in lfs_dir_orphaningcommit (lfs=0x3ffcd298, dir=0x3ffd6d90, attrs=0x3ffc41e0 <shos_task_stack+7668>, attrcount=2) at lfs.c:2379
#16 0x40108bac in lfs_dir_commit (lfs=0x3ffcd298, dir=0x3ffd6d90, attrs=0x3ffc41e0 <shos_task_stack+7668>, attrcount=2) at lfs.c:2551
#17 0x40108d58 in lfs_file_rawsync (file=0x3ffd6d88, lfs=0x3ffcd298) at lfs.c:3409
#18 lfs_file_rawsync (lfs=0x3ffcd298, file=0x3ffd6d88) at lfs.c:3373
#19 0x4010972f in lfs_file_sync (lfs=0x3ffcd298, file=0x3ffd6d88) at lfs.c:6083
#20 0x40105e81 in shos_vfs_fs_lfs_fsync (fs=<optimized out>, fd=0) at shos_vfs_lfs.c:380
#21 0x40104bb1 in shos_vfs_fsync (vfd=256) at shos_vfs.c:433
#22 0x400e84ad in esp_vfs_fsync (fd=<optimized out>) at vfs.c:565
...
...
buffer:
0000000 b5 00 00 00 f3 ff ff 1a ff ff ff ff ff ff ff ff |................|
00000010 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000030 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000040 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000050 ff ff ff ff ff ff ff ff ff ff ff 10 1f e0 78 40 |..............x@|
00000060 00 00 0a 63 6f 6e 66 39 2e 6a 73 6f 6e 20 30 00 |...conf9.json 0.|
00000070 02 1a 00 00 00 d2 06 00 00 6f d0 48 08 1f ff a8 |.........o.H....|
does this help? we have memory dumps corresponding to these but don't quite know what to look for.
from littlefs.
which basically is - crash if writing block 0 or 1 and the first 128 byes (our io size) do not contain
littlefs
.
Smart! It is only asserting when the cache is flushed, which is a bit removed from when the new metadata is "staged" (written to pcache), but this is still useful info.
You could add this assert to littlefs's internal lfs_bd_read
function, but I'm not sure it would really give us more than what you've found.
An aside: What exact commit hash are you on? The line numbers don't seem to quite line up with 2.8.2.
One thing to note is this stack trace is going through a normal compaction (lfs_dir_compact in lfs_dir_splittingcompact), NOT the expanding superblock compaction route. This is another point against a #901 bug, though it's still not impossible.
I'm writing as I'm looking into this, sorry if my thoughts are out of order or lead nowhere.
9f e0 03 f7
This is a very strange tag. The super-type 0x100 (0x9fe & 0x700) should be internal-only and never written to disk. But we never use the sub-type 0xfe with LFS_TYPE_FROM. Maybe this is caused by an underflow somewhere? Not sure how this tag is possible but it looks too much like a tag to ignore.
00000000 01 02 00 00 9f e0 03 f7 75 00 00 00 76 00 00 00 |........u...v...|
00000010 30 10 00 64 be b2 f8 b2 ff ff ff ff ff ff ff ff |0..d............|
This looks like a checksum tag (crc + padding) but... isn't a checksum tag? (0x5xx -> checksum, 0x3xx -> userattr?). Do yours file have userattrs that end in a bunch of 0xffs?
Looking at the second dump:
00000000 b5 00 00 00 f3 ff ff 1a ff ff ff ff ff ff ff ff |................|
00000010 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000030 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000040 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000050 ff ff ff ff ff ff ff ff ff ff ff 10 1f e0 78 40 |..............x@|
00000060 00 00 0a 63 6f 6e 66 39 2e 6a 73 6f 6e 20 30 00 |...conf9.json 0.|
00000070 02 1a 00 00 00 d2 06 00 00 6f d0 48 08 1f ff a8 |.........o.H....|
This commit is just a mess. The tag f3 ff ff 1a
has an immense size that is probably an underflow/uninitialized/etc. The 00 00 0a 63
tag is at least a proper looking name tag, but the preceding bytes don't make sense as far as I can tell.
On second thought, moving this assert into littlefs's internal lfs_bd_read
function may be useful to at least rule out driver/hardware related issues.
Something else I'm concerned about: conf9.json appears to reside in id=0. In blocks 0,1 this should normally be the superblock entry.
In lfs_file_t
, .m.pair={0,1} and .id=0 should NEVER happen. It may be fruitful to add an assert on this in lfs_file_sync
.
I'm not really sure what could cause this. Do you ever move the lfs_file_t
struct? Does the lfs_file_t
's .type field make sense? (littlefs always sets this to LFS_TYPE_REG, 0x001, otherwise maybe memory corruption?)
from littlefs.
and this is for the second one:
(gdb) frame 19
#19 0x4010972f in lfs_file_sync (lfs=0x3ffcd298, file=0x3ffd6d88) at lfs.c:6083
6083 lfs.c: No such file or directory.
(gdb) print *file
$1 = {next = 0x0, id = 14, type = 1 '\001', m = {pair = {1, 0}, rev = 180, off = 4096, etag = 1343224922, count = 23, erased = true, split = false, tail = {4294967295, 4294967295}}, ctz = {head = 25, size = 3354}, flags = 67842, pos = 3354, block = 25, off = 3354, cache = {block = 4294967295, off = 3328, size = 26,
buffer = 0x3ffd7000 '\377' <repeats 128 times>, " "}, cfg = 0x3f410b74 <defaults>}
(gdb) print *lfs
$1 = {rcache = {block = 1, off = 3968, size = 128, buffer = 0x3ffcd320 "\020\037\340x@"}, pcache = {block = 0, off = 0, size = 128, buffer = 0x3ffcd3a4 "\265"}, root = {1, 0}, mlist = 0x3ffd6d88, seed = 2227447927, gstate = {tag = 0, pair = {0, 0}}, gdisk = {tag = 0, pair = {0, 0}}, gdelta = {tag = 0, pair = {0,
0}}, free = {off = 106, size = 112, i = 33, ack = 112, buffer = 0x3ffcd428}, cfg = 0x3ffcd248, block_count = 112, name_max = 255, file_max = 2147483647, attr_max = 1022}
from littlefs.
Hmm, it doesn't look corrupted. So at least arbitrary memory corruption is unlikely.
It is weird that the id=14/23, but the tag's id ends up 0.
This may be a stretch, but have you checked for stack overflow? The commit sequence of code gets quite deep. A change in stack depth could also be explained by a littlefs version change.
Will dig more...
from littlefs.
I stand corrected, I forgot about the tag xoring. These dumps make more sense than I thought.
Running through readmdir.py
:
$ ./scripts/readmdir.py dump1 512 0 -a -T
mdir {0x0} rev 513 -> {0x75, 0x76}
off tag type id len
00000008: 601ffc08 hardtail . 8
00000008: 75 00 00 00 76 00 00 00 u...v...
00000014: 500ffc6c ccrc 0x0 . 108
00000014: be b2 f8 b2 ff ff ff ff ff ff ff ff ff ff ff ff ................
00000024: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
00000034: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
00000044: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
00000054: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
00000064: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
00000074: ff ff ff ff ff ff ff ff ff ff ff ff ............
...
This is a well formed tail commit, but the superblock is missing. And the file contents are missing?
$ ./scripts/readmdir.py dump2 512 0 -a -T
mdir {0x0} rev 181 (corrupted!)
off tag type id len
00000008: 0c0000e5 name 0xc0 0 229
00000008: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
00000018: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
00000028: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
00000038: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
00000048: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
00000058: ff ff ff 10 1f e0 78 40 00 00 0a 63 6f 6e 66 39 [email protected]
00000068: 2e 6a 73 6f 6e 20 30 00 02 1a 00 00 00 d2 06 00 .json 0.........
00000078: 00 6f d0 48 08 1f ff a8 ff ff ff ff ff ff ff ff .o.H............
00000088: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
00000098: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
000000a8: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
000000b8: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
000000c8: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
000000d8: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
000000e8: ff ff ff ff ff .....
...
This commit makes less sense. The type 0x0c0
doesn't exist, you would expect the superblock type here 0x0ff
. Its contents is are all 0xffs which also doesn't make sense, and it seems to run over the following tags, though that's likely just because the size was corrupted.
It's curious that these commits are doing very different things...
from littlefs.
it's 2.8.2 + assertion + lfs_probe from #947
I forgot I asked you to add asserts, that would offset the line numbers a bit wouldn't it...
we have an early indication that it is caused by a change between 2.7.0 and 2.8.2
Looking at the changes, the biggest ones are making block_count
optional and some cleanup to lfs_alloc
. Both could be culprits. I wonder if a lfs->cfg->block_count
-> lfs->block_count
change was missed somewhere...
we managed to isolate what we think is a reliable repro for this, and going back to 2.7.0 code
A reproducible example would be quite a good find :)
from littlefs.
In the context of #959, first dump makes perfect sense now. It's a superblock mdir with the superblock entry deleted (but valid tail pointer pointing to the new superblock).
The second dump still does not make sense:
$ ./scripts/readmdir.py dump2 512 0 -a -T
mdir {0x0} rev 181 (corrupted!)
off tag type id len
00000008: 0c0000e5 name 0xc0 0 229
00000008: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
00000018: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
00000028: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
00000038: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
00000048: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
00000058: ff ff ff 10 1f e0 78 40 00 00 0a 63 6f 6e 66 39 [email protected]
00000068: 2e 6a 73 6f 6e 20 30 00 02 1a 00 00 00 d2 06 00 .json 0.........
00000078: 00 6f d0 48 08 1f ff a8 ff ff ff ff ff ff ff ff .o.H............
00000088: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
00000098: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
000000a8: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
000000b8: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
000000c8: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
000000d8: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
000000e8: ff ff ff ff ff .....
...
It looks like the magic tag is being corrupted somehow.
The fact that there also looks like some remnants of a file (conf9.json) in the superblock suggests either superblock expansion hasn't occured, or the error is during superblock expansion.
Thinking out loud, I wonder if the block ends up not erased. This corruption could come from flash masking if multiple progs occur without an erase...
Also a random though, but to make results show up quicker, you could drop the block_cycles
very low, say block_cycles=1
. This is what we do in the local littlefs tests.
from littlefs.
Double checking block erase during superblock expansion: It seems hard for hard for littlefs to miss an erase in this case. It's the first thing we do in lfs_dir_compact
after creating the new superblock:
https://github.com/littlefs-project/littlefs/blob/4dd30c1b8f1b416633fe63a338ede8934b6449a9/lfs.c#L1967
from littlefs.
What is your opinion is #962 something similar or totally different?
What are you using to read out such nice hex dumps (are you also using some mcu with an external flash)?
from littlefs.
What is your opinion is #962 something similar or totally different?
This looks like the same issue.
#959 is a bit of a red herring. The magic string can go missing, but the current driver works around this without erroring. It's more a spec/impl disagreement.
Ignoring #959, there is still an issue where superblock expansion can lead to filesystem corruption. Seemingly due to a rogue tail pointer commit. Nikola has also found a stack trace that suggests lfs_dir_splittingcompact
is suspicious.
from littlefs.
What are you using to read out such nice hex dumps (are you also using some mcu with an external flash)?
That would be a question for @rojer, fortunately they've been able to send the extracted binaries.
There may be some external tool that can read SPI flash chips, otherwise you could write a loop to print the disk a byte at a time over your JTAG/debugger. Note if you go over JTAG you should really have some form of error-detection, JTAG/SWD can be noisy. The easiest thing to do is send each block three times.
If you also happen to have an SD card or network interface, going that route may be easier.
from littlefs.
re: dumps - some come from reading device's flash, some were taken in a gdb session while examining crash dump. after that, it's hexdump -C
, nothing special.
@geky speaking of #959 - i'm somewhat concerned by its current open status, do you think there may be something wrong with it? or awaiting review? we are about to ship a production release with it.
from littlefs.
@geky speaking of #959 - i'm somewhat concerned by its current open status, do you think there may be something wrong with it? or awaiting review? we are about to ship a production release with it.
I was hoping to resolve the remaining issue in the same release. Or at least better understand what is going on.
#959 is in a weird place where it's technically a bug, but it doesn't actually break anything. And since the current behavior is already out there and in use, it doesn't seem like the highest priority to bring in.
That and it's a behavior change which affects the disk, so we need to be confident it is correct. We can't really take it back after releasing.
from littlefs.
Until/unless we find more info, I'll go ahead and merge #959
from littlefs.
Related Issues (20)
- lfs_remove doesn't have unlink/rmdir distinction HOT 4
- When are file attributes finally written to "disk"? HOT 2
- Block count defined twice HOT 1
- What is the "bd" folder? HOT 1
- why t1_buf is sample of t2_buf at sometime?
- Expanding superblock deletes File HOT 7
- '-fcallgraph-info=su' breaking builds on some platforms HOT 9
- All the data is lost after update from disk 2.0 to disk 2.1 HOT 3
- v2.7 : long delay in open/close for spi nor HOT 8
- Data Flash values after erase is "undefined" - Would LittleFS work? HOT 1
- Do we really need directory structures in meta data layer for simple structure variable read/write? HOT 1
- running into RAM limitations for freertos tasks HOT 1
- LittleFS Display Directory Content HOT 3
- File creation with full path in the name HOT 2
- Query: failing to find files after a soak test HOT 8
- Usage of lfs_dir_seek HOT 2
- File system corruption when creating directories HOT 2
- used and free memory HOT 2
- issue when append data to a file HOT 2
- Wrong printf specifiers
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from littlefs.