dlbeer / dhara Goto Github PK

NAND flash translation layer for low-memory systems

License: Other

Makefile 0.91% C 99.09%

dhara's Introduction

	   Dhara: NAND flash translation layer for small MCUs

		     Daniel Beer <[email protected]>
			       1 Apr 2017

Dhara is a small flash translation layer designed to be used in
resource-constrained systems for managing NAND flash. It provides a
mutable block interface with standard read and write operations. It has
the following additional features:

  * Perfect wear-levelling: the erase count of any two blocks will
    differ by at most 1.

  * Trim: logical sectors can be deleted to improve performance if their
    content is not required.

  * Data integrity: write() (and trim()) of logical sectors are
    atomic. If the power fails, the state rolls back to the last
    synchronization point. Synchronization points occur at regular
    intervals, but can also be reached on demand.

  * Real-time performance: all operations, including startup, are O(log
    n) worst case in the size of the chip, if bad blocks are uniformly
    distributed.

The implementation makes minimal assumptions regarding the underlying
NAND chip. It can be used with almost any chip available, and can take
advantage of extra hardware features where available. In particular:

  * No OOB data is consumed. All available OOB bytes can be spent on
    ECC.

  * It can take advantage of internally buffered copy operations, if the
    NAND chip supports these.

  * It can make use of NAND chip's on-board ECC, if available. If
    software ECC is required, implementations of Hamming and BCH codes
    are provided in the source distribution (see the ecc/ directory).

  * It does not require partial page programmability. However, if the
    functionality is present, then it can be taken advantage of by
    presenting a smaller pseudo-page size by dividing the real page size
    by the number of allowed reprogram operations. The only restriction
    is that partial writes must contain complete ECC information.

  * It will take advantage of the ability to do partial reads if this is
    possible. Reads must be error checked and corrected (this can
    usually be done in units smaller than the page size).

The implementation consists of the files in the dhara/ subdirectory,
plus the NAND layer implementation, which you must provide. The
top-level interface is the set of functions described in map.h (see the
comments in the header file for more details):

    init: initialize a map layer instance
    resume: scan the map and recover the saved state
    clear: delete all data
    capacity, size: obtain usage statistics
    find: obtain the physical location of a logical sector
    read: read a logical sector
    write: write a logical sector
    copy_page: copy a raw flash page to a logical sector
    copy_sector: copy one logical sector to another
    trim: remove a logical sector from the map
    sync: ensure that changes to the map are committed
    gc: manually trigger garbage collection

To provide the NAND layer, implement the set of functions described in
nand.h (see comments for details). In summary, you must provide the
following operations:

    is_bad: determine whether a block is bad
    mark_bad: mark a NAND block as bad
    erase: erase a NAND block
    prog: program a NAND page, including ECC and checksums
    is_free: determine whether a page is erased (unprogrammed)
    read: read a (possibly partial) NAND page, and attempt ECC if
      necessary
    copy: copy one page to another, using internal buffers if possible

Check the datasheet for your chip for information on these operations.
In most cases, the manufacturer will specify a preferred layout scheme
for the ECC and bad block markers in the OOB region. Pay particular
attention to the marker used for factory-marked bad blocks!

If your ECC scheme is such that programming an all-0xff page is
equivalent to a no-op, then it's ok for your implementation of is_free()
to simply check for all-0xff page content.

Note that bad blocks need only be queried one at a time. It's not
necessary to maintain a bad-block table -- just the standard OOB marking
scheme is fine, and preserves the performance guarantees of the map
layer.

Also note that when implementing partial read, you must read enough of
the page that you're able to apply ECC and check for uncorrectable
errors. Uncorrectable errors *must* be detected in order for the data
integrity guarantees to be valid. This may require the use of a checksum
in addition to ECC. If this is done, ensure that the checksum bytes are
also protected by ECC!

Implementations of two popular ECC mechanisms (Hamming and BCH) can be
found in the ecc/ subdirectory. Each implements ECC over variable-sized
chunks (256 or 512 bytes are typical sizes). Multiple ECC chunks may be
required per page.

dhara's People

Contributors

Stargazers

Watchers

Forkers

rgwan liuning587 xiongyu0523 hoangt yarivcol jaehyek ruben1992 gemcore qdlixiang fivezhao giangianoulas tomyqg charliexp easyvolts guaguagua sylvainbasset ddcien chenjiaqi gtsee nsq974487195 wyfzidane 19317362 sebhuet floatgate tomsim mcublog avs688 wonderhacker ucayalifish happydg linghu0060 aksonlyaks xwiron priyankarnd cbuilder mfkiiyd shenkegs vinely pengguoguo hixixtech lawesly ray148289 dattl xiezhihua001 aloebs29 zhao0116 jac0bwang voidwalker2012 cobooguo zoukai1988 uuzulei ruidian burnpanck brucesuen freddiechopin martinbra zhyolo hkt999 xiaoxiang781216 jtlkm jlewallen dogkiki qinyunti rui-rebelo mmatsnev bbrother liruohuai zbqxyz fark avykhovanets chuanglun hongshui3000 star2510965709 vdv18 sergeyladanov guitarhua chengshuihang david-sun-ch quqw ever-never eric613-chan accessv yifeian syl2019-he manmuqingshan banzhangzzw permabox duzelin quoc123123 icodein xiaojian987 gaitq youryanglin embedded-tech slowcorners woshigaoshou1 lin20121221 japhylu alidong talih0

dhara's Issues

Handling of ECC error

Good day @dlbeer,

While working with your dhara ftl for nand flash it was noticed that there is no ecc error handling, the dhara functions just return this error without any further action. Is it necessary to implement ECC error handling yourself? I saw a similar issue #24

Using a different GC ratio with an existing flash map

Thank you for the incredible library. It takes so many of the pain points out of using flash memory in embedded systems.

I'm finding myself in a position where I need a bit more control over GC. I've previously been using a GC ratio of 1, but may need to switch to manual GC. The dhara_map_init documentation states "You should always initialize the same chip with the same garbage collection ratio". Is there some wiggle room around that "should". It doesn't say "must", so I interpret that it "could" be done. If that is the case, what are the possible ramifications of init'ing a chip (previously init'ed with GC ratio = 1) with a GC ratio = 0 and using dhara_map_gc for manual collection?

DHARA+FATFS: boot sector is remapped without a copy

Hello,

I have faced with a problem which i can't solve for more than two weeks.
I am using dhara with FatFS file system and a nand flash.
I wrote a simple test which opens a file writes 64 bytes of data and closes it.
The test spins for about 1000-3000 times. The flash is large enough to hold all written data.
Everything works fine except that sometimes the fat fs function open file (f_open) crashes.
After that I can't mount the file system with f_mount function.
I have started digging into dhara and found more or less what is going on.
The f_mount function fails because it can't find the boot sector for the fat file system.
It is the zero sector which has 0xAA55 signature.
I added several logging functions into my test to track what happens with zero sector.
I have added an output log to dhara_nand_copy function.Also into dhara_nand_prog function to track all write to pages.
Also i check the validity of the zero sector reading it content through dhara_map_find function.
So my test sequence looks like the following.

format flash
open file
write 64bytes
close file
read zero sector and check that it has 0xAA55 signature
goto 2

So i see that on f_close function dhara starts copying pages. And sometimes i see that boot sector is copied also.
After that i see that boot sector is mapped to a new location.
Everything looks fine but when error happens the boot sector is mapped to a new location but I don't see that it was copied.
And after that my boot sector check function fails and file system is crashed.
But if i read the sector from using old location it is there.
So dhara remaps the boot sector but does not copy it to that location.
How that could happen? Maybe my configuration is incorrect.
What should I check?

Here is a log example:

--- File closed
--- BOOT SECTOR is mapped to page 31334
--- file opened
---write 64 bytes
---close file
--- -copy from ---- to ---- (A LOT OF OTHER PAGES ARE COPYED)
--- -copy from ---- to ----
--- -copy from ---- to ----
--- -copy from 31334 to 31484 (HERE BOOT SECTOR IS COPYED)
--- -copy from ---- to ----
--- File closed
--- BOOT SECTOR is mapped to page 31484 (HERE BOOT SECTOR IS MAPPED TO A NEW LOCATION)
--- file opened
---write 64 bytes
---close file
--- -copy from ---- to ---- (A LOT OF OTHER PAGES ARE COPYED)
--- -copy from ---- to ----
--- -copy from ---- to ----
--- -copy from ---- to ---- (BOOTSECTOR is not copyed here)
--- -copy from ---- to ----
--- -copy from ---- to ----
--- File closed
--- BOOT SECTOR is mapped to page 31498 (HERE BOOT SECTOR IS MAPPED TO A NEW LOCATION)
--- BOOT SECTOR is invalid

As you see from the last log the boot sector is at the new location but was not copied.
Although if i read the previous sector (31484) the boot sector will be there.
Why dhara maps a page but does not copy it?
Thanks,

Using dhara for nor and nand flash - some questions

Hi @dlbeer, thank you for providing dhara.

I am looking at using dhara for both nor and nand flashes as a "virtual block manager" with wear levelling. Dhara already provides excellent support for nand flashes but for nor flashes I am running into some issues:

Nor flashes are a lot smaller then nand flashes so it is wanted to keep the dhara_meta_size small, there are two possibilities to do that: reduce the radix depth and reduce the meta entry.

I have been succesfull at reducing the radix depth (is it correct that this can be seen as a measure of the amount of logical pages ?).

Reducing the size of the meta entry (is it correct that this can be seen as a measure of the amount of physical pages ?) seems to be more difficult, what would be the best way to reduce this to e.g. a uint16_t ?

Nor flashes have the ability to continue writing to a page, would it make sense to introduce a dhara_nand_rewrite() routine that tries to rewrite a page and if it fails to continue the present behaviour. For nand flashes this routine could just return an error.

Question on ECC depth/etc

Sorry - this is not really an issue, it is more of a documentation question/issue.

The BCH implementation, how many correctable bits does it support?

ie: Typically some NAND solutions require 8bits of correction.

That answer is not clear to me

Thanks.

Add checksum to meta?

Hi,
Is it possible to write checksum of the writted/read page to the meta block? Some memory chips does not contain extra space (or hardware option) for ecc. I see how to protect the journal with magic, just make ie crc16 into frist two bytes of the page_buf. But i'm not sure if possible to use meta for this purpose.

triming

Hello. I am currently using dhara and I am happy with it, thank you very much.
I understand that triming sectors is optional.
1 - Can you confirm that I can just ignore the trim function when I re-write a sector ?
2 - Am I going to experiment longer garbage collector phases if I don't trim the sectors as they become unused?
3 - Generally speaking, what are the consequences ?
I hope my question is clear.

dhara_journal_resume issue

I'm having an infrequent issue in dhara_journal_resume that occurs after power failure when epoch is 0xFF. During journal resume, find_root finds an empty page (all bytes are 0xFF so the epoch matches) which results in the journal being populated with invalid data.

It looks like find_root needs a check to make sure that it found a valid checkpoint to base the root on. I'm currently testing the following fix and it appears to have resolved the issue.

static int find_root(struct dhara_journal *j, dhara_page_t start,
		     dhara_error_t *err)
{
	const dhara_block_t blk = start >> j->nand->log2_ppb;
	int i = (start & ((1 << j->nand->log2_ppb) - 1)) >> j->log2_ppc;

	while (i >= 0) {
		const dhara_page_t p = (blk << j->nand->log2_ppb) +
			((i + 1) << j->log2_ppc) - 1;

		if (!dhara_nand_read(j->nand, p,
				     0, 1 << j->nand->log2_page_size,
						 j->page_buf, err) &&
+                   (hdr_has_magic(j->page_buf)) &&
		    (hdr_get_epoch(j->page_buf) == j->epoch)) {
			j->root = p - 1;
			return 0;
		}

		i--;
	}

	dhara_set_error(err, DHARA_E_TOO_BAD);
	return -1;
}

dhara_map_capacity returns inconsistent values

Hi there,

We are encountering a mysterious bug when writing a big file ~40mb to the NAND using DHARA in combination with reliance-edge API.

Reliance edge works by writing the Sector Count in the first sector( which subsequently becomes read only) when formatting. This value must be consistent as it is checked at runtime by calling the function dhara_map_capacity(). We notice that after we write a file bigger than ~40mb the value returned by dhara_map_capacity() changes. This prevents us to mount FS again because the current sector count diverges from the one written in the first sector. It is interesting to point out that the value returned by dhara it's bigger than the previous one and after we format the NAND again we don't encounter the bug anymore (dhara_map_capacity() value remains consistent even after multiple writes )

We replicates the bug in the following way:
Erase the NAND completely
Format the NAND - (RelienceEdge red_format)
dhara_map_capacity returns : 110313 (this value is then written in the first sector)
Write file ~40mb (RelienceEdge red_write)
dhara_map_capacity returns : 112073
(RelienceEdge red_mount) then fails because 110313 != 112073
Format the NAND again
dhara_map_capacity returns : 112073 and after that the value doesn't change anymore.

It would be great if you any idea/suggestion on this! Thanks :) Apologies if it looks like a generic question but due to the nature of our work I can't share too many details.

Expected overheads / available capacity

Hi,

Hopefully quick question on how available capacity is calculated / may be maximised.

I've got dhara working with a 2Gb Micron MT29F, appears to be working well. API's were straight forward to implement, documentation helped get me going real quick (great job). It stores and retrieves data, looking good.

Micron's flash has 64 x 2KB pages per block and 2048 total blocks. It's ECC is such that each page can be written 4 times. I've expressed this to dhara as pages are 512 bytes, with 256 pages per block.

GC ratio is set the example from the readme (4).

After initialisation dhara_map_capacity reports 307456 sectors. Which at 512 bytes per sector = 150.125MB

Dhara is therefore using 41.3% of the raw 256MB NAND capacity for its own internal purposes.

Nosing around in the code I saw that I could increase the GC ratio, which would result in a reduction of reserved capacity.

Presumably this would result in an increase in the number of garbage collection cycles? Presumably slowing writing as GC is performed more often.

Is there anything else I could do to increase usable capacity? I couldn't see many other levers to pull.

Thanks,

Phil

Metadata page refresh

After detecting a certain bit-flip threshold (e.g.: 4 bit flips for 8 bit-flip tolerant ecc scheme) when reading data, it is a good idea to remap (refresh) a page. For sectors, this can be done by manually following a dhara_map_read() operation by a dhara_map_write() to the same sector. Is there a way to handle this condition when the read page is a metadata page?

Does not work reliably with pages containing only FF

It is not clear to me how to implement dhara_nand_is_free() with an actual flash device. Checking if the page contains all FF seemed to be what was required but that doesn't work reliably when the map is resumed.

Especially there is no way to distinguish between an erased page and a used page containing 0xFF.

If any number of page were written with all 0xFF then can cause find_last_group() and find_head() to initialize the journal with a page that is already used, the next write would then corrupt the data. The pages may be at the end, or in the middle of something.

What would be the correct way to implement dhara_nand_free?

`dhara_journal_enqueue` can fail with `DHARA_E_NONE`

If dhara_journal_enqueue is called with a NULL data pointer, this logic can fall through to line 840 with my_err still set to DHARA_E_NONE:

		if (!(prepare_head(j, &my_err) ||
		      (data && dhara_nand_prog(j->nand, j->head, data,
					       &my_err))))
			return push_meta(j, meta, err);

		if (recover_from(j, my_err, err) < 0)
			return -1;

This causes it to pass DHARA_E_NONE into recover_from, which will treat it as a failure due to this test. The net effect is that dhara_journal_enqueue will return -1, but the *err value will be DHARA_E_NONE.

Will it be feasible to use little file system(LFS) along with the use of dhara file translation layer(FTL) on Micron NAND flash MT29F4G08AB

So, we are trying to upgrade our existing Micron MT29 NAND flash from 2Gb to 4Gb on Atmel controller. The existing file system used is FAT32. But now we are in an need to change the file system to little file system(lfs). Will it be feasible to use this file system along with the file translation layer from dhara. I admire with @hchaudhary1 and @geky for thier insightful conversation.

Performance Tuning

Hello! Thank you for publishing Dhara.

I have been playing with it and wanted to share some performance numbers. I am using a NAND Flash chip on an embedded system. When I use my raw nand driver, I get the following numbers writing and reading full pages. (This output is from my UART shell:)

# nand_write_read

..............................
Write 3840.0 KB at 986.988 KB/s
..............................
Read 3840.0 KB at 1998.049 KB/s
Done. Marked 0 this run, 0 total
#

However, when I run a Dhara write/read loop on a freshly erased NAND (with a brand new map/journal), I get these numbers:

# dhara_write_read

................................
Write 4000.0 KB at 620.230 KB/s
................................
Read 4000.0 KB at 271.348 KB/s
#

I was surprised that reads are slower than writes. Dhara sector reads are 86% slower than raw NAND page reads. Dhara sector writes are 37% slower than raw NAND page writes.

Through logging, I can see that this is because of two reasons:

Dhara walks the Radix tree from the root for every sector lookup
Dhara does a partial page read each time it walks a node of the tree

Dhara walks the Radix tree from the root for every sector lookup

Here is the comment at the top of trace_path():

/* Trace the path from the root to the given sector, emitting
 * alt-pointers and alt-full bits in the given metadata buffer.

Since the vast majority of sector accesses are sequential, meaning the next target sector will be sector+1 (since file data is read or written in big streams), it seems like there could be an easy optimization to not search the entire tree every time. Perhaps it could remember the last sector found, and then start searching at the sibling node of the Radix tree. I looked at the loop but did not see an obvious way to implement that.

Dhara does a partial page read each time it walks a node of the tree

Here are some logs showing Dhara's partial page reads of metadata as it walks the tree. Notice that it does ~10 small metadata reads of len 132 bytes each, before it has the page address and can read the full page of len 2048 bytes (from offset 0) that the user is seeking.

E (11422) nand: nand_read_page: blk 18, pg 15, pg_addr 0x48F, pg_offset 548, p_data 0x20009354, len 132
E (11423) nand: nand_read_page: blk 9, pg 47, pg_addr 0x26F, pg_offset 284, p_data 0x20009354, len 132
E (11424) nand: nand_read_page: blk 5, pg 31, pg_addr 0x15F, pg_offset 152, p_data 0x20009354, len 132
E (11425) nand: nand_read_page: blk 3, pg 15, pg_addr 0xCF, pg_offset 1076, p_data 0x20009354, len 132
E (11426) nand: nand_read_page: blk 2, pg 15, pg_addr 0x8F, pg_offset 548, p_data 0x20009354, len 132
E (11427) nand: nand_read_page: blk 1, pg 47, pg_addr 0x6F, pg_offset 284, p_data 0x20009354, len 132
E (11428) nand: nand_read_page: blk 1, pg 31, pg_addr 0x5F, pg_offset 152, p_data 0x20009354, len 132
E (11429) nand: nand_read_page: blk 1, pg 15, pg_addr 0x4F, pg_offset 1076, p_data 0x20009354, len 132
E (11430) nand: nand_read_page: blk 1, pg 15, pg_addr 0x4F, pg_offset 548, p_data 0x20009354, len 132
E (11431) nand: nand_read_page: blk 1, pg 15, pg_addr 0x4F, pg_offset 284, p_data 0x20009354, len 132
E (11432) nand: nand_read_page: blk 1, pg 15, pg_addr 0x4F, pg_offset 152, p_data 0x20009354, len 132
E (11433) nand: nand_read_page: blk 1, pg 1, pg_addr 0x41, pg_offset 0, p_data 0x2000d1c0, len 2048
.E (11435) nand: nand_read_page: blk 18, pg 15, pg_addr 0x48F, pg_offset 548, p_data 0x20009354, len 132
E (11436) nand: nand_read_page: blk 9, pg 47, pg_addr 0x26F, pg_offset 284, p_data 0x20009354, len 132
E (11437) nand: nand_read_page: blk 5, pg 31, pg_addr 0x15F, pg_offset 152, p_data 0x20009354, len 132
E (11438) nand: nand_read_page: blk 3, pg 15, pg_addr 0xCF, pg_offset 1076, p_data 0x20009354, len 132
E (11439) nand: nand_read_page: blk 2, pg 15, pg_addr 0x8F, pg_offset 548, p_data 0x20009354, len 132
E (11440) nand: nand_read_page: blk 1, pg 47, pg_addr 0x6F, pg_offset 284, p_data 0x20009354, len 132
E (11441) nand: nand_read_page: blk 1, pg 31, pg_addr 0x5F, pg_offset 152, p_data 0x20009354, len 132
E (11442) nand: nand_read_page: blk 1, pg 15, pg_addr 0x4F, pg_offset 1076, p_data 0x20009354, len 132
E (11443) nand: nand_read_page: blk 1, pg 15, pg_addr 0x4F, pg_offset 548, p_data 0x20009354, len 132
E (11444) nand: nand_read_page: blk 1, pg 15, pg_addr 0x4F, pg_offset 284, p_data 0x20009354, len 132
E (11445) nand: nand_read_page: blk 1, pg 2, pg_addr 0x42, pg_offset 0, p_data 0x2000d1c0, len 2048
...etc...

The large number of semi-random accesses to different blocks and pages is the reason for Dhara's read slowness on my system. There is a considerable amount of overhead to kick off a transaction -- a 132 byte metadata read takes just about as long as a full 2048 byte full page read, due to bus and DMA setup and kickoff.

It seems like there is an opportunity to cache and avoid all these NAND I/O calls. Either by caching ten metadata entries (since it seems to follow mostly the same tree path in sequential block reads, and they are only 132 bytes each), or by having one or two full-page caches at the driver level (so that, for example, all those reads to block 1, page 15 could be reduced to just one full-page read, and then the various offset reads for metadata could come from RAM).

This is just for your information only. Dhara's performance already meets my system requirements, so I don't plan to implement any optimizations. But it seems like there is some low-hanging fruit that could result in significant performance improvements.

Thanks again!

prepare_write count check

Hi,

In the function prepare_write() the expression m->count >= dhara_map_capacity(m) stands before checking trace_path(m, dst, NULL, meta, &my_err). And in the case when map count is equal to map capacity returns error DHARA_E_MAP_FULL. But if page is exists in trace then count should not be incremented and the check is wrong. If I'm understand it right then the check should be only before incrementing m->count++;

gc fails if checkpoint page is corrupted

Checkpoint page data might be corrupted (data is incomplete) due to power loss. If it happens, the related pages cannot be collected as garbage any more as meta data read fails. Does it work as design?

Can dhara be detected by linux as a device file?

Hello,

I am working with a micron MT29FABABA flash and I would like to add files directly to it. In order to do this I wanted to mount the device somewhere and just copy the files to it (as with an USB key), but before, I should recognize it as a device file (I am in linux so make it visible in the /dev directory).

Is it possible to do this with dhara?

If not, is there any known way to make dhara visible to a file system and add files to the flash directly?

Can dhara be used for NOR flash?

This is a question, not an issue report (;

Can dhara be used with no issues on a typical serial NOR flash? I have no experience with NAND and just basic knowledge about this technology, but I understand that the differences would basically boil down to these:

NOR has no ECC, as it generally doesn't need it, it also has no OOB, if some cells become damaged, then the chip is most likely dead anyway,
serial NOR is much smaller (like a few MB usually),
the size of erasable block is smaller (usually 4 kB - 64 kB),
the size of programmable block is smaller (256/512 B).

I understand that first point makes it impossible to check for read errors and to mark pages as "bad" - both of these are probably not needed, assuming that dhara checks for data consistency (for example to recover after power failure) in some other way. For such a memory I would therefore implement dhara_nand_is_bad() to always return "block is perfectly OK" with no checks, and dhara_nand_mark_bad() to be a no-op (or even an assertion, as this should never be called anyway).

Is this a good or a bad idea to use dhara on such chip? Or maybe it just needs some special handling somewhere?

Side question, assuming that I could use it. In most (all?) NOR flashes you can program individual bytes of a page in separate operations with no issues. I guess that setting log2_page_size to 1 is not such a great idea, but how low can this be set and is there a "sweet spot"? Could I set it just to the typical 256/512 bytes, or maybe a better option would be to set it higher or lower?

Thanks in advance!

journal.c: cp_free does an illegal dhara_nand_is_free if on last group and none of the pages are free

dhara/dhara/journal.c

Line 396 in 6cc5736

} while (!cp_free(j, j->head));

We have seen a situation where we get failed READPAGE command with IpCommandSequenceErrorFlagSet. This is the log:

[ 3.350000] w25n_transfer: Spi command failed with error: 7002
[ 3.350000] w25n_load_nand_buffer: Failed to load page at 0x10000804: 3
[ 3.360000] w25n_nand_read: Failed to load nand buffer with data from 0x10000804.
[ 3.360000] dhara_nand_is_free: Failed to check if page 65536 is free: 3

This basically means that we get an SPI error caused by a read outside the configured memory area. This is our dhara config:
`
/* W25N01GW (1Gb / 128 MB) memory capacity */

/* Log2 sizes defined for Dhara */

#define W25N01GW_PAGE_SIZE_LOG2 (11)

#define W25N01GW_PAGE_SIZE (1 << W25N01GW_PAGE_SIZE_LOG2)

#define W25N01GW_BLOCK_COUNT (1024)

#define W25N01GW_PAGES_PER_BLOCK_LOG2 (6)

#define W25N01GW_PAGES_PER_BLOCK (1 << W25N01GW_PAGES_PER_BLOCK_LOG2)

#define W25N01GW_BLOCK_SIZE (W25N01GW_PAGES_PER_BLOCK * W25N01GW_PAGE_SIZE)
`

To me it seems that if we for some reason end up in the last checkpoint group and that group have no free pages we will keep looking "over the edge", since we start by skipping one page to next user page before the call to cp_free.

Can you help me?

Issues with mapping on a empty flash chip

Hi,

I'm attempting to port this code into a embedded system with an initially clean flash chip (all sectors 0xFF).

Calling init followed by resume does not create a map file written to the chip (verified by looking at my SPI trace to the chip).
Internal Map file set to:
NandMap 0x2000db10 56 dhara_map
journal 0x2000db10 48 dhara_journal
nand 0x0003f1f8 0x2000db10 4 const dhara_nand*
page_buf 0x2000cb10 0x2000db14 4 uint8_t*
log2_ppc 0x04 0x2000db18 1 unsigned char
epoch 0x00 0x2000db19 1 unsigned char
flags 0x00 0x2000db1a 1 unsigned char
bb_current 0x00000000 0x2000db1c 4 unsigned int
bb_last 0x0000003f 0x2000db20 4 unsigned int
tail_sync 0x00000000 0x2000db24 4 unsigned int
tail 0x00000000 0x2000db28 4 unsigned int
head 0x00000000 0x2000db2c 4 unsigned int
root 0xffffffff 0x2000db30 4 unsigned int
recover_next 0xffffffff 0x2000db34 4 unsigned int
recover_root 0xffffffff 0x2000db38 4 unsigned int
recover_meta 0xffffffff 0x2000db3c 4 unsigned int
gc_ratio 0x01 0x2000db40 1 unsigned char
count 0x00000000 0x2000db44 4 unsigned int

The first time I write to the flash via dhara_map_write(...)
the internal map file changes to
NandMap 0x2000db10 56 dhara_map
journal 0x2000db10 48 dhara_journal
nand 0x0003f1f8 0x2000db10 4 const dhara_nand*
page_buf 0x2000cb10 0x2000db14 4 uint8_t*
log2_ppc 0x04 0x2000db18 1 unsigned char
epoch 0x00 0x2000db19 1 unsigned char
flags 0x01 0x2000db1a 1 unsigned char
bb_current 0x00000000 0x2000db1c 4 unsigned int
bb_last 0x0000003f 0x2000db20 4 unsigned int
tail_sync 0x00000000 0x2000db24 4 unsigned int
tail 0x00000000 0x2000db28 4 unsigned int
head 0x00000001 0x2000db2c 4 unsigned int
root 0x00000000 0x2000db30 4 unsigned int
recover_next 0xffffffff 0x2000db34 4 unsigned int
recover_root 0xffffffff 0x2000db38 4 unsigned int
recover_meta 0xffffffff 0x2000db3c 4 unsigned int
gc_ratio 0x01 0x2000db40 1 unsigned char
count 0x00000001 0x2000db44 4 unsigned int

At this point if I do not call sync, no map file is saved in flash and i can not access the data over reboots (however I can find and access the data before reboot).
If I call sync the map file changes to the following and is written to flash

NandMap 0x2000db10 56 dhara_map
journal 0x2000db10 48 dhara_journal
nand 0x0003f1f8 0x2000db10 4 const dhara_nand*
page_buf 0x2000cb10 0x2000db14 4 uint8_t*
log2_ppc 0x04 0x2000db18 1 unsigned char
epoch 0x00 0x2000db19 1 unsigned char
flags 0x00 0x2000db1a 1 unsigned char
bb_current 0x00000000 0x2000db1c 4 unsigned int
bb_last 0x0000003f 0x2000db20 4 unsigned int
tail_sync 0x0000000e 0x2000db24 4 unsigned int
tail 0x0000000e 0x2000db28 4 unsigned int
head 0x00000010 0x2000db2c 4 unsigned int
root 0x0000000e 0x2000db30 4 unsigned int
recover_next 0xffffffff 0x2000db34 4 unsigned int
recover_root 0xffffffff 0x2000db38 4 unsigned int
recover_meta 0xffffffff 0x2000db3c 4 unsigned int
gc_ratio 0x01 0x2000db40 1 unsigned char
count 0x00000001 0x2000db44 4 unsigned int

At this point if I call read into the same logical sector 0, dhara_map_find returns with DHARA_ERROR_NOT_FOUND. So the map appears to be losing the data immediately.

I feel like i may be missing an important step. is there any documentation or examples or tutorials for accessing Dhara? Any ideas as to whats going wrong? I've been stuck on this problem for a few days now.

Thanks so much for your help and the awesome FTL.
Nate

tests/jfill failure

Hello!
I added a call to srandom(time(NULL)) into sim_reset and ran "while tests/jfill.test; do :; done"
After a while:
jfill.test: tests/jtutil.c:61: jt_check: Assertion `root_offset < raw_size' failed.

At the time of the failure root is before tail: root 862, tail 864, head 872
It seems that the error recovery in journal_enqueue sometimes leaves a largish gap between root and head, so that journal_peek/dequeue get confused.

Example Programs for each feature

This is a great code.
I don't see too much documentation of the API. Is there an example code for each feature that is available so that one can learn the API?

I do see the many test codes (perhaps these are the example codes?), but I am not sure the purpose of each of those, and where and when they apply.

Thanks

DHARA_E_MAP_FULL occurs

After using dhara_map_write() as layer for FatFs the m->count become full. It can be decremented only with dhara_map_trim(). How should it used?

Resuming a cleared map

Resuming a cleared and synced map does not seem to produce the expected behavior.
I use the pattern (clear the map, then sync it) to perform a quick format of the flash device.
After playing with the simulator, it looks like there is no reset of the journal size cookie after a call to dhara_map_clear followed by dhara_map_sync. This implies that if there is a power failure after the call to dhara_map_sync, a dhara_map_resume will produce a map having the wrong number of allocated sectors. I found 3 solutions to this problem:

Perform a full format of the device by erasing all blocks. This option is costly.
Do a dummy write to any sector between map clear and sync. The resumed map will have 1 sector allocated.
Set the journal size cookie when calling dhara_map_clear

The third option is my preferred solution:

 void dhara_map_clear(struct dhara_map *m)
 {
 	if (m->count) {
 		m->count = 0;
+		ck_set_count(dhara_journal_cookie(&m->journal), m->count);
 		dhara_journal_clear(&m->journal);
 	}
 }

Map Sync Time

We are implementing a time-sensitive logging application where we store periodic sensor data to multiple files. Our implementation connects FatFSChan 0.14b with Dhara. Our Flash is a 4Gb NAND with 4k page size. Generally the system performs well and is robust. Each 1 minute, we perform an f_sync to minimize risk of data loss. We are storing data to at least 6 files, and the files say open for the duration of the logging session. After a period of time (2 hours for example), we would occasionally experience an f_sync execution duration greater than 1 second. Our system does not have sufficient memory to buffer sensor data resulting an loss of data from a sensor data buffer overflow.

Our analysis of the problem shows that garbage collection during dhara_map_sync in the while (!dhara_journal_is_clean(&m->journal) loop is the primary cause of the system delay. Our mitigation is to exit the while loop if the execution time exceeds our defined performance tolerance. We only enable the restriction on execution time when logging.

Is there unforeseen risk to the integrity of the journal with this approach?

//returns 0=success, -1=fail
int dhara_map_sync(struct dhara_map *m, dhara_error_t *err)
{
#if LIMITED_SYNC_EXECUTION_TIME
    TickType_t start_time = xTaskGetTickCount();
#endif
	while (!dhara_journal_is_clean(&m->journal)) {
		
#if LIMITED_SYNC_EXECUTION_TIME
	    // Exit early if we are spending too much time here.
		if (system_is_logging()){
			if ((xTaskGetTickCount() - start_time) > MAX_MAP_SYNC_MSEC) {
	
				*err = MAP_SYNC_TIMEOUT;
				return -1;
			}
		}
#endif
		
		
		dhara_page_t p = dhara_journal_peek(&m->journal);
		dhara_error_t my_err;
		int ret;

		if (p == DHARA_PAGE_NONE) {
			ret = pad_queue(m, &my_err);
		} else {
			ret = raw_gc(m, p, &my_err);
			if (!ret)
				dhara_journal_dequeue(&m->journal);
		}

		if ((ret < 0) && (try_recover(m, my_err, err) < 0))
			return -1;
	}

	return 0;
}

Map Sector Confusion

I feel I'm perhaps misunderstanding something in regards to how to implement the required functions correctly.

I have a simple test that attempts to write the first 4 sectors and then checks to see what they are mapped to (sector/page wise in FLASH).

The results seem incorrect. (I'm sure with my nand.c functions or in my FLASH driver.)

I currently have the FLASH being erased on each boot just to ensure a clean slate.

The FLASH NAND chip that is being used is the W25N01GV by Winbond.
The CPU is an STM32F437VG running FreeRTOS.
All other tasks were disabled for this testing.

My log2_page_size is set to 9 (512 byte sectors) with log2_ppb set to 8 or 256x 512byte sectors per block (64 2KB pages per block).

My original implementation had the page size set to the full 2kb page which gave 15 user pages then 1 checkblock. I thought that was perhaps too large so I moved to using the 512 byte (subpage) sectors. The exact same thing occurs in both cases although the smaller log2_page_size is easier to troubleshoot it appears.

The results (output) that I'm seeing is as follows:

================ Block Device Init (Dhara FTL) ================
Dhara FTL - Map Init
FTL - Map Init Complete. Resuming...
NAND Flash Read - Sector: 3 - Page/SubSector: 0/3 - Offset: 0 - Length: 512
NAND Flash Read - Sector: 259 - Page/SubSector: 64/3 - Offset: 0 - Length: 512
NAND Flash Read - Sector: 515 - Page/SubSector: 128/3 - Offset: 0 - Length: 512
NAND Flash Read - Sector: 771 - Page/SubSector: 192/3 - Offset: 0 - Length: 512
NAND Flash Read - Sector: 1027 - Page/SubSector: 256/3 - Offset: 0 - Length: 512
NAND Flash Read - Sector: 1283 - Page/SubSector: 320/3 - Offset: 0 - Length: 512
NAND Flash Read - Sector: 1539 - Page/SubSector: 384/3 - Offset: 0 - Length: 512
NAND Flash Read - Sector: 1795 - Page/SubSector: 448/3 - Offset: 0 - Length: 512
Map Initialized. Could not find existing journal. head = 0
Error initializing journal: Too many bad blocks
Map Capacity: 152628
Initial Sync...
Map Count: 0
================ Write Operation ================
Attempting map write to sector 0...
NAND Flash Program - Sector: 0 - Page/SubSector: 0/0)
Successful map write to sector 0.
Attempting map write to sector 1...
NAND Flash Program - Sector: 1 - Page/SubSector: 0/1)
Successful map write to sector 1.
Attempting map write to sector 2...
NAND Flash Program - Sector: 2 - Page/SubSector: 0/2)
NAND Flash Program - Sector: 3 - Page/SubSector: 0/3)
Successful map write to sector 2.
Attempting map write to sector 3...
NAND Flash Read - Sector: 3 - Page/SubSector: 0/3 - Offset: 284 - Length: 132
NAND Flash Read - Sector: 3 - Page/SubSector: 0/3 - Offset: 20 - Length: 132
NAND Flash Read - Sector: 3 - Page/SubSector: 0/3 - Offset: 20 - Length: 132
NAND Flash Program - Sector: 4 - Page/SubSector: 1/0)
Successful map write to sector 3.
================ Sector Scan ================
NAND Flash Read - Sector: 3 - Page/SubSector: 0/3 - Offset: 284 - Length: 132
Logical sector 0 mapped to physical sector 2
NAND Flash Read - Sector: 3 - Page/SubSector: 0/3 - Offset: 284 - Length: 132
NAND Flash Read - Sector: 3 - Page/SubSector: 0/3 - Offset: 20 - Length: 132
Logical sector 1 mapped to physical sector 0
NAND Flash Read - Sector: 3 - Page/SubSector: 0/3 - Offset: 20 - Length: 132
Logical sector 2 mapped to physical sector 0
Logical sector 3 mapped to physical sector 4

As you can see the mapping is
0 -> 2
1 -> 0
2 -> 0
3 -> 4

when it should be
0 -> 0
1 -> 1
2 -> 2
3 -> 4

(with physical sector 3 being the checkpoint block for the first 3 sectors)

When I test my underlying NAND Flash driver I can write to specific pages and recall all the data so at least my reading and writing are correct.

I had an early implementation that was using some OOB data for indicating if the sector (512 byte subset of 2KB page) but I disabled this functionality for a non-0xFF check based is_free to eliminate any possible issues with the custom OOB sector marking.

The relevant code for testing this in my task is as follows:

GC_RATIO = 4

debug_print("================ Block Device Init (Dhara FTL) ================\n");
debug_print("Dhara FTL - Map Init\n");
dhara_map_init(&BlockDevice.map, &BlockDevice.nand, (uint8_t*)&pageBuffer, GC_RATIO);
debug_print("FTL - Map Init Complete. Resuming...\n");
dhara_error_t resume_err = DHARA_E_NONE;
uint8_t map_initialized = dhara_map_resume(&BlockDevice.map, &resume_err);
if (map_initialized)
{
	debug_print("Map Initialized. Could not find existing journal. head = %lu\n", BlockDevice.map.journal.head);
	if (resume_err != DHARA_E_NONE)
		debug_print("\tError initializing journal: %s\n", dhara_strerror(resume_err));
}
else
{
	debug_print("Journal Resumed. head = %lu\n", BlockDevice.map.journal.head);
}
debug_print("Map Capacity: %lu\n", dhara_map_capacity(&BlockDevice.map));	
	
debug_print("Initial Sync...\n");
dhara_error_t sync_err = DHARA_E_NONE;	
dhara_map_sync(&BlockDevice.map, &sync_err);
if (sync_err != DHARA_E_NONE)
	debug_print("Error syncing map: %s\n", dhara_strerror(resume_err));
debug_print("Map Count: %lu\n", BlockDevice.map.count);

debug_print("================ Write Operation ================\n");
for (uint32_t curSector = START_SECTOR; curSector < START_SECTOR + SECTORS_TO_WRITE; curSector++)
{
	debug_print("Attempting map write to sector %lu...\n", curSector);
	pageToSave.Sectors[0][4] = curSector;	
	if (!dhara_map_write(&BlockDevice.map, curSector, (uint8_t*)&pageToSave, &writeError))
	{
		if (writeError != DHARA_E_NONE)
			debug_print("Error writing map sector %lu: %s\n", curSector, dhara_strerror(writeError));
		else
			debug_print("Successful map write to sector %lu.\n", curSector);
	}
	else
		debug_print("Failed to write map: %s\n", dhara_strerror(writeError));
}

dhara_page_t flashPage = 0;
dhara_error_t findError = DHARA_E_NONE;
debug_print("================ Sector Scan ================\n");
for (uint32_t curSector = START_SECTOR; curSector < START_SECTOR + SECTORS_TO_WRITE; curSector++)
{
	if (!dhara_map_find(&BlockDevice.map, curSector, &flashPage, &findError))
		debug_print("Logical sector %lu mapped to physical sector %lu\n", curSector, (uint32_t)flashPage);		
	else
		debug_print("map find failed to find logical sector %lu - Error: %s\n", curSector, dhara_strerror(findError));
}

If there is anything else I need to provide, please let me know.

Thank you so much for your work on this project and any help you can provide to improve my understanding of what I might be doing wrong!

prepare_write behavior

Hi,

Why inside the prepare_write() function the auto_gc() check returns 0 in case of error? Look at code section:

static int prepare_write(struct dhara_map *m, dhara_sector_t dst, uint8_t *meta, dhara_error_t *err)
{
    dhara_error_t my_err;

    if (auto_gc(m, err) < 0)
        return 0; <<< Why this is zero and not -1? 

    if (m->count >= dhara_map_capacity(m))
    {
        dhara_set_error(err, DHARA_E_MAP_FULL);
        return -1;
    }

    if (trace_path(m, dst, NULL, meta, &my_err) < 0) 
    {
        if (my_err != DHARA_E_NOT_FOUND) 
        {
            dhara_set_error(err, my_err);
            return -1;
        }        

        m->count++;
    }

    ck_set_count(dhara_journal_cookie(&m->journal), m->count);
    return 0;
}

`try_delete` sets an alt pointer twice

I believe this assignment is superfluous, given the previous assignment a few lines above.

When the bad block appears in the read ECC error, dhara cannot be processed

@dlbeer

when bad block appears in the read ecc error ,dhara cannot handle.But this is where many nand flash bad blocks appear to be characterized

Continually write to one sector causes error "Journal is full"

Continually write to a sector causes error "Journal is full". I called GC, but it doesn't help.

example

Hello, I am a beginner, is there a basic usage example? In the test folder, I can’t tell which is the basic usage example

Garbage Collection Instrumentation

When using manual GC (using a gc ratio of 0 and calling dhara_map_gc()), is there any way to determine whether/how-much garbage exists? If not, is there a relatively simple way to add that functionality?

related to #27

Very aggressive garbage collection --- a configuration problem?

I'm using Dhara mostly successfully as the FTL layer on top of a Raspberry Pi Pico (see http://cowlark.com/2021-02-16-fuzix-pi-pico). The front end is using a traditional Unix filesystem; the filesystem block size is the same as the Dhara page size of 512 bytes, and the erase size is 4096 bytes. The actual implementation is at https://github.com/davidgiven/FUZIX/blob/rpipico/Kernel/platform-rpipico/devflash.c.

I've added trim support to the filesystem, so that it notifies Dhara when blocks are no longer in use by the filesystem. My understanding is that this helps Dhara do a better job of garbage collection, and that it gets very unhappy if it doesn't have a pool of unused blocks which it can write to. However, no matter what I do, every operation seems to result in substantial numbers of copies: for example, after deleting a 31kB file I can see the filesystem trim the 62 blocks which contained the file data --- but in the process Dhara has used the copy callback 1500 times! That's over half the data in the filesystem, and of course in order to copy those blocks it's had to erase them first...

Do you have any idea what might be happening here, and how I can stop it?

Using Dhara in Zephyr

Hello,

I am thinking of using Dhara in the Zephyr RTOS for an external NAND flash (Winbond W25M02GW). Are you aware if anyone has already done this? In case I have to do this myself, I presume the best way would be to use Dhara as the translation layer of the FAT_fs file system which is already a part of Zephyr. Is this correct? Or would it be better to write a new file system wrapper that directly invokes Dhara? Btw., I have tried using Little_FS, which also supports NAND and which is already a part of Zephyr, but I find it too slow for my application, so I am looking at alternatives. That is why I am looking at Dhara. Btw, I have also created a file system wrapper for Zephyr for FLOGFS, another NAND file system, and that is fast enough for my needs, but that has another set of problems (weak handling of extreme cases)
thanks
regards

binary license attribution?

Hello,

very cool project. If I was to use this in a microcontroller without a user interface, how should I attribute the project? Do you require attribution for binary releases?

Thanks again!

dlbeer / dhara Goto Github PK

dhara's Introduction

dhara's People

Contributors

Stargazers

Watchers

Forkers

dhara's Issues

Dhara walks the Radix tree from the root for every sector lookup

Dhara does a partial page read each time it walks a node of the tree

Recommend Projects

Recommend Topics

Recommend Org