Giter VIP home page Giter VIP logo

Comments (10)

utzig avatar utzig commented on July 22, 2024 1

I was able with minor changes to perform the decryption when copying image data from the scratch area to the primary slot instead, and it seems to work fine. I haven't performed extensive testing though, so I might have missed something. Is there any good reason explaining why MCUboot is decrypting when writing to the scratch area?

I don't remember the exact reason, but have you tested your implementation on the simulator? What I have in mind is that there was one specific step, either the last or first sector swap that contained information that needs to be known in plain text to resume an interrupted swap, and one of those sectors might be in the scratch area when an interrupt happens. It could also be the whatever the reason was, it is not an issue anymore.

This would be especially valuable for MCUs having large internal flash memory sectors, like e.g. the STM32F413, which has 128 KiB flash sectors.

It would not be hard to update the swap with move algorithm so that it performs the moves and swaps using 128KB, which would also work for the external flash. The main issue would be that you need the least 256KB free, one for the status, and another for the spare sector where the first move is done. Using swap with scratch even the last sector can be used as long as there is enough space for status.

No-one actually knows, the issue has just occurred randomly on devices, only been seen on zephyr as far as I'm aware.

JUUL vaping devices have been running swap with scratch, although they use Mynewt. I have no idea how many devices they have on the field, but there's no lack of people using vaping devices! If there was a bug, I find it very hard to believe even statistically that they would not have found it. Anyone upgrading millions of devices should bump into every last bug there is. That said, there might be something Zephyr specific. The swap algorithms themselves are very deterministic for a given slot/scratch size and image size combination, so even someone with rudimentary testing skills should be able to find a way to consistently reproduce this bug, once found! Since that never happened, things which might have some impact, are flash driver specifics, like writes that return success when the write failed, probably more so for external (Q)SPI flashes, or something like multitasking being enabled and some other task/thread interfering with the swap. MCUboot was designed to run as a sole application without any other extraneous stuff happening.

All that said, swap with move is a much better algorithm if the limitations are removed. Btw, when I fist added it with PR #589 it was broken for more than 3 years until the issue was found and I fixed it with #1597. This was not a bug that "could" happen, but an issue that always happened as long as a reset happened at the right time. That it was broken for 3 years all the while people kept praising it, still makes me chuckle. And also a testament for my lack of testing this algorithm at the time, but at least I did a lot more testing after #1597 and it looked solid afterwards! ;-)

from mcuboot.

taltenbach avatar taltenbach commented on July 22, 2024 1

I don't remember the exact reason, but have you tested your implementation on the simulator? What I have in mind is that there was one specific step, either the last or first sector swap that contained information that needs to be known in plain text to resume an interrupted swap, and one of those sectors might be in the scratch area when an interrupt happens. It could also be the whatever the reason was, it is not an issue anymore.

My point is that only the payload of MCUboot images is encrypted, i.e. all metadata added for MCUboot (header, TLVs and trailer) are non-encrypted. That means unless MCUboot depends on data within the raw firmware image during the swap (which is not the case unless I'm mistaken), having the image in plain text in the scratch area shouldn't be required. However, I didn't tested on the simulator, I will definitely give it a try.

It would not be hard to update the swap with move algorithm so that it performs the moves and swaps using 128KB, which would also work for the external flash. The main issue would be that you need the least 256KB free, one for the status, and another for the spare sector where the first move is done. Using swap with scratch even the last sector can be used as long as there is enough space for status.

Yes, in fact I initially configured MCUboot with swap-move and it was working like a charm with 128 KiB sectors. As you mentioned, the issue here is the 256 KiB overhead: on the project I'm working on, the application is quite huge and losing 256 KiB (or even 128 KiB) of internal flash memory isn't an option. However, there is a lot of spare memory in the external flash, that's why I'm considering using swap-scratch and placing a large scratch area in external flash. Since the scratch area is large flash wear isn't an issue. But this solution is only possible assuming the firmware image is encrypted in the scratch area.

JUUL vaping devices have been running swap with scratch, although they use Mynewt. I have no idea how many devices they have on the field, but there's no lack of people using vaping devices! If there was a bug, I find it very hard to believe even statistically that they would not have found it. Anyone upgrading millions of devices should bump into every last bug there is. That said, there might be something Zephyr specific. The swap algorithms themselves are very deterministic for a given slot/scratch size and image size combination, so even someone with rudimentary testing skills should be able to find a way to consistently reproduce this bug, once found! Since that never happened, things which might have some impact, are flash driver specifics, like writes that return success when the write failed, probably more so for external (Q)SPI flashes, or something like multitasking being enabled and some other task/thread interfering with the swap. MCUboot was designed to run as a sole application without any other extraneous stuff happening.

This definitely makes sense to me and in hindsight I think using swap-scratch in my case is a way better option than implementing my own custom upgrade strategy. Perhaps there is actually an issue with the swap-scratch implementation, that occurs in very specific cases, but it sounds far more likely that the issue is due to a non-MCUboot component. Also, being used and tested for years I expect the swap-scratch implementation to be much more robust than any custom strategy I might design.

On the other hand, is the issue @nordicjm mentioned occurring when resetting the device during an upgrade process? If it does, I think there might be some nondeterministic/hardware-specific behavior here if the reset happens precisely while a sector containing an MCUboot trailer is being erased. My knowledge in hardware is limited so perhaps this doesn't make sense at all, but is there something guaranteeing that during an erase all bits flips exactly at the same time? My understanding would be that if a reset occurs during an erase, the sector that was being erased isn't in a defined state: some bits might have flipped but some other not. If that assumption is true, then if a reset occurs while a MCUboot trailer is being erased, there is a chance that the trailer's magic is good but the swap status totally corrupts, which might lead to MCUboot doing crazy things at next boot. Is MCUboot assuming that with a 16-byte long magic number, the likelihood of such an issue occurring is close to zero?

from mcuboot.

utzig avatar utzig commented on July 22, 2024 1

On the other hand, is the issue @nordicjm mentioned occurring when resetting the device during an upgrade process? If it does, I think there might be some nondeterministic/hardware-specific behavior here if the reset happens precisely while a sector containing an MCUboot trailer is being erased. My knowledge in hardware is limited so perhaps this doesn't make sense at all, but is there something guaranteeing that during an erase all bits flips exactly at the same time? My understanding would be that if a reset occurs during an erase, the sector that was being erased isn't in a defined state: some bits might have flipped but some other not. If that assumption is true, then if a reset occurs while a MCUboot trailer is being erased, there is a chance that the trailer's magic is good but the swap status totally corrupts, which might lead to MCUboot doing crazy things at next boot. Is MCUboot assuming that with a 16-byte long magic number, the likelihood of such an issue occurring is close to zero?

The whole reset recovery resilience is builtin; without having it calling this a secure bootloader would be quite a stretch! That's also why the simulator is used, when it runs an upgrade/revert, it resets at every flash write so we can be sure that every step is tested for resilience.

If you find any brick issues, please dump that last sector of the first slot, so we can find out at what step in the upgrade process it was running. I am afraid it won't be enough information to fix the issue, and extra debug traces would be required, but it's some more information than we have.

from mcuboot.

nordicjm avatar nordicjm commented on July 22, 2024

Swap using scratch shouldn't be used, swap using move should be used instead @d3zd3z

from mcuboot.

taltenbach avatar taltenbach commented on July 22, 2024

Swap using scratch shouldn't be used, swap using move should be used instead @d3zd3z

@nordicjm I'm working on a project with an STM32F413, which has 1.5 MiB of internal flash memory divided in 128 KiB sectors. Using swap-move would mean that two sectors of the internal flash memory could not be used for storing the app (one must be reserved for the swap-move algorithm and another for the MCUboot trailer). Wasting 256 KiB (~17 %) is not acceptable in my case, that's why I'm considering swap-scratch

from mcuboot.

nordicjm avatar nordicjm commented on July 22, 2024

Swap using scratch shouldn't be used, swap using move should be used instead @d3zd3z

@nordicjm I'm working on a project with an STM32F413, which has 1.5 MiB of internal flash memory divided in 128 KiB sectors. Using swap-move would mean that two sectors of the internal flash memory could not be used for storing the app (one must be reserved for the swap-move algorithm and another for the MCUboot trailer). Wasting 256 KiB (~17 %) is not acceptable in my case, that's why I'm considering swap-scratch

I would suggest using direct XIP or overwrite only in that case then. Swap using scratch is known to brick devices for reasons that are not known, it's use is not recommended and @d3zd3z has considered removing support from it from MCUboot entirely

from mcuboot.

taltenbach avatar taltenbach commented on July 22, 2024

Swap using scratch shouldn't be used, swap using move should be used instead @d3zd3z

@nordicjm I'm working on a project with an STM32F413, which has 1.5 MiB of internal flash memory divided in 128 KiB sectors. Using swap-move would mean that two sectors of the internal flash memory could not be used for storing the app (one must be reserved for the swap-move algorithm and another for the MCUboot trailer). Wasting 256 KiB (~17 %) is not acceptable in my case, that's why I'm considering swap-scratch

I would suggest using direct XIP or overwrite only in that case then. Swap using scratch is known to brick devices for reasons that are not known, it's use is not recommended and @d3zd3z has considered removing support from it from MCUboot entirely

Well, I wasn't aware of such a critical issue with swap-scratch, this is very valuable information, thank you. I saw you add a note about that in the Zephyr-specific documentation, perhaps it would be helpful to also add a (big) warning in design.md. Unless that issue is specific to Zephyr of course, but from what you say I understand the algorithm itself is suspected to be broken?

Unfortunately, in my case direct XIP is not possible and overwrite only not an option because the revert mechanism is desired. I guess I will have to develop my own strategy, perhaps something similar to #1902.

I saw several comments from @d3zd3z suggesting swap-scratch has no added value over swap-move and should not be used. I am missing something here? Because unless I'm mistaken, the swap-move requires one additional flash sector compared to swap-scratch (for storing the MCUboot trailer), which seems totally fine in case sector size is small, but can represent a significant waste of memory for MCUs having large flash sectors like numerous STM32 devices. IMO dropping support for swap-scratch without providing a revert-able alternative other than swap-move is likely to be an issue for a lot of projects

from mcuboot.

nordicjm avatar nordicjm commented on July 22, 2024

Well, I wasn't aware of such a critical issue with swap-scratch, this is very valuable information, thank you. I saw you add a note about that in the Zephyr-specific documentation, perhaps it would be helpful to also add a (big) warning in design.md. Unless that issue is specific to Zephyr of course, but from what you say I understand the algorithm itself is suspected to be broken?

No-one actually knows, the issue has just occurred randomly on devices, only been seen on zephyr as far as I'm aware.

Unfortunately, in my case direct XIP is not possible and overwrite only not an option because the revert mechanism is desired. I guess I will have to develop my own strategy, perhaps something similar to #1902.

Direct XIP does support revert which @de-nordic added, you have to set it up for revert and then it functions similarly to swap using move, you load an update and I believe mark it for test and reboot, it gets booted and if you reboot without confirming it that image slot gets erased and the previous image gets booted, otherwise if it is confirmed than the alternative slot gets erased and you use the newer image.

from mcuboot.

taltenbach avatar taltenbach commented on July 22, 2024

No-one actually knows, the issue has just occurred randomly on devices, only been seen on zephyr as far as I'm aware.

Noted, thanks for those details.

Direct XIP does support revert which @de-nordic added, you have to set it up for revert and then it functions similarly to swap using move, you load an update and I believe mark it for test and reboot, it gets booted and if you reboot without confirming it that image slot gets erased and the previous image gets booted, otherwise if it is confirmed than the alternative slot gets erased and you use the newer image.

Yes, sorry, my message wasn't clear enough, I meant:

  • Direct XIP is not possible in my case because the secondary slot is in external flash memory, which doesn't support XIP.
  • Overwrite only isn't an option either because the revert mechanism is desired.

from mcuboot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.