Giter VIP home page Giter VIP logo

Comments (14)

ikifar2012 avatar ikifar2012 commented on August 27, 2024

The main issue this is being tracked under is home-assistant/supervisor#4381 it appears to happen only under bookworm and no root cause has been identified yet, the only thing I can say for certain is it seems to happen whenever the container is recreated and resolves itself with a restart, until the next supervisor update when the container is recreated yet again.

from supervised-installer.

Lockszmith-GH avatar Lockszmith-GH commented on August 27, 2024

thanks for confirming @ikifar2012.

So it seems, at least at this point, I would ask the maintainers for a system state that will report this unhealthy state - something that I can send an alert on, and track.

from supervised-installer.

donaldguy avatar donaldguy commented on August 27, 2024

See home-assistant/supervisor#4839 for my investigation of what concerning-things seems to be happening here

from supervised-installer.

donaldguy avatar donaldguy commented on August 27, 2024

The "short" version of that is

after a reboot with an existing hassio_supervisor container:

$ uptime
 18:44:13 up 1 min,  1 user,  load average: 1.65, 0.75, 0.28
$ docker logs hassio_supervisor 2>&1 | tail -2
24-01-25 18:44:04 INFO (MainThread) [supervisor.resolution.fixup] System autofix complete
24-01-25 18:44:04 INFO (MainThread) [supervisor.host.manager] Host information reload completed
$ ha supervisor info 2>&1 | grep 'healthy\|supported\|^version'
healthy: true
supported: true
version: 2023.12.1
version_latest: 2023.12.1
$ docker inspect hassio_supervisor -f '{{.HostConfig.SecurityOpt}}'
[apparmor=hassio-supervisor label=disable]

big "but"

docker inspect hassio_supervisor --format='{{.State.Pid}}' | \
    xargs pstree -p |  grep -o '([0-9]\+)' | sed -e 's/(/-p /' -e 's/)//' | paste -sd ' ' | \
    xargs sudo ps Z

(or if you prefer / don't want to install pstree, the single-depth: docker inspect hassio_supervisor --format='{{.State.Pid}}' | xargs ps Z --ppid

gives

LABEL                               PID TTY      STAT   TIME COMMAND
unconfined                         2503 ?        Ss     0:00 /package/admin/s6/command/s6-svscan -d4 -- /run/service
unconfined                         2638 ?        S      0:00 s6-supervise s6-linux-init-shutdownd
unconfined                         2641 ?        Ss     0:00 /package/admin/s6-linux-init/command/s6-linux-init-shutdownd -d3 -c /run/s6/basedir -g 300
unconfined                         2666 ?        S      0:00 s6-supervise s6rc-fdholder
unconfined                         2667 ?        S      0:00 s6-supervise s6rc-oneshot-runner
unconfined                         2680 ?        Ss     0:00 /package/admin/s6/command/s6-ipcserverd -1 -- /package/admin/s6/command/s6-ipcserver-acces
unconfined                         2722 ?        S      0:00 s6-supervise supervisor
unconfined                         2724 ?        S      0:00 s6-supervise watchdog
unconfined                         2726 ?        Ss     0:00 bash /usr/bin/bashio ./run watchdog
unconfined                         2727 ?        Ssl    0:06 python3 -m supervisor
unconfined                         7424 ?        S      0:00 sleep 300

that is the apparmor profile is loaded BUT NOT APPLIED to any supervisor process

but after supervisor upgrade

(or any other event which causes recreation of hassio_supervisor container on a running system)

odroidn2:~:# docker inspect hassio_supervisor --format='{{.State.Pid}}' | \
  xargs pstree -p |  grep -o '([0-9]\+)' | sed -e 's/(/-p /' -e 's/)//' | paste -sd ' ' | \
  xargs ps Z
LABEL                               PID TTY      STAT   TIME COMMAND
hassio-supervisor (enforce)        9035 ?        Ss     0:00 /package/admin/s6/command/s6-svscan -d4 -- /run/service
hassio-supervisor (enforce)        9077 ?        S      0:00 s6-supervise s6-linux-init-shutdownd
hassio-supervisor (enforce)        9079 ?        Ss     0:00 /package/admin/s6-linux-init/command/s6-linux-init-shutdownd -d3 -c /run/s6/basedir -g 300
hassio-supervisor (enforce)        9086 ?        S      0:00 s6-supervise s6rc-fdholder
hassio-supervisor (enforce)        9087 ?        S      0:00 s6-supervise s6rc-oneshot-runner
hassio-supervisor (enforce)        9095 ?        Ss     0:00 /package/admin/s6/command/s6-ipcserverd -1 -- /package/admin/s6/command/s6-ipcserver-acces
hassio-supervisor (enforce)        9135 ?        S      0:00 s6-supervise supervisor
hassio-supervisor (enforce)        9137 ?        S      0:00 s6-supervise watchdog
hassio-supervisor (enforce)        9138 ?        Ssl    0:03 python3 -m supervisor
hassio-supervisor (enforce)        9142 ?        Ss     0:00 bash /usr/bin/bashio ./run watchdog
hassio-supervisor (enforce)        9773 ?        S      0:00 sleep 30

odroidn2:~:# ha supervisor info 2>&1 | grep 'healthy\|supported\|^version'
healthy: false
supported: true
version: 2023.12.1
version_latest: 2023.12.1

odroidn2:~:# docker logs hassio_supervisor 2>&1 | tail -3
24-01-25 20:07:52 INFO (MainThread) [supervisor.host.manager] Host information reload completed
24-01-25 20:07:52 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
24-01-25 20:07:52 INFO (MainThread) [supervisor.jobs] 'ResolutionFixup.run_autofix' blocked from execution, system is not healthy - privileged

odroidn2:~:# docker logs hassio_supervisor 2>&1 | grep 'CRITICAL\|ERROR\|WARNING'
24-01-25 20:07:42 CRITICAL (MainThread) [supervisor.hardware.monitor] Not privileged to run udev monitor!
24-01-25 20:07:42 WARNING (MainThread) [supervisor.dbus.rauc] Host has no rauc support. OTA updates have been disabled.
24-01-25 20:07:42 WARNING (MainThread) [supervisor.dbus.agent] No OS-Agent support on the host. Some Host functions have been disabled.
24-01-25 20:07:52 CRITICAL (MainThread) [supervisor.core] System is running in an unhealthy state and needs manual intervention!

AppArmor profile now is in effect and not sufficient for (socket (2) as socket(PF_NETLINK, SOCK_RAW|SOCK_CLOEXEC|SOCK_NONBLOCK, NETLINK_KOBJECT_UEVENT) inside of) udev_monitor_new_from_netlink (3)

from supervised-installer.

donaldguy avatar donaldguy commented on August 27, 2024

Thus there appear to be two important issues:

  1. why is the apparmor profile as written not sufficient for this udev socket open on Debian 12

    • (while remaining sufficient on HAOS ... one assumes/hopes).
    • My best guess (which I should test, but haven't yet, is that its that deny network raw o nline 8? ; that or the policy needs to be being applied to the actual pyc in pyudev and/or libudev.so not just the python interpretor? or there is a new capability needed?)
  2. why is the apparmor profile not being applied to the container's proccesses on first-start after boot? (also no longer, for me, on siblings of ns pid1 spawned by docker exec)

    • (is this a docker/containerd.bug? a kernel bug? a systemd bug?

    • (or is it the initialization chain of hassio-apparmor being race-y? As I demonstrated in this comment it does at least seem like containerd will not start the container when the profile is not loaded - but I'm thinking maybe there is a race between allocation of the name and restore of its contents? [and that apparmor modules short circuit application of a profile if its found empty at the time?])

from supervised-installer.

donaldguy avatar donaldguy commented on August 27, 2024

I think I ended up leaving it out of the version posted in there, but its perhaps worth noting that with the profile applied, pyudev.Context().list_devices() still works fine [that is, I believe: udenv_enumerate_new (3) and udev_list_entry_get_next(3)) in the supervisor process itself (with debug/debug_wait]

so the problem seems to be potentially less the udev part writ large then the (monitor_new_from_)netlink part in particular

(udevadm monitor in the supervisor container (as well as in the hassio_audio container under docker-default apparmor) still works - but that appears to be by virtue of the Ux v ix the policy designates it with)

from supervised-installer.

donaldguy avatar donaldguy commented on August 27, 2024

speaking to OP's workaround:

For my part systemctl restart hassio-supervisor et al are not sufficient (at least not consistently1) to get from "bad" to "good", only a full system reboot - but this may have changed at some point since I started encountering the issue

Conversely, while I was writing home-assistant/supervisor#4839 I could have sworn at some point a "soft" restart was initally but not repeatably sufficient to get from "good" to "bad" (but that could, for example, be something like /run/supervisor/startup-marker causing a recreate instead of a docker stop / docker start)

Footnotes

  1. though one early time I was looking into this, the act of editing config.json or using ha supervisor options to set debug and debug_block and then bootstrapping my first supervisor vscode environment did appear to immediately sidestep the failure--launching it to full hesienbug status. I do not know if that suggests potentially an even more subtle timing issue (such as last supervisor TCP_WAIT-ish socket cleanup?) or if I just somewhere that time I e.g. rebooted the machine without really noting it as part of what I did.

from supervised-installer.

donaldguy avatar donaldguy commented on August 27, 2024

but stopping my thread-spamming, I would be very curious whether other people experiencing this sort of issue can replicate (or countermand) my catching the processes unconstrained

(using docker inspect hassio_supervisor --format='{{.State.Pid}}' | xargs ps Z --ppid, the deeper pstree|sed version, or just aa-status/apparmor_status though there you are looking for presence/absence of the hassio-supervisor profile in the list of processes)

from supervised-installer.

rlue avatar rlue commented on August 27, 2024

I just installed hassio supervised on a fresh debian (bookworm) system and happy to help debug, but much of this thread is over my head. What I can say with certainty is:

For my part systemctl restart hassio-supervisor et al are not sufficient (at least not consistently) to get from "bad" to "good", only a full system reboot

This was also my experience. Happy to run any commands and provide their output for corroboration.

from supervised-installer.

github-actions avatar github-actions commented on August 27, 2024

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

from supervised-installer.

okin avatar okin commented on August 27, 2024

home-assistant/version#370 fixed home-assistant/supervisor#4381.
Is this issue fixed by the PR aswell?

from supervised-installer.

donaldguy avatar donaldguy commented on August 27, 2024

I would fully expect so, but you know, it needs to roll out first

for what it’s worth I belatedly applied my suggested workaround of pulling the profile entirely and haven’t had an issue since, though technically that’s a different thing

from supervised-installer.

ikifar2012 avatar ikifar2012 commented on August 27, 2024

home-assistant/version#370 fixed home-assistant/supervisor#4381. Is this issue fixed by the PR aswell?

Still awaiting this home-assistant/version#372 to be merged as well... We will see after the next supervisor update

from supervised-installer.

github-actions avatar github-actions commented on August 27, 2024

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

from supervised-installer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.