Giter VIP home page Giter VIP logo

rancher / elemental Goto Github PK

View Code? Open in Web Editor NEW
248.0 14.0 37.0 23.65 MB

Elemental is an immutable Linux distribution built to run Rancher and its corresponding Kubernetes distributions RKE2 and k3s. It is built using the Elemental-toolkit

Home Page: https://elemental.docs.rancher.com/

License: Apache License 2.0

Makefile 1.59% Go 47.26% Shell 11.71% TypeScript 29.37% Dockerfile 8.69% JavaScript 0.11% 1C Enterprise 1.28%
kubernetes docker containers rancher

elemental's Introduction

Elemental

Lint

Daily CI

CLI-K3s CLI-K3s-Upgrade CLI-RKE2 CLI-RKE2-Upgrade

UI-K3s UI-K3s-Upgrade UI-RKE2 UI-RKE2-Upgrade

Weekly CI

CLI-K3s-Airgap CLI-K3s-Scalability CLI-Multicluster

Goal

Elemental is a software stack enabling a centralized, full cloud-native OS management solution with Kubernetes.

Cluster Node OSes are built and maintained via container images through the Elemental Toolkit and installed on new hosts using the Elemental CLI.

The Elemental Operator and the Rancher System Agent enable Rancher Manager to fully control Elemental clusters, from the installation and management of the OS on the Nodes to the provisioning of new K3s or RKE2 clusters in a centralized way.

Follow our Quickstart or see the full docs for more info.

License

Copyright (c) 2020-2024 SUSE, LLC

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

elemental's People

Contributors

anmazzotti avatar cbosdo avatar cos-cibot avatar davidcassany avatar dependabot[bot] avatar fgiudici avatar frelon avatar fullmetal-fred avatar github-actions[bot] avatar ibuildthecloud avatar juadk avatar kkaempf avatar lagartoflojo avatar ldevulder avatar mbologna avatar mjura avatar mudler avatar nunix avatar rdoxenham avatar simonflood avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elemental's Issues

TPM alternatives

Registering via TPM is limiting, indeed we have a TPM emulation mechanism #23 (which is NOT safe to use, for tests only) that also forces the ros-installer to be built with cgo enabled.

This card is to discuss and find about potential alternatives that can be run in parallel and supported aside of TPM.

Requirements:

The alternative has to take account of the SDO usage and supply chain, which spans from setting up the node from the last mile to who is actually setting up the nodes in the first place.

Ideally should also consider portable cases like where registering data can be carried over different mediums #3, but that can be considered aside.

Reach out also to security team.

See also: #23 and for example the FIDO specs: https://fidoalliance.org/specs/FDO/FIDO-Device-Onboard-RD-v1.0-20201202.html#appendix-b-device-key-provisioning-with-ecdsa

https://fidoalliance.org/specs/FDO/fido-device-onboard-v1.0-ps-20210323/fido-device-onboard-v1.0-ps-20210323.html

Consume framework images in the generated chart

Currently ros-operator charts are pointing to OS images, instead should point to framework images which contains only the binary and nothing else.

Since ros-operator is built statically, it should be possible to run it distroless. That would dramatically reduce download and setup times for the chart.

Failure pulling user-data from CDROM

Latest builds from CI fails tests cause of this:

INFO[0000] Processing stage step 'Pull data from provider (local)'  commands=0 delete_entities=0 entities=0 files=0 nameserver=0 step="Pull data from provider (local)"
panic: runtime error: index out of range [0] with length 0

goroutine 1 [running]:
github.com/diskfs/go-diskfs/filesystem/iso9660.parseDirEntry({0x10ab068, 0x10ab068, 0xa}, 0xc000206038)
        /luetbuild/go/src/github.com/mudler/yip/vendor/github.com/diskfs/go-diskfs/filesystem/iso9660/directoryentry.go:281 +0x56f
github.com/diskfs/go-diskfs/filesystem/iso9660.Read({0xc3f900, 0xc00000f138}, 0x5b000, 0x0, 0xc0002062d8)
        /luetbuild/go/src/github.com/mudler/yip/vendor/github.com/diskfs/go-diskfs/filesystem/iso9660/iso9660.go:223 +0x9fa
github.com/diskfs/go-diskfs/disk.(*Disk).GetFilesystem(0xc00007d5e0, 0x0)
        /luetbuild/go/src/github.com/mudler/yip/vendor/github.com/diskfs/go-diskfs/disk/disk.go:227 +0x238
github.com/davidcassany/linuxkit/pkg/metadata/providers.FindCIs()
        /luetbuild/go/src/github.com/mudler/yip/vendor/github.com/davidcassany/linuxkit/pkg/metadata/providers/provider_cdrom.go:85 +0x3a6
github.com/davidcassany/linuxkit/pkg/metadata/providers.ListCDROMs()
        /luetbuild/go/src/github.com/mudler/yip/vendor/github.com/davidcassany/linuxkit/pkg/metadata/providers/provider_cdrom.go:45 +0xae
github.com/mudler/yip/pkg/plugins.DataSources({{0x0, 0x0, 0x0}, {0x0, 0x0, 0x0}, {0x0, 0x0, 0x0}, {0x0, ...}, ...}, ...)
        /luetbuild/go/src/github.com/mudler/yip/pkg/plugins/datasource.go:49 +0xb4b
github.com/mudler/yip/pkg/executor.(*DefaultExecutor).Apply(0xc00023f400, {0x7fff88984ac2, 0x7}, {{0xc0002a2ec0, 0x1}, 0xc00024f4a0}, {0xc56f28, 0x10aad40}, {0xc3fc90, 0x10ab068})
        /luetbuild/go/src/github.com/mudler/yip/pkg/executor/default.go:162 +0xcd9
github.com/mudler/yip/pkg/executor.(*DefaultExecutor).run(0x20, {0x7fff88984ac2, 0x7}, {0xc0002a2de0, 0x1e}, {0xc56f28, 0x10aad40}, {0xc3fc90, 0x10ab068}, 0xb80300, ...)
        /luetbuild/go/src/github.com/mudler/yip/pkg/executor/default.go:92 +0x13a
github.com/mudler/yip/pkg/executor.(*DefaultExecutor).walkDir.func1({0xc0002a2de0, 0x0}, {0xc49240, 0xc0002a6340}, {0x0, 0x0})
        /luetbuild/go/src/github.com/mudler/yip/pkg/executor/default.go:72 +0x20b
github.com/twpayne/go-vfs.walk({0x7fe455ad6818, 0x10aad40}, {0xc0002a2de0, 0x1e}, 0xc000207bb0, {0xc49240, 0xc0002a6340}, {0x0, 0x0})
        /luetbuild/go/src/github.com/mudler/yip/vendor/github.com/twpayne/go-vfs/walk.go:30 +0xbb
github.com/twpayne/go-vfs.walk({0x7fe455ad6818, 0x10aad40}, {0x7fff88984aca, 0xb}, 0xc000207bb0, {0xc49240, 0xc000271ba0}, {0x0, 0x0})
        /luetbuild/go/src/github.com/mudler/yip/vendor/github.com/twpayne/go-vfs/walk.go:47 +0x2e5
github.com/twpayne/go-vfs.Walk({0x7fe455ad6818, 0x10aad40}, {0x7fff88984aca, 0xb}, 0x7fff88984aca)
        /luetbuild/go/src/github.com/mudler/yip/vendor/github.com/twpayne/go-vfs/walk.go:58 +0x66
github.com/mudler/yip/pkg/executor.(*DefaultExecutor).walkDir(0x10aad40, {0x7fff88984ac2, 0x7fe455ac3370}, {0x7fff88984aca, 0xb}, {0xc56f28, 0x10aad40}, {0xc3fc90, 0x10ab068})
        /luetbuild/go/src/github.com/mudler/yip/pkg/executor/default.go:55 +0x10e
github.com/mudler/yip/pkg/executor.(*DefaultExecutor).runStage(0xc00023f400, {0x7fff88984ac2, 0x7}, {0x7fff88984aca, 0xb}, {0xc56f28, 0x10aad40}, {0xc3fc90, 0x10ab068})
        /luetbuild/go/src/github.com/mudler/yip/pkg/executor/default.go:104 +0x21c
github.com/mudler/yip/pkg/executor.(*DefaultExecutor).Run(0xe, {0x7fff88984ac2, 0x7}, {0xc56f28, 0x10aad40}, {0xc3fc90, 0x10ab068}, {0xc00024ec60, 0x1, 0x3})
        /luetbuild/go/src/github.com/mudler/yip/pkg/executor/default.go:122 +0x147
github.com/mudler/yip/cmd.glob..func1(0x10709a0, {0xc00024ec60, 0x1, 0x3})
        /luetbuild/go/src/github.com/mudler/yip/cmd/root.go:107 +0x2a8
github.com/spf13/cobra.(*Command).execute(0x10709a0, {0xc000030090, 0x3, 0x3})
        /luetbuild/go/src/github.com/mudler/yip/vendor/github.com/spf13/cobra/command.go:860 +0x5f8

See also: diskfs/go-diskfs#103

cloud-init: no ssh keys created for additional users

This seems to create the 'vagrant' user, but not the ssh keys (note, root had the keys correctly added):

#cloud-config

# Add additional users or set the password/ssh keys for root
users:
- name: "root"
  passwd: "ros"
  ssh_authorized_keys:
  - https://raw.githubusercontent.com/hashicorp/vagrant/main/keys/vagrant.pub
- name: "vagrant"
  passwd: "vagrant"
  shell: "/bin/bash"
  homedir: "/run/vagrant"
  ssh_authorized_keys:
  - https://raw.githubusercontent.com/hashicorp/vagrant/main/keys/vagrant.pub

Screenshot from 2022-02-11 10-04-16

Originally posted by @mudler in #13 (comment)

luet-mtree is enabled while installing from container images

Version used: https://github.com/rancher-sandbox/os2/releases/tag/v0.1.0-alpha20

I tried to provision of server with this config:

#cloud-config
rancheros:
    tpm:
        emulated: true
        seed: 5
    install:
        registrationCaCert: |-
            -----BEGIN CERTIFICATE-----
            [...]
            -----END CERTIFICATE-----
        registrationURL: https://epinio-k3s/v1-rancheros/registration/vzrn69vhqdjw6w54xrsqdwzkj5rkz2jhmqq8j8vd577x8k72mnthbr
        containerImage: quay.io/costoolkit/os2:v0.1.0-alpha20-amd64

Download of the container image was fine, but I ended up with a luet error:

Mar 10 15:41:22 localhost ros-installer[1756]:   ERROR     Plugin luet-mtree at /usr/bin/luet-mtree had an error: error while executing plugin: exit status 1
Mar 10 15:41:22 localhost ros-installer[1756]: panic: fatal error

I attached the whole log file.
ros-installer.log

ros-operator needs to add the Rancher cacert to the machine registration

Problem statement

ros-operator currently returns

rancheros:
    install:
        registrationUrl: https://172.17.0.2/v1-rancheros/registration/g75mqpnrlrrgqqbf9cxwzw2b75bt5xngdv9pljprdd7vp4cqd27hjx

from the registrationUrl. However, ros-installer will barf at this since it's "signed with an unknown authority".

Adding the Rancher cacert (from 'left pane' -> 'global settings' -> 'Advanced settings' -> 'Show cacerts') as in

rancheros:
    install:
        registrationUrl: https://172.17.0.2/v1-rancheros/registration/g75mqpnrlrrgqqbf9cxwzw2b75bt5xngdv9pljprdd7vp4cqd27hjx
        registrationCaCert: |
          -----BEGIN CERTIFICATE-----
          MIIBpzCCAU2gAwIBAgIBADAKBggqhkjOPQQDAjA7MRwwGgYDVQQKExNkeW5hbWlj
          bGlzdGVuZXItb3JnMRswGQYDVQQDExJkeW5hbWljbGlzdGVuZXItY2EwHhcNMjIw
          MTI1MDgxMjAxWhcNMzIwMTIzMDgxMjAxWjA7MRwwGgYDVQQKExNkeW5hbWljbGlz
          dGVuZXItb3JnMRswGQYDVQQDExJkeW5hbWljbGlzdGVuZXItY2EwWTATBgcqhkjO
          PQIBBggqhkjOPQMBBwNCAAS8ac2BZIHiLMvAasGldNGLvPoxiNLeJgJdn3zEi605
          DKtPVeO9Wq3BY0lKZ0Lz0hEo03POVlJs0ieQjTH5c+1lo0IwQDAOBgNVHQ8BAf8E
          BAMCAqQwDwYDVR0TAQH/BAUwAwEB/zAdBgNVHQ4EFgQUtB+RIFNNC9JZWcvFJZy2
          AnNILRYwCgYIKoZIzj0EAwIDSAAwRQIgUdk6jxLGX8/AUSS9TiMnlT4sqVksXrWK
          B8gjIV8k9fkCIQDIuV1iRmHzeGLFuFgpkmrtxNQwrxpKhMcs/ZYrDXkW9w==
          -----END CERTIFICATE-----

fixes this problem.

Expected result

Pulling data from the registrationUrl (with wget, before building an installation image) should include the cacert in the yaml.

automatic CDROM ejection

Booting ISO should be ejected on shutdown if we set it from a config file option.

This is helpful in scenarios where automated installation are performed and hypervisor are not unmounting the iso after first installation.

Systemd seems supports this, a script can be placed in /usr/lib/systemd/system-shutdown/name.shutdown to execute commands during shutdown. I'm not sure if it is possible to hook cdrom ejection, but I believe so, otherwise we have to back off on some custom logic in the initrd. It worth trying with systemd first.

Something among the lines of:

#!/bin/bash
cd=$(lsblk | grep COS_LIVE | cut -d ' ' -f1)
eject -T /dev/$cd 

or a service, like:

[Unit]
Description=Eject the DVD
Before=final.target
After=shutdown.target
DefaultDependencies=no

[Service]
Type=oneshot
ExecStart=/usr/bin/eject -m
StandardInput=tty-force
StandardOutput=inherit
StandardError=inherit

[Install]
WantedBy=shutdown.target

Either way this have to be dynamic. We don't want always to eject the CDROM as it might not fit in other scenarios (where the CDROM might be used to actually boot a full system, without anything being installed locally) and might break other flows which we are not aware yet.

We can think about enabling automatically this feature afterwards.

Generic install disk images

Once we can bootstrap from a container image, it might make things easier for demos and one-offs to have a generic disk image to boot from. Then you can run cos-installer with the image and config specified in flags.

Use self-hosted for release workflow

Currently the release pipeline runs on GHA runners, while tests are running on self-hosted.

The release pipeline, when being switched to self-hosted fails to build the github action used to release:

Build container for action use: '/runner/_work/_actions/rancher-sandbox/github-action-ghr/v1/Dockerfile'.
  /usr/bin/docker build -t 60e226:177dbcbb71be43fa8f1b60253c7e6799 -f "/runner/_work/_actions/rancher-sandbox/github-action-ghr/v1/Dockerfile" "/runner/_work/_actions/rancher-sandbox/github-action-ghr/v1"
  Sending build context to Docker daemon  11.26kB
  
  Step 1/11 : FROM alpine:latest
  latest: Pulling from library/alpine
  59bf1c3509f3: Pulling fs layer
  59bf1c3509f3: Verifying Checksum
  59bf1c3509f3: Download complete
  59bf1c3509f3: Pull complete
  Digest: sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300
  Status: Downloaded newer image for alpine:latest
   ---> c059bfaa849c
  Step 2/11 : LABEL "com.github.actions.name"="GitHub Releases"
   ---> Running in dd22dc09cbb7
  Removing intermediate container dd22dc09cbb7
   ---> f7edb6940cad
  Step 3/11 : LABEL "com.github.actions.description"="Upload build artifacts to GitHub releases"
   ---> Running in af36c91de8d9
  Removing intermediate container af36c91de8d9
   ---> 5ad8b644f250
  Step 4/11 : LABEL "com.github.actions.icon"="tag"
   ---> Running in 53b937497f6f
  Removing intermediate container 53b937497f6f
   ---> 549c0713ba7a
  Step 5/11 : LABEL "com.github.actions.color"="black"
   ---> Running in b9db6624cb53
  Removing intermediate container b9db6624cb53
   ---> 4f43208847a5
  Step 6/11 : ENV GHR_FORK tcnksm/ghr
   ---> Running in a123a0a44c55
  Removing intermediate container a123a0a44c55
   ---> f5885d28db46
  Step 7/11 : ENV GHR_VERSION 0.13.0
   ---> Running in 65b7a923b1dd
  Removing intermediate container 65b7a923b1dd
   ---> 71bd9f98f423
  Step 8/11 : RUN apk add --no-cache bash curl xz zip coreutils
   ---> Running in 6a01f41c26cc
  fetch https://dl-cdn.alpinelinux.org/alpine/v3.15/main/x86_64/APKINDEX.tar.gz
  WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.15/main: DNS lookup error
  fetch https://dl-cdn.alpinelinux.org/alpine/v3.15/community/x86_64/APKINDEX.tar.gz
  WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.15/community: DNS lookup error
  ERROR: unable to select packages:
    bash (no such package):
      required by: world[bash]
    coreutils (no such package):
      required by: world[coreutils]
    curl (no such package):
      required by: world[curl]
    xz (no such package):
      required by: world[xz]
    zip (no such package):
      required by: world[zip]
  The command '/bin/sh -c apk add --no-cache bash curl xz zip coreutils' returned a non-zero code: 5
  Warning: Docker build failed with exit code 5, back off 3.415 seconds before retry.
  /usr/bin/docker build -t 60e226:177dbcbb71be43fa8f1b60253c7e6799 -f "/runner/_work/_actions/rancher-sandbox/github-action-ghr/v1/Dockerfile" "/runner/_work/_actions/rancher-sandbox/github-action-ghr/v1"
  Sending build context to Docker daemon  11.26kB
  
  Step 1/11 : FROM alpine:latest
   ---> c059bfaa849c
  Step 2/11 : LABEL "com.github.actions.name"="GitHub Releases"
   ---> Using cache
   ---> f7edb6940cad
  Step 3/11 : LABEL "com.github.actions.description"="Upload build artifacts to GitHub releases"
   ---> Using cache
   ---> 5ad8b644f250
  Step 4/11 : LABEL "com.github.actions.icon"="tag"
   ---> Using cache
   ---> 549c0713ba7a
  Step 5/11 : LABEL "com.github.actions.color"="black"
   ---> Using cache
   ---> 4f43208847a5
  Step 6/11 : ENV GHR_FORK tcnksm/ghr
   ---> Using cache
   ---> f5885d28db46
  Step 7/11 : ENV GHR_VERSION 0.13.0
   ---> Using cache
   ---> 71bd9f98f423
  Step 8/11 : RUN apk add --no-cache bash curl xz zip coreutils
   ---> Running in 868059029d35
  fetch https://dl-cdn.alpinelinux.org/alpine/v3.15/main/x86_64/APKINDEX.tar.gz
  WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.15/main: DNS lookup error
  fetch https://dl-cdn.alpinelinux.org/alpine/v3.15/community/x86_64/APKINDEX.tar.gz
  WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.15/community: DNS lookup error
  ERROR: unable to select packages:
    bash (no such package):
      required by: world[bash]
    coreutils (no such package):
      required by: world[coreutils]
    curl (no such package):
      required by: world[curl]
    xz (no such package):
      required by: world[xz]
    zip (no such package):
      required by: world[zip]
  The command '/bin/sh -c apk add --no-cache bash curl xz zip coreutils' returned a non-zero code: 5
  Warning: Docker build failed with exit code 5, back off 8.101 seconds before retry.
  /usr/bin/docker build -t 60e226:177dbcbb71be43fa8f1b60253c7e6799 -f "/runner/_work/_actions/rancher-sandbox/github-action-ghr/v1/Dockerfile" "/runner/_work/_actions/rancher-sandbox/github-action-ghr/v1"
  Sending build context to Docker daemon  11.26kB
  
  Step 1/11 : FROM alpine:latest
   ---> c059bfaa849c
  Step 2/11 : LABEL "com.github.actions.name"="GitHub Releases"
   ---> Using cache
   ---> f7edb6940cad
  Step 3/11 : LABEL "com.github.actions.description"="Upload build artifacts to GitHub releases"
   ---> Using cache
   ---> 5ad8b644f250
  Step 4/11 : LABEL "com.github.actions.icon"="tag"
   ---> Using cache
   ---> 549c0713ba7a
  Step 5/11 : LABEL "com.github.actions.color"="black"
   ---> Using cache
   ---> 4f43208847a5
  Step 6/11 : ENV GHR_FORK tcnksm/ghr
   ---> Using cache
   ---> f5885d28db46
  Step 7/11 : ENV GHR_VERSION 0.13.0
   ---> Using cache
   ---> 71bd9f98f423
  Step 8/11 : RUN apk add --no-cache bash curl xz zip coreutils
   ---> Running in be4ff142b419
  fetch https://dl-cdn.alpinelinux.org/alpine/v3.15/main/x86_64/APKINDEX.tar.gz
  fetch https://dl-cdn.alpinelinux.org/alpine/v3.15/community/x86_64/APKINDEX.tar.gz
  WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.15/main: DNS lookup error
  WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.15/community: DNS lookup error
  ERROR: unable to select packages:
    bash (no such package):
      required by: world[bash]
    coreutils (no such package):
      required by: world[coreutils]
    curl (no such package):
      required by: world[curl]
    xz (no such package):
      required by: world[xz]
    zip (no such package):
      required by: world[zip]
  The command '/bin/sh -c apk add --no-cache bash curl xz zip coreutils' returned a non-zero code: 5
Error: Docker build failed with exit code 5

See: https://github.com/rancher-sandbox/os2/runs/5138984981?check_suite_focus=true

Consume `ManagedOSVersion` from `ManagedOS`

Part of #58

Once #63 is in we can consume the new Kind with ManagedOS and trigger upgrades as usually

ManagedOS

Managed OS needs to be able to refer to ManagedOSVersions for upgrades (extends it)

apiVersion: rancheros.cattle.io/v1
kind: ManagedOS
spec:
   # osImage: rancher/os2:v0.0.0
   managedVersionRef:
      name: "<X>"

We need the ManagedOS Kind to be able to refer to ManagedOSVersions

cloud-init: homedir and shell are required when creating new users

Booting with the following cloud-init file fails adding the vagrant user:

#cloud-config

# Add additional users or set the password/ssh keys for root
users:
- name: "root"
  passwd: "ros"
  ssh_authorized_keys:
  - https://raw.githubusercontent.com/hashicorp/vagrant/main/keys/vagrant.pub
- name: "vagrant"
  passwd: "vagrant"
  ssh_authorized_keys:
  - https://raw.githubusercontent.com/hashicorp/vagrant/main/keys/vagrant.pub

cos-setup-network:
Screenshot from 2022-02-11 10-04-16

Update to latest cOS

Consume the latest toolkit changes, and move pins to current.

Respect version in `minVersion` field in `ManagedOSImageVersion`

Currently ManagedOSImageVersion minVersion field is ignored, altought it should gate versions that are not eligible to certain nodes.

This requires a mechanism to stream back the bundle application result into a state of a ManagedOS upgrade.
Currently we have a state associated to the ManagedOSImage which is a fleet bundle, maybe we can use that to figure out if the ManagedOS image upgrade is applied and infer the version of the OS running on the nodes from it.

When this mechanism is in place, then we can gate the ManagedOSImageVersion to be applied to a ManagedOS.

While doing this, we should keep in mind there are several stakeholders involved (harvester, maintenance team, etc. ) so we should discuss this broadly before taking any specific path for parsing versions. ping also @kkaempf

Setup `root` user, disable `root`/passwd login for SSH

Using the following config.yaml, I can ssh into the machine using the root passwd, both during and after the install.

#cloud-config
rancherd:
  role: cluster-init

disable_root: 1
ssh_pwauth: 0

# See https://rancher-sandbox.github.io/cos-toolkit-docs/docs/reference/cloud_init/ for a full syntax reference
stages:
  initramfs:
  - users:
      root:
        passwd: "ros"
  - disable_root: 1
  - ssh_pwauth: 0
  - files:
    - path: /etc/ssh/ssh_config
      content: |
          Port 22
          Protocol 2
          PasswordAuthentication no
          ChallengeResponseAuthentication no
          UsePAM no
          HostbasedAuthentication no
          IgnoreRhosts yes
          AuthenticationMethods publickey
          PubkeyAuthentication yes
          PermitEmptyPasswords no
          PermitRootLogin no
          DenyUsers root
      permissions: 0600
      owner: 0
      group: 0

I expected passwd auth and root login to be disabled for SSH, but that seems to not be the case. Why?

SSH `github`/`gitlab` auth not working after install

Using the following config.yaml I can ssh into the machine during the install, but not after. (It asks for passwd). Why?

#cloud-config
rancherd:
  role: cluster-init

# See https://rancher-sandbox.github.io/cos-toolkit-docs/docs/reference/cloud_init/ for a full syntax reference
stages:
  initramfs:
  - users:
      jsundqvist:
  network:
  - authorized_keys:
      jsundqvist:
      - github:jsundqvist

EDIT: Same for gitlab

Auto-installing ISO should not show `localhost login:` prompt

When using an ISO that was created with a config.yaml, it's boots just as the regular ISO, showing the login prompt suggesting login using root:ros. Then, after about 50 seconds of secretly finishing the installation, it reboots from the harddrive. This works, obviously, but is very confusing.

I would expect the auto-installing ISO to not show the login prompt, as this allows users to unknowingly interfere with the installation, and instead show the logs of what the installer is doing.

`ManagedOS` upgrade command is hardcoded

Currently os2 applies Plans from ManagedOS resources in a hardcoded way: https://github.com/rancher-sandbox/os2/blob/9fed2edefc12cbcb426874db137631f9d52de3cd/pkg/controllers/managedos/template.go#L138 .
This doesn't make possible to reuse the ManagedOS resource outside the os2 context, and doesn't allow much flexibility if someone wants to customize the upgrade process.

What we could have instead is exposing the ContainerSpec in suc and if empty fallback to current as defaults. In that way will be easier later to create a OSUpgradeCatalog of ContainerSpecs based on the os2 derivative

CRD: `ManagedOSVersionChannel`

Part of #58

Create the following new CRDs and controller for the resources types described below

ManagedOSVersionChannel and the controller

the ManagedOSVersion resources are populated by a controller, or either a kubernetes mechanism that satisfy ManagedOSVersionChannel source and constraints. The ManagedOSVersionChannel can be defined as such:

apiVersion: rancheros.cattle.io/v1
kind: ManagedOSVersionChannel
spec:
  type: "JSON"
  options: # map[string]interface{}
       url: "<file>.json"

where a type of channel (e.g. JSON, URL, XML, Github) populates ManagedOSVersions by using the specified options

It defines with which rules ManagedOSVersion should be populated (e.g. github repository tags, releases, a custom json file to parse)

The controller make sure to add labels/annotations where appropriately to notify that a ManagedOS version can be applied.

Ping the harvester team about their strategy and if it does cover their use cases too. Note they might have already a json file or something similar for their available versions, in which we might want to sync up within this card as we want to collect as much feedback as possible at this stage.

cc @bk201

Allow configuring ros-operator chart url/version

Currently our scripts/os2 binaries use a hardcoded chart url and use the latest DEVEL version (partly due us not releasing a prod version of the ros-operator of course)

There should be some kind of variable that allow us to override both the CHART url repo and the CHART version so we can test with different chart versions/repos.

A good entrypoint for this would be the dapper config which passes configurations into the build process.

One of the problems might be that on some scripts (https://github.com/rancher-sandbox/os2/blob/master/framework/files/usr/sbin/ros-operator-install) the chart is fixed and we cna change that from the build process directly, in fact it uses a locally stored chart in the iso.

So we may require extra configs under /usr/lib/rancheros-release (https://github.com/rancher-sandbox/os2/blob/master/Dockerfile#L161) which IMO is a bad thing. Like the chart version/repo should not be tied to the os version, it should be independent allowing you to upgrade the chart and leave the system running. Before the separation this was not possible, you had the operator tied to the OS, and now its still not possible as the scripts have a hardcoded local path for the operator path and the operator is copied on build time.

So this migth need a rethinking of the ros-operator-install script.

Plus Im not still clear if the ros-operator is an integral part of the os2 feature set or not, as the operator can run anywhere, its not really tied to os2, but seems like os2 wants to provide the feature of having the operator out-of-the-box after install....very weird.

Requirements

CRD and controller for OS versions

In order to be able to select available versions to upgrade our nodes into from UI or either kubectl, we need a generic ManagedOSVersion CRD/controller and a ManagedOSVersionChannel CRD to represent the version we want to upgrade into, and how we get all the lists of available versions. This will also help us later on with moving image validation steps inside the kubernetes cluster with a kubewarden policy (as we can validate releases both while fetching new ones, and while applying them)

Notes

API defintions: https://github.com/rancher-sandbox/os2/blob/master/pkg/apis/rancheros.cattle.io/v1/os.go#L28

Hostname is changed upon each reboot

After installation I set my hostname to ros-node-02 and configured a K3s cluster on it with rancherd/Rancher. All was fine but after a reboot the hostname has been changed to something like rancher-XXXXX. I tried to reset the hostname but same issue after a reboot.
I'm able to reproduce this every time.

I have a DHCP/DNS server that fix the IP address, so it's not an IP change upon each reboot. I attached journalctl logs on this issue.
journalctl.log.gz

Split ros-operator into its own repo

It makes no sense to have all this mixed together with the os2 files. First, everything is clumped into a ton of files to setup like the CI, the makefile, the scripts, etc... while ros-operator is very simple and could do with a simple repo in which we control everything and its much clearer where to change things.

Plus we can release it separately instead of in a big release with the installer, the chart, the iso, etc..

ros-operator and its chart should live in its own repo for ease of releasing, testing and updating. We also dont need os2 to test the ros-operator so that makes it simpler.

A test repo is available at https://github.com/Itxaka/ros-operator with the full package, CI stuff, release stuff.

Action items

extra yaml fields not functional when running manual installation

When an installation is performed manually with ros-installer --config-file, runcmd, and other extra YAML fields are not preserved:

Input config file:

#cloud-config
rancherd:
  role: cluster-init
  rancherValues:
    features:
    - multi-cluster-management=true
ssh_authorized_keys:
  - github:mudler
runcmd:
  - curl -fL https://raw.githubusercontent.com/rancher/rancherd/master/install.sh | sh -

After install:

rancher-8421:/oem # cat 99_custom.yaml 
#cloud-config
rancheros: {}
ssh_authorized_keys:
- github:mudler

Seems ros-installer is manipulating the yaml fields in a way that extra fields are not carried over.

Acceptance tests

Currently our smoke tests are testing the following aspects (the vbox ones are more complete, due to accelleration support in the runners):

  • CD datasource and first-user configuration
  • First services on boot starts correctly (ros-operator, rancherd, k3s, etc)
  • Starting up a k3s cluster on rancherd and installing the ros-operator on top (rancherd included)
  • Creation of MachineRegistration resources

In order to have an acceptance test suite, we should at least cover the following aspects:

  • Node upgrades managed by ros-operator
  • Manual node upgrades from the CLI
  • Installation and bootstrap of nodes from ISO and #6
  • Multi-machine tests

Action items

Enhance our test suite on github actions with new test scenarios:

  • Bootstrapping nodes with MachineRegistration. Currently our test suite generates MachineRegistration resources but are not consumed. Consider tests for both ISO and container images (depends on #6) (split in #41)
  • Node upgrades following https://rancher.github.io/os2/upgrade/#rancheros (split in #42)
  • Manual upgrades, following cOS default usage and ros-installer (split in #43) - In progress
  • Upgrades in a multi-node environment. After machines are bootstrapped, the test suite should try to upgrade the nodes with the ManagedOs resource and also manually from the cli (split in #44)
  • Use ISO and container image to boot some VMs in the E2E tests (#147)

CRD: ManagedOSVersion

Part of #58

Create the following new CRDs and controller for the resources types described below

ManagedOSVersion

ManagedOSVersion indicates the system version we want to upgrade to and that can be used within ManagedOS

apiVersion: rancheros.cattle.io/v1
kind: ManagedOSVersion
spec:
  version: <X>
  # or, if specified by any channel, a custom upgrade containerSpec
  upgradeContainerSpec: ...
  tags: [ "", "" ]
  minVersion: ""
  metadata: 
      <map[string]interface{}>
    CVEs: [" "] 
    Changelog: |....

The ManagedOSVersion declares its compatibility versions (range from which it can be upgraded to, and optional sets of constraints) and have to accomodate extra data (CVEs, Changelogs, etc ) this information have to be used purely from UI/UX interface, so we can treat them as metadata, but to keep in mind that has to be parsed programmatically somehow.

Support container images as installation bootstrap

See: rancher/os2#8

https://github.com/rancher-sandbox/os2/blob/7741733950e35fea2ad3b30ab6290fcd16e61753/pkg/config/read.go#L209 needs to support container images, see configuration test for elemental-cli which is relevant for the purpose: https://github.com/rancher-sandbox/os2/blob/master/pkg/config/read_test.go#L118 (the test checks how the config of os2 is translated to env var that will be used during the elemental installation process)

ping @mudler

Allow for change of node-name prior to install

It would be nice if we could specify the node name via the MachineInventory object when joining the cluster.

This doesn't even need to be hostname since both k3s and rke2 have the --node-name flag when installing!

rd.neednet=1 is needed on boot in order to install ssh keys

In case oem files are copied over into /oem (that is, after install) and not loaded by datasources, rd.neednet=1 is needed to guarantee that the boot and initramfs stages are executed when network is available, for example to fetch ssh user keys.

Enabling rd.neednet=1, on the other hand, makes configuring network via cloud-init from datasources not possible.

Currently in os2, we have this issue as if an installation providing keys is performed, doesn't necessarly work unless network is properly booted up, or booting with rd.neednet=1 enabled (currently enabled).

This is related to rancher/elemental-toolkit#1140 and rancher/elemental-toolkit#388

upload artifacts separated

Currently we upload the full output dir, which means that all the artifacts get lumped in a big zip of about 5Gb.

Which is kind of inconvenient if you want to download only one artifact, not to mention that gihut speeds are not the best and kind of flaky sometimes.

Would be helpful to upload each artifact individually so they can also be downloaded individually

Add/configure/find-a-way to use virtualization for VM tests

Current tests on VM run on MacOS runner in GH with VirtualBox because virtualization is not available on Linux worker. MacOS runners are x86 for now but will be moved to aarch64 at some point in the future. So could be a good idea to find another way to run these tests.

Rancher-Desktop uses cirrus-ci for this, as KVM can be configured on container image.

  • configure a runner to run it, nested virt on a self-hosted runner? cirrus-ci (a test is in progress on it)?
  • schedule test(s) that need virtualization (new, not the ones that already use vbox)

http: panic serving pipe: runtime error:

The Rancher host ran into memory pressure and had to be rebooted. After reboot, RancherOS2 node onboarding fails and I see the attached log message coming from ros-operator:

time="2022-02-11T14:52:52Z" level=error msg="error syncing 'default/m-qemu-standard-pc-q35-ich9-2009-not-specified-vbvth': handler machine-inventory: clusters.provisioning.cattle.io \"custom\" not found, requeing" │
│ 2022/02/11 14:53:05 http: panic serving pipe: runtime error: invalid memory address or nil pointer dereference                                                                             │
│ goroutine 974 [running]:                                                                                                                                                                   │
│ net/http.(*conn).serve.func1(0xc0004425a0)                                                                                                                                                 │
│     /usr/lib64/go/1.16/src/net/http/server.go:1824 +0x153                                                                                                                                  │
│ panic(0x1c0bf20, 0x2efb6c0)                                                                                                                                                                │
│     /usr/lib64/go/1.16/src/runtime/panic.go:971 +0x499                                                                                                                                     │
│ github.com/rancher/os2/pkg/server.writeResponse(0x7fd6a83db200, 0xc000623e18, 0xc00000a1e0, 0x0, 0xc000623e18, 0xc000623e48)                                                               │
│     /home/abuild/rpmbuild/BUILD/os2-0.1.0-alpha13/pkg/server/server.go:175 +0xf3                                                                                                           │
│ github.com/rancher/os2/pkg/server.(*InventoryServer).handle(0xc000c4c600, 0x2138860, 0xc000b80a80, 0xc000784b00)                                                                           │
│     /home/abuild/rpmbuild/BUILD/os2-0.1.0-alpha13/pkg/server/server.go:152 +0x42e                                                                                                          │
│ github.com/rancher/os2/pkg/server.(*InventoryServer).ServeHTTP(0xc000c4c600, 0x2138860, 0xc000b80a80, 0xc000784b00)                                                                        │
│     /home/abuild/rpmbuild/BUILD/os2-0.1.0-alpha13/pkg/server/server.go:105 +0xa5                                                                                                           │
│ github.com/rancher/steve/pkg/auth.ToMiddleware.func1.1(0x2138860, 0xc000b80a80, 0xc000784900)                                                                                              │
│     /home/abuild/rpmbuild/BUILD/os2-0.1.0-alpha13/vendor/github.com/rancher/steve/pkg/auth/filter.go:167 +0x1c7                                                                            │
│ net/http.HandlerFunc.ServeHTTP(0xc000ba1500, 0x2138860, 0xc000b80a80, 0xc000784900)                                                                                                        │
│     /usr/lib64/go/1.16/src/net/http/server.go:2069 +0x44                                                                                                                                   │
│ net/http.serverHandler.ServeHTTP(0xc000b80540, 0x2138860, 0xc000b80a80, 0xc000784900)                                                                                                      │
│     /usr/lib64/go/1.16/src/net/http/server.go:2887 +0xa3                                                                                                                                   │
│ net/http.(*conn).serve(0xc0004425a0, 0x213bf40, 0xc000b58c40)                                                                                                                              │
│     /usr/lib64/go/1.16/src/net/http/server.go:1952 +0x8cd                                                                                                                                  │
│ created by net/http.(*Server).Serve                                                                                                                                                        │
│     /usr/lib64/go/1.16/src/net/http/server.go:3013 +0x39b         

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.