Giter VIP home page Giter VIP logo

elemental-operator's Introduction

Rancher

This file is auto-generated from README-template.md, please make any changes there.

Build Status Docker Pulls Go Report Card

Rancher is an open source container management platform built for organizations that deploy containers in production. Rancher makes it easy to run Kubernetes everywhere, meet IT requirements, and empower DevOps teams.

Latest Release

  • v2.8
    • Latest - v2.8.5 - rancher/rancher:v2.8.5 / rancher/rancher:latest - Read the full release notes.
    • Stable - v2.8.5 - rancher/rancher:v2.8.5 / rancher/rancher:stable - Read the full release notes.
  • v2.7
    • Latest - v2.7.10 - rancher/rancher:v2.7.10 - Read the full release notes.
    • Stable - v2.7.10 - rancher/rancher:v2.7.10 - Read the full release notes.
  • v2.6
    • Latest - v2.6.14 - rancher/rancher:v2.6.14 - Read the full release notes.
    • Stable - v2.6.14 - rancher/rancher:v2.6.14 - Read the full release notes.

To get automated notifications of our latest release, you can watch the announcements category in our forums, or subscribe to the RSS feed https://forums.rancher.com/c/announcements.rss.

Quick Start

sudo docker run -d --restart=unless-stopped -p 80:80 -p 443:443 --privileged rancher/rancher

Open your browser to https://localhost

Installation

See Installing/Upgrading Rancher for all installation options.

Minimum Requirements

  • Operating Systems
    • Please see Support Matrix for specific OS versions for each Rancher version. Note that the link will default to the support matrix for the latest version of Rancher. Use the left navigation menu to select a different Rancher version.
  • Hardware & Software

Using Rancher

To learn more about using Rancher, please refer to our Rancher Documentation.

Source Code

This repo is a meta-repo used for packaging and contains the majority of Rancher codebase. For other Rancher projects and modules, see go.mod for the full list.

Rancher also includes other open source libraries and projects, see go.mod for the full list.

Build configuration

Refer to the build docs on how to customize the building and packaging of Rancher.

Support, Discussion, and Community

If you need any help with Rancher, please join us at either our Rancher forums or Slack where most of our team hangs out at.

Please submit any Rancher bugs, issues, and feature requests to rancher/rancher.

For security issues, please first check our security policy and email [email protected] instead of posting a public issue in GitHub. You may (but are not required to) use the GPG key located on Keybase.

License

Copyright (c) 2014-2024 Rancher Labs, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

elemental-operator's People

Contributors

alexander-demicev avatar anmazzotti avatar davidcassany avatar dependabot[bot] avatar fgiudici avatar frelon avatar ibuildthecloud avatar kkaempf avatar ldevulder avatar mbologna avatar mjura avatar mudler avatar paynejacob avatar rdoxenham avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elemental-operator's Issues

[operator] what is the --image-operator flag even for??

  • Its mandatory
  • It also has a default value, so why is it mandatory?
  • the default is a non-existent image (rancher/elemental-operator)
  • The config of the image is never used
  • We already have the operator binary running, what is the image for?

Goes like this:

  • Asks for image to be provided
  • Has default image, dont care
  • Default image doesnt exists
  • Proceeds to ignore the image completely
  • Refuses to elaborate

Chad flag, doesnt care about its value but it needs to be specified LOL

IMHO, drop that flag it makes no sense anywhere!

"Failed creating partition: exit status 1" - log missing essential information

Output from journalctl -u elemental-operator after booting an ISO

-- Logs begin at Fri 2022-07-15 06:15:20 UTC, end at Fri 2022-07-15 06:18:02 UTC. --
Jul 15 06:15:40 localhost systemd[1]: Starting Elemental Operator Registration...
Jul 15 06:15:40 localhost elemental-operator[1600]: time="2022-07-15T06:15:40Z" level=info msg="Using TPMHash 1c9b6e73145c651a5cb73ee0eed13c2300effb030f4b7dbfefe5e7a63db96308 to dial wss://172.17.0.2/elemental/registration/b5msj66rsjm9twbnjnz2vsswpsr76w4hvl2mbn58jp9xn8cc94txsr"
Jul 15 06:15:40 localhost elemental-operator[1600]: Install environment:
Jul 15 06:15:40 localhost elemental-operator[1600]: ELEMENTAL_INSTALL_CLOUD_INIT=/tmp/2415592138.yip
Jul 15 06:15:40 localhost elemental-operator[1600]: ELEMENTAL_DEBUG=true
Jul 15 06:15:40 localhost elemental-operator[1600]: ELEMENTAL_INSTALL_REBOOT=true
Jul 15 06:15:40 localhost elemental-operator[1600]: ELEMENTAL_INSTALL_PASSWORD=<removed>
Jul 15 06:15:40 localhost elemental-operator[1600]: ELEMENTAL_INSTALL_TARGET=/dev/sda
Jul 15 06:15:40 localhost elemental-operator[1600]: ELEMENTAL_REGISTRATION_URL=https://172.17.0.2/elemental/registration/b5msj66rsjm9twbnjnz2vsswpsr76w4hvl2mbn58jp9xn8cc94txsr
Jul 15 06:15:40 localhost elemental-operator[1600]: ELEMENTAL_REGISTRATION_CA_CERT=-----BEGIN CERTIFICATE-----
Jul 15 06:15:40 localhost elemental-operator[1600]: MIIBpzCCAU2gAwIBAgIBADAKBggqhkjOPQQDAjA7MRwwGgYDVQQKExNkeW5hbWlj
Jul 15 06:15:40 localhost elemental-operator[1600]: bGlzdGVuZXItb3JnMRswGQYDVQQDExJkeW5hbWljbGlzdGVuZXItY2EwHhcNMjIw
Jul 15 06:15:40 localhost elemental-operator[1600]: NzA1MDcwOTU2WhcNMzIwNzAyMDcwOTU2WjA7MRwwGgYDVQQKExNkeW5hbWljbGlz
Jul 15 06:15:40 localhost elemental-operator[1600]: dGVuZXItb3JnMRswGQYDVQQDExJkeW5hbWljbGlzdGVuZXItY2EwWTATBgcqhkjO
Jul 15 06:15:40 localhost elemental-operator[1600]: PQIBBggqhkjOPQMBBwNCAAS2wzKIZ5azTvfTMHc4+Zrmy9efM3ndy2ocgycwTCVB
Jul 15 06:15:40 localhost elemental-operator[1600]: rBdbRGH7t1wjGHtpaYh9CR2AflK34+CQ/nibHJLGzH8Go0IwQDAOBgNVHQ8BAf8E
Jul 15 06:15:40 localhost elemental-operator[1600]: BAMCAqQwDwYDVR0TAQH/BAUwAwEB/zAdBgNVHQ4EFgQUa11N1nJQbJ8YX6MtGjva
Jul 15 06:15:40 localhost elemental-operator[1600]: X4KvRIUwCgYIKoZIzj0EAwIDSAAwRQIgNHzToz74FitImN5T4uCTdmOfYOKVwX0+
Jul 15 06:15:40 localhost elemental-operator[1600]: WyDGp78hxx8CIQDFThQWNi1j1JJTOiIJKXpqjrr4ndzWzsQHqRjveFqdBg==
Jul 15 06:15:40 localhost elemental-operator[1600]: -----END CERTIFICATE-----
Jul 15 06:15:40 localhost elemental-operator[1600]: ELEMENTAL_SYSTEM_AGENT_URL=https://172.17.0.2/k8s/clusters/local
Jul 15 06:15:40 localhost elemental-operator[1600]: ELEMENTAL_SYSTEM_AGENT_TOKEN=eyJhbGciOiJSUzI1NiIsImtpZCI6Ik1BU1FFdkhiVzI0ZFFKbF9TeDFNZzVtVHdlRHZURTZ0YmY4MHJCLWtxcjgifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJmbGVldC1kZWZhdWx0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6Im15LW5vZGUtdG9rZW4tazhxeGIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoibXktbm9kZSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImQ4MTIxOTA5LTJhNWEtNGNkZi05YjQzLWQyYTc1ZTgzZjZjZSIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDpmbGVldC1kZWZhdWx0Om15LW5vZGUifQ.esXZgq5-6vRPwjnXP3g9L-RzXQKrj0LPTc8oqQN5OUCYtr6PU7lbg-43zsN83_J-I89nmVvH2nwvD1BbucZ9B3U7aq_B2wPZ-_rRveUj4Xsbp-7DenXbS5rqXPjra8ieEhJ5Nr62sNPrPfDkm-lCtiu5olHFgroK2KWHfgrU3DizHqVEJfa2eJJ6ZA8hWMiYZ-JzAuDaoB2Q67FbAJhYRDtSXF7ei5KsfbOXb-FBdm94TDunN6qL2I-kgT06xR7_FWlvqLB5dpU7RFA0xN_aWF5dEj5UknZBenGLrgJRRdaQeDz_tHZmKSVBbviC9z7FoQde4ZHmrGaUplTzmN1F6g
Jul 15 06:15:40 localhost elemental-operator[1600]: ELEMENTAL_SYSTEM_AGENT_SECRET_NAME=m-qemu-standard-pc-q35-ich9-2009-not-specified
Jul 15 06:15:40 localhost elemental-operator[1600]: ELEMENTAL_SYSTEM_AGENT_SECRET_NAMESPACE=fleet-default
Jul 15 06:15:40 localhost elemental-operator[1651]: INFO[2022-07-15T06:15:40Z] Starting elemental version 0.0.15
Jul 15 06:15:40 localhost elemental-operator[1651]: INFO[2022-07-15T06:15:40Z] Install called
Jul 15 06:15:40 localhost elemental-operator[1651]: INFO[2022-07-15T06:15:40Z] Running before-install hook
Jul 15 06:15:40 localhost elemental-operator[1651]: INFO[2022-07-15T06:15:40Z] Partitioning device...
Jul 15 06:15:40 localhost elemental-operator[1651]: ERRO[2022-07-15T06:15:40Z] Failed creating partition: exit status 1
Jul 15 06:15:40 localhost elemental-operator[1651]: ERRO[2022-07-15T06:15:40Z] Failed creating bios partition
Jul 15 06:15:40 localhost elemental-operator[1651]: Error: 1 error occurred:
Jul 15 06:15:40 localhost elemental-operator[1651]:         * exit status 1
Jul 15 06:15:40 localhost systemd[1]: elemental-operator.service: Main process exited, code=exited, status=1/FAILURE
Jul 15 06:15:40 localhost elemental-operator[1600]: time="2022-07-15T06:15:40Z" level=fatal msg="failed to write elemental config: exit status 1"
Jul 15 06:15:40 localhost systemd[1]: elemental-operator.service: Failed with result 'exit-code'.
Jul 15 06:15:40 localhost systemd[1]: Failed to start Elemental Operator Registration.

There are several things missing from this log

  • "Partitioning device ..." should also print the actual device its partitioning
  • "Failed creating partition" should print the partition number
  • How is the partition created ? Is parted (or fdisk) being called ? What the full output of the partitioning tool ?

Chunked upload of machineInventory smbios data

(Was: elemental-installer to honor nginx's packet size limits rancher/elemental#137)

"we figured out that issue with the Bad Websocket on the ROS installer. NGINX ingress by default needs flags appended to allow for headers larger than 8k. The header we were sending (containing MachineInventory data) was ~17k. This could be a bit of an issue since it requires modifications to the default configuration of nginx ingress which both RKE1 and 2 default to."

See also: https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#large-client-header-buffers

Custom cloud-init support in machineregistration crd

The config.Config struct which ends up being the machine registration crd spec includes a free form map[string]interface meant to used as a way to provide custom cloud-init configuration. Currently elemental-operator does not process it, former ros-installer logic had some convoluted way to take it into account under some circumstances.

More specific elemental-operator creates its own cloud-init file for root password setup and root ssh keys setup (based on the config.Config.Elemental.Install data) and passes it to the elemental-cli as config.Config.Elemental.Install.ConfigURL. This logic has two consquences:

  • Setting configurl in machine registraion crd is ignored, as it is later on overwritten by the self-created cloud-init file path.
  • If config.Config.Data is stored to a file and passed as a cloud-init config to elemental-cli through the config.Config.Elemental.Install.ConfigURL then we would silently ignore the password and ssh-keys setup from config.Config.Elemental.Install, merging both cloud-init configs I don't think is reasonable as we support two different syntax, yip and cloud-init.

Probably what we should do is removing config.Config.Elemental.Install.Password and config.Config.Elemental.Install.SSHKeys from the Install and its associated logic and simply use config.Config.Data (or another free-from field) as the only cloud-init data source and instruct with documentation and examples how to create and set users. Less logic, more flexibility.

Also if we go that path I believe Data should belong to Elemental struct or even to Install struct and probably be renamed to something as obvious as CloudInit.

Any thoughts about the suggested approach are welcome ๐Ÿ˜„

Set hostname in cloud-config?

Previously, hostname property was allowed/set in cloud-config section to set the server hostname upon booting. Now this property is not allowed anymore, and machineName can be used for this if it is set only (this means that the default value of machineName will not be used to set the hostname if nothing is provided).
EDIT: on my last test the hostname is always localhost, so it seems that machineName is not used anymore to set the hostname? In fact after a reboot I have something like rancher-XXXXX as a hostname...

I tried to set machineName to something else: machineName: node-{{ trunc 4 .MachineID }} but in that case the automated installation is not executed, and MachineInventories is not created in Rancher Dashboard either. No error in the operator logs, I can see that Rancher Dashboard is contacted but nothin more.

We need a way to be able to set the hostname as setting machineName to something else doesn't seems to work (sounds like the clusterName property in Rancher, maybe machineName should be always auto-generated in readonly?). Maybe DHCP can bue used to set a hostname, but lot of customers may need to be able to force a hostname with a fixed IP.

Cloud config which is pulled from Kubernetes is unmarshalling into a struct

Previously we were unmarshalling the cloudConfig returned by Kubernetes into a map[string]interface{} so it could have been free-form, and more expecially when being marshalled down the line, it would keep also fields which are not part of elemental configuration (for example, extra configuration which are part of the cloud init syntax would be wiped out otherwise).

This card is about to move back to a freeform unmarshalling so we can keep also keys that are not part of our configuration structure.

Split elemental-operator server and client side in two different binaries

This card is to strip elemental-operator register subcommand into its own binary. Server side is installed in management cluster via a helm chart. The client side is something to include in elemental nodes OS to ensure they can eventually register and bootstrap, hence all the controller logic is not needed for nodes.

Panic when trying to add labels via elemental-operator when no labels exist

If I register a machine and don't specify any labels to add at creation time, future calls to elemental-operator register --label "mykey=myval" hang. If I look at the logs in the operator, I see:

2022/07/27 19:44:52 http: panic serving pipe: assignment to entry in nil map
goroutine 1585 [running]:
net/http.(*conn).serve.func1()
        /usr/local/go/src/net/http/server.go:1802 +0xb9
panic({0x1b1a3e0, 0x20359e0})
        /usr/local/go/src/runtime/panic.go:1047 +0x266
github.com/rancher/elemental-operator/pkg/server.(*InventoryServer).ServeHTTP(0xc0016cd100, {0x2071e70, 0xc000988000}, 0x2032db8)
        /src/pkg/server/register.go:88 +0x55e
github.com/rancher/steve/pkg/auth.ToMiddleware.func1.1({0x2071e70, 0xc000988000}, 0xc00063d300)
        /go/pkg/mod/github.com/rancher/[email protected]/pkg/auth/filter.go:167 +0x3d4
net/http.HandlerFunc.ServeHTTP(0x0, {0x2071e70, 0xc000988000}, 0x464b8e)
        /usr/local/go/src/net/http/server.go:2047 +0x2f
net/http.serverHandler.ServeHTTP({0xc002732630}, {0x2071e70, 0xc000988000}, 0xc00063d300)
        /usr/local/go/src/net/http/server.go:2879 +0x43b
net/http.(*conn).serve(0xc0009280a0, {0x207a700, 0xc001d59440})
        /usr/local/go/src/net/http/server.go:1930 +0xb08
created by net/http.(*Server).Serve
        /usr/local/go/src/net/http/server.go:3034 +0x4e8

It looks like we need to check for nil at https://github.com/rancher/elemental-operator/blob/main/pkg/server/register.go#L81

Operator writes config.yaml as JSON

After installing with elemental-operator there's /etc/rancher/elemental/agent/config.yaml with the following content

{"workDirectory":"/var/lib/elemental/agent/work","localEnabled":true,"localPlanDirectory":"/var/lib/elemental/agent/plans","appliedPlanDirectory":"/var/lib/elemental/agent/applied","remoteEnabled":true,"connectionInfoFile":"/var/lib/elemental/agent/elemental_connection.json"}

After node bootstrap, when rancher-system-agent takes over, it writes /etc/rancher/agent/config.yaml with

workDirectory: /var/lib/rancher/agent/work
appliedPlanDirectory: /var/lib/rancher/agent/applied
remoteEnabled: true
localEnabled: false
localPlanDirectory: /var/lib/rancher/agent/plans
preserveWorkDirectory: false
connectionInfoFile: /var/lib/rancher/agent/rancher2_connection_info.json

elemental-operator hangs after setting labels

system-agent calls elemental-operator as

elemental-operator register --label \"elemental.cattle.io/ExternalIP=$(hostname -I | awk '{print $1}')\" --label \"elemental.cattle.io/InternalIP=$(hostname -I | awk '{print $2}')\"

This results in respective labels to be added to machineInventory (good!), it also makes elemental-operator hang indefinitely (bad)

elemental-operator should, when called with --label only set labels and not try to call elemental install.
It also should return immediately after setting the labels.

"registration:" components get dropped

Running kubectl apply -f with this registration.yaml

apiVersion: elemental.cattle.io/v1beta1
kind: MachineRegistration
metadata:
  name: rpi-cluster-nodes
  namespace: fleet-default
spec:
  config:
    cloud-config:
      users:
      - name: root
        passwd: root
    elemental:
      registration:
        emulated-tpm: true
        emulate-tpm-seed: 4
        no-smbios: false
      install:
        automatic: true
        reboot: true
        debug: true
        device: /dev/sdb
  machineName: m-${System Information/Manufacturer}-${System Information/Product Name}-${System Information/UUID}

lets elemental-operator drop the registration: components. The Rancher manager UI only shows (note the empty registration value under spec.config.elemental)

apiVersion: elemental.cattle.io/v1beta1
kind: MachineRegistration
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"elemental.cattle.io/v1beta1","kind":"MachineRegistration","metadata":{"annotations":{},"name":"rpi-cluster-nodes","namespace":"fleet-default"},"spec":{"config":{"cloud-config":{"users":[{"name":"root","passwd":"root"}]},"elemental":{"install":{"automatic":true,"debug":true,"device":"/dev/sdb","reboot":true},"registration":{"emulate-tpm-seed":4,"emulated-tpm":true,"no-smbios":false}}},"machineName":"m-${System Information/Manufacturer}-${System Information/Product Name}-${System Information/UUID}"}}
  creationTimestamp: "2022-08-03T09:02:28Z"
  finalizers:
  - wrangler.cattle.io/machine-registration
  generation: 2
  managedFields:
  - apiVersion: elemental.cattle.io/v1beta1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .: {}
          v:"wrangler.cattle.io/machine-registration": {}
      f:spec:
        f:config:
          f:elemental:
            f:system-agent: {}
    manager: elemental-operator
    operation: Update
    time: "2022-08-03T09:02:28Z"
  - apiVersion: elemental.cattle.io/v1beta1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        .: {}
        f:conditions: {}
        f:registrationToken: {}
        f:registrationURL: {}
        f:serviceAccountRef:
          .: {}
          f:kind: {}
          f:name: {}
          f:namespace: {}
    manager: elemental-operator
    operation: Update
    subresource: status
    time: "2022-08-03T09:02:28Z"
  - apiVersion: elemental.cattle.io/v1beta1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
      f:spec:
        .: {}
        f:config:
          .: {}
          f:cloud-config:
            .: {}
            f:users: {}
          f:elemental:
            .: {}
            f:install:
              .: {}
              f:debug: {}
              f:device: {}
              f:reboot: {}
            f:registration: {}
        f:machineName: {}
    manager: kubectl-client-side-apply
    operation: Update
    time: "2022-08-03T09:02:28Z"
  name: rpi-cluster-nodes
  namespace: fleet-default
  resourceVersion: "15314"
  uid: 97801245-063d-4216-87dd-7d682199a7df
spec:
  config:
    cloud-config:
      users:
      - name: root
        passwd: root
    elemental:
      install:
        debug: true
        device: /dev/sdb
        reboot: true
      registration: {}
      system-agent: {}
  machineName: m-${System Information/Manufacturer}-${System Information/Product Name}-${System
    Information/UUID}
status:
  conditions:
  - lastUpdateTime: "2022-08-03T09:02:28Z"
    status: "True"
    type: Ready
  registrationToken: 4zgtpc66qkhhgvktsk75gzrqt4wrbpn2bptm4dvhf7s2xv8xh5rmk9
  registrationURL: https://192.168.0.33/elemental/registration/4zgtpc66qkhhgvktsk75gzrqt4wrbpn2bptm4dvhf7s2xv8xh5rmk9
  serviceAccountRef:
    kind: ServiceAccount
    name: rpi-cluster-nodes
    namespace: fleet-default

Add "smbios:" to machineRegistration options

When I download the machineRegistration, it already includes many of the possible #cloudConfig.elemental.install settings. ๐Ÿ‘

However, it is missing the smbios: setting (which should default to true)

metadata.labels lost its flexibility

Andrew's (successful) run with Jacob's code had

  machineInventoryLabels:
    clusterName: test
  machineName: m-${System Information/Manufacturer}-${System Information/Product Name}-${SystemInformation/UUID}

in its machineRegistration

However, with the current elemental-operator, the line above gets rejected with

โ”‚ time="2022-07-14T12:55:56Z" level=error msg="error creating machine inventory: MachineInventory.elemental.cattle.io \"m-qemu-standard-pc-q35-ich9-2009-not-specified\" is invalid: [metadata.labels: Invalid value: \"m-${SystemInformation/Manufacturer}-${System Information/Product Name}-${SystemInformation/UUID}\": must be no more than 63 characters

in the elemental-operator log in the management cluster.

Even shortening it to

 machineName: m-${SystemInformation/UUID}

still doesn't work

time="2022-07-14T15:25:10Z" level=error msg="error creating machine inventory: MachineInventory.elemental.cattle.io \"m-qemu-standard-pc-q35-ich9-2009-not-specified\" is invalid: metadata.labels: Invalid value: \"m-${SystemInformation/UUID}\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')" 

Config variables mapping to and from machine registration crd

the machine registration crd spec is what ends up populating the config.Config struct in elemental-operator, when registering and fetching all the data it looks like the data we retrieve does not match config.Config variable names. It is unclear to me how this marshal and unmarshal process works, but I am pretty convinced we have some divergences in there.

I also bet this could be related to #46 as we already have a noSMBIOS variable inside config.Config.Elemental.Registration.

Remove indirect dependencies

As far as I understand all indirect dependencies are not really needed? So why are they there? Should we run a go mod tidy and clear them up?

Labels on machine-inventory-selector-ready are doing nothing

There is this part of the code[0] in which we check if the inventory has the following labels:

  • elemental.cattle.io/ExternalIP
  • elemental.cattle.io/InternalIP
  • elemental.cattle.io/Hostname

In order to set them in the MachineInventorySelector.Status.Addresses which Im not sure when they end up at, but they might be propagated to cluster.x-k8s.io/v1beta1/Machine.Status.Addresses

The problem is that those labels are currently set by the bootstrap[1] and by that time the MachineInventorySelector.Ready is true, so it never reaches setting up the labels

Im not sure of the repercussions of this but I can notice that while we are setting the ExternalIP to whatever, on the machine it appears as InternalIP

It also seems like as part of the bootstrap we are not setting the elemental.cattle.io/Hostname so maybe that is the reason of the hostname change on restart?

So either drop that extra labeling on bootstrap that never reaches its purpose or rework the controller to always update labels in case we update labels afterwards.

[0] https://github.com/rancher/elemental-operator/blob/main/pkg/controllers/machineinventoryselector/controller.go#L116
[1] https://github.com/rancher/elemental-operator/blob/main/pkg/controllers/machineinventoryselector/bootstrap.go#L136

ca-cert in config file breaks SSL for real certs

If I have a real cert for my Rancher Manager (or something farther upstream is terminating the SSL), the inclusion of ca-cert breaks the registration command fails. While I can delete the key in the config that gets loaded in at install time, it's much harder to change the values that's loaded into /oem/registration/config.yaml for use in updating labels at run time.

What would be nice is to have a way to tell the registration endpoint to not pass ca-cert through.

elemental-system-agent vs. rancher-system-agent -- there's only place for one

elemental-operator register writes config for elemental-system-agent (built from github.com/rancher/system-agent)

The only thing elemental-system-agent does is to download and install rancher-system-agent (built, you guessed it, from github.com/rancher/system-agent)

๐Ÿคฆโ€โ™‚

We should understand what kind of configs elemental-operator writes, change them to be rancher-system-agent compatible and only start rancher-system-agent. See also #60

Background info: The "official" (?) way to download and start rancher-system-agent is a shell script, provided by the management cluster: curl -fL https://<rancher-url>/system-agent-install.sh

  • What does this script do ?
  • What does elemental-operator register do ?

Make sure we generate the correct helm chart

Problem

We have upstream operator images and OBS images, but we want to generate one chart only that can be used for both ( the image should be tweakable from the values file).

Action item

  • Make sure we build from OBS the chart pointing to the release tagged images
  • Make sure charts from github releases workflows are working as well
  • Have a "-dev" chart pointing to master
  • Setup CI workflow for releases (GH/OBS)
  • Make sure all images referenced in operator are actually existing after merge of #3
  • help @kkaempf with OBS helm chart packaging
  • publish helm chart on registry.suse.com

"reboot: true" isn't honored

Setting

  reboot: true

in the machineRegistration isn't honored by elemental-operator. It just exits after installation.

elemental-operator register --label "..." hangs

While setting labels at installation time works (see

"elemental-operator register --label \"elemental.cattle.io/ExternalIP=$(hostname -I | awk '{print $1}')\" --label \"elemental.cattle.io/InternalIP=$(hostname -I | awk '{print $2}')\"",
), I can't get a similar
elemental-operator register --label "..." call to work in the installed system.

All I see is the operator version and the (correct) tom hash with its wss://... url and then elemental-operator just hangs.

`CACert` is missing and `registrationURL` it's "incomplete" from the MachineInventory status spec

Currently the registrationURL generated doesn't include anymore the CACert url, and the registrationURL is incomplete as it doesn't populate automatically with the rancher settings.

Action Items

  • Bring back CACert to be printed out in the generated cloud config. If not provided it should print out the one from the rancher settings
  • Bring back code to retrieve the rancher host from the rancher settings

Create a new entry point in elemental-operator server to download the registration config required on the client side

This issue is a result of the discussion in #67 (comment)

Currently after adding a new machine registration crd into the management cluster a new registrationURL is dynamically created including the token required for this registration. To register a call of elemental-operator register is required on the client side. The client to perform such a registration requires the registrationURL and the associated certificate. To pass this two parameter we either use the command line elemental-operator register --registration-url https://... --registration-ca-cert <path|certificate_string> or by embedding this data into a config yaml file:

elemental:
  registration:
    url: https://...
    ca-cert: <certificate string>

For convenience, currently, we can instruct the elemental-operator service to generate this yaml by an unauthenticated get request to the registration URL. As noted in #67 (comment) this is odd and most probably a bad practice.

Currently the registrationURL looks like https://<hostname>/elemental/registration/<generatedtoken>. To obtain this URL we could do something like:

kubectl get -n fleet-default machineregistration.elemental.cattle.io <machineregistrationname> -o yaml

where the resulting yaml includes status.registrationURL. Then using this URL to run an unauthenticated get request responds with the registration config yaml including this same URL and the associated certificate.

Wouldn't it be better to implement a new entry point on the server side with something like

http://<hostname>/elemental/registration/config/<machineregistrationname>

that responds with

elemental:
 registration:
   url: https://...
   ca-cert: <certificate string>

and, as suggested in the mentioned comment, simply return 403 for unauthenticated calls to the registrationURL?

Allow for setting static IP address in MachineInventory

While this can be done with manually editing the system-agent "plan", it would be very useful to allow for setting a static IP address just by changing a field in the MachineInventory (even if the underlying data path is the same).

We would then likely need to have a status field stating that it's waiting for a change to be validated that goes back to "ready" after the application gets set and communicated back to the operator.

Use coud-init files instead of a unit service file for elemental-operator

Elemental-operator is called by an elemental-operator.service file included into the Elemental OS. From #50 investigations it looks like this has some potential to collide with other on going services. More specific it seams to have issues when executed before cos-setup-network.service ends (which happens like one or two minutes after reaching the login prompt). I am wondering if shouldn't we relay on calling elemental-operator from one of our yip stages (network.after?) for a more cloud-init centric and simpler boot process management. This could be extensible to other one shot services too, if any.

This is also to prevent having to run changes such as https://build.opensuse.org/request/show/990272 which are not obvious.

Checksum issue in MachineInventories

Environment:

  • Rancher 2.6.6
  • Operator version 0.4.2, commit f243498, commit date git20220802

Issue saw in the rancher/elemental E2E test: https://github.com/rancher/elemental/runs/7648734615?check_suite_focus=true

After manual test the issue is that after Elemental OS installation the new node doesn't bootstrap the cluster part. With a quick check I can see that there is an issue with the checksum in MachineInventories, the appliedChecksum and the checksum variables are different, and there is now failedChecksum variable (which is expected in that case):

apiVersion: elemental.cattle.io/v1beta1
kind: MachineInventory
metadata:
  creationTimestamp: "2022-08-03T09:44:13Z"
  generation: 1
  labels:
    cluster-id: id-cluster-k3s
    manager: elemental-operator
    operation: Update
    subresource: status
    time: "2022-08-03T09:46:13Z"
  name: m-qemu-standard-pc-q35-ich9-2009-55392308-8b8f-4a04-b3b9-6
  namespace: fleet-default
  ownerReferences:
  - apiVersion: elemental.cattle.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: MachineInventorySelector
    name: selector-cluster-k3s-nw4p6
    uid: 6432f38f-30f4-4905-9343-75519d0b7e56
  resourceVersion: "579181"
  uid: abc09ba2-65c1-4286-8346-2c61b7cd3e89
spec:
  tpmHash: 81ac219a55a63d5323da922872a5a27aee2f8cfb780dce328664ed0661f9a1e1
status:
  conditions:
  - lastUpdateTime: "2022-08-03T09:44:13Z"
    status: "True"
    type: Initialized
  - lastUpdateTime: "2022-08-03T09:46:13Z"
    status: "True"
    type: Ready
  - lastUpdateTime: "2022-08-03T09:46:13Z"
    status: "True"
    type: PlanReady
  plan:
    appliedChecksum: 44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
    checksum: 17079797e37c49cf93f87d645b21f1484a4b706097763e9b02e81ca98fa055b8
    failedChecksum: 17079797e37c49cf93f87d645b21f1484a4b706097763e9b02e81ca98fa055b8
    secretRef:
      name: m-qemu-standard-pc-q35-ich9-2009-55392308-8b8f-4a04-b3b9-6
      namespace: fleet-default

I have no issue with on ISO image generated on 20220801 with this operator version:

  • Operator version 0.4.1, commit 6b52b44, commit date git20220729

So the issue appears between the two version, but I wasn't able to find when exactly.

SMBIOS data missing from machineInventory

Although elemental-register --debug shows

...
      NoSMBIOS: false,
...

there's no SMBIOS data in the created machineInventory

apiVersion: elemental.cattle.io/v1beta1
kind: MachineInventory
metadata:
  creationTimestamp: "2022-08-03T09:11:58Z"
  generation: 1
  managedFields:
  - apiVersion: elemental.cattle.io/v1beta1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        .: {}
        f:tpmHash: {}
    manager: elemental-operator
    operation: Update
    time: "2022-08-03T09:11:58Z"
  - apiVersion: elemental.cattle.io/v1beta1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        .: {}
        f:conditions: {}
        f:plan:
          .: {}
          f:checksum: {}
          f:secretRef:
            .: {}
            f:name: {}
            f:namespace: {}
    manager: elemental-operator
    operation: Update
    subresource: status
    time: "2022-08-03T09:11:58Z"
  name: m-raspberrypi-rpi-30303031-3030-3030-6130-393533306132
  namespace: fleet-default
  resourceVersion: "18822"
  uid: 6b9d3e35-7866-4e8c-94a6-8a4f3cc18a10
spec:
  tpmHash: 2d4bb87828a11cd790634d33e51cfacc9ea1cbb7937eda4e9210eed9c7410f8b
status:
  conditions:
  - lastUpdateTime: "2022-08-03T09:11:58Z"
    status: "True"
    type: Initialized
  - lastUpdateTime: "2022-08-03T09:11:58Z"
    message: waiting for plan to be applied
    status: "False"
    type: Ready
  - lastUpdateTime: "2022-08-03T09:11:58Z"
    message: waiting for plan to be applied
    status: "False"
    type: PlanReady
  plan:
    checksum: 44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
    secretRef:
      name: m-raspberrypi-rpi-30303031-3030-3030-6130-393533306132
      namespace: fleet-default

"Fetched configuration from manager cluster:" is empty

Environment

  • Rancher 2.6.7-rc5
  • Operator version 0.4.2, commit f243498, commit date git20220802

Client node on KVM

live-cloud-config.yaml

elemental:
  registration:
    url: https://192.168.0.33/elemental/registration/52qcfrjxzww69q5hzt8kn62rprlw726qhc5srcjqhvr7hxwhmvmpsv
    ca-cert: |-
      -----BEGIN CERTIFICATE-----
      MIIBvjCCAWOgAwIBAgIBADAKBggqhkjOPQQDAjBGMRwwGgYDVQQKExNkeW5hbWlj
      bGlzdGVuZXItb3JnMSYwJAYDVQQDDB1keW5hbWljbGlzdGVuZXItY2FAMTY1OTUx
      NTM0NzAeFw0yMjA4MDMwODI5MDdaFw0zMjA3MzEwODI5MDdaMEYxHDAaBgNVBAoT
      E2R5bmFtaWNsaXN0ZW5lci1vcmcxJjAkBgNVBAMMHWR5bmFtaWNsaXN0ZW5lci1j
      YUAxNjU5NTE1MzQ3MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEjln0FIGi9jbg
      1o8K5wRqv13v41k2zyfCRrKnVYQ7YoOMJ7Y8DWzB6jYXcFdy9X02hsnU/ji4mOD1
      H/OIyZ6g3qNCMEAwDgYDVR0PAQH/BAQDAgKkMA8GA1UdEwEB/wQFMAMBAf8wHQYD
      VR0OBBYEFKmWVNdsdI+3Uzc/K2D5vKHtladEMAoGCCqGSM49BAMCA0kAMEYCIQC5
      4Q0+vZskKYwWGCusF4YovjLfVBvwJlrX2/fmAU9YnwIhAPchW2a8GgA1dtgVmhmk
      VK+z+TCHPBJkGzhYciFWFExu
      -----END CERTIFICATE-----

Client fails to read machineRegistration from management cluster:

# elemental-register --debug /run/initramfs/live
INFO[0000] Register version 0.4.2, commit f243498, commit date git20220802 
DEBU[0000] scanning config path /run/initramfs/live     
INFO[0000] reading config file /run/initramfs/live/livecd-cloud-config.yaml 
DEBU[0000] input config:
config.Config{
  Elemental: config.Elemental{
    Install: config.Install{
      Firmware: "",
      Device: "",
      NoFormat: false,
      ConfigURLs: nil,
      ISO: "",
      SystemURI: "",
      Debug: false,
      TTY: "",
      PowerOff: false,
      Reboot: false,
      EjectCD: false,
    },
    Registration: config.Registration{
      URL: "https://192.168.0.33/elemental/registration/52qcfrjxzww69q5hzt8kn62rprlw726qhc5srcjqhvr7hxwhmvmpsv",
      CACert: "-----BEGIN CERTIFICATE-----\nMIIBvjCCAWOgAwIBAgIBADAKBggqhkjOPQQDAjBGMRwwGgYDVQQKExNkeW5hbWlj\nbGlzdGVuZXItb3JnMSYwJAYDVQQDDB1keW5hbWljbGlzdGVuZXItY2FAMTY1OTUx\nNTM0NzAeFw0yMjA4MDMwODI5MDdaFw0zMjA3MzEwODI5MDdaMEYxHDAaBgNVBAoT\nE2R5bmFtaWNsaXN0ZW5lci1vcmcxJjAkBgNVBAMMHWR5bmFtaWNsaXN0ZW5lci1j\nYUAxNjU5NTE1MzQ3MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEjln0FIGi9jbg\n1o8K5wRqv13v41k2zyfCRrKnVYQ7YoOMJ7Y8DWzB6jYXcFdy9X02hsnU/ji4mOD1\nH/OIyZ6g3qNCMEAwDgYDVR0PAQH/BAQDAgKkMA8GA1UdEwEB/wQFMAMBAf8wHQYD\nVR0OBBYEFKmWVNdsdI+3Uzc/K2D5vKHtladEMAoGCCqGSM49BAMCA0kAMEYCIQC5\n4Q0+vZskKYwWGCusF4YovjLfVBvwJlrX2/fmAU9YnwIhAPchW2a8GgA1dtgVmhmk\nVK+z+TCHPBJkGzhYciFWFExu\n-----END CERTIFICATE-----",
      EmulateTPM: false,
      EmulatedTPMSeed: 1,
      NoSMBIOS: false,
      Labels: map[string]string{},
    },
    SystemAgent: config.SystemAgent{
      URL: "",
      Token: "",
      SecretName: "",
      SecretNamespace: "",
    },
  },
  CloudConfig: map[string]interface {}(nil),
} 
INFO[0000] Using TPMHash 1c9b6e73145c651a5cb73ee0eed13c2300effb030f4b7dbfefe5e7a63db96308 to dial wss://192.168.0.33/elemental/registration/52qcfrjxzww69q5hzt8kn62rprlw726qhc5srcjqhvr7hxwhmvmpsv 
DEBU[0000] Fetched configuration from manager cluster:

 
DEBU[0000] Computed environment variables:              
INFO[2022-08-03T08:50:18Z] Starting elemental version 0.0.15            
Error: at least a target device must be supplied
FATA[0000] failed calling elemental client: exit status 1 

Metrics endpoint for operator pod

It would be very helpful to expose some basic counters to a Prometheus (or OpenTelemetry) collector.

Some potential metrics off the top of my head:

  • Basic request latencies and active connections
  • Count of MachineInventory with labels
  • Count of MachineInventory without labels
  • Total count of machines registered ever (re-registrations count should double count)
  • Count of machines for each OS image tags
  • Average Time between onboarding machine and it's registration into a cluster

Friendlier URLs for registration endpoint

If something goes wrong with a deployment, it would be nice to have a way to specify a nicer URL that people need to type in. This could also help with identifying which cloud-init was run when a machine registered.

It might also be worth having a way to marking one of these as "default" to make the (maybe most common?) use case of only one registration endpoint being required.

Log version, github commit id, or build time

When elemental-operator starts up under Rancher Manager, this is what I see in its log

time="2022-07-13T11:44:57Z" level=info msg="Starting controller at namespace cattle-elemental-system. Upgrade sync interval at: 1h0m0s"                                                    โ”‚
time="2022-07-13T11:44:57Z" level=info msg="Applying CRD managedosimages.elemental.cattle.io"                                            
...

what I'd like to see is something like

time="2022-07-13T11:44:56Z" level=info msg="elemental-operator, version 0.1.1 @bcfe4d0, built 2022-Jul-13 11:32Z"
...

TPM emulation settings don't get copied to installed system

While I have

...
spec:
  config:
    cloud-config:
      users:
      - name: root
        passwd: root
    elemental:
      install:
        debug: true
        device: /dev/mmcblk1
        reboot: false
      registration:
        emulate-tpm: true
        emulated-tpm-seed: 3
        no-smbios: false
      system-agent: {}
  machineName: m-${System Information/Manufacturer}-${System Information/Product Name}-${System
    Information/UUID}
...

in the machineRegistration, /oem/registration/config.yaml (in the installed system) does not show the tpm emulation parameters:

elemental:
  registration:
    url: https://192.168.0.33/elemental/registration/pc8slmdvrpwh8tt5tlv79q8s74q2r27sgsz2pxv8bbcpz22kzjbzxm
    ca-cert: |-
      -----BEGIN CERTIFICATE-----
      MIIBpzCCAU2gAwIBAgIBADAKBggqhkjOPQQDAjA7MRwwGgYDVQQKExNkeW5hbWlj
      bGlzdGVuZXItb3JnMRswGQYDVQQDExJkeW5hbWljbGlzdGVuZXItY2EwHhcNMjIw
      ODAzMTQzMzMzWhcNMzIwNzMxMTQzMzMzWjA7MRwwGgYDVQQKExNkeW5hbWljbGlz
      dGVuZXItb3JnMRswGQYDVQQDExJkeW5hbWljbGlzdGVuZXItY2EwWTATBgcqhkjO
      PQIBBggqhkjOPQMBBwNCAASnKoQc7dt6p+cvQAN0HWuVkZ7CQ1xB+uEZ22d9yKnM
      6QnyJAQfohuzGBipibg02NReLnTkY1hu4T3zZOwSmPSzo0IwQDAOBgNVHQ8BAf8E
      BAMCAqQwDwYDVR0TAQH/BAUwAwEB/zAdBgNVHQ4EFgQUV9KbCuKnYr0V/2UH5u8c
      D5Qy6VQwCgYIKoZIzj0EAwIDSAAwRQIhAJHWr4XLCHfSvFK4EBijlTmpieeFxQME
      LPlrJxW+3S3wAiALM+MRedogV/C16tpJDlkOUhyhHAQMyE8KV0K/xDUmMg==
      -----END CERTIFICATE-----

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.