bottlerocket-os / bottlerocket Goto Github PK

View Code? Open in Web Editor NEW

8.2K 127.0 486.0 14.28 MB

An operating system designed for hosting containers

Home Page: https://bottlerocket.dev

License: Other

Shell 1.96% Rust 95.53% Go 2.51%

operating-system containers linux rust

bottlerocket's People

Contributors

Stargazers

Watchers

Forkers

etungsten tjkirch zmrow jhaynes aruiz kprepos osde8info kiss2u jahkeup dastbe zoosky deepinthought sftim hyunsukgo grail jasonswindle youjio rckasa itshabib bcressey potatogim edneville sohilladhani rjammala kennethehmsen xorilog vdt westonsteimel bia everesio shiloong shaneutt mikesir87 raghu999 keithc-ca luxas nikileshsa leemgs 0r0i mkrupczak3 tianyax zggl novemberde hide5stm shivamshukla01 raymondseger konduktor4ik exploitone1 micwade-aws awesome-archive leandrocostam todokku tuapuikia hhy5277 chenlilun 5l1v3r1 mbakar01 inductor doytsujin moneytech skymysky mikeroyal rheehot huangweiboy2 boilerplate4u joeledwards siemantic hiejulia luzhcs mbrukman xf05888 jake-bladt swipswaps napsolutions smoser heinrichsmythe yangweng elmergonzalezb zgen0623 ubi-mirrors altoplano spread0x samuelkarp hasheddan cargorust philm pahud virmanig dhaniram-kshirsagar goldenraycoin geoffreyporto oats87 ckatsak miss-bug treyhyde ops-l tbble ecpullen arryboom d3v3l0

bottlerocket's Issues

Define initial OpenAPI spec

As a first step to building the client (#26), we need to define the OpenAPI spec.

Integrate k8s and EKS related bits into Thar

After investigation / discussion by @bcressey , @tjkirch , and @zmrow we've come up with the action items to make this happen:

Investigate the minimum settings needed to make k8s run pods
Add k8s settings to models in apiserver
Build a tool to handle settings that need to be dynamically set at runtime (Sundog, implemented in #49 )
Build a "setting generator" program that knows how to fetch k8s related settings at runtime
Add base64 decode helper to template library to render decoded base64 text

Prototype: data store migration and helpers

create a migration trait
write a prototype data store migration that implements the trait
create a CLI shell for migrations
write migration helpers for running the migration against given data/metadata

Build and document the on box update workflow

#28 has most of this.

%.makepkg isn't regenerated when sources or Cargo.lock changes

Now that spec2pkg generates rules to automatically download files, including those dynamically referenced from a Cargo.lock, it should be re-run whenever a sources file or a Cargo.lock (if present) is updated.

cc @bcressey who might have some ideas about how to do this, since I don't want to make a sources or Cargo.lock file required for any given package (filesystem should continue to have neither).

Templating of service restart commands

Feedback from #7:

We need templating of restart commands in order to handle commands with arguments that are based on changed settings.

For example, instead of reading /etc/hostname, systemd's hostnamectl accepts the new hostname as an argument. To do this, we'd want to have a template command like hostnamectl set-hostname {{ settings.hostname }}

Migration: OS integration

Integrate the migrations from #34 and migrator from #62 into the OS.

A service for the migrator to run early during boot: @tjk
Create the default data store with appropriate links, not just as a directory (in apiserver, or in service if done after #119): @zmrow
Data store format indicator in the image: @tjk
Update code to reference /current instead of /v1 for data store path
Automated cleanup of old data stores and migrations

tough: Signature verification fails if ignored data is present

If a metadata file contains a key that we ignore, signature verification will fail because it will not be present when we re-serialize the data to verify the signature.

Actually implement canonical JSON (#111)
Add a "ignored data" map to all the structs (#145)

Extend API server to allow common runc and containerd config options

Using Wants not Requires on configured.target

(See #96)

The kubelet, docker, and containerd service files have dependencies on configured.target that are expressed with Wants instead of Requires.

This means that if configured.target fails at boot, kubelet and friends will start, even though we know we haven't generated the configuration files they need.

(This was done to avoid an even worse problem - if we used Requires, then kubelet and friends would stop or restart if the configuration utilities stopped or restarted; the configuration utilities should definitely not have an impact on the runtime of kubelet and friends once started.)

We couldn't find a way to express what we want in systemd, but we should continue research to show that it doesn't actually exist, and then consider our possible approaches:

augment systemd
wrap kubelet (and friends) to add domain knowledge
your idea here?

'sources' filename isn't checked

I was working on updating Rust and I updated the checksums in the sources file but forgot to update the filenames. I had deleted the files matching those filenames. The package still built. I think it should fail to build if the filenames in sources don't exist or the checksums don't match what's given for an exact filename.

systemd target representing configured system

We need a systemd target representing the point at which all userdata (moondog) and dynamic (sundog) configuration is complete. This is what services like k8s and EKS will depend on, because they need the user's cluster settings to be fully applied to the system before starting.

Create target
Update k8s/EKS service units to depend on it, in addition to network or other existing dependencies

Design Admin container

This should include a narrative on how users will interact with the OS including how they will access host logs, communicate with the API server, debug currently running containers and otherwise introspect the running system.

Migration: API integration

Assuming the API will expose system update status, we will want to expose migration status as part of it - at least as a potential failure reason.

Migration: build system integration

Figure out which migrations should be built and included in the image (could start with a hardcoded list we keep up to date?)
Build the relevant migrations during the build process, each of which will be a Cargo project in a directory under workspaces/api/migration/migrations/vX.Y
Rename the built binaries to include the version from the path (see above) to fit migration conventions. (The final name would be invalid to Cargo, so it must be a rename.)
Install the migrations to /var/lib/thar/datastore/migrations in the image.

Build system should pass --nocheck to rpmbuild when target arch does not match host arch

#7 (comment)

tough: Be more forgiving if the cached root.json is corrupt

The caching is outside of the spec so we have to be careful.

What does happen if the cached root is corrupt? It doesn't mean the datastore.read would necessarily fail, but it still might not verify below, and we don't retry with the given root.
What if the user gave a new root on purpose because they know the old/cached one is bad/outdated? We're ignoring them.

What if we load and verify both and take the higher version number? If the user-supplied root doesn't validate itself, that seems like an Error regardless of the cache; if the cached root doesn't validate itself we should probably remove it and continue with the user-supplied root; if neither validates, obviously Error.

Originally posted by @tjkirch in #38 (comment)

Design the update "server" and "client"

Single settings commit at boot

We should update moondog and sundog so they don't both commit their settings changes at boot, and instead introduce a tiny new service that commits settings. It will depend on moondog and sundog, and the new target from #77 will depend on it, so ordering will be preserved.

Come up with name for committer service
Move commit code from moondog/sundog to this service
Add service and make target from #77 depend on it

tbs: base64 helper should fail if multiple params are given

The base64 helper is built to decode a single parameter. This makes sense in the context of our templates because each parameter is a setting. If a user needs to decode multiple parameters (settings) it should be done in multiple blocks, i.e {{base64_decode setting1}} : {{base64_decode setting2}}.

Expected behavior
The helper should fail if multiple params are passed to it: {{base64_decode setting1 setting2}}.

Current behavior
The helper will decode the first param in the list and happily carry on with life

moondog: Only run at first boot

Moondog should be able to track the state of the current boot. If the current boot is not the first boot, exit and don't set settings from userdata.

tough: Don't require the developer to check data store path permissions

We currently have to remember to call this. What if we make new() take Into<CheckedPath>, write impl TryInto<CheckedPath> for AsRef<Path> which would call check_permissions, and AsRef<Path> for CheckedPath? And then add join to CheckedPath, which would also check_permissions. No way to forget then.

Originally posted by @tjkirch in #38 (comment)

tbs: Templates with helpers that require params should not render if no params are supplied

Helpers with required params (base64_decode for example) should fail the template rendering process if no params are given. Currently (with Handlebars 1.1), templates render but don't include the value (because one wasn't supplied).

Helper base64_decode requires one param, i.e {{base64_decode param}}. A template containing {{base64_decode}}` will render, which is undesired behavior.

Migration runner

We need a tool to run the migrations created using #44. At a high level, it needs to:

Take the data store path to migrate, and the version to migrate to
Find appropriate migrations on disk (perhaps being told some potential locations)
Run the migrations on a copy of the data store
If successful, flip the data store link to the copy

Generate all API models using OpenAPI

As part of #26, we may think about forking/writing our own generator for the API client. As part of that effort, it may be a great idea to also write a slimmed-down generator specifically for our API models.

This generator would create a Models library crate that contains only the models that apiserver and thar-be-settings use.

This would have multiple benefits:

Separation of concerns: the apiserver shouldn't need to be coupled to what is in the datastore. It only needs to know how to serve it.
Adding new settings to the API is as simple as adding them to the OpenAPI yaml and re-generating.
Less mucking about in the apiserver Rust code (no need to create structs, etc.)

tough: Add logging

We need some kind of output/logging story so we can output/log when this happens :)

Originally posted by @tjkirch in #38 (comment)

Templates with helpers aren't parsed correctly

Configuration file templates are parsed for settings in thar-be-settings. These settings are used to query the API for their associated values. Following the update to Handlebars 2.x, this parsing is broken.

base64_decode is a helper in both examples below.

Current behavior
Parsing template {{base64_decode settings.hostname}} returns base64_decode

Expected behavior
Parsing template {{base64_decode settings.hostname}} should return setting settings.hostname.

Handle k8s settings changes after boot

Our default settings are missing restart commands for kubernetes. We need to determine the effect of updating the kubernetes settings in terms of system services that need to restart, and then redesign the "services" layout in the default settings, adding restart-commands if needed.

Unfriendly error on /settings/pending when there's nothing pending

This 500 error is not very friendly, making it sound like you've done something wrong when requesting pending settings, when in fact there just are no pending settings. We should return an empty Settings, e.g. {}

$ curl 'localhost:4242/settings/pending'
{"description":"Found no 'settings' in datastore"}

tough: Figure out how to make hash types forward compatible

Originally from this thread: #38 (comment)

Consider making sha256 an Option<Decoded<Hex>> and providing code for making that easy to use throughout the library.

This is harder than just changing the internals, because the pub Target struct currently provides the hash information. Maybe it doesn't need to in order to be useful, since this library provides hash checking.

tough: fetch/io functions are a mess

Parameter ordering for functions in the fetch and io modules are a complete mess and need to be refactored.

tough: Implement "latest known time"

I'd recommend we actually implement the "latest known time" bit referenced in the spec; if we come up without NTP access and the system clock is behind, we'd be vulnerable to rollback attacks, unless we store the latest known time and take that into account.

Originally posted by @tjkirch in #38 (comment)

tough: HTTP retries

On connection resets, the Reader needs to retry the HTTP connection for the rest of the range (similar to wget’s default behavior).

Template parsing should get keys from conditional statements

Parsing templates for keys will not return keys embedded in conditional expressions.

For example, parsing the following will return a single key [bridge_ip], but should return [bridge_ip, foo]:

{{#if foo}}
{{bridge_ip}}
{{/if}}

Build a POC OpenAPI client

Investigate and use OpenAPI to build a POC client for the API.

Add a readme

We need an overall readme for the project that we can iterate on. We also need component specific documentation within each rust project and as part of the build system.

Script to create AMI from built image

We need a script to build an AMI from a Thar image. This may just be for the short-term, depending on how we continue with build automation and what other tools become available.

tough: Do we have to make a thing out of URLs ending with slashes?

Do we have to make a thing out of URLs ending with slashes? That seems like something that url.join would do for us, or we could do ourselves to remove an error variant and this unfriendly warning.

Originally posted by @tjkirch in #38 (comment)

tough: Perhaps allow for long-running update processes

What's the intended usage pattern for Repository? It seems like it does all of the work for TUF's client application detailed workflow during its load, and then the primary interface is targets to see what's available and read_target to fetch something.

This seems OK for one-shot processes that know what they want, but not for long-running systems that want to check for updates in a known repository later.

Perhaps we should split creation? To me it seems like the phases are (1) load local root, (2) try updating remote root, (3) remote timestamp, (4) snapshot and targets, (5) goto 3 and do 2,4,5 as necessary? And convenience methods to do 1-5 and 3-5. The current workflow would be the 1-5 method. (Obviously the fixed expires would need to be dynamic, roles would need to be stored, etc. to make this work.)

Originally posted by @tjkirch in #38 (comment)

`signpost mark-successful-boot` needs to run at boot

Figure out when in the boot process this should run
Write a systemd unit

tough: Clean up error type

Split the error type into separate modules?
Clean up places where context is provided by string arguments and use enums instead

Migration: update system integration

Before rebooting for an update:

Pull migration list from the TUF metadata
Select relevant migrations for this update based on incoming/outgoing versions of update
Download (to persistent storage) and verify relevant migrations from metadata-specified location if they are not already included in the image (with matching checksum)
Cache TUF metadata required to verify the migration from the root.json installed on the verity partition
- Add functionality to record and retrieve older root.json versions in tough
Read the directory as a filesystem-based TUF repo when running migrations
Use pentacle to execute the trusted version as a sealed memfd

RFC design & in-depth proposal process

I think we should consider modeling a process much like rust-lang's rfcs project where motivations and proposed implementation/experience is outlined quite explicitly.

https://github.com/rust-lang/rfcs/

I've tried this out to some extent in #35 and I think I like it for our process of deliberation.

What do other folks think?

Add runc and containerd packages to Thar

Signpost should be capable of being used as a library

As the title suggests, update Signpost so we can use pieces of it elsewhere.

Metadata kind as enum

Feedback from #7:

The md field of Metadata (in the apiserver model) could be an enum of known metadata variants (like "AffectedServices") instead of a string. This could help us type-check different kinds of metadata for safety.

Migration: handle data store format changes

i.e. "major" version changes, for example if we change from a filesystem data store to SQLite or LMDB.

the migrator has to run migrations for major-version transitions (because we need to finish any minor version migrations before the major, and there may be more minor after the major)
the migrator has to handle major version links
research other changes that need to be made

thar-be-settings doesn't handle restart command status correctly (also logging output will be ugly)

https://github.com/amazonlinux/PRIVATE-thar/blob/ad9e9b1a9cc03dae0579d33ff9b32b4fa201a9bb/workspaces/api/thar-be-settings/src/service.rs#L115-L120
In the snippet above, we are correctly handling the case where a command simply fails to run (nonexistent command, etc), but we are not handling the case where the command runs but its status is nonzero.

https://github.com/amazonlinux/PRIVATE-thar/blob/ad9e9b1a9cc03dae0579d33ff9b32b4fa201a9bb/workspaces/api/thar-be-settings/src/service.rs#L121-L122
Also, the above lines log the stdout and stderr directly. As it turns out, both of them are type Vec<u8> which will log a giant ugly string of numbers.

Migration: migration helper functions

As we start writing data store migrations, we should add more functions to the helper library so they become easier and easier to write.

One helper function I expect will be commonly used is defaults_for, which will return the default values for a given subtree of settings. This could be useful in a few cases:

when we're adding a new subtree, assuming the apiserver or another component don't populate new subtrees with defaults automatically
if we know the old values for a tree that's changing aren't useful and we just want to replace them with the new defauls
if the old values are useful but we still need the new defaults to fill everything in appropriately

API server model: prevent serialization of nested empty structs

In this code:

https://github.com/amazonlinux/PRIVATE-thar/blob/8044c6cee5b170fe0a40f2c64088f696c830f49f/workspaces/api/apiserver/src/model.rs#L9-L34

and given this struct:

Settings {
    timezone: None,
    hostname: None,
    docker: Some(DockerSettings {
        bridge_ip: None,
    },
}

this would be serialized in JSON as {"docker": {}} when it seems like it should serialize as {}.

Some options I can think of:

Make Settings.docker a DockerSettings instead of an Option<DockerSettings>, and #[derive(Default)] (or a more specific impl) for DockerSettings so that it can be deserialized when the key isn't present
Keep the struct the same, but change #[serde(skip_serializing_if = "Option::is_none")] on Settings.docker to a new function, DockerSettings::is_all_none (or skip_serialize or whatever seems right) so that serialization is skipped.

I think I like the first option better; if every key in every settings struct is always an Option<T>, it's redundant for the substructs themselves to be optional as well.

bottlerocket-os / bottlerocket Goto Github PK

bottlerocket's People

Contributors

Stargazers

Watchers

Forkers

bottlerocket's Issues

Recommend Projects

Recommend Topics

Recommend Org