janma / nomad-driver-nspawn Goto Github PK
View Code? Open in Web Editor NEWA Nomad task driver for systemd-nspawn
License: MIT License
A Nomad task driver for systemd-nspawn
License: MIT License
Errors in task logs:
Jul 03, '23 00:25:31 -0700 | Driver Failure | rpc error: code = Unknown desc = systemd-nspawn failed to start task
Jul 03, '23 00:25:31 -0700 | Driver | Failed to parse UID "auto": Invalid argument
I basically used modified nginx example:
job "nginx" {
datacenters = ["dc1"]
type = "service"
group "linux" {
count = 1
network {
port "http" {
static = "8080"
to = "80"
}
}
task "nginx" {
driver = "nspawn"
config {
image = "/home/sergei/projs/nomad/Nginx/image"
resolv_conf = "copy-host"
command = ["/bin/bash", "-c", "nginx && tail -f /var/log/nginx/access.log " ]
boot = false
process_two = true
ports = ["http"]
}
}
}
}
I can't quite figure out what parameter UID belongs to.
I'm just wondering if it's possible to get CSI working with nspawn. Based on my limited knowledge about how CSI works, it should be possible since all that would need to be done would to bind mount the temporary directory created by the CSI driver to the correct place in the container.
Do you have any plans to implement this?
Support something like this:
group "example" {
count = 1
volume "nix-store" {
type = "host"
source = "nix-store"
read_only = true
}
task "example" {
driver = "nspawn"
volume_mount {
volume = "nix-store"
destination = "/nix/store"
}
config {
image = "debian"
}
}
It would be implemented the same as the current bind
and bind_read_only
syntax. It sounds easy to add.
Happy to send a PR for this
https://www.nomadproject.io/docs/job-specification/volume
https://www.nomadproject.io/docs/job-specification/volume_mount
I guess we can only implement the host
driver; as I dont know if CSI ingetrates with nspawn
When starting a job that exits immediately (e.g. due to wrong config in nginx) the driver think the job is failed because it doesn't show up in machinectl :
2020-07-11T14:18:13.777+0200 [ERROR] client.driver_mgr.nomad-driver-nspawn: failed to get machine information: driver=nspawn @module=nspawn error="timed out while getting machine properties: No machine 'example-6e0636df-97ce-5a06-b354-ce79a7b7c63a' known" timestamp=2020-07-11T14:18:13.776+0200
because probably the service is already gone at that point; and will not show up in machinectl
Instead I would expect logs to show up when viewing the alloc
in nomad, but those are now empty because the allocation is never considered succesfully created
ID Node ID Task Group Version Desired Status Created Modified
694b6326 430116cf example 0 run failed 2m10s ago 1m39s
nomad alloc logs 694b6326
Error reading file: Unexpected response code: 404 (task "example" not started yet. No logs available)
This makes it very hard to debug to figure out why an nspawn
command failed.
Hey! Just found your project! Cool!
I've been working on something related https://github.com/arianvp/nomad-driver-systemd (though it's a bit abandoned and i was planning to work on it again). Namely scheduling systemd units using nomad. Systemd units have all the same isolation primitives as systemd-nspawn containers; and there is also https://systemd.io/PORTABLE_SERVICES/ which is a new feature in systemd for container-like workloads.
I was wondering; would you accept a PR for supporting running regular systemd services and portable services into this project? I think the projects are similar enough that fragmentation doesn't make sense here.
Hi there,
thanks for this nomad plugin. I wanted to try out this driver but I get the above error when allocating job.
I can see in the driver's code when the error is raised, but I'm not able to understand why it fails to get the machine address.
Could you advise me on why is that happening?
I have nspawn installed and I was able to successfully boot and play around with nspawn container outside of nomad.
I'm running all this on a fresh Ubuntu 20.04 instance. Is there anything on my machine I could check that would provide more info?
job "debian" {
type = "service"
datacenters = ["dc1"]
group "debian" {
task "debian" {
driver = "nspawn"
config {
image = "debian-buster"
resolv_conf = "copy-host"
image_download {
url = "https://nspawn.org/storage/debian/buster/tar/image.tar.xz"
}
}
resources {
cpu = 2000
memory = 1024
}
}
}
}
nomad alloc status
Recent Events:
Time Type Description
2020-09-05T21:59:05+02:00 Alloc Unhealthy Unhealthy because of failed task
2020-09-05T21:59:05+02:00 Not Restarting Error was unrecoverable
2020-09-05T21:59:05+02:00 Driver Failure rpc error: code = Unknown desc = timed out while getting machine addresses:
2020-09-05T21:59:05+02:00 Driver buster login:
2020-09-05T21:59:05+02:00 Driver Failed to set alternative interface name 've-debian-b594aee4-72b5-aa4d-12b8-3178280d1d08' to 've-debian-buTh1', ignoring: Operation not supported
2020-09-05T21:58:32+02:00 Driver Downloading image
2020-09-05T21:58:32+02:00 Task Setup Building Task Directory
Hello there,
Thanks for creating this nspawn driver for nomad! I've been playing around with it this weekend :)
I tried to enable read_only
/ volatile
in a container and it refused to start, due to this validation rule:
nomad-driver-nspawn/nspawn/nspawn.go
Line 224 in 0217d36
I've looked online and couldn't find any info about this, so I'm curious about your experience with these flags interacting with -U
. I've not tried volatile
because of a lack of suitable containers, but running systemd-nspawn
directly with -U --read-only
seems to work fine. Has there been a recently change in systemd that made this combination work?
Thanks,
xkxx
It seems that restarting Nomad service (for example when upgrading Nomad or reloading configuration) restarts jobs run by nspawn driver. Docker jobs stay alive and are not restarted when restarting Nomad.
I observed the following errors in logs:
2020-10-26T19:28:26.592+0100 [ERROR] client.alloc_runner.task_runner: error recovering task; cleaning up: alloc_id=2fb194a7-5964-07f6-e9da-b3c09abfb3a5 task= error="rpc error: code = Unknown desc = failed to decode driver config: EOF" task_id=2fb194a7-5964-07f6-e9da-b3c09abfb3a5//0af76d99
2020-10-26T19:28:26.592+0100 [WARN] client.alloc_runner.task_runner: error destroying unrecoverable task: alloc_id=2fb194a7-5964-07f6-e9da-b3c09abfb3a5 task= error="rpc error: code = Unknown desc = task not found for given id" task_id=2fb194a7-5964-07f6-e9da-b3c09abfb3a5//0af76d99
Failed jobs are then reallocated and run fine, however, it's undesirable that they are restarted.
Would it be hard to support that?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.