tritondatacenter / containerpilot Goto Github PK
View Code? Open in Web Editor NEWA service for autodiscovery and configuration of applications running in containers
License: Mozilla Public License 2.0
A service for autodiscovery and configuration of applications running in containers
License: Mozilla Public License 2.0
Update: I thought this was due to using the Catalog API rather than the Agent API but we are already using the Agent API. Needs a bit more investigation - perhaps my expectations are wrong.
The service definition that I get is:
{
"Node": "consul-server1",
"Address": "10.111.0.1",
"ServiceID": "my-service-8d462d139c39",
"ServiceName": "my-service",
"ServiceTags": [],
"ServiceAddress": "10.222.0.100",
"ServicePort": 3000,
"ServiceEnableTagOverride": false,
"CreateIndex": 2403,
"ModifyIndex": 2409
},
What I expect to get is:
{
"Node": "consul-agent1",
"Address": "10.222.1.100",
"ServiceID": "my-service-8d462d139c39",
"ServiceName": "my-service",
"ServiceTags": [],
"ServiceAddress": "10.222.0.100",
"ServicePort": 3000,
"ServiceEnableTagOverride": false,
"CreateIndex": 2403,
"ModifyIndex": 2409
},
I can't seem to get containerbuddy to use CONSUL_HTTP_TOKEN. Logging from the agent:
`Feb 11 14:07:12 host consul[18926]: agent: Synced service 'mesos-consul:xxxxxxxx:tst-logredis:31398'
Feb 11 14:08:41 host consul: 2016/02/11 14:08:41 [ERR] http: Request PUT /v1/agent/check/pass/containername-81eb092d0568?note=ok, error: CheckID does not have associated TTL from=172.17.0.3:55587
Feb 11 14:08:41 host consul[18926]: http: Request PUT /v1/agent/check/pass/containername-81eb092d0568?note=ok, error: CheckID does not have associated TTL from=172.17.0.3:55587
Feb 11 14:08:41 host consul: 2016/02/11 14:08:41 [WARN] agent: Service 'containername-81eb092d0568' registration blocked by ACLs
Feb 11 14:08:41 host consul[18926]: agent: Service 'containername-81eb092d0568' registration blocked by ACLs
Feb 11 14:08:41 host consul: 2016/02/11 14:08:41 [WARN] agent: Check 'containername-81eb092d0568' registration blocked by ACLs
Feb 11 14:08:41 host consul[18926]: agent: Check 'containername-81eb092d0568' registration blocked by ACLs
Feb 11 14:08:51 host consul: 2016/02/11 14:08:51 [WARN] agent: Check 'containername-81eb092d0568' registration blocked by ACLs
Feb 11 14:08:51 host consul[18926]: agent: Check 'containername-81eb092d0568' registration blocked by ACLs
Feb 11 14:08:53 host consul: 2016/02/11 14:08:53 [WARN] agent: Check 'containername-649ba536bd72' missed TTL, is now critical
Feb 11 14:08:53 host consul[18926]: agent: Check 'containername-649ba536bd72' missed TTL, is now critical
Feb 11 14:08:53 host consul[18926]: agent: Check 'containername-649ba536bd72' registration blocked by ACLs
Feb 11 14:08:53 host consul: 2016/02/11 14:08:53 [WARN] agent: Check 'containername-649ba536bd72' registration blocked by ACLs`
Am I correct in believing this is simply not supported at the moment? I've tried the 0.1.1 release and building off of master
Allow the configuration of a service registration to include tags that the discovery backend can expose for consumption by backend
handlers. (ex. "prod", "dev")
Some fields that are required need to have better error messages.
Some bad UX I've seen.
When running in bridge mode none of the interfaces available to containerbuddy contains the actual IP that needs to be advertised to consul.
I'd actually expect that leaving out the interfaces configuration would magically use the consul agent IP. But whatever the case it seems useful to be able to explicitly set the IP through a variable for docker bridged mode users.
All commands in the app.json are just strings. Currently, Containerbuddy splits these strings on space.
This could be fragile if the command requires an argument to have a space.
This leaves three options:
Supporting a JSON array of arguments makes practical sense:
All executable fields, such as onStart
and onChange
, accept both a string or an array. If a string is given, the command and its arguments are separated by spaces; otherwise, the first element of the array is the command path, and the rest are its arguments.
String Command
"health": "/usr/bin/curl --fail -s http://localhost/app"
Array Command
"health": [
"/usr/bin/curl",
"--fail",
"-s",
"http://localhost/app"
]
so there are multiple issues with the local deploymentโฆ
cd ./examples/nginx
./start -p example -f docker-compose-local.yml
this should be
cd ./examples
./start.sh -p example -f docker-compose-local.yml
there is no executable called start
in ./examples/nginx
, it is in ./examples
and its called start.sh
the build and start of nginx is broken.
when it tries to start nginx i get this error
Successfully built a09f1b400979
Creating example_nginx_1...
Cannot start container c0bd9d5f1e31674146c0800c884a273580fa35b6f93356c66aaeea2e0fe449d9: [8] System error: exec: "opt/containerbuddy/containerbuddy": stat opt/containerbuddy/containerbuddy: no such file or directory
this is obviously because there is no file like /opt/containerbuddy/containerbuddy
in the repo nor in the built image.
Accept a signal to reload the Containerbuddy configuration file on external changes. Implementation details:
quit
channel for each of the goroutines running checkHealth
or checkForChange
, so the easiest way to reload the configuration would be to send the quit signal to all these goroutines and recreate them with the new config.It would be great if we could get automated builds on TravisCI
Could make reviewing easier, since Travis will update the build status on the PR.
This came up in discussion in #51, and we decided to split this problem out for later work.
This enhancement includes:
From @misterbisson:
22 opens up the door to more than just deregistering services. Consider running Couchbase in Docker. The blueprint and demo makes deploying Couchbase and scaling it up easy (repo), but scaling down requires more steps that haven't been automated.
A
SIGTERM
handler could be exactly what's needed to add that automation. If it could also execute a user-defined executable (and wait for it), it would allow us to mark the Couchbase node for removal from the cluster and automatically rebalance the data to the remaining nodes before stopping it.I haven't tested it, but I think the right command to call would be:
couchbase-cli rebalance -c 127.0.0.1:8091 -u $COUCHBASE_USER -p $COUCHBASE_PASS --server-remove=${IP_PRIVATE}:8091
And when that is done, it should be safe to stop (and remove/delete) the container.
Hey,
I'm getting an error related to Nginx port when running the example locally.
I did some digging and this command does not work.
NGINX_PORT=$(docker inspect example_nginx_1 | json -a NetworkSettings.Ports."80/tcp".0.HostPort)
Error produced:
command not found: json
write /dev/stdout: broken pipe
So I suggest to use the command above instead. Tested and working.
Docker docs -> Find a Specific Port Mapping
NGINX_PORT=$(docker inspect --format='{{(index (index .NetworkSettings.Ports "80/tcp") 0).HostPort}}' ${PREFIX}_nginx_1)
I can submit a PR if necessary, just let me know :)
We've seen a number of cases where we want Containerbuddy to perform some kind of bootstrapping behavior prior to starting our application.
The implementation I'm thinking of is that we'd add a new configuration value onStart
to the Config struct. We then execute a run
of a user-defined executable just before the main application is run
.
The first thing my container does is:
PUT /v1/agent/check/pass/tst-logredis2-tst-5a49064b027c?note=ok&token=xxxxxxxxxxxxx
Which fails with (output from tcpdump):
HTTP/1.1 500 Internal Server Error
Date: Fri, 04 Mar 2016 08:29:42 GMT
Content-Length: 36
Content-Type: text/plain; charset=utf-8
CheckID does not have associated TTL
Consul agent logs:
Mar 4 09:59:27 tst-xxxx-xxxx-001 consul: 2016/03/04 09:59:27 [ERR] http: Request PUT /v1/agent/check/pass/tst-logredis2-tst-5a49064b027c?note=ok&token=, error: CheckID does not have associated TTL from=X.X.X.X:60299
Afterwards containerbuddy registers the service and check (just the check tcpdump below)
PUT /v1/agent/check/register?token=x
{"ID":"tst-logredis2-tst-5a49064b027c","Name":"tst-logredis2-tst-5a49064b027c","Notes":"TTL for tst-logredis2-tst set by containerbuddy","ServiceID":"tst-logredis2-tst-5a49064b027c","TTL":"30s"}
This behaviour seems incorrect - shouldn't the service be registered first?
Here's my app.json:
{
"consul": "{{.HOST}}:8500",
"stopTimeout": -1,
"services": [
{
"name": "tst-logredis2-tst",
"port": 8080,
"health": [
"socat",
"-",
"TCP4:localhost:8080"
],
"poll": 10,
"ttl": 30,
"interfaces": [
"eth0[0]",
"x.x.x.x/16",
"inet",
"inet6"
]
}
]
}
Ran into this while working on #91. The etcd configuration is supposed to accept a list of endpoints as documented in the README:
{
"etcd": {
"endpoints": ["http://etcd1:4001"]
}
}
But this throws the error Must provide etcd endpoints
, which comes from here where we switch over the config values. Passing single string for a single host works fine. This section of our etcd tests also works, which suggests that the problem is in the configuration file parsing and not the switch where we're throwing the error.
My goal is to make it somewhat easy to create new integration tests by factoring
out the common logic into a simple framework.
It might be overkill though, maybe there is an easier way to do it.
integration_tests/fixtures
- Contains folder for each test harnessintegration_tests/tests
- Contains folders for each integration testintegration_tests/fixtures/fixture_name
- Folder contains a Dockerfile
for building a Conatainerbuddy app - resulting in an image named fixture_name
The Dockerfile
should specify an ENTRYPOINT/CMD and should not require any arguments.
integration_tests/tests/test_name
test.sh
scriptdocker-compose.yml
for setting up the test environmentIf test.sh
returns success: 0
then the test passed, otherwise it failed
This script can make some assumptions:
containerbuddy_etcd
and containerbuddy_consul
are running and are fresh (no data)integration_tests/fixtures
are created and are available as images - requiring no argumentsintegration_test
integration_test: build
./test.sh
The script in the top-level folder named test.sh
will:
integration_tests/fixtures
in alpha orderbuild
into integration_test/fixtures/fixture_name/build
so it can easily be sourced by the Dockerfile
cd integration_tests/fixtures/fixture_name
docker build -t fixture_name .
build
folder to clean up (should add to .gitignore also)Note: Since fixtures are created in alpha order, they can have FROM directives for previously
created images - so as to reduce duplication.
integration_tests/tests
in alpha ordercontainerbuddy_etcd
and containterbuddy_consul
docker-compose.yml
to bring up the test environmentintegration_test/tests/test_name/test.sh
Bugs like #40 may be difficult to find if we are not exercising more complicated code paths.
Need to add some scripts to test critical paths for a running instance of Containerbuddy.
We should be able to compose a set of docker containers which can exercise code paths that are important from outside the codebase.
I'm still trying to hunt down the details but while working on autopilotpattern/mysql#1 I'm discovering it appears to be possible that a configuration reload can cause Containerbuddy to fail to make further forward progress.
I'm going to try to work up a reproduction, because it definitely doesn't happen every time so there might be a race with other signals or something like that.
I'm trying to combine Weave and containerbuddy to register weave IPs in consul-dns. Unfortunately, discovery uses the first IP that it finds from the list of interfaces - which in my case is usually the docker bridge, not the one I want (weave interface).
I'd like to configure the containerbuddy to use ethwe
instead directly, because I know my containers will be on the weave network:
Example of my container's interfaces on the Weave network
root@aab54fefc215:/# ifconfig
eth0 Link encap:Ethernet HWaddr 02:42:ac:11:00:02
inet addr:172.17.0.2 Bcast:0.0.0.0 Mask:255.255.0.0
inet6 addr: fe80::42:acff:fe11:2/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:8 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:648 (648.0 B) TX bytes:648 (648.0 B)
ethwe Link encap:Ethernet HWaddr ee:71:f3:9b:ed:4a
inet addr:10.130.16.1 Bcast:0.0.0.0 Mask:255.255.224.0
inet6 addr: fe80::ec71:f3ff:fe9b:ed4a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1410 Metric:1
RX packets:8 errors:0 dropped:0 overruns:0 frame:0
TX packets:9 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:648 (648.0 B) TX bytes:690 (690.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
Adding an interface override like "interface": "ethwe"
may help me get the correct IP:
{
"consul": "consul:8500",
"services": [
{
"name": "nginx",
"port": 80,
"interface": "ethwe",
"health": "/usr/bin/curl --fail -s http://localhost/health.txt",
"poll": 10,
"ttl": 25
}
],
"backends": [ ... ]
}
Containerbuddy should be performing the duties of the init process if it is run as PID1: Reaping child processes.
One possible way to do this is by using this go library:
https://github.com/ramr/go-reaper
Although since Containerbuddy is already handling signals, we can just run
func reapChildren() {
var wstatus syscall.WaitStatus
pid, err := syscall.Wait4(-1, &wstatus, 0, nil)
for syscall.EINTR == err {
pid, err = syscall.Wait4(-1, &wstatus, 0, nil)
}
if syscall.ECHILD == err {
return
}
log.Printf("Reaped: pid=%d, wstatus=%+v", pid, wstatus)
}
// ... in handleSignals
if 1 == os.Getpid() {
signal.Notify(sig, syscall.SIGCHLD)
}
//...
case syscall.SIGCHLD:
reapChildren()
Running ./containerbuddy -version
fails to bring up the Githash and Version identifiers:
docker run --rm -it -v build/containerbuddy:/containerbuddy debian:jessie /bin/bash
root@5823395a7845:/# ./containerbuddy -version
Version:
GitHash:
I've been able to verify that it works with the 0.1.2-RC build so presumably this got introduced when we moved the directories around in 0.1.3.
On my local workstation, I have the following crazy amount of interfaces:
docker0 Link encap:Ethernet HWaddr 02:42:27:ff:b2:cc
inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0
inet6 addr: fe80::42:27ff:feff:b2cc/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:128306 errors:0 dropped:0 overruns:0 frame:0
TX packets:158590 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:8752297 (8.7 MB) TX bytes:483052292 (483.0 MB)
eth0 Link encap:Ethernet HWaddr 10:c3:7b:45:a2:ff
inet addr:192.168.0.7 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::12c3:7bff:fe45:a2ff/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:5764724 errors:0 dropped:0 overruns:0 frame:0
TX packets:2369492 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:5741557978 (5.7 GB) TX bytes:390985056 (390.9 MB)
Interrupt:18 Memory:fbf00000-fbf20000
eth1 Link encap:Ethernet HWaddr 40:16:7e:37:99:a6
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
Interrupt:19 Memory:fb800000-fb820000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:353966 errors:0 dropped:0 overruns:0 frame:0
TX packets:353966 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:57731938 (57.7 MB) TX bytes:57731938 (57.7 MB)
lxcbr0 Link encap:Ethernet HWaddr 7a:01:23:0a:6a:1f
inet addr:10.0.3.1 Bcast:10.0.3.255 Mask:255.255.255.0
inet6 addr: fe80::7801:23ff:fe0a:6a1f/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:18425 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:3809150 (3.8 MB)
veth9a44ba0 Link encap:Ethernet HWaddr 3e:fb:89:da:76:7f
inet6 addr: fe80::3cfb:89ff:feda:767f/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:8 errors:0 dropped:0 overruns:0 frame:0
TX packets:33 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:648 (648.0 B) TX bytes:7584 (7.5 KB)
vmnet1 Link encap:Ethernet HWaddr 00:50:56:c0:00:01
inet addr:10.99.99.1 Bcast:10.99.99.255 Mask:255.255.255.0
inet6 addr: fe80::250:56ff:fec0:1/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:160348 errors:0 dropped:16648 overruns:0 frame:0
TX packets:18424 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
vmnet8 Link encap:Ethernet HWaddr 00:50:56:c0:00:08
inet addr:10.88.88.1 Bcast:10.88.88.255 Mask:255.255.255.0
inet6 addr: fe80::250:56ff:fec0:8/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:222122 errors:0 dropped:16648 overruns:0 frame:0
TX packets:240397 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
wlan0 Link encap:Ethernet HWaddr 00:24:01:ee:dc:bc
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
If you look at eth1
, you can see that nothing is plugged into it, so its IP addresses will be empty.
When I try to execute HAProxy using the HAProxy base image for v1.6 I get
docker run --rm mbbender/haproxy /opt/containerbuddy/containerbuddy -config file:///opt/containerbuddy/haproxy.json /usr/local/sbin/haproxy -f /usr/local/etc/haproxy/haproxy.cfg
2015/11/06 04:34:46 fork/exec : no such file or directory
It takes a bit for that error to show up, maybe 5-10 seconds so it's not an immediate thing. If I run this image without the containerbuddy wrapper it works as expected.
My image only adds consul-template to the mix which isn't even in play in any of these tests so I don't expect that should matter.
I don't know go so I'm attempting to troubleshoot/debug but not having much luck.
I realized that if I stop and start the container containerbuddy does_not refresh the service ip address on consul. I would have to deregister the service manually. Is this a bug?
When I run tests with the -race
flag to detect data races, there's what looks to be a linker failure associated with a missing glibc component. We should fix this so that we can run the race detector as part of our integration tests.
$ make test
docker rm -f containerbuddy_consul > /dev/null 2>&1 || true
docker run -d -m 256m --name containerbuddy_consul \
progrium/consul:latest -server -bootstrap-expect 1 -ui-dir /ui
bf62fd69cf6a9208790d33de6536c804d6a671889cb61775e56f901d69c1b20a
docker rm -f containerbuddy_etcd > /dev/null 2>&1 || true
docker run -d -m 256m --name containerbuddy_etcd -h etcd quay.io/coreos/etcd:v2.0.8 \
-name etcd0 \
-advertise-client-urls http://etcd:2379,http://etcd:4001 \
-listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
-initial-advertise-peer-urls http://etcd:2380 \
-listen-peer-urls http://0.0.0.0:2380 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster etcd0=http://etcd:2380 \
-initial-cluster-state new
04fb620862a9f165f051abe0b0b9c70bbc299033e445cf2fa279b11e07201f25
docker run --rm --link containerbuddy_consul:consul --link containerbuddy_etcd:etcd \
-v /src/containerbuddy:/go/src/containerbuddy \
-v /src/containerbuddy/.godeps:/go/src \
-v /src/containerbuddy/build:/build \
-v /src/containerbuddy/cover:/cover \
-v /src/containerbuddy/examples:/root/examples:ro \
-v /src/containerbuddy/Makefile.docker:/go/makefile:ro \
-e LDFLAGS='-X main.GitHash=8e9e751 -X main.Version=dev-build-not-for-release' \
containerbuddy_build test
cd /go/src/containerbuddy && go test -v -race -coverprofile=/cover/coverage.out
# testmain
runtime/race(.text): __libc_malloc: not defined
runtime/race(.text): getuid: not defined
runtime/race(.text): pthread_self: not defined
runtime/race(.text): madvise: not defined
runtime/race(.text): madvise: not defined
runtime/race(.text): madvise: not defined
runtime/race(.text): sleep: not defined
runtime/race(.text): usleep: not defined
runtime/race(.text): abort: not defined
runtime/race(.text): isatty: not defined
runtime/race(.text): __libc_free: not defined
runtime/race(.text): getrlimit: not defined
runtime/race(.text): pipe: not defined
runtime/race(.text): __libc_stack_end: not defined
runtime/race(.text): getrlimit: not defined
runtime/race(.text): setrlimit: not defined
runtime/race(.text): setrlimit: not defined
runtime/race(.text): setrlimit: not defined
runtime/race(.text): exit: not defined
runtime/race(.text.unlikely): __errno_location: not defined
runtime/race(.text): undefined: __libc_malloc
/usr/local/go/pkg/tool/linux_amd64/link: too many errors
FAIL containerbuddy [build failed]
makefile:36: recipe for target 'test' failed
make: *** [test] Error 2
make: *** [test] Error 2
The PR #4 provided by @bbox-kula made the changes we needed to have a generic interface for service discovery so we can have pluggable backends as intended. Including a second implementation of this backend as an example would make sure this works out in practice.
TODO on this issue: pick a second backend
Our first build on TravisCI failed https://travis-ci.org/joyent/containerbuddy/builds/94954762
The failing test is one that passed when run locally. I believe there's a race in the signal sending set up or tear-down.
=== RUN TestMaintenanceSignal
2015/12/04 20:13:36 we are paused!
--- FAIL: TestMaintenanceSignal (0.00s)
signals_test.go:51: Should not be in maintenance mode after receiving second SIGUSR1
=== RUN TestTerminateSignal
.2015/12/04 20:13:36 Caught SIGTERM
2015/12/04 20:13:36 Deregister service: test-service
--- PASS: TestTerminateSignal (1.00s)
I'm looking into it, but I'm going to cc @justenwalker because he's been in this area of the code recently.
From https://www.joyent.com/blog/automatic-dns-updates-with-containerbuddy:
The problem is that if we simply remove the old container we'll have a period of lost traffic between the time we remove the container and the TTL expires. We need a way to signal Containerbuddy to mark the node for planned maintenance. I'll circle back to that in a revision to Containerbuddy and discuss this change in an upcoming post.
This is poorly thought through so far, but I wanted to open up discussion for it.
Health checking gives us a binary way to determine if the app is working or not. If the app is not healthy, we should not send any requests to it and we most likely need to spawn a replacement. But scaling depends on more than binary app health. Every app has one or more performance indicators that can be used to determine if the app is nearing overload and should be scaled up or is too lightly loaded and should be scaled down. In the spirit of what Containerbuddy does to make applications container-native, I think it might also make sense to add an awareness of those performance indicators.
Some performance indicators can be read from the system. The five minute load average reported by the kernel may work well for many apps, but many of the most interesting indicators come from the app itself.
In MySQL, the number of Query
entries from SHOW PROCESSLIST
that are in any Waiting
state can be a very significant performance indicator, and it's one that is best retrieved from within the container.
In Nginx, the average request processing time is useful info, but that's only output in the logs, which are hopefully not inside the container (if they were, ngxtop would be a nice tool to help understand them). But we do expect http_stub_status_module
in our triton-nginx image, so we can look at that instead. Active connections
vs. the worker_connections
limit is a hugely important number there. Any delta between accepts
and handled
is a huge red flag. Or, perhaps, the Waiting
number is an inverse indicator (high numbers indicate low activity, low numbers are high activity, zero could be critical). (See autopilotpattern/nginx#3 for further musing on this.)
Here's what I'm imagining it would look like in the config (though it would be obviously unlikely to actually mix Nginx and MySQL in a single image):
"kpis": [
{
"name": "system-loadavg",
"poll": 103,
"kpi": "/opt/containerbuddy/system-loadavg.sh"
},
{
"name": "mysql-queries-waiting",
"poll": 31,
"kpi": "/my/bin/mysql-queries-waiting.sh"
},
{
"name": "nginx-busy",
"poll": 17,
"kpi": "/my/bin/nginx-busy.sh"
},
{
"name": "nginx-missed-connections",
"poll": 137,
"kpi": "/my/bin/nginx-missed-connections.sh"
}
]
In order, the above four KPI entries would return:
Query
entries from SHOW PROCESSLIST
that are in any Waiting
state. 0
is great. 1
or above can be trouble. 10
or more is probably critical.(${Active_connections} - ${Waiting}) / ${worker_connections}
, a decimal value. 1
is maxed out, 0.5
is 50% busy.accepts - handled
vars, an integer value. Anything other than 0
here is probably critical.The executables that get these KPIs are to be provided by the app developer/packager, though it might make sense to have a common way to get thee system load average as part of Containerbuddy.
As with the health checks, the executables would be run periodically at the time specified. The health checks could then be returned in stdout
as newline delimited JSON as they're executed, for interpretation and use by whatever tools are reading the logs.
I wanted to offer those specific use cases as as an exercise to figure out what data would be handled here. In short, I'm thinking each KPI is a numeric (decimal allowed) time series value. The executable will simply return that value and Containerbuddy can pass it on with the name given in the config.
Containerbuddy is a bit chatty. Health checks in particular pile up in the Docker logs. Adding (and documenting) log levels wouldn't be a bad idea. A couple options:
When I try to run the example either locally or on Trident it gets to where it starts writing the template file and never exits. See pastebin http://pastebin.com/GQW5KGE6 It has been running at least 1/2hour.
If Triton allows multiple IPs per address (@bahamat on #50), and we potentially have IPV6 (#52) Then perhaps we need to allow services to select their IP in a more flexible way.
Propose something like:
{
"cidr": "192.168.0.0/16"
}
instead of (or in addition to) the "interfaces"
option
Proposed behavior:
interfaces
given - pick the first IP on the first found interface listed in interfaces
- Already Supportedcidr
given - pick the first IP address on any interface matching the CIDRinterfaces
) matching the CIDRNote: first IP could be IPV4 or IPV6 - Maybe that's a flag or environment variable that dictates which one is preferred? Depends on what is decided in #52 I suppose. In the cidr
case, it will always be determined by the whatever version of CIDR is given, so this disambiguation logic applies to 1 and 2
Also, case 4 may be unnecessary if each interface will only have at most 1 v4 and 1 v6 address. In this case it should probably be a parse error instead of handling it specially.
According to the description, containerbuddy accepts POSIX signals to change its runtime behavior. However, when we test it, we've found that the signals could not be accepted by the process at all, and after we dive into the code, we've got some hints:
//Line 25 in containerbuddy/main.go
// Run the onStart handler, if any, and exit if it returns an error
if onStartCode, err := run(config.onStartCmd); err != nil {
os.Exit(onStartCode)
}
// Set up handlers for polling and to accept signal interrupts
if 1 == os.Getpid() {
reapChildren()
}
handleSignals(config)
handlePolling(config)
In the above code snippet, handleSignals would not be called untill the "run" procedure has been executed, however, "run" procedure would be blocked untill the process itself quit. Given such design, handleSignals would not take into effects at all.
So is this a bug for such design? A similar project is https://github.com/Yelp/dumb-init , its signal handler could work correctly.
Rather than having to push a commit to bump the version (like this #23), it'd be better to be able to have a -version
flag that exposes the version that we can inject via LDFLAGS.
In the code we'd need a couple of global variables:
var Version string // version for this build, set at build time via LDFLAGS
var GitHash string // short-form hash of the commit at HEAD of this build, set at build time via LDFLAGS
At the top of the makefile we can have:
VERSION ?= dev-build-not-for-release
LDFLAGS := '-X containerbuddy.GitHash $(shell git rev-parse --short HEAD) -X containerbuddy.Version ${VERSION}'
Our go build
would then include -ldflags ${LDFLAGS}
to inject those values into the global variables, and then we'd just need a flag
argument to echo that text to stdout.
This means that when we build during development, we'll get:
$ make build
...
$ ./build/containerbuddy -version
Version: dev-build-not-for-release
GitHash: deadbeef
But when we do a release build, we'll get:
$ VERSION=0.0.2 make build
...
$ ./build/containerbuddy -version
Version: 0.0.2
GitHash: deadbeef
$ VERSION=0.0.2 make release
...
Upload this file to Github release:
464ee7708b3bd93c6c996a22f76c426bc144ec71 release/containerbuddy-0.0.2-alpha.tar.gz
Godep is pretty much the de-facto tool for managing dependencies in go projects. I suggest we consider using Godep instead of manually checking out and manipulating the source repositories.
This requires that the directory structure be altered to conform more closely with the golang coding standards
Following the discussion in #80, I'm going to move the example applications into the integration tests. This makes sure we keep the example applications working, and also lets us point to more complex example applications in different repos for more production-ready examples.
It would be nice to be able to reference environment variables within the configuration file. e.g. if I want to one Dockerfile that can be used on a 'dev consul' and later in production, I might choose to change (future) tag support or use a different consul master port.
After #76 landed we ended up with a main.go and separate containerbuddy
package. This opens up the suggestion of making the core functionality of Containerbuddy a library that other applications could import. Then the Containerbuddy binary that we ship would be the main.go and probably the configuration loaders? I've experimented with this a little bit back before we added a lot of our features but it seems like it's feasible and useful.
This would definitely be a post-1.0 release item and needs some discussion about where the "seams" are before we start putting up a bunch of PRs. I do want to avoid making the project unapproachably over-factored.
Unfortunately, I did not do adequate testing for #38 and introduced bug.
Parsing the config yields a Cmd object for all commands. However, health checks and backend change hooks are run more than once. Once run, the Cmd object is effectively dead - cannot be run again.
The result is that we hit the failure case in the executeAndWait function and exit prematurely.
This does not affect the preStop, postStop, and onStart Commands since they are only ever run once.
At some point the time to start the build container got noticeably long, which means test runs have gone from a few seconds to ~20. I can reproduce this both locally and on TravisCI. It's not the tests themselves:
$ time make test
docker rm -f containerbuddy_consul > /dev/null 2>&1 || true
docker run -d -m 256m --name containerbuddy_consul \
progrium/consul:latest -server -bootstrap-expect 1 -ui-dir /ui
199c7b2d0d11c6450887b329aeb86e737770f6a93733c149d9397a9018385b43
docker rm -f containerbuddy_etcd > /dev/null 2>&1 || true
docker run --rm --link containerbuddy_consul:consul --link containerbuddy_etcd:etcd -v /Users/tim.gross/src/justenwalker/containerbuddy/vendor:/go/src -v /Users/tim.gross/src/justenwalker/containerbuddy:/go/src/github.com/joyent/containerbuddy -v /Users/tim.gross/src/justenwalker/containerbuddy/build:/build -v /Users/tim.gross/src/justenwalker/containerbuddy/cover:/cover -v /Users/tim.gross/src/justenwalker/containerbuddy/examples:/root/examples:ro -v /Users/tim.gross/src/justenwalker/containerbuddy/Makefile.docker:/go/makefile:ro -e LDFLAGS='-X containerbuddy.GitHash=4ae8cb3 -X containerbuddy.Version=dev-build-not-for-release' containerbuddy_build test
...
(long pause)
...
cd /go/src/github.com/joyent/containerbuddy && go test -v -coverprofile=/cover/coverage.out ./containerbuddy
...
(lots of output)
...
ok github.com/joyent/containerbuddy/containerbuddy 6.130s
make test 0.73s user 0.04s system 2% cpu 26.594 total
Going to mark this as an enhancement for post-1.0 release and take it on myself, probably while working on #75 in parallel.
Given the following config file:
{
"consul": "consul:8500",
"logging": {
"level": "DEBUG",
"format": "default",
"output": "stderr"
},
"services": [
{
"name": "myservice",
"port": 80,
"poll": 10,
"ttl": 30
}
]
}
we expect the output to be:
$ ./containerbuddy -config file:///containerbuddy.json hello-world.sh
2016/03/09 19:16:18 `health` is required in service jenkins
but instead we get:
$ ./containerbuddy -config file:///containerbuddy.json hello-world.sh
$ echo $?
1
This failure happens after the logging framework is set up in this section. But if we force the failure to happen before we set up the logging, for example with the config file, we get the same non-logging behavior:
{
"consul": "consul:8500",
"etcd": "etcd:4001",
"logging": {
"level": "DEBUG",
"format": "default",
"output": "stderr"
},
"services": [
{
"name": "myservice",
"port": 80,
"poll": 10,
"ttl": 30
}
]
}
Just a few days ago there was a folder with example files and now it has gone missing. A few links comming from the blog are broken now.
The way that the DOCKERMAKE
is used in #57 is causing all dependencies to be re-cloned into the container for every test run. I'm going to look into this to reduce test iteration speed.
Currently, we have preStart
, preStop
and postStart
events; As the container is running though, it may be useful to have periodic tasks execute to report on status to external systems separate from the health checks that report to the service discovery backend.
The primary use-case would be a logical extension point for push-style metrics without having to build in any backends into Containerbuddy directly. (See #27 for discussion)
Configuration may look something like:
{
"onScheduled": [
{ "frequency": "1s", "command": [ "/bin/push_metrics.sh" ] },
{ "frequency": "10s", "command": [ "/bin/push_other_metrics.sh" ] }
]
}
Some other things to consider:
Observation. When creating default.conf file via the consul template an upstream is created for all services. If your cluster has containers which do not expose an external port and are not managed by nginx including an upstream for them will fail the nginx file because it will try to emit a port of 0.
This doesn't affect the example programs but would affect other situations.
I have modified my code to only create entries in the conf file for those services which are linked to nginx.
There are undoubtedly cleaner methods than the one am I using because I am certainly not a bash guru but I am including the file in case it is of interest (and in the hope that you say "Geez Don, it is a lot simpler to just do this ..."
start.sh.zip
When I add a tag to a service I cannot access it in a consul template
hexo.json
{
"consul": "consul:8500",
"onStart": "/opt/containerbuddy/reload-hexo.sh",
"services": [
{
"name": "hexo",
"port": 4000,
"tags":["joyent.blog.vawter.com"],
"health": "/usr/bin/curl --fail -s http://localhost:4000/",
"poll": 10,
"ttl": 25
}
],
"backends": [
]
}
template
{{range services}}
{{range service .Name}}
{{.Name}}
{{.Address}}
{{.Port}}
{{.Tags}}
{{end}}
{{end}}
template output
hexo
192.168.128.235
4000
[]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.