nimbusproject / cloudinit.d Goto Github PK

View Code? Open in Web Editor NEW

17.0 17.0 5.0 822 KB

An API for launching/configuring/and maintaining services in the clouds

License: Other

Python 99.46% Shell 0.54%

cloudinit.d's People

Contributors

Stargazers

Watchers

Forkers

ooici scion-network buzztroll lelou6666 cloudxtreme

cloudinit.d's Issues

Instances launch even though plan is invalid

I ran a launch plan that had a service referenced that does not exist. Cloudinit.d detected this but not before it launched instances for all levels.

$ cloudinitd -vvv -x -n Dave12 boot main.conf
Starting up run Dave12
Validating the launch plan.
 Started IaaS work for rabbit
 Started IaaS work for provisioner
 Started IaaS work for epu-sleepers
Starting the launch plan.
Begin boot level 1...
Level 1 ERROR.
service cassandra not found
see /Users/david/.cloudinitd/ for more details

test for ssh port running without using ssh

right now cloudinitd repeatedly trys ssh until success. This is done to test for when the ssh service is availble. however, there is no way to determine if a failure is because of no service listening on port 22 (yet), or if there was an authentication error. In the first case we want to retry many times, in the second we do not. This causes error cases to take a long time to be identified.

we should switch this to a socket connection check.

assign public IP addresses

Some clouds have a specific and separate step that is required after submission to assign a public IP address to the VM instance. We should add support for this to cloudinit.d

bootpgm variable is not being expanded

Merge branch 'master' into master

May I suggest that you guys consider git pull --rebase when everybody is working on master?

http://purebreeze.com/2010/08/how-do-you-avoid-git-merge-commits/

Passing in a single object as a security group when a list is required by the API

Tests are needed for this

Provide the run name as a dep variable.

Providing the run name as a dep variable would allow tools started on localhost access to the cloudinitd db to extract configuration for amqp, etc.

detect duplicate service names

If two services in a launch plan have the same name awkward errors occur. Cloudinitd should detect this before running the plan and return a better error.

ctl+c is not always killing cloudinitd

This is likely due to threads that wait on boto.

infinite poll

Launching a rabbit node off of f9e2563 and it seems to be running the ssh /bin/true check forever instead of progressing to bootpgm

"hostname:" does not work with defaults

If there is a default image/iaas configuration in the cloudinitd plan, the "hostname:" mechanism does not seem to work

API access to IaaS status

It would be good for higher level tools to, via API, figure out the IaaS status of any service's VM (if it is associated with a VM). It would be best if there was a flag that controlled whether the information can come from the database or generate a new query.

terminate script is not called

with this conf: "terminatepgm: cassandra-unload.sh", I put a wget call to a webserver in the script. The call's not showing up in the webserver logs (and cassandra is not getting unloaded).

turn on and off https

some clouds will not be running https. we need to turn off security

more than 10 bootlevels can be out of order

output file

Cloudinitd should have an output json file full of all of the attributes associated with each service so that it can more easily be used as a scripting tool.

repair + terminate problem

After a successful repair operation (where successful means it started a new replacement node and configured it), I ran terminate at the end of the run. Terminate did not remove the original node (which is not good in and of itself but also I would think should have been terminated at the time repair decided to make a replacement).

VMs not killed after a terminatepgm fails

A user reported a launch plan failing to terminate all VMs after a terminatepgm returned nonzero. However the cloudinitd DB file was still removed.

Make local host case a first class feature

Right now cloudinitd assumes that it will ssh into remote hosts. For development cases it is often convenient to run everything on a localhost. This can be done currently if sshd is running on the localhost, but this can be a bit awkward. We may want to investigate making this a first class feature.

local_exe is not explanding ~/

Before sending a command argument string to fab each argument is quoted with python's pipes.quote() function. This quotes path's with a ~ in them. In the case where ssh is used these arguments are expanded by sshd. However in the case where fab local() is called they are never expanded. We need to make this behavior consistent.

move temporary json files to the log directory

cloudinit get_dep_keys() incomplete

get_dep_keys does not return the service database keys. They are a static set of keys that the user can still ask for so this is not too much of a problem

Allow a failed boot to be continued via repair

If a boot fails after performing a large amount of work the author of the boot plan may wish to make a small tweak to the boot plan to fix a bug and then allow that boot to continue. Restarting from the beginning can be costly.

We should let the user run --repair on a failed boot.

Tests that create 1 VM and then use that for many future instances.

cleanup service directory in tmp on terminate

This feature will be relevant for the case where all services are running on the same VM (or the laptop development case)

termination is incomplete after level failure

So this happened:

Launched a plan with 3 levels, 1 VM per level. All VMs started booting immediately.
Level1's boot program failed with an error and init stopped
I did cloudinitd terminate to kill the nodes but only the level1 node was killed. The console output indicated that all had been killed however.

add a feature to list hte iaas handle of every vm every created for a boot

this is a fail safe way to make sure you do not leak resources. Running VMs often cost money, we need to give the user a way to do a sanity check that some bug, misuse, or early termination of cloudinitd did not leave some VMs orphaned and running.

add feature to reload a config file to an already booted run

debugging help

This will be more important when the code is in the wild, two suggestions for helping debug things based on logs:

Use of DEBUG should trigger a logging format string that has line #s
Severe errors should trigger a stacktrace print to the logs when DEBUG is set

Infinite loop if terminate twice in a row

If any service in the plan has a hostname that is a variable, ex:

hostname: ${otherservice.hostname}

And the launch is terminated with --noclean, and then terminated again the process will go into an infinite loop.

IaaS type with unsupported features

a few iaas platforms (nimbus, eucalyptus) implement the EC2 interface but without a complete feature set. For example security groups. As is, if a security group is set it will just be a pass through to the IaaS platform and things should still work. However, they may not work as expected. We should log a warning when this happens, and we should add a level to validate to describe the problem when a plan is checked.

Missing environment variable errors are misleading

In at least one scenario, a missing environment variable resulted in an error about the dep variable being assigned instead of the env itself. This can be confusing to debug. The dep config:

rabbitmq_host: env.RABBITMQ_HOST

The error:

The service basenode has no attr by the name of rabbitmq_host. Please check your config files.

database load is hanging

Noticed on a problematic run, epumgmt load of the cloudinitd database hangs indefinitely

set service directory based on presence of an env

Currently all programs (bootpgm, readypgm, etc) are copied to the directory /tmp/nimbusready/. We should allow a user to override the baseenv in the presence of an environment variable

expand multiple variables in a single line

Cloud initd should be able to expand many variables in a single line. A test is needed for this.

Console output messagine about logging was incorrect

When terminating, checking the status of, or repairing a launch the output directing the user to the right log file is incorrect if the -n option is used.

IaaS validation broken

In "cb_iaas.py" it looks like "g_validate_funcs" only considers "nimbus" vs. "ec2". But the value of "iaas" for ec2 is based on boto regions. So there seems to be a mismatch. Here's the error when attempting to start with "us-west-1", it is saying there is a problem but there is not actually a problem:

iaas type us-west-1 has a problem: u'us-west-1'