nimbusproject / cloudinit.d Goto Github PK
View Code? Open in Web Editor NEWAn API for launching/configuring/and maintaining services in the clouds
License: Other
An API for launching/configuring/and maintaining services in the clouds
License: Other
I ran a launch plan that had a service referenced that does not exist. Cloudinit.d detected this but not before it launched instances for all levels.
$ cloudinitd -vvv -x -n Dave12 boot main.conf
Starting up run Dave12
Validating the launch plan.
Started IaaS work for rabbit
Started IaaS work for provisioner
Started IaaS work for epu-sleepers
Starting the launch plan.
Begin boot level 1...
Level 1 ERROR.
service cassandra not found
see /Users/david/.cloudinitd/ for more details
right now cloudinitd repeatedly trys ssh until success. This is done to test for when the ssh service is availble. however, there is no way to determine if a failure is because of no service listening on port 22 (yet), or if there was an authentication error. In the first case we want to retry many times, in the second we do not. This causes error cases to take a long time to be identified.
we should switch this to a socket connection check.
Some clouds have a specific and separate step that is required after submission to assign a public IP address to the VM instance. We should add support for this to cloudinit.d
May I suggest that you guys consider git pull --rebase
when everybody is working on master?
http://purebreeze.com/2010/08/how-do-you-avoid-git-merge-commits/
Tests are needed for this
Providing the run name as a dep variable would allow tools started on localhost access to the cloudinitd db to extract configuration for amqp, etc.
If two services in a launch plan have the same name awkward errors occur. Cloudinitd should detect this before running the plan and return a better error.
This is likely due to threads that wait on boto.
Launching a rabbit node off of f9e2563 and it seems to be running the ssh /bin/true check forever instead of progressing to bootpgm
If there is a default image/iaas configuration in the cloudinitd plan, the "hostname:" mechanism does not seem to work
It would be good for higher level tools to, via API, figure out the IaaS status of any service's VM (if it is associated with a VM). It would be best if there was a flag that controlled whether the information can come from the database or generate a new query.
with this conf: "terminatepgm: cassandra-unload.sh", I put a wget call to a webserver in the script. The call's not showing up in the webserver logs (and cassandra is not getting unloaded).
some clouds will not be running https. we need to turn off security
Cloudinitd should have an output json file full of all of the attributes associated with each service so that it can more easily be used as a scripting tool.
After a successful repair operation (where successful means it started a new replacement node and configured it), I ran terminate at the end of the run. Terminate did not remove the original node (which is not good in and of itself but also I would think should have been terminated at the time repair decided to make a replacement).
A user reported a launch plan failing to terminate all VMs after a terminatepgm returned nonzero. However the cloudinitd DB file was still removed.
Right now cloudinitd assumes that it will ssh into remote hosts. For development cases it is often convenient to run everything on a localhost. This can be done currently if sshd is running on the localhost, but this can be a bit awkward. We may want to investigate making this a first class feature.
Before sending a command argument string to fab each argument is quoted with python's pipes.quote() function. This quotes path's with a ~ in them. In the case where ssh is used these arguments are expanded by sshd. However in the case where fab local() is called they are never expanded. We need to make this behavior consistent.
get_dep_keys does not return the service database keys. They are a static set of keys that the user can still ask for so this is not too much of a problem
If a boot fails after performing a large amount of work the author of the boot plan may wish to make a small tweak to the boot plan to fix a bug and then allow that boot to continue. Restarting from the beginning can be costly.
We should let the user run --repair on a failed boot.
This feature will be relevant for the case where all services are running on the same VM (or the laptop development case)
So this happened:
cloudinitd terminate
to kill the nodes but only the level1 node was killed. The console output indicated that all had been killed however.this is a fail safe way to make sure you do not leak resources. Running VMs often cost money, we need to give the user a way to do a sanity check that some bug, misuse, or early termination of cloudinitd did not leave some VMs orphaned and running.
This will be more important when the code is in the wild, two suggestions for helping debug things based on logs:
If any service in the plan has a hostname that is a variable, ex:
hostname: ${otherservice.hostname}
And the launch is terminated with --noclean, and then terminated again the process will go into an infinite loop.
a few iaas platforms (nimbus, eucalyptus) implement the EC2 interface but without a complete feature set. For example security groups. As is, if a security group is set it will just be a pass through to the IaaS platform and things should still work. However, they may not work as expected. We should log a warning when this happens, and we should add a level to validate to describe the problem when a plan is checked.
In at least one scenario, a missing environment variable resulted in an error about the dep variable being assigned instead of the env itself. This can be confusing to debug. The dep config:
rabbitmq_host: env.RABBITMQ_HOST
The error:
The service basenode has no attr by the name of rabbitmq_host. Please check your config files.
Noticed on a problematic run, epumgmt load of the cloudinitd database hangs indefinitely
Currently all programs (bootpgm, readypgm, etc) are copied to the directory /tmp/nimbusready/. We should allow a user to override the baseenv in the presence of an environment variable
Cloud initd should be able to expand many variables in a single line. A test is needed for this.
When terminating, checking the status of, or repairing a launch the output directing the user to the right log file is incorrect if the -n option is used.
In "cb_iaas.py" it looks like "g_validate_funcs" only considers "nimbus" vs. "ec2". But the value of "iaas" for ec2 is based on boto regions. So there seems to be a mismatch. Here's the error when attempting to start with "us-west-1", it is saying there is a problem but there is not actually a problem:
iaas type us-west-1 has a problem: u'us-west-1'
With launch plan integration01 level4, seeing a boot program's output after the boot program and ready program output already happen.
Right now all timeout setting are hard coded, this needs to be cleaned up.
right now the dryrun options leaves a useless database on the filesystem.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.