Giter VIP home page Giter VIP logo

cloudinit.d's People

Contributors

buzztroll avatar labisso avatar oldpatricka avatar priteau avatar timf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cloudinit.d's Issues

Instances launch even though plan is invalid

I ran a launch plan that had a service referenced that does not exist. Cloudinit.d detected this but not before it launched instances for all levels.

$ cloudinitd -vvv -x -n Dave12 boot main.conf
Starting up run Dave12
Validating the launch plan.
 Started IaaS work for rabbit
 Started IaaS work for provisioner
 Started IaaS work for epu-sleepers
Starting the launch plan.
Begin boot level 1...
Level 1 ERROR.
service cassandra not found
see /Users/david/.cloudinitd/ for more details

test for ssh port running without using ssh

right now cloudinitd repeatedly trys ssh until success. This is done to test for when the ssh service is availble. however, there is no way to determine if a failure is because of no service listening on port 22 (yet), or if there was an authentication error. In the first case we want to retry many times, in the second we do not. This causes error cases to take a long time to be identified.

we should switch this to a socket connection check.

assign public IP addresses

Some clouds have a specific and separate step that is required after submission to assign a public IP address to the VM instance. We should add support for this to cloudinit.d

detect duplicate service names

If two services in a launch plan have the same name awkward errors occur. Cloudinitd should detect this before running the plan and return a better error.

infinite poll

Launching a rabbit node off of f9e2563 and it seems to be running the ssh /bin/true check forever instead of progressing to bootpgm

API access to IaaS status

It would be good for higher level tools to, via API, figure out the IaaS status of any service's VM (if it is associated with a VM). It would be best if there was a flag that controlled whether the information can come from the database or generate a new query.

terminate script is not called

with this conf: "terminatepgm: cassandra-unload.sh", I put a wget call to a webserver in the script. The call's not showing up in the webserver logs (and cassandra is not getting unloaded).

output file

Cloudinitd should have an output json file full of all of the attributes associated with each service so that it can more easily be used as a scripting tool.

repair + terminate problem

After a successful repair operation (where successful means it started a new replacement node and configured it), I ran terminate at the end of the run. Terminate did not remove the original node (which is not good in and of itself but also I would think should have been terminated at the time repair decided to make a replacement).

Make local host case a first class feature

Right now cloudinitd assumes that it will ssh into remote hosts. For development cases it is often convenient to run everything on a localhost. This can be done currently if sshd is running on the localhost, but this can be a bit awkward. We may want to investigate making this a first class feature.

local_exe is not explanding ~/

Before sending a command argument string to fab each argument is quoted with python's pipes.quote() function. This quotes path's with a ~ in them. In the case where ssh is used these arguments are expanded by sshd. However in the case where fab local() is called they are never expanded. We need to make this behavior consistent.

cloudinit get_dep_keys() incomplete

get_dep_keys does not return the service database keys. They are a static set of keys that the user can still ask for so this is not too much of a problem

Allow a failed boot to be continued via repair

If a boot fails after performing a large amount of work the author of the boot plan may wish to make a small tweak to the boot plan to fix a bug and then allow that boot to continue. Restarting from the beginning can be costly.

We should let the user run --repair on a failed boot.

termination is incomplete after level failure

So this happened:

  1. Launched a plan with 3 levels, 1 VM per level. All VMs started booting immediately.
  2. Level1's boot program failed with an error and init stopped
  3. I did cloudinitd terminate to kill the nodes but only the level1 node was killed. The console output indicated that all had been killed however.

debugging help

This will be more important when the code is in the wild, two suggestions for helping debug things based on logs:

  1. Use of DEBUG should trigger a logging format string that has line #s
  2. Severe errors should trigger a stacktrace print to the logs when DEBUG is set

Infinite loop if terminate twice in a row

If any service in the plan has a hostname that is a variable, ex:

hostname: ${otherservice.hostname}

And the launch is terminated with --noclean, and then terminated again the process will go into an infinite loop.

IaaS type with unsupported features

a few iaas platforms (nimbus, eucalyptus) implement the EC2 interface but without a complete feature set. For example security groups. As is, if a security group is set it will just be a pass through to the IaaS platform and things should still work. However, they may not work as expected. We should log a warning when this happens, and we should add a level to validate to describe the problem when a plan is checked.

Missing environment variable errors are misleading

In at least one scenario, a missing environment variable resulted in an error about the dep variable being assigned instead of the env itself. This can be confusing to debug. The dep config:

rabbitmq_host: env.RABBITMQ_HOST

The error:

The service basenode has no attr by the name of rabbitmq_host. Please check your config files.

IaaS validation broken

In "cb_iaas.py" it looks like "g_validate_funcs" only considers "nimbus" vs. "ec2". But the value of "iaas" for ec2 is based on boto regions. So there seems to be a mismatch. Here's the error when attempting to start with "us-west-1", it is saying there is a problem but there is not actually a problem:

iaas type us-west-1 has a problem: u'us-west-1'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.