Giter VIP home page Giter VIP logo

leanercloud / autospotting Goto Github PK

View Code? Open in Web Editor NEW
2.3K 58.0 310.0 30.86 MB

Saves up to 90% of AWS EC2 costs by automating the use of spot instances on existing AutoScaling groups. Installs in minutes using CloudFormation or Terraform. Convenient to deploy at scale using StackSets. Uses tagging to avoid launch configuration changes. Automated spot termination handling. Reliable fallback to on-demand instances.

Home Page: https://autospotting.io

License: Open Software License 3.0

Makefile 1.35% Go 98.43% Dockerfile 0.11% HTML 0.11%
aws autoscaling-groups aws-lambda spot-instances cost ec2 amazon-web-services terraform-module infrastructure aws-autoscaling

autospotting's People

Contributors

artemnikitin avatar atillamas avatar bfin avatar binarylogic avatar codenoid avatar cristim avatar gabegorelick avatar gjmveloso avatar grahamlyons avatar jjones-smug avatar jwineinger avatar karock avatar lenucksi avatar mello7tre avatar mrwacky42 avatar nyoroon avatar phils avatar prayashm avatar raravena80 avatar renzof avatar rgarcia avatar roeyazroel avatar russellballestrini avatar salvianreynaldi avatar shugyousha avatar tapirs avatar thebigjc avatar tootedom avatar vivekdubey avatar xlr-8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

autospotting's Issues

Working with Elastic Beanstalk

Just tested it with Elastic Beanstalk - working great. just add the tags during creation of new environment.

Maybe add to README?

Roey

Implement global configuration options when executed from Lambda

Currently the autospotting CLI tool supports a number of flags when executed manually, but those are completely ignored when executed from Lambda, which is not yet configurable.

This should be implemented by customizing our CloudWatch events.

  • The CloudFormation stack should have configuration parameters corresponding to each of the global command-line flags
  • A custom CloudWatch event should be generated by the CloudFormation template based on those stack parameters
  • The event data should be handled by the autospotting Lambda handler for generating configuration data structure just like done for the command-line flags

Some Spot are created without Tag

Some spot requests are fulfilled, but no 'launched-for-asg' is been assigned to this instance.

03:08:27
autoscaling.go:443: production-ecs-cluster Created spot instance request sir-dbqr5wjj
03:08:27
autoscaling.go:483: production-ecs-cluster Failed to create tags for the spot instance request InvalidSpotInstanceRequestID.NotFound: The spot instance request ID 'sir-dbqr5wjj' does not exist
03:08:28
autoscaling.go:335: production-ecs-cluster Refreshed details for sir-dbqr5wjj {
03:08:28
SpotInstanceRequestId: "sir-dbqr5wjj",
03:08:33
autoscaling.go:335: production-ecs-cluster Refreshed details for sir-dbqr5wjj {
03:08:33
SpotInstanceRequestId: "sir-dbqr5wjj",
03:08:38
autoscaling.go:335: production-ecs-cluster Refreshed details for sir-dbqr5wjj {
03:08:38
SpotInstanceRequestId: "sir-dbqr5wjj",
03:08:43
autoscaling.go:335: production-ecs-cluster Refreshed details for sir-dbqr5wjj {
03:08:43
SpotInstanceRequestId: "sir-dbqr5wjj",
03:08:48
autoscaling.go:335: production-ecs-cluster Refreshed details for sir-dbqr5wjj {
03:08:48
SpotInstanceRequestId: "sir-dbqr5wjj",
03:08:53
autoscaling.go:335: production-ecs-cluster Refreshed details for sir-dbqr5wjj {
03:08:53
SpotInstanceRequestId: "sir-dbqr5wjj",
03:08:58
autoscaling.go:335: production-ecs-cluster Refreshed details for sir-dbqr5wjj {
03:08:58
SpotInstanceRequestId: "sir-dbqr5wjj",
03:09:04
autoscaling.go:335: production-ecs-cluster Refreshed details for sir-dbqr5wjj {
03:09:04
SpotInstanceRequestId: "sir-dbqr5wjj",

Use eawsy/aws-lambda-go for packaging

This has a couple of benefits

  • simpler, less custom build scripts
  • build in Docker container, so it doesn't need anything but docker installed for development instead of a full Golang toolchain
  • remove the Python wrapper entirely
  • slightly faster execution

Possible issues

  • the shipping of the instance information would need some refactoring, maybe we could be using go-bindata instead of including a blob
  • Local execution would need some Investigation

Fix handling of instance storage mapping

There is a bug in instance store handling.

In case the launch configuration has more device mappings than available in the specified instance type, some of them will get ignored when launching the instances, so the instance will actually have less ephemeral instance store volumes than specified in the launch configuration.

Since we compare storage in the instance information with the number of ephemeral devices in the launch configuration, this causes the storage comparison to fail for a lot of otherwise compatible and closely priced instances, potentially leaving us only with much more expensive instances likely to fail when compared by price.

The storage comparison should instead consider the minimum between the number of ephemeral devices specified in the launch configuration and the number of devices available for that instance type.

undefined: Asset

I'm trying to test this out but I'm having trouble even getting it running locally. Following the SETUP.md, I get this:

$ go get github.com/cristim/autospotting
# github.com/cristim/autospotting
src/github.com/cristim/autospotting/autospotting.go:53: undefined: Asset
src/github.com/cristim/autospotting/autospotting.go:58: undefined: Asset

Configuration option for the number of instances to be allowed per instance-type/AZ combination

At the moment this number is hardcoded to 20% in the autospotting instance replacement logic, which should be changed.

  • we need a configuration option added to the logic that would allow an arbitrary number (the related logic may also need some clean-ups).
  • the new option should be exposed through a new command-line flag, defaulting to the current hardcoded value.
  • the new flag needs to be exposed by the CloudFormaiton stack as a parameter, also with the same default value.
  • it needs to be configurable on a group level override using tags
  • the changed code need to have unit tests
  • the new option needs to be documented in the README, perhaps in multiple sections if applicable.

Pick the replaced on-demand instances based on the uptime

The instances should be replaced in a way that minimizes the wasted runtime hours that the user gets charged. We should pick the on-demand instance which will be closer to a full instance hour when eventually terminated.

Max((uptime_minutes + grace_period_minutes + 15 ) % 60)

The 15min is a buffer that allows us to launch a spot instance and replace the on-demand one. It could also be parameterized, although I don't think it worth the effort.

Make it self-contained, and disable auto-updates

Fork the code into a self-contained implementation which is packaged entirely into the Lambda function's code, without external runtime dependencies.

This may be done thorugh a port to one of the lambda-based frameworks such as apex, serverless or sparta, after investigating which one of those is the best for us.

Agent Fault during execution

Getting the following error:

`autoscaling.go:732: memory compatible, continuing evaluation
2016/10/05 09:08:22 Request body type has been overwritten. May cause race conditions
2016/10/05 09:08:22 Request body type has been overwritten. May cause race conditions
2016/10/05 09:08:22 Request body type has been overwritten. May cause race conditions
2016/10/05 09:08:22 Request body type has been overwritten. May cause race conditions
2016/10/05 09:08:22 Request body type has been overwritten. May cause race conditions
2016/10/05 09:08:22 Request body type has been overwritten. May cause race conditions
2016/10/05 09:08:22 Request body type has been overwritten. May cause race conditions
2016/10/05 09:08:22 Request body type has been overwritten. May cause race conditions
2016/10/05 09:08:22 Request body type has been overwritten. May cause race conditions
2016/10/05 09:08:23 Request body type has been overwritten. May cause race conditions
2016/10/05 09:08:23 Request body type has been overwritten. May cause race conditions
2016/10/05 09:08:23 Request body type has been overwritten. May cause race conditions
2016/10/05 09:08:23 Request body type has been overwritten. May cause race conditions
2016/10/05 09:08:24 Request body type has been overwritten. May cause race conditions
2016/10/05 09:08:24 Request body type has been overwritten. May cause race conditions
2016/10/05 09:08:24 Request body type has been overwritten. May cause race conditions
2016/10/05 09:08:24 Request body type has been overwritten. May cause race conditions
2016/10/05 09:08:24 Request body type has been overwritten. May cause race conditions
2016/10/05 09:08:26 Request body type has been overwritten. May cause race conditions
autoscaling.go:489: Throttling: Rate exceeded
status code: 400, request id: 467a74c5-8adb-11e6-b8f6-5b4ea7c39369
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x478404]

goroutine 151 [running]:
panic(0x84c520, 0xc420014040)
/home/travis/.gimme/versions/go1.7.linux.amd64/src/runtime/panic.go:500 +0x1a1
github.com/cristim/autospotting/core.(_autoScalingGroup).countAttachedInstanceStoreVolumes(0xc420252740, 0xc42066ee40)
/home/travis/gopath/src/github.com/cristim/autospotting/core/autoscaling.go:842 +0x34
github.com/cristim/autospotting/core.(_autoScalingGroup).getCompatibleSpotInstanceTypes(0xc420252740, 0xc4203f5660, 0xa, 0xc4202ef800, 0xc420641ce0, 0x38, 0x0)
/home/travis/gopath/src/github.com/cristim/autospotting/core/autoscaling.go:747 +0xa16
github.com/cristim/autospotting/core.(_autoScalingGroup).getCheapestCompatibleSpotInstanceType(0xc420252740, 0xc4203f5660, 0xa, 0xc4202ef800, 0xc420641ca0)
/home/travis/gopath/src/github.com/cristim/autospotting/core/autoscaling.go:665 +0x22f
github.com/cristim/autospotting/core.(_autoScalingGroup).launchCheapestSpotInstance(0xc420252740, 0xc42032b338)
/home/travis/gopath/src/github.com/cristim/autospotting/core/autoscaling.go:361 +0x2f1
github.com/cristim/autospotting/core.(_autoScalingGroup).process(0xc420252740)
/home/travis/gopath/src/github.com/cristim/autospotting/core/autoscaling.go:63 +0x55b
github.com/cristim/autospotting/core.(_region).processEnabledAutoScalingGroups.func1(0xc4204ec420, 0xc42022cce0, 0x16, 0xc4204ec420, 0xc42025a280, 0x0, 0x0, 0x0)
/home/travis/gopath/src/github.com/cristim/autospotting/core/region.go:195 +0x74
created by github.com/cristim/autospotting/core.(*region).processEnabledAutoScalingGroups
/home/travis/gopath/src/github.com/cristim/autospotting/core/region.go:197 +0x112
END RequestId: 8e105fb4-8ada-11e6-83d3-57bdd7783738
REPORT RequestId: 8e105fb4-8ada-11e6-83d3-57bdd7783738 Duration: 9197.08 ms Billed Duration: 9200 ms Memory Size: 128 MB Max Memory Used: 15 MB `

Choose more reliable instance types

Use the Spot Bid Advisor information about the likelyhood of instances being terminated, in order to prefer instances unlikely to be terminated in the near future.

Increase the automated test coverage to an acceptable value

I'd like to see it achieve somewhere around 80%

Summary of files:

React on the 2min termination notice

Each of the instances about to be terminated can poll a metadata entry which will specify when the termination is imminent.

This should be handled in some way that may be specific to the application, so the user should be allowed to run some code that cleanly takes that instance out of the pool.

Spot requests fail when no SSH key is configured

Hi!

While playing around with autospotting, I configured a Launch Configuration that did not define an SSH key. In this setup, the spot requests failed with this error message:

failed: Invalid value '' for keyPairNames. It should not be blank (400 response code)
bad-parameters: Your Spot request failed due to bad parameters.

It would be nice if autospotting supported this setup.

Failed to tag instance

It has happened before, the waitFor function should be enough, but it's look like the request it self doesn't exists.

The instance was launch successfully with the right user data.

autoscaling.go:313: test-ecs-cluster Waiting for spot instance for spot instance request sir-7tag5hag
autoscaling.go:468: awseb-e-sku5tnq3xu-stack-AWSEBAutoScalingGroup-1F01KRYAGNGAD Failed to create tags for the spot instance request InvalidSpotInstanceRequestID.NotFound: The spot instance request ID 'sir-r1vi76vg' does not exist
status code: 400, request id: d00b11c4-ba27-4934-ab5e-5a189b6b1c64

Consider availability zone when picking an instance type

Our instances all run in the us-east-1d availability zone. When the autospotting process runs, it's finding the cg1.4xlarge instance type as the cheapest option and trying to use that to request spot instances. Unfortunately, the cg1.4xlarge instance type isn't available in that us-east-1d AZ, only us-east-1c. We get this error on the spot request "capacity-not-available: There is no Spot capacity available that matches your request. "

For use cases like ours that are limited to a specific AZ, it would be really helpful to consider the AZ when retrieving spot pricing info, and only pick an instance type if it's available in the AZ.

When this happens, it continues to request a new spot instance each time the process runs, which fairly quickly uses up the maximum number of open spot instance requests that AWS allows, preventing other spot instance requests.

Try it out on EC2 classic and fix any issues

The current code was recently changed in order to work on VPC, and was only tested on VPC and DefaultVPC environments, so it may have regressions on EC2 Classic.

Those need to be checked and ironed out if found.

Properly handle SecurityGroups in DefaultVPC environments

When using a stack created for EC2 classic, the groups are created by name.

In VPC(inclusing DefaultVPC) they need to be given by ID.

The code may need to query the groups by name and return their ID, and pass them by ID on VPC environments.

Improve the instance replacement logic

The code needs to be cleaned up at least as per some of the code review comments posted on #46.

Any function that is changed needs to have its unit tests updated or created if missing.

CodeDeploy support

My current autoscaling groups use CodeDeploy to deploy the web applications. CodeDeploy uses Hooks in the AutoScaling Group for that, is this supported out of the box?

Keep at least one (or more) reserved instance running

I have a web application and it is essential to have a number of instances always running to handle the possibility of having all spot instances shut down and make the web application still accessible.

Is it possible to configure Autospotting to keep a specific (or a number of) reserved instances without replacement while replacing the rest of the instances with spot ones ?

Whitelist/blacklist certain instance types via configuration

I saw forks that hard-coded some specific instance types they need in their environment. Other people complained that some instances are problematic in certain availability zones and had them hardcoded out.

This could be configurable using CloudFormation stack parameters, passed as variables to the Lambda function now that Lambda supports this feature. We could add two new CloudFormation stack options:

WhiteListedInstanceTypes: m3.medium,c4.large
BlackListedInstanceTypes: c3.large

On another hand I think that such a global setting may not be desirable for some cases, so it may be better to also be able to apply it on a per-group basis, using additional tags set on the AutoScaling group:

AutoSpotting_WhiteListedInstanceTypes: c3.large,c4.large
AutoSpotting_BlackListedInstanceTypes:  c3.xlarge

Failed to describe AutoScaling tags in <region> AccessDenied

I'm getting the following error:

region.go:248: Failed to describe AutoScaling tags in us-east-1 AccessDenied: User: arn:aws:sts::308824460317:assumed-role/AutoSpotting-LambdaExecutionRole-17GHN91Z6W9OE/AutoSpotting-LambdaFunction-V9LHZ91FKJK4 is not authorized to perform: autoscaling:DescribeTags

Workaround it by adding "AmazonEC2FullAccess" policy to the autospotting role.

I'm using very early version of the stack, maybe the role needs to be updated.

Currently this is the structure of the role:

{
    "Statement": [
        {
            "Action": [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeLaunchConfigurations",
                "autoscaling:AttachInstances",
                "autoscaling:DetachInstances",
                "ec2:CreateTags",
                "ec2:DescribeInstances",
                "ec2:DescribeRegions",
                "ec2:DescribeSpotInstanceRequests",
                "ec2:DescribeSpotPriceHistory",
                "ec2:RequestSpotInstances",
                "ec2:TerminateInstances",
                "iam:PassRole",
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.