Giter VIP home page Giter VIP logo

blox's Introduction

Logo

Blox: Open Source schedulers for Amazon ECS

Build Status

Blox provides open source schedulers optimized for running applications on Amazon ECS. Developers now have greater control over how their applications are deployed across clusters of resources, run and scale in production, and can take advantage of powerful placement capabilities of Amazon ECS. Blox is being delivered as a managed service via the Amazon ECS Console, API and CLIs. Blox v1.0 provides daemon scheduling for Amazon ECS. We will continue to add additional schedulers as part of this project. Blox schedulers are built using AWS primitives, and the Blox designs and code are open source. If you are interested in learning more or collaborating on the designs, please read the design. If you are currently using Blox v0.3, please read the FAQ.

Project structure

For an overview of the components of Blox, run:

./gradlew projects

Testing

To run the full unit test suite, run:

./gradlew check

This will run the same tests that we run in the Travis CI build.

Deploying

First, take a look at what Blox will put in your personal stack by running the showStackConfig task:

$ ./gradlew showStackConfig

> Task :showStackConfig
Blox deployment stack configuration:

  Default resource name         (blox.name): blox-<username>-alpha-us-west-2 (default)
  API Gateway stage            (blox.stage): alpha (default)
  Stack prefix                (blox.prefix): <username>-alpha (default)
  AWS Region                  (blox.region): us-west-2 (default)
  AWS Credential Profile     (blox.profile): blox-<username>-alpha-us-west-2 (default)
  Cloudformation stack name (blox.cfnStack): blox-<username>-alpha-us-west-2 (default)
  Deployment S3 bucket name (blox.s3Bucket): blox-<username>-alpha-us-west-2 (default)

To customize these values, modify ~/.gradle/gradle.properties to override the property listed.

AWS CLI configuration for profile blox-<username>-alpha-us-west-2:

The config profile (blox-<username>-alpha-us-west-2) could not be found

If you wish to customize any of these values, you can do so by overriding the property in parentheses using any of the supported ways to override Gradle properties. The easiest way is to override it for your user in ~/.gradle/gradle.properties:

blox.profile=default
blox.region=us-east-1

Next, in order to deploy your personal stack:

  • install the official AWS CLI

  • create an IAM user with the following permissions:

    {
        "Version":"2012-10-17",
        "Statement":[{
            "Effect":"Allow",
            "Action":[
                "s3:*",
                "lambda:*",
                "apigateway:*",
                "cloudformation:*",
                "iam:*",
                "execute-api:*",
                "events:DescribeRule"
            ],
            "Resource":"*"
        }]
    }
    

    These permissions are pretty broad, so we recommend you use a separate, test account.

  • configure the AWS Credential Profile shown in the showStackOutput task with the AWS credentials for the user you created above:

    aws configure --profile blox-<username>-alpha-us-west-2 set region us-west-2
    aws configure --profile blox-<username>-alpha-us-west-2
    
  • create an S3 bucket where all resources (code, cloudformation templates, etc) to be deployed will be stored:

    ./gradlew createBucket
    
  • deploy the Blox stack:

    ./gradlew deploy
    

End to end testing

Once you have a stack deployed, you can test it with:

./gradlew testEndToEnd

Contact

License

All projects under Blox are released under Apache 2.0 and contributions are accepted under individual Apache Contributor Agreements.

blox's People

Contributors

aaithal avatar aaronkao avatar dramaticlly avatar emkay avatar eswarbala avatar gongmax avatar hyandell avatar jhspaybar avatar khalian avatar kiranmeduri avatar kylbarnes avatar mwarkentin avatar narehayrapetyan avatar poojamaiya avatar samuelkarp avatar sawanoboly avatar shubharao avatar simplycloud avatar steckmey avatar tuedtran avatar wbingli avatar wjbuys avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

blox's Issues

Flaky Test?

I'm still new to Go and the testing framework being used, but sometimes when I run make it will fail with 1 error and this looks like it might be related.

--- PASS: TestRunLoadInstancesReturnsError (0.00s)
panic: Fail in goroutine after TestOverlappingRunInvocationsAreSkipped has completed

goroutine 9 [running]:
panic(0x51cce0, 0xc42016e910)
	/usr/local/go/src/runtime/panic.go:500 +0x1a1
testing.(*common).Fail(0xc4200a4540)
	/usr/local/go/src/testing/testing.go:412 +0x11f
testing.(*common).FailNow(0xc4200a4540)
	/usr/local/go/src/testing/testing.go:431 +0x2b
testing.(*common).Fatalf(0xc4200a4540, 0x5fef7b, 0x24, 0xc4201a6390, 0x3, 0x3)
	/usr/local/go/src/testing/testing.go:496 +0x83
github.com/blox/blox/vendor/github.com/golang/mock/gomock.(*Controller).Call(0xc42027f200, 0x54faa0, 0xc4203f2570, 0x5ee0c6, 0x9, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/Users/williamthurston/go/src/github.com/blox/blox/vendor/github.com/golang/mock/gomock/controller.go:113 +0x452
github.com/blox/blox/cluster-state-service/handler/mocks.(*MockTaskLoader).LoadTasks(0xc4203f2570, 0x26, 0x0)
	/Users/williamthurston/go/src/github.com/blox/blox/cluster-state-service/handler/mocks/taskloader_mocks.go:45 +0x73
github.com/blox/blox/cluster-state-service/handler/reconcile.(*Reconciler).RunOnce(0xc420404d20, 0x0, 0x0)
	/Users/williamthurston/go/src/github.com/blox/blox/cluster-state-service/handler/reconcile/reconciler.go:89 +0xbf
github.com/blox/blox/cluster-state-service/handler/reconcile.(*Reconciler).Run.func1(0xc420404d20)
	/Users/williamthurston/go/src/github.com/blox/blox/cluster-state-service/handler/reconcile/reconciler.go:69 +0x2f
created by github.com/blox/blox/cluster-state-service/handler/reconcile.(*Reconciler).Run
	/Users/williamthurston/go/src/github.com/blox/blox/cluster-state-service/handler/reconcile/reconciler.go:73 +0x1c6
FAIL	github.com/blox/blox/cluster-state-service/handler/reconcile	0.092s

daemon-scheduler constraint doesn't hold in manual start of the container

I was testing daemon-scheduler locally and observed this behavior:

  • I created an environment and deployment for the daemon-scheduler. The service involved in ECS currently has 0 container running. There are 3 hosts.
  • The daemon deployed and 1 copy of the container is running on each host.
  • I ssh'd onto one of the hosts, killed the container, looking at the log, as expected, a new one got started. Also worked when I added a new host to the cluster.
  • I manually started a container of the same service on one of the hosts, and both containers continue to run. Here I (kinda) expect the daemon would step in and enforce the constraint by removing one?
  • I killed off the container (started by the daemon) and leave the one (started by me) running, a new one got started by the daemon again, and 2 copies are still running.

Is my assumption of how this should work wrong?

[Proposal] Support Identity for scheduled services

Description

This feature provides the ability to associate specific resources to a task that survives task restarts, such as a unique token, DNS hostname or data volume.

Motivation

Strong identity is cornerstone in running persistent applications that have a notion of membership. Identity is provided by either supplying the same token at launch/relaunch of a task instance, providing the same DNS name upon task relaunch, or by making sure the state of a task instance (e.g., data volumes) moves along with it.

Use cases

  • Ability to run clustered applications like Kafka. For example, Kafka requires a broker id when it’s launched. When the broker exits and needs to be relaunched, it should launch with the state such as broker id and the broker data.
  • Ability to run applications that advertise their location. Peers of this application would need to reach each other to form a cluster. When a peer exits and gets relaunched it would need to get the same DNS name so that other peers are able to reach it.

CI setup

Setup Travis CI for triggering builds upon PR request, checkins

[Proposal] Support highly available Blox deployments

Description

Currently Blox is a single instance application stack that can be run locally or orchestrated by ECS. In order to provide better resiliency to failures, every component in the Blox framework should be highly available. Blox is made of stateless services along with a stateful datastore. All the components should be configured to restart automatically upon exit in order to improve the reliability. Also the components should be run in a replicated manner to make them redundant, so that when one instance fails, others can take over the responsibility. Etcd, which is the datastore in the stack, should be setup to run in a clustered setup.

Motivation

In order to improve the production readiness of Blox framework, single points of failure need to be eliminated and a redundant system design put in place to offer an acceptable level of uptime for the users.

Use cases

  • Able to achieve high degree of operational uptime when deploying Blox in production.

Support retention of terminal records independent of ECS retention

Currently ECS cleans up terminal cluster state, like stopped tasks, from its database after a certain period of time. When cluster-state-service reconciles, and learns about the terminal records being cleaned up by ECS, these records are also cleaned up from local store. We should look into providing a configurable retention period of terminal records so that the data is available for auditing, debugging purpose for a longer period of time.

Add metrics to daemon-scheduler

daemon-scheduler should gather metrics to help assess the operational characteristics and expose it via an API endpoint. For example, metrics such as number of deployments created, number of environments, scheduling latency.

[Proposal] Support for Mesos Scheduling Frameworks

Description

This feature supports Mesos scheduling frameworks, such as Marathon and Chronos, with Amazon ECS.

Motivation

The Mesos ecosystem has scheduling frameworks that cover a wide range of use cases. This feature lets customers choose from existing schedulers that meet their needs. It also enables customers that already use Mesos scheduling frameworks to continue to use those frameworks without the need to operate a resource manager.

Use cases

  • Allows customers to use Mesos scheduling frameworks with Amazon ECS.
  • Supports the same scheduling framework in multiple environments.

Kinesis in CFN template

Add Kinesis stream option to the cloud formation template and ensure it is used and tested in cucumber tests.

Support multiple data stores

Are there any plans for the cluster-state-service to support other data stores such as Consul?

On a side-note, is there a better place to ask questions like these?

[Proposal] Support for Kubernetes

Description

This feature supports Kubernetes framework.

It is currently possible to run Kubernetes on AWS but the whole setup is complex and having it as a managed service integrated with AWS would be great.

EDIT: As it is mentioned in comments supporting Kubernetes APIs is a better framing of this request.

Motivation

Kubernetes is a very popular framework for running containers. It has a big community and tooling around it.

Use cases

Allow the usage of Kubernetes container scheduler and any other of it's features that can fit into ECS.

List Resources API

Provide an API that will return the available and remaining resources in a container instance.

Support environment updates

The daemon-scheduler api and demo cli don't seem to support updating environments. I've been deleting, stopping tasks manually, and recreating when I modify a task definition. Updating an environment to a new revision and having changes propagate to existing deployments would be useful.

[Bug] Fix resource information in CSS instance API responses

Resource field 'value' in responses to instance API calls using curl (Ex. curl -v http://localhost:3000/v1/instances) is always null.

"registeredResources": [
{
"name": "CPU",
"type": "INTEGER",
"value": null
},
{
"name": "MEMORY",
"type": "INTEGER",
"value": null
},
{
"name": "PORTS",
"type": "STRINGSET",
"value": null
},
{
"name": "PORTS_UDP",
"type": "STRINGSET",
"value": null
}
],
"remainingResources": [
{
"name": "CPU",
"type": "INTEGER",
"value": null
},
{
"name": "MEMORY",
"type": "INTEGER",
"value": null
},
{
"name": "PORTS",
"type": "STRINGSET",
"value": null
},
{
"name": "PORTS_UDP",
"type": "STRINGSET",
"value": null
}
]

Delete environment

Currently the delete environment API deletes the environment regardless of whether the environment has running tasks. Add a check to prevent the deletion of an "active" environment, ie one that has running tasks, and provide a force delete environment option that stops the tasks before deleting the environment.

This will also help with test cleanup as deleting the environment after the test runs will clean up its artifacts (tasks started by the environment) as well.

Support for multiple accounts

Today, CSS consumes data for clusters within a single account. We should add support for multiple accounts.

Also, since we deal with a single account, APIs referring to entities (like cluster, etc.) by just names and not ARNs work fine. We'll have to figure out a way for data disambiguation when we support multiple accounts.

[Proposal] Support Time based Task deployments

Description

This feature provides the ability to launch task definitions at a specified time or frequency using a cron-like syntax.

Motivation

Many system maintenance and batch jobs need to be automatically run at a specified time or frequency.

Use cases

  • Ability to aggregate data in a database for reporting, auditing, etc.
  • Ability to backup data at a particular time.
  • Ability to launch resource intensive task during lower utilization periods.

Implement the existing AWS API for all list and describe calls

Rather than a separate client and separate API I'd prefer to just use the existing AWS SDK and command line tools to query my cluster state from this system, just without the risk of throttling that exists when speaking to the hosted API.

It'd also be cool if it implemented the various write APIs as well and forwarded them, but that's probably less valuable for me.

cluster-state-service task is failing

Hi ,
I am using AWS installation mentioned in https://github.com/blox/blox/tree/dev/deploy#local-installation.
But after successfully running the service, i am still seeing below error for bloxoss/cluster-state-service:0.1.0 task is failing with below error :
2016-12-12T02:06:29Z [INFO] Reconciler loading tasks and instances
2016-12-12T02:07:29Z [CRITICAL] Error starting event stream handler: Error bootstrapping: Failed to reconcile. Could not load tasks.: Error loading tasks from data store: Error loading tasks from store: Context deadline is exceeded: context deadline exceeded

Am i missing something here?

All other task are running fine.
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
00999f64e828 bloxoss/daemon-scheduler:0.1.0 "/daemon-scheduler --" 5 minutes ago Up 4 minutes 0.0.0.0:32770->2000/tcp ecs-BloxFramework-1-scheduler-8cb28feefab9e48c5900
ede79b28d18c quay.io/coreos/etcd:v3.0.13 "/usr/local/bin/etcd " 5 minutes ago Up 5 minutes 2379-2380/tcp ecs-BloxFramework-1-etcd-faba859abcfba88e4e00
0c5c1af7f37d amazon/amazon-ecs-agent:latest "/agent" 2 days ago Up 2 days ecs-agent

Input validation

Validate input parameters like cluster name, task definition and deployment tokens to daemon-scheduler APIs.

Support unlimited cluster-state-service task filter combinations

Currently we only allow the cluster and status filter combination to be used on the cluster-state-service /v1/tasks API method. You can't filter by cluster and startedBy or all three filters. We should re-factor the css filter tasks method to support any possible filter combination. This will future proof us for adding new filters down the road.

[Proposal] Create Web User Interface

Description

This feature is to provide a web user interface for consuming, visualizing, and modifying the Blox state.

Motivation

Having a web UI would be a nice addition for visualizing cluster state and managing deployed environments. Please submit comments for any features you would like to see in the web UI.

Use cases

  • View cluster-state-service instances and tasks.
  • View daemon-scheduler environments and deployments.

Support version parameter in Streaming API

Streaming API in cluster-state-service allows any consumer to listen for state changes in a streaming fashion. This API should support a fromVersion parameter so that any consumer can catch up to the events from the point where they dropped off.

Filter based on remaining resources

Provide an API that can be used to search for container instances with available resources. For example, query to identify instances with 128 M and 2 CPUs available. This can be used by schedulers to quickly identify possible container instances to place a task on.

Add metrics to cluster-state-service

cluster-state-service should gather metrics to help assess the operational characteristics and expose it via an API endpoint. For example, metrics such as number of ECS events processed, number of reconciliations, last reconciliation time, next reconciliation time.

Metadata API

Add a metadata API to daemon-scheduler which prints version info and any other metadata that we want to expose.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.