TaskCluster - API Design

This repository outlines the formal API design for the TaskCluster.

Design Goals

AMQP is only for message exchange, not keep alive, state or data, we should only use it for events that have relevance now.
State of tasks should be eventually consistent on S3, data that doesn't change should be directly uploaded to S3.
Database should hold state while task is executed, once tasks are resolved completed or failed they can be removed from the database.
Keep it simple
Be dynamic initially, while allowing for less dynamic behavior as we evolve the system.

Terminology

Task, a unit of work executed by the task cluster.
Artifact, a result generated by a worker
Queue, place that hold state of pending and running tasks and ensures they are eventually scheduled (or fails if times out).
Resolution, a task is resolved once, the 2 resolved states are 'completed' and 'failed', if a task gets canceled it fails.

Common Identifiers

Whenever, we talk about an id it is a with the exception of run-id an string of at most 36 alpha-numeric characters (plus _ and -).

Name	Description
`task-id`	Identifies a unique task
`run-id`	Identifies a run of a task (this is an integer, max 999)
`worker-group`	Identifies a group of workers
`worker-id`	Identifies a specific worker within a group
`provisioner-id`	Identifies a provisioner
`worker-type`	Identifies a worker type for a given provisioner

Note, that worker-id and worker-type are not globally unique, they are merely identifiers within a given worker-group and a provisioner-id.

The only identifiers that is assigned by the queue is the run-id and task-id. The run-id is assigned by the queue, because we want them to be numerically increasing sequence per task. The task-id assigned by the queue to ensure random uuid as a security measure. Non random task-id's would allow new tasks to overwrite older tasks, which is a security issue.

The other identifiers are dynamically allocated when you call into the queue. For example if you want to add a new machine type and provisioner, you just give them a unique name and start submitting tasks for them.

By convention non-uuid identifiers should be either prefixed by irc nickname of the person who invented it, or registered in queue documentation to ensure that they are unique. For details see the Future Security Design section below.

Worker identification, the alert reader will notice that a worker is identified by two ids worker-group and worker-id. In this case a group of workers could identify a master node that manages a cluster of specialized hardware. The worker-group could also identify multi-core EC2 instance under which each worker-id identifies a process. The worker-group identifier is often be useful for routing, where as this worker-id (in combination with worker-group) will identify a process, specialized hardware node or folder within which the task ran.

Task Status Structure

The task status structure contains all data stored the queue about a task. The purpose of this structure is track the state of a task until it is resolved.

{
  "task_id":            // Unique task identifier
  "provisioner_id":     // Provisioner identifier
  "worker_type":        // Type of worker to be provisioned by provisioner
  "runs": [
    {
      "run_id":         // run-id, an integer starting from 1
      "worker_group":   // worker group identifier
      "worker_id":      // worker identifier
    }
  ],
  "state":              // pending|running|completed|failed
  "reason":             // String such as none, retries-failed, timeout, canceled
  "routing":            // Task specific routing keys
  "retries":            // Number of retries left
  "priority":           // Double relative priority
  "created":            // Creation time (ISO 8601)
  "deadline":           // Deadline for resolution after this either failed or completed
  "taken_until":        // Time until it reverses from running to pending
}

The actual task definition, results and logs should be stored in S3, the queue will sign urls for the worker so it can upload these files without AWS credentials.

Future Security Design

As the system evolves we may want shift from ensuring identifier uniqueness by convention. Specifically, we will probably want provisioners to register with the queue and provide a JSON schema of task payloads they accept, as well as define a set of oauth scopes is required to post tasks for workers provisioned by the provisioner...

Essentially, we'll need to lock down the system so that there are different scopes for posting and consuming tasks with a given provisioner-id. Additionally, registering JSON schemas for each worker-type would allow us to reject invalid tasks much sooner.

This document leaves these security considerations as future work, as initially we'll just want something fairly dynamic.

taskcluster / apidesign Goto Github PK

apidesign's Introduction

TaskCluster - API Design

Design Goals

Terminology

Common Identifiers

Task Status Structure

Future Security Design

apidesign's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent