Giter VIP home page Giter VIP logo

alcor's Introduction

"This Project has been archived by the owner, who is no longer providing support. The project remains available to authorized users on a "read only" basis."

Build Status codecov License: MIT GitHub release Percentage of issues still open Average time to resolve an issue

Alcor

A Hyperscale Cloud Native SDN Platform

In this README:

Introduction

Cloud computing means scale and on-demand resource provisioning. As more enterprise customers migrate their on premise workloads to the cloud, the user base of a cloud provider could grow at a rate of 10X in just a few years. This will require a cloud virtual networking system with a more scalable and extensible design. As a part of the community effort, Alcor is an open-source cloud native platform that provides high availability, high performance, and large scale virtual networking control plane and management plane at a high resource provisioning rate.

Alcor leverages the latest SDN and container technologies as well as an advanced distributed system design to support deployment, configuration and scale-out of millions of VM and containers. It is built based on a distributed micro-services architecture with a uniform way to secure, connect, and monitor control plane micro-services, and fine-grained control of service-to-service communication including load balancing, retries, failovers, and rate limits. Alcor also offers a way to unify VM and container networking management, and ensures ultra-low latency and high throughput due to its application aware fast path when provisioning containers and serverless applications.

The following diagram illustrates the high-level architecture of Alcor control plane.

Alcor architecture

Detailed design docs:

Key Features

Cloud-Native Architecture

Alcor leverages Kubernetes and Istio to build its distributed micro-services architecture. Depending on the control plane load, Alcor Controller scales out with multiple instances and each instance is a Kubernetes application. One step further, each application contains various infrastructure microservices to manage different types of network resources.

Throughput-Optimal Design

Alcor focuses on top-down throughput optimization on every system layer including API, Controller, messaging mechanism, and host agent. For example, a batch API is provided to support deploying a group of ports with a single POST call, and a message batching mechanism is proposed on a per-host basis, which is capable of driving groups (potentially thousands) of resources to the same host in one shot.

Fast Resource Provisioning

To support time-critical applications, Alcor enables a direct communication channel from Controller to Host Agent. This channel bypasses a message queueing system like Kafka, and utilizes gRPC to offer 10x latency improvement compared to Kafka.

Planned Features

A list of planned features is included our current roadmap. Some highlighted items:

  1. Major VPC features (e.g., security group, ACL, QoS)
  2. Controller services break-down
  3. Compatibility with OVS
  4. Controller grey release
  5. Performance comparison with Neutron and many more...

Repositories

The Alcor project is divided across a few GitHub repositories.

  • alcor/alcor: This is the main repository of Alcor Regional Controller that you are currently looking at. It hosts controllers' source codes, build and deployment instructions, and various documents that detail the design of Alcor.

  • alcor/alcor_control_agent: This repository contains source codes for a host-level stateless agent that connects regional controllers to the host data-plane component. It is responsible for programming on-host data plane with various network configuration for CURD of VPC, subnet, port, Security group etc., and monitoring network health of containers and VMs on the host.

  • alcor/integration: The integration repository contains codes and scripts for end-to-end integration of Alcor control plane with popular orchestration platforms and data plane implementations. We currently support integration with Kubernetes (via CNI plugin) and Mizar Data Plane. We will continue to integrate with other orchestration systems and data plane implementations.

  • alcor/meeting: The meeting repository is used to store all the meeting notes and recorded video clips for the Alcor Open Source project.

Directory Structure

This main repository of Alcor Regional Controller is organized as follows:

  • docs: design, test and api documentation
  • lib: common libraries shared among various microservices.
  • services: api gateway and multiple microservices. Each sub-directory includes both source and testing codes.
  • web: define customer-facing web objects.
  • schema: define protobuf schema for agent-control communication.
  • config: configuration files used by controllers.
  • scripts: build, deployment and testing scripts
  • kubernetes: yaml files used for controller deployment in a Kubernetes cluster.
  • legacy: legacy codes of controller that has been retired

alcor's People

Contributors

chenpiaoping avatar cj-chung avatar cvedetect avatar davidliu506 avatar dependabot[bot] avatar er1cthe0ne avatar eric-yuan avatar gzure avatar haboy52581 avatar issacyxw avatar kevin-zhonghao avatar kimeunju108 avatar kiran1048 avatar phudtran avatar pkommoju avatar skdong avatar songxiaoyan avatar vanderchen avatar yanmo96 avatar zhdgao avatar zmn223 avatar zzxgzgz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alcor's Issues

Control/data plane E2E test setup

Test framework development and environment setup for the following two scenarios:

  1. One VPC (with one transit router) + two subnets (each with one transit switch and two endpoints)

  2. One VPC (with two transit router) + two subnets (each with two transit switch and four endpoints)

[Micro-service] Route manager

Requirement of Route manager:

  • Expose Route CURD API
  • Manage routing rules for VPC and subnet
  • Interact with vpc manager and subnet manager for creation of vpc and subnet, respectively
  • Deployed as a K8s service

[Micro-service] VPC manager

Requirement of VPC manager:

  • Expose VPC CRUD API
  • Manage the lifecycle of VPC resource
  • Interact with other microservices including route managers and subnet manager
  • Deployed as a K8s service

Logging configuration

More configuration should be available such as:

  • log engine e.g.) java logger or log4j
  • log file size

Data model improvement: split to customer request, response, and data persistence

Requests

The current Alcor is using a simplified data model where customer request input, request response, persisted data, and message exchanged among microservices remain very similar. This simplified data model serves its purpose to some extend. It is time for us to revisit and improve data model.

  • D1: Customer request input is from end-customers which only contains information available from customer or admin users
  • D2: Request response to customer request
  • D3: Data persisted to database usually contains a superset of D1 (plus resource allocated by control plane but invisible to customers).
  • D4: Message exchanged among microservices is provided by resource owners (e.g., Port Manager) and contains information required by other microservices (e.g., Subnet Manager). It is a subnet of D3.

[Micro-service] Node manager

Summary
The node manager is a control plane component responsible of managing and configurating physical servers/nodes/hosts inside a data center. It maintains the detailed configuration for each node and collect its health state continuously in a configurable interval. The node health states could be used collectively in many user scenarios, for example, determine if a deployment of network resource (for example, port) is successful, or if a load balancing backend is reachable.

User Cases

  • Port manager talks to node manager to pull node details from a node id (UUID).

Basic Requirements

  1. Manage mapping from node id to node info (including IP, mac, veth)
  2. Populating the node mapping in the controller launch time (through a configuration file) during new region or data center buildout
  3. Expose a CURD API to register, update, or delete a node.
  4. Define a health model, data schema and APIs for node health
  5. Working together with other services including port manager.
  6. Require HA and high scalability as 100,000+ nodes could upload health data continuously.

Advanced Requirements

  1. Node info extensible to more node resources including FPGA etc.
  2. Fall back mechanism when a node state is stall (e.g. proactively pull instead of waiting for agent to push)

Support deletion port E2E scenario

ACA change is completed by this PR (futurewei-cloud/alcor-control-agent#166). We will need PM to determine if it is a port delete for both scenarios below and then send down the port delete with the corresponding neighbor delete (same logic as port create and send down neighbor table). See here for context: futurewei-cloud/alcor-control-agent#166 (comment)

first scenario to support VM creation without specifiying a port, then delete the VM

  1. Create a VM giving it a created subnet but not port
  2. Neutron will create a port for the new VM
  3. Delete the above VM
  4. Controller receives /delete request from clients and will issue a DELETE GS to ACA
  5. Action item - needs to confirm DPM can send down the corresponding delete port to ACA
  6. Action item - ACA add support for port delete operation

second scenario to support VM created with specifiying a port, then delete the VM

  1. Create new port on the horizon UI
  2. Create a VM using the new port above
  3. Delete the above VM
  4. Controller receives a /update request from clients with no device_id and device_owner, and controller passes the update request to ACA
  5. Action item - needs to update port.proto to include device ID and device owner fields (status: DONE)
  6. Action item - needs to confirm DPM can send down the corresponding update port to ACA (status: DONE, PR #571)
  7. Action item - ACA add support for port update with device ID and device owner fields = null (status: DONE, ACA PR #227)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.