topfreegames / maestro Goto Github PK

View Code? Open in Web Editor NEW

112.0 19.0 15.0 10.92 MB

Maestro: Game Room Scheduler

License: MIT License

Go 99.49% Makefile 0.34% Dockerfile 0.17%

kubernetes game-room scheduler

maestro's Introduction

WARNING: The version v9.x of Maestro is under deprecation, complete guide of the new version v10.x can be found here.

Maestro

Dedicated game server management service, designed to manage multiple game servers fleets in isolated schedulers.

What does the module do?

A Game Room is a dedicated game server that runs in a match execution context, a group of game rooms (fleet) are organized in a Scheduler.

Maestro manage multiple game rooms fleets with user-created custom specifications (schedulers).

What problem is solved?

Maestro orchestrate game rooms fleets according to user specifications (schedulers), in which it can maintain a fixed number of game rooms up and running, or can use autoscaling policies for costs optimization.

Ideally, Maestro should be used by games that implement dedicated game server (dgs) architecture.

Recommended Integration Phase: Alpha

This module is not required for prototyping, but it is recommended to include it in the alpha phase so future development and deployment of new game room versions can be better managed.

Dependencies

Maestro does not have any dependencies, but it provides an events forwarding feature that can be used to integrate with external matchmaking services.

How is the problem solved?

Currently, the only runtime that Maestro supports is Kubernetes.

With a scheduler, the user can define how a game room can be built and deployed on the runtime. Each scheduler manages a fleet of game rooms in isolated namespaces.

Usually a scheduler will define which docker image will be used, its commands, resources that need to be allocated (cpu and memory), and other parameters for fleet management.

Every action that Maestro does on runtime to manage its resources is encapsulated in what we call an Operation. An Operation can be triggered by the user or by Maestro itself. Maestro uses a queue to control operations flow and disposes of a worker that keeps processing operations.

Maestro provides two APIs for external and internal communication:

management-api: used by users to manage schedulers and operations.
rooms-api: used by game rooms to report its status back to Maestro.

Additional information

Documentation can be found in the docs folder. This module is supported by the Wildlife's multiplayer team.

Position	Name
Owner Team	Multiplayer Team
Documentation Owner	Guilherme Carvalho

maestro's People

Contributors

Stargazers

Watchers

Forkers

cscatolini knelasevero lftakakura rodopoulos capella rodrigolck ramonberrutti jangocheng ebpf-verifier jaywildlife hengerbean jjh692881875 sysadminxxx manuellysuzik hspedro

maestro's Issues

Add support to nodeAffinity

Reference:

https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature

We need that to complement taints/tolerations.

Example from our monitoring pods:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: prometheus
  name: prometheus
  namespace: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      name: prometheus
      labels:
        app: prometheus
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: "monitoring-dedicated"
                  operator: In
                  values: ["true"]
      tolerations:
      - key: "dedicated"
        operator: "Equal"
        value: "monitoring"
        effect: "NoSchedule"
      containers:
      - name: prometheus
        image: quay.io/coreos/prometheus:v1.1.1
        ports:
        - name: web
          containerPort: 9090

This deployment will tolerate the taint dedicated:monitoring=NoSchedule while also only scheduling pods into nodes that has the selector monitoring-dedicates=true

Taints and tolerations alone can't guarantee in which node the pods will be scheduled.

keep tracking of the number of scale ups and scale downs

maybe report it to redis or to a statsd gauge

GRPC forwarder crashes if the metadata has fields which aren't strings

The problem is at https://github.com/topfreegames/maestro/blob/master/plugins/grpc/forwarder.go#L124

The forwarder always tries to cast the received metadata fields to string without checking their type first

Rooms are not restarted when forwarders session on scheduler is updated

maestro needs a ui

it would be nice to have a good visualization of everything that's happening

update does not seem to be working as expected

today I tried to update a scheduler configuration and got the following:

➜  fpsmultiplayer git:(maestro_master) ✗ maestro update docker/schedulerv2.yaml
Status: 500
Response: {"code":"MAE-003","description":"timeout during room deletion","error":"error when deleting old rooms. Maestro will scale up, if necessary, with previous room configuration.","success":false}

taking a look here:

➜  fpsmultiplayer git:(maestro_master) ✗ kprod get pods -n overkill-v2
NAME                   READY     STATUS    RESTARTS   AGE
overkill-v2-34a5d12d   1/1       Running   0          1m
overkill-v2-56f719c6   1/1       Running   0          1m
overkill-v2-77c7cd73   1/1       Running   0          1m
overkill-v2-9dd97f7b   1/1       Running   0          1m
overkill-v2-cd7ea664   1/1       Running   1          10h
overkill-v2-d6f6762b   1/1       Running   0          47m
overkill-v2-df7d6a5e   1/1       Running   0          10h
overkill-v2-e46e5780   1/1       Running   1          10h
overkill-v2-e72c7149   1/1       Running   0          5h
overkill-v2-ebcc99a9   1/1       Running   0          1m

it seems that maestro killed some pods but also left others running and didn't even tried to delete them.

we could use some form of self-healing here

Maestro should delete a pod if it can't start because of PodFitsHostPorts

Maestro is not behaving as expected when it chooses a hostPort that is not available in any of the kubernetes nodes, in this cases it should edit the pod to change the chosen hostPort or delete it and create a new one to trigger another random port get

Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  49s (x26 over 8m)  default-scheduler  0/10 nodes are available: 1 Insufficient cpu, 1 PodFitsHostPorts, 1 PodToleratesNodeTaints, 9 MatchNodeSelector.

route to scale manually

we need a way to scale the gru's manually, it's useful and I don't see why not.

Maestro moving to v10.x with breaking changes

Summary

Hey everyone!
We're deprecating the current version of maestro (v9) in favor of a new major change (v10).
The v10 is a revamp of Maestro, having a totally different codebase and architecture. Although some endpoints have correlations, the structure is different and this new version is incompatible with the current one at almost all levels.

Related libs

Maestro client is not affected by this release;
Maestro cli is affected, having a new version too.

v10 adoption

After being released, the v10 will still miss some already planned features, like autoscaling. That said, depending on your need, it's better to wait for these features to come on new minor versions through Q2 and Q3, although we still recommend using v10 if it's possible.

v9 end of life

The v9 code will still be hosted in the "v9" branch in the repository.

We will keep the v9 support after the v10 is released (more details below). However, after that period, the version will no longer receive any updates.

Here is the complete timeline of the v9 next steps:

Date	Status	Comments
2022-02-14	Deprecated	Only security and performance fixes will be accepted. The package will be marked as “deprecated”, and V10 will be recommended.
2022-12-14	Inactive	Only critical security fixes will be accepted.
2023-06-14	Read-only	No more code changes will be accepted, and the “v9” branch will become read-only.

Issues and pull requests

All issues regarding v9 will need to be labeled. Then, a contributor member will review them to see if they will be accepted following the v9 status (Deprecated, Inactive, and Read-only).

cache room host and port

It would be useful to cache room address and port so not access kubeAPI on ping/status/roomEvent

Do you have an example of this repo

Multiple schedulers

It would be better to start a goroutine for each scheduler on controller to parallelize deployment.

Update on new affinity and toleration

During scheduler update, pods must be rescheduled no new nodes (deleted and recreated) when affinity and toleration changes.

Just add new case on function MustUpdatePods in controller/controller.go

Maestro needs to watch for pods with state!=Running

We need to replace them

Demo case for maestro

Yes, we all know that a good demo would make our potential users learned what it is and how to use it. Just like agones project has created some demo games, e.g. paddle-soccer, and a series of blog posts, I wish our projects could also have sth like this!

Thanks.