Giter VIP home page Giter VIP logo

k8sclustermanagers.jl's Introduction

K8sClusterManagers.jl

CI codecov Docs: stable Docs: development

A Julia cluster manager for provisioning workers in a Kubernetes (K8s) cluster.

Pairs well with julia_pod for interactive Julia development within a K8s pod.

K8sClusterManager

The K8sClusterManager is intended to be used from a Pod running inside a Kubernetes cluster.

Assuming you have kubectl installed locally and configured to connect to a cluster, you can easily create an interactive Julia REPL session running from within the cluster by executing:

kubectl run -it example-manager-pod --image julia:1

Or equivalently, using a K8s manifest named example-manager-pod.yaml containing:

apiVersion: v1
kind: Pod
metadata:
  name: example-manager-pod
spec:
  containers:
  - name: manager
    image: julia:1
    stdin: true
    tty: true

and running the following commands will also create a Julia REPL running inside a Kubernetes Pod:

kubectl apply -f example-manager-pod.yaml

# Once the pod is running
kubectl attach -it pod/example-driver-pod

Now in this Julia REPL session, you can do add two workers via:

julia> using Pkg; Pkg.add("K8sClusterManagers")

julia> using K8sClusterManagers

julia> addprocs(K8sClusterManager(2))

Advanced configuration

K8sClusterManager exposes a configure keyword argument that can be used to make modifications to the Pod manifest when defining workers.

When launching the cluster the function configure(pod) will be called where pod is an dict-object representing the YAML/JSON Pod manifest. The function must return an object of the same type. For example if you wanted to change the workers to require GPU resources you could write the following:

function my_gpu_configurator(pod)
    worker_container = pod["spec"]["containers"][1]
    worker_container["resources"]["limits"]["nvidia.com/gpu"] = 1
    return pod
end

To get an example instance of pod objects that might be passed into the configure, call

using K8sClusterManagers, JSON
pod = K8sClusterManagers.worker_pod_spec(manager_name="example", image="julia", cmd=`julia`)
JSON.print(pod, 4)

Useful Commands

Monitor the status of all your Pods

watch kubectl get pods,services

Stream the stdout of the worker "example-driver-pod-worker-9001":

kubectl logs -f pod/example-driver-pod-worker-9001

Currently cleaning up after / killing all your pods can be slow / ineffective from a Julia context, especially if the driver Julia session dies unexpectedly. It may be necessary to kill your workers from the command line.

kubectl delete pod/example-driver-pod-worker-9001 --grace-period=0 --force=true

It may be convenient to set a common label in your worker podspecs, so that you can select them all with -l='...' by label, and kill all the worker Pods in a single invocation.

Display info about a Pod -- this is especially useful to troubleshoot a Pod that is taking longer than expected to get up and running.

kubectl describe pod/example-driver-pod

Troubleshooting

If you get deserialize errors during interations between driver and worker processes, make sure you are using the same version of Julia on the driver as on all the workers!

If you aren't sure what went wrong, check the logs! The syntax is

kubectl logs -f pod/pod_name

where the Pod name pod_name you can get from kubectl get pods.

Testing

The K8sClusterManagers package includes tests that are expect to have access to a Kubernetes cluster. The tests should be able to be run in any Kubernetes cluster but have only been run with minikube.

Minikube

  1. Install Docker or Docker Desktop
  2. If using Docker Desktop: set the resources to a minimum of 3 CPUs and 2.25 GB Memory
  3. Install minikube
  4. Start the Kubernetes cluster: minikube start
  5. Use the in-cluster Docker daemon for image builds: eval $(minikube docker-env) (Note: only works with single-node clusters)
  6. Run the K8sClusterManagers.jl tests

k8sclustermanagers.jl's People

Contributors

omus avatar christopher-dg avatar ericphanson avatar kimlaberinto avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.