polyaxon / charts Goto Github PK

Helm charts for creating reproducible and maintainable deployments of Polyaxon with Kubernetes.

License: Apache License 2.0

Smarty 89.31% Shell 7.57% Python 1.70% Mustache 1.42%

kubernetes helm helm-charts deep-learning tensorflow pytorch scikit-learn machine-learning polyaxon helm-chart mlops gitops k8s distributed-systems

charts's Introduction

Polyaxon-charts

Polyaxon charts is a set of Helm charts for creating reproducible and maintainable deployments of Polyaxon and it's components with Kubernetes.

This repo includes:

Helm chart for self-hosted Polyaxon version
Helm chart for Polyaxon agent to be deployed with Polyaxon cloud or Polyaxon EE
Helm charts for distributed training operators.

Install

Setup Helm

To install the platform, the agent, or the training operator make sure you have Helm installed.

Namespace

$ kubectl create namespace polyaxon

namespace "polyaxon" created

Polyaxon charts repo

You can add the Polyaxon charts repo repository to your helm, so you can install Polyaxon and other charts provided by Polyaxon from it. This makes it easy to refer to the chart without having to use a long URL each time.

$ helm repo add polyaxon https://charts.polyaxon.com
$ helm repo update

TFJob/PytorchJob/MXJob/XGBoostJob/MPIJob

To install the distributed training jobs:

helm install plxtj polyaxon/trainingjobs --namespace=polyaxon

Deploying Polyaxon Agent to a Kubernetes namespace

The agent chart can be installed on a single node or on a multi-node cluster, in which case you need to provide a volume with ReadWriteMany or a cloud bucket for the artifacts store.

For more information, please visit the docs for the agent deployment.

Deploying Polyaxon to a Kubernetes namespace

This platform chart can be installed on a single node or on a multi-node cluster, in which case you need to provide a volume with ReadWriteMany or a cloud bucket for the artifacts store.

For more information, please visit the docs for the platform deployment.

The platform chart bootstraps a Polyaxon deployment on a Kubernetes cluster using the Helm package manager.

It also packages a postgres dependency for Polyaxon (we recommend that you bring your own postgres instance instead of using the built-in subchart):

PostgreSQL

If you deploy the scheduler or deploy our enterprise version you will need Redis (we recommend that you bring your own postgres instance instead of using the built-in subchart):

Redis

Uninstalling the Chart

To uninstall/delete the <RELEASE_NAME> deployment:

$ helm delete <RELEASE_NAME>

or with --purge flag

$ helm delete <RELEASE_NAME> --purge

The command removes all the Kubernetes components associated with the chart and deletes the release.

charts's People

Contributors

Stargazers

Watchers

charts's Issues

Failed to delete helm polyaxon

Hi guys,

i found difficulty when deleting polyaxon helm. The command is like this

helm delete polyaxon --purge

But, it's throw error Error: jobs.batch "polyaxon-clean-experiments" already exists

I hope someone can help me with this. Thank you

polyaxon-rabbitmq-ha-0 state is CrashLoopBackOff

Hello All,

It seems that the polyaxon-rabbitmq-ha-0 is not starting.

I used the following command to install the chart.

sudo polyaxon admin deploy -f config.yml

the log of the container is:

2020-03-08 02:19:38.666 [info] <0.8.0> Feature flags: list of feature flags found:
2020-03-08 02:19:38.666 [info] <0.8.0> Feature flags:   [ ] drop_unroutable_metric
2020-03-08 02:19:38.666 [info] <0.8.0> Feature flags:   [ ] empty_basic_get_metric
2020-03-08 02:19:38.666 [info] <0.8.0> Feature flags:   [ ] implicit_default_bindings
2020-03-08 02:19:38.666 [info] <0.8.0> Feature flags:   [ ] quorum_queue
2020-03-08 02:19:38.666 [info] <0.8.0> Feature flags:   [ ] virtual_host_metadata
2020-03-08 02:19:38.666 [info] <0.8.0> Feature flags: feature flag states written to disk: yes
2020-03-08 02:19:38.698 [info] <0.266.0> ra: meta data store initialised. 0 record(s) recovered
2020-03-08 02:19:38.698 [info] <0.271.0> WAL: recovering ["/var/lib/rabbitmq/mnesia/rabbit@polyaxon-rabbitmq-ha-0.polyaxon-rabbitmq-ha-discovery.polyaxon.svc.cluster.local/quorum/rabbit@polyaxon-rabbitmq-ha-0.polyaxon-rabbitmq-ha-discovery.polyaxon.svc.cluster.local/00000003.wal"]
2020-03-08 02:19:38.699 [info] <0.275.0>
 Starting RabbitMQ 3.8.0 on Erlang 22.1.5
 Copyright (C) 2007-2019 Pivotal Software, Inc.
 Licensed under the MPL.  See https://www.rabbitmq.com/

  ##  ##      RabbitMQ 3.8.0
  ##  ##
  ##########  Copyright (C) 2007-2019 Pivotal Software, Inc.
  ######  ##
  ##########  Licensed under the MPL.  See https://www.rabbitmq.com/

  Doc guides: https://rabbitmq.com/documentation.html
  Support:    https://rabbitmq.com/contact.html
  Tutorials:  https://rabbitmq.com/getstarted.html
  Monitoring: https://rabbitmq.com/monitoring.html

  Logs: <stdout>

  Config file(s): /etc/rabbitmq/rabbitmq.conf

  Starting broker...2020-03-08 02:19:38.699 [info] <0.275.0>
 node           : rabbit@polyaxon-rabbitmq-ha-0.polyaxon-rabbitmq-ha-discovery.polyaxon.svc.cluster.local
 home dir       : /var/lib/rabbitmq
 config file(s) : /etc/rabbitmq/rabbitmq.conf
 cookie hash    : z9Efk6foMzTWv7yMOCE7Sg==
 log(s)         : <stdout>
 database dir   : /var/lib/rabbitmq/mnesia/rabbit@polyaxon-rabbitmq-ha-0.polyaxon-rabbitmq-ha-discovery.polyaxon.svc.cluster.local
2020-03-08 02:19:38.712 [info] <0.275.0> Running boot step pre_boot defined by app rabbit
2020-03-08 02:19:38.712 [info] <0.275.0> Running boot step rabbit_core_metrics defined by app rabbit
2020-03-08 02:19:38.713 [info] <0.275.0> Running boot step rabbit_alarm defined by app rabbit
2020-03-08 02:19:38.715 [info] <0.281.0> Memory high watermark set to 244 MiB (256000000 bytes) of 64408 MiB (67536941056 bytes) total
2020-03-08 02:19:38.717 [info] <0.283.0> Enabling free disk space monitoring
2020-03-08 02:19:38.717 [info] <0.283.0> Disk free limit set to 50MB
2020-03-08 02:19:38.719 [info] <0.275.0> Running boot step code_server_cache defined by app rabbit
2020-03-08 02:19:38.719 [info] <0.275.0> Running boot step file_handle_cache defined by app rabbit
2020-03-08 02:19:38.719 [info] <0.286.0> Limiting to approx 65436 file handles (58890 sockets)
2020-03-08 02:19:38.720 [info] <0.287.0> FHC read buffering:  OFF
2020-03-08 02:19:38.720 [info] <0.287.0> FHC write buffering: ON
2020-03-08 02:19:38.720 [info] <0.275.0> Running boot step worker_pool defined by app rabbit
2020-03-08 02:19:38.720 [info] <0.276.0> Will use 16 processes for default worker pool
2020-03-08 02:19:38.720 [info] <0.276.0> Starting worker pool 'worker_pool' with 16 processes in it
2020-03-08 02:19:38.720 [info] <0.275.0> Running boot step database defined by app rabbit
2020-03-08 02:19:38.720 [info] <0.275.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit@polyaxon-rabbitmq-ha-0.polyaxon-rabbitmq-ha-discovery.polyaxon.svc.cluster.local is empty. Assuming we need to join an existing cluster or initialise from scratch...
2020-03-08 02:19:38.720 [info] <0.275.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2020-03-08 02:19:38.720 [info] <0.275.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2020-03-08 02:19:38.721 [info] <0.275.0> Peer discovery backend does not support locking, falling back to randomized delay
2020-03-08 02:19:38.721 [info] <0.275.0> Peer discovery backend rabbit_peer_discovery_k8s supports registration.
2020-03-08 02:19:38.721 [info] <0.275.0> Will wait for 1820 milliseconds before proceeding with registration...
2020-03-08 02:19:40.588 [info] <0.275.0> Failed to get nodes from k8s - {failed_connect,[{to_address,{"kubernetes.default.svc.cluster.local",443}},
                 {inet,[inet],nxdomain}]}
2020-03-08 02:19:40.589 [error] <0.274.0> CRASH REPORT Process <0.274.0> with 0 neighbours exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n                 {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from_config/0 line 140 in application_master:init/4 line 138
2020-03-08 02:19:40.589 [info] <0.43.0> Application rabbit exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n                 {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from_config/0 line 140
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,\"{failed_connect,[{to_address,{\\"kubernetes.default.svc.cluster.local\\",443}},\n                 {inet,[inet],nxdomain}]}\"}},[{rabbit_mnesia,init_from_config,0,[{file,\"src/rabbit_mnesia.erl\"},{line,140}]},{rabbit_mnesia,init_with_lock,3,[{file,\"src/rabbit_mnesia.erl\"},{line,120}]},{rabbit_mnesia,init,0,[{file,\"src/rabbit_mnesia.erl\"},{line,87}]},{rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,55}]},{rabbit_boot_steps,run_step,2,[{file,\"src/rabbit_boot_steps.erl\"},{line,59}]},{rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,28}]},{rabbit_boot_steps,run_boot_steps,1,[{file,\"src/rabbit_boot_steps.erl\"},{line,29}]},{rabbit,start,2,[{file,\"src/rabbit.erl\"},{line,975}]}]}}}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,"{failed_connect,[{to_address,{\"kubernetes.defau

Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done

Environment:

kubectl

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.6", GitCommit:"72c30166b2105cd7d3350f2c28a219e6abcd79eb", GitTreeState:"clean", BuildDate:"2020-01-18T23:31:31Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.6", GitCommit:"72c30166b2105cd7d3350f2c28a219e6abcd79eb", GitTreeState:"clean", BuildDate:"2020-01-18T23:23:21Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}

helm

Client: &version.Version{SemVer:"v2.15.1", GitCommit:"cf1de4f8ba70eded310918a8af3a96bfe8e7683b", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.15.1", GitCommit:"cf1de4f8ba70eded310918a8af3a96bfe8e7683b", GitTreeState:"clean"}

polyaxon-cli

---
Metadata-Version: 2.1
Name: polyaxon-cli
Version: 0.6.0
Summary: Command Line Interface (CLI) for Polyaxon.
Home-page: https://github.com/polyaxon/polyaxon-cli
Author: Mourad Mourafiq
Author-email: [email protected]
Installer: pip
License: MIT
Location: /usr/local/lib/python3.5/dist-packages
Requires: polyaxon-client, raven, pathlib, polyaxon-deploy, click, tabulate, click-completion, polyaxon-dockerizer
Classifiers:
  Programming Language :: Python
  Programming Language :: Python :: 2
  Programming Language :: Python :: 2.7
  Programming Language :: Python :: 3
  Programming Language :: Python :: 3.5
  Programming Language :: Python :: 3.6
  Programming Language :: Python :: 3.7
  Operating System :: OS Independent
  Intended Audience :: Developers
  Intended Audience :: Science/Research
  Topic :: Scientific/Engineering :: Artificial Intelligence
Entry-points:
  [console_scripts]
  polyaxon = polyaxon_cli.main:cli

ImagePullBackOff for in cluster Docker Registry

Hello All,

I can't start jobs/experiments/notebooks due to them freezing in the starting phase.

After a bit of looking into it I found out that the pods are failing in a ImagePullBackOff.

A quick describe yielded the following events:

Events:
  Type     Reason     Age                     From                 Message
  ----     ------     ----                    ----                 -------
  Normal   Scheduled  7m19s                   default-scheduler    Successfully assigned polyaxon/plx-notebook-cb72a35b76ea4c2fa8a65e795960346f-87f76cd6b-sxvhs to oodapow-pc
  Normal   Pulling    7m18s                   kubelet, oodapow-pc  Pulling image "polyaxon/polyaxon-init:0.6.1"
  Normal   Pulled     7m17s                   kubelet, oodapow-pc  Successfully pulled image "polyaxon/polyaxon-init:0.6.1"
  Normal   Created    7m17s                   kubelet, oodapow-pc  Created container polyaxon-init-job
  Normal   Started    7m16s                   kubelet, oodapow-pc  Started container polyaxon-init-job
  Warning  Failed     6m34s (x3 over 7m16s)   kubelet, oodapow-pc  Error: ErrImagePull
  Warning  Failed     5m55s (x5 over 7m15s)   kubelet, oodapow-pc  Error: ImagePullBackOff
  Normal   Pulling    5m41s (x4 over 7m16s)   kubelet, oodapow-pc  Pulling image "127.0.0.1:31813/quick-start_1:5ef3f22108184273bf4a696378c6a2d5"
  Warning  Failed     5m41s (x4 over 7m16s)   kubelet, oodapow-pc  Failed to pull image "127.0.0.1:31813/quick-start_1:5ef3f22108184273bf4a696378c6a2d5": rpc error: code = Unknown desc = failed to resolve image "127.0.0.1:31813/quick-start_1:5ef3f22108184273bf4a696378c6a2d5": no available registry endpoint: failed to do request: Head https://127.0.0.1:31813/v2/quick-start_1/manifests/5ef3f22108184273bf4a696378c6a2d5: http: server gave HTTP response to HTTPS client
  Normal   BackOff    2m15s (x19 over 7m15s)  kubelet, oodapow-pc  Back-off pulling image "127.0.0.1:31813/quick-start_1:5ef3f22108184273bf4a696378c6a2d5"

Environment:

kubectl

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.6", GitCommit:"72c30166b2105cd7d3350f2c28a219e6abcd79eb", GitTreeState:"clean", BuildDate:"2020-01-18T23:31:31Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.6", GitCommit:"72c30166b2105cd7d3350f2c28a219e6abcd79eb", GitTreeState:"clean", BuildDate:"2020-01-18T23:23:21Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}

helm

Client: &version.Version{SemVer:"v2.15.1", GitCommit:"cf1de4f8ba70eded310918a8af3a96bfe8e7683b", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.15.1", GitCommit:"cf1de4f8ba70eded310918a8af3a96bfe8e7683b", GitTreeState:"clean"}

polyaxon-cli

---
Metadata-Version: 2.1
Name: polyaxon-cli
Version: 0.6.0
Summary: Command Line Interface (CLI) for Polyaxon.
Home-page: https://github.com/polyaxon/polyaxon-cli
Author: Mourad Mourafiq
Author-email: [email protected]
Installer: pip
License: MIT
Location: /usr/local/lib/python3.5/dist-packages
Requires: polyaxon-client, raven, pathlib, polyaxon-deploy, click, tabulate, click-completion, polyaxon-dockerizer
Classifiers:
  Programming Language :: Python
  Programming Language :: Python :: 2
  Programming Language :: Python :: 2.7
  Programming Language :: Python :: 3
  Programming Language :: Python :: 3.5
  Programming Language :: Python :: 3.6
  Programming Language :: Python :: 3.7
  Operating System :: OS Independent
  Intended Audience :: Developers
  Intended Audience :: Science/Research
  Topic :: Scientific/Engineering :: Artificial Intelligence
Entry-points:
  [console_scripts]
  polyaxon = polyaxon_cli.main:cli

config.yml

rbac:
  enabled: false

serviceType: NodePort

broker: redis
rabbitmq-ha:
  enabled: false

Can't install on 1.16.2 from master (with 1.16 fix)

Hi,
Doing:

    git clone https://github.com/polyaxon/polyaxon-chart
    helm dependency update ./polyaxon
    helm install ./polyaxon --name=polyaxon --namespace=polyaxon -f ../polyaxon/config.yaml  --dry-run --debug

I get :

[debug] Created tunnel using local port: '38219'

[debug] SERVER: "127.0.0.1:38219"

[debug] Original chart version: ""
[debug] CHART PATH: /.../polyaxon-chart/polyaxon

Error: unable to recognize "": no matches for kind "Deployment" in version "extensions/v1beta1"

That's on k8s 1.16.2 using microk8s

kubectl version 
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-17T17:16:09Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T19:09:08Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

This is with master head is :

commit 5b5b5f8b52619f98defadc4fcb647abeb0db3993 (HEAD -> master, origin/master, origin/HEAD)
Author: Rinat Shigapov <[email protected]>
Date:   Tue Oct 22 20:00:50 2019 +0300

    add K8S 1.16 support (#51)
    
    * add k8s 1.16 support
    
    * streams is unready when api server uses SSL+NodePort
    
    * revert postgres dependency

Hooks Postgres Password

The password is set to POLYAXON_DB_USER but should be read from the password secret or POLYAXON_DB_PASSWORD instead. It doesn't seem to exist in the config map though so not quite sure how you would like me to fix this.

https://github.com/polyaxon/polyaxon-chart/blob/02d740f7eb2e730976e9c55efa284d8061135a70/polyaxon/templates/hooks/tables-job.yaml#L43

gitlab oauth

I can see that the chart was compatible with gitlab auth in the past, but now I'm not sure if it is compatible or only ldap.

Can you clarify it for me?

Thank you

Juanma

Following the instructions on ReadME for experiment and job fails in yaml.

With the yaml

---
version: 1
kind: job
environment:
  persistence:
    data:
        mountPath: "/data"
        existingClaim: "pvc-data"
        readOnly: false
  resources:
    cpu:
      limits: 1
      requests: 1
    memory:
      requests: 5120
      limits: 5120

build:
  dockerfile: polyaxon/Dockerfile
run:
  cmd:
    - python polyaxon/temp.py
    - sleep 10

I get the error: Polyaxonfile is not valid. Error message `{'environment': {'persistence': {'data': ['Not a valid list.']}}}`.

Certificate error when running helm repo add

Hi -

I'm getting the following error when running helm repo add polyaxon https://charts.polyaxon.com:

Error: Looks like "https://charts.polyaxon.com" is not a valid chart repository or cannot be reached: Get https://charts.polyaxon.com/index.yaml: x509: certificate signed by unknown authority

What's strange is that this seems to be dependent upon the environment from which I'm running helm repo add. I can add the repo just fine when running locally or on AWS, but when running on a particular on-prem server, I get the error. All of these environments are running the same OS and the same versions of Kubernetes/Helm.

Running helm repo add incubator https://kubernetes-charts-incubator.storage.googleapis.com/ as a general helm test, it works fine. I also notice that wget https://charts.polyaxon.com/index.yaml and curl https://charts.polyaxon.com/index.yaml throw a similar cert error to the helm command.

I know very little about SSL/TLS, so I'm unsure how to get around this issue. The one thing I've tried is to pass a cert explicitly using helm's --cert-file argument, which did not work.

Any thoughts on what to do here?

no matches for kind "Deployment" in version "extensions/v1beta1"

Hi all

Unable to deploy helm chart with helm 3

Observing

helm install polyaxon --namespace polyaxon polyaxon/polyaxon
Error: unable to build kubernetes objects from release manifest: unable to recognize "": no matches for kind "Deployment" in version "extensions/v1beta1"

Expected:
polyaxon to be deployed on k8s

k8s version

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:18:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2+k3s1", GitCommit:"cdab19b09a84389ffbf57bebd33871c60b1d6b28", GitTreeState:"clean", BuildDate:"2020-01-27T18:09:26Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}

helm version

version.BuildInfo{Version:"v3.0.2", GitCommit:"19e47ee3283ae98139d98460de796c1be1e3975f", GitTreeState:"clean", GoVersion:"go1.13.5"}

Same error even with

postgres:
    enabled: false

Also unable to install version 0.5.6

h install polyaxon --namespace polyaxon polyaxon/polyaxon --version 0.5.6
Error: unable to build kubernetes objects from release manifest: [unable to recognize "": no matches for kind "Deployment" in version "extensions/v1beta1", unable to recognize "": no matches for kind "StatefulSet" in version "apps/v1beta1", unable to recognize "": no matches for kind "StatefulSet" in version "apps/v1beta2"]

Would appreciate if someone could help troubleshoot.

Helm chart fails if tls is enabled

I tried to follow the SSL guide here with the example ingress configuration:

serviceType: ClusterIP
ingress:
  enabled: true
  hostName: polyaxon.acme.com
  tls:
  - secretName: polyaxon.acme-tls
    hosts:
      - polyaxon.acme.com

Unfortunately helm upgrade using version 0.4.3 fails with the following error:

Error: YAML parse error on polyaxon/templates/ing.yaml: error converting YAML to JSON: yaml: line 39: did not find expected '-' indicator
Error: UPGRADE FAILED: YAML parse error on polyaxon/templates/ing.yaml: error converting YAML to JSON: yaml: line 39: did not find expected '-' indicator

I seams like this line is causing the trouble.
For now switching back to http works for me, or am I missing something?

Should allow TLS in ingress

we should have an option for injecting tls

ing.yaml

spec:
  tls:
  - hosts:
    - polyaxon.domain.com
    secretName: polyaxon-tls

Add ability to specify custom labels for pods.

I want to add my own labels to polyaxon pods. I can't do that now outside of using kustomize.

CrashLoopBackOff polyaxon-redis-*-0

Hello All,

It seems that the both polyaxon-redis-master-0 and polyaxon-redis-slave-0 are not starting.

> kubectl get pods -n polyaxon

NAME                                            READY   STATUS             RESTARTS   AGE
polyaxon-docker-registry-78c9b9c9dd-f8lgk       1/1     Running            0          7m36s
polyaxon-polyaxon-api-59b45bccc6-kl8nj          2/2     Running            0          7m35s
polyaxon-polyaxon-beat-776b89ccfd-tm7cx         2/2     Running            0          7m36s
polyaxon-polyaxon-events-7774c88844-2hzlf       1/1     Running            0          7m35s
polyaxon-polyaxon-hpsearch-cf5ffd5f5-pgsl9      1/1     Running            0          7m36s
polyaxon-polyaxon-k8s-events-6d77b8c499-4m78j   1/1     Running            0          7m36s
polyaxon-polyaxon-monitors-d55dbf7dd-mdvw8      1/1     Running            0          7m36s
polyaxon-polyaxon-scheduler-5767dc68cd-zwvwh    1/1     Running            0          7m35s
polyaxon-postgresql-0                           1/1     Running            0          7m35s
polyaxon-redis-master-0                         0/1     CrashLoopBackOff   6          7m35s
polyaxon-redis-slave-0                          0/1     CrashLoopBackOff   6          7m35s

I used the following commands to install the chart.

# Create a namespace
$ kubectl create namespace polyaxon

# Add Polyaxon charts repo
$ helm repo add polyaxon https://charts.polyaxon.com

# Deploy Polyaxon
$ helm install polyaxon/polyaxon \
    --name=polyaxon \
    --namespace=polyaxon \
    -f config.yaml

The config.yaml file looks like this:

rbac:
  enabled: false

serviceType: NodePort

broker: redis
rabbitmq-ha:
  enabled: false

The log of the polyaxon-redis-master-0 pod is:

> kubectl logs -n polyaxon polyaxon-redis-master-0

1:C 06 May 2020 16:26:51.332 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 06 May 2020 16:26:51.333 # Redis version=5.0.4, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 06 May 2020 16:26:51.333 # Configuration loaded
1:M 06 May 2020 16:26:51.334 * Running mode=standalone, port=6379.
1:M 06 May 2020 16:26:51.334 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 06 May 2020 16:26:51.334 # Server initialized
1:M 06 May 2020 16:26:51.334 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 06 May 2020 16:26:51.334 * Reading RDB preamble from AOF file...
1:M 06 May 2020 16:26:51.335 * Reading the remaining AOF tail...
1:M 06 May 2020 16:26:52.096 # Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>

The log of the polyaxon-redis-slave-0 pod is:

> kubectl logs -n polyaxon polyaxon-redis-slave-0

INFO  ==> ** Starting Redis **
1:C 06 May 2020 16:29:27.315 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 06 May 2020 16:29:27.315 # Redis version=5.0.4, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 06 May 2020 16:29:27.315 # Configuration loaded
1:S 06 May 2020 16:29:27.316 * Running mode=standalone, port=6379.
1:S 06 May 2020 16:29:27.316 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:S 06 May 2020 16:29:27.316 # Server initialized
1:S 06 May 2020 16:29:27.316 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:S 06 May 2020 16:29:27.316 * Reading RDB preamble from AOF file...
1:S 06 May 2020 16:29:27.316 * Reading the remaining AOF tail...
1:S 06 May 2020 16:29:27.348 # Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>

Environment:

kubectl

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.6", GitCommit:"72c30166b2105cd7d3350f2c28a219e6abcd79eb", GitTreeState:"clean", BuildDate:"2020-01-18T23:31:31Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.6", GitCommit:"72c30166b2105cd7d3350f2c28a219e6abcd79eb", GitTreeState:"clean", BuildDate:"2020-01-18T23:23:21Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}

helm

Client: &version.Version{SemVer:"v2.15.1", GitCommit:"cf1de4f8ba70eded310918a8af3a96bfe8e7683b", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.15.1", GitCommit:"cf1de4f8ba70eded310918a8af3a96bfe8e7683b", GitTreeState:"clean"}

polyaxon-cli

---
Metadata-Version: 2.1
Name: polyaxon-cli
Version: 0.6.0
Summary: Command Line Interface (CLI) for Polyaxon.
Home-page: https://github.com/polyaxon/polyaxon-cli
Author: Mourad Mourafiq
Author-email: [email protected]
Installer: pip
License: MIT
Location: /usr/local/lib/python3.5/dist-packages
Requires: polyaxon-client, raven, pathlib, polyaxon-deploy, click, tabulate, click-completion, polyaxon-dockerizer
Classifiers:
  Programming Language :: Python
  Programming Language :: Python :: 2
  Programming Language :: Python :: 2.7
  Programming Language :: Python :: 3
  Programming Language :: Python :: 3.5
  Programming Language :: Python :: 3.6
  Programming Language :: Python :: 3.7
  Operating System :: OS Independent
  Intended Audience :: Developers
  Intended Audience :: Science/Research
  Topic :: Scientific/Engineering :: Artificial Intelligence
Entry-points:
  [console_scripts]
  polyaxon = polyaxon_cli.main:cli

Private registry for docker images

I'm behind a proxy and I need push every image to a private registry. I think if change the values for something like below solves this issue:

Parameter	Description	Default
`global.imageRegistry`	Global Docker image registry	`nil`
`global.imagePullSecrets`	Global Docker registry secret names as an array	`[]`
`api.image.registry`	API image registry	`docker.io`
`api.image.repository`	API image name	`polyaxon/polyaxon-api`
`api.image.tag`	API image tag	`0.5.6`

install polyaxon by using helm (offline) error

Hello! I am interesting in the polyaxon and want to run demo by installing in minikube. But I met the problem at the very beginning.
Trying install polyaxon in minikube.

start minikube
minikube start --cpus 4 --memory 8192 --disk-size=40g --driver=hyperkit
download the polyaxon-charts and generate polyaxon-1.0.8.tgz
using helm install polyaxon-1.0.8.tgz offline
helm install polyaxon-1.0.8.tgz

but it goes....

Error: render error in "polyaxon/templates/streams-deployment.yaml": template: polyaxon/templates/streams-deployment.yaml:63:3: executing "polyaxon/templates/streams-deployment.yaml" at <include "config.artifactsStore.mount" .>: error calling include: template: polyaxon/templates/partials/_stores.tpl:16:39: executing "config.artifactsStore.mount" at <eq .Values.artifactsStore.kind "host_path">: error calling eq: invalid type for comparison

I got the problem just like that.So is anyone else has same problem just like me?

version information:

my os is macos 10.15.3.
minikube version is v1.9.2
helm version is

Client: &version.Version{SemVer:"v2.16.6", GitCommit:"dd2e5695da88625b190e6b22e9542550ab503a47", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.16.6", GitCommit:"dd2e5695da88625b190e6b22e9542550ab503a47", GitTreeState:"clean"}

polyaxon charts version is the newest by now.

THX! BEST WISHES~

Error: requirements.lock is out of sync with requirements.yaml

Issue: I was trying to install polyaxon with helm in an automated environment, install did fail with message :
"""Error: requirements.lock is out of sync with requirements.yaml"""

While issue can easily be solved manually "in manual deployments",
it would be good for automation to update the lock file in the repository.

polyaxon / charts Goto Github PK

charts's Introduction

Polyaxon-charts

Install

Setup Helm

Namespace

Polyaxon charts repo

TFJob/PytorchJob/MXJob/XGBoostJob/MPIJob

Deploying Polyaxon Agent to a Kubernetes namespace

Deploying Polyaxon to a Kubernetes namespace

Uninstalling the Chart

charts's People

Contributors

Stargazers

Watchers

Forkers

charts's Issues

Environment:

Environment:

Recommend Projects

Recommend Topics

Recommend Org