Giter VIP home page Giter VIP logo

Comments (5)

Tomcli avatar Tomcli commented on September 27, 2024

Hi @stock99, it looks like the script didn't find the right pod name from your Kubernetes cluster. Can you echo your pod name with the below commands? Thanks.

ui_pod=$(kubectl get pods | grep ffdl-ui | awk '{print $1}')
restapi_pod=$(kubectl get pods | grep ffdl-restapi | awk '{print $1}')
grafana_pod=$(kubectl get pods | grep prometheus | awk '{print $1}')

echo $ui_pod
echo $restapi_pod
echo $grafana_pod

Also, the pod/ format was introduce from kubectl client v1.10.0 and above, so I would recommend to update your kubectl client to a version after v1.10.0.

from ffdl.

stock99 avatar stock99 commented on September 27, 2024

Hi Tomcli,
It looks like the kubectl come with kubeadm-dind installation script isn't the latest one (1.8.x). If i installed the latest version via snap, the installation script there seem to enforce the use of 1.8.15 still. Should I adjust any environment variable?

echo $ui_pod
ffdl-ui-b6cbb98f-c4zpm
echo $restapi_pod
ffdl-restapi-84bcb74478-t8df6
echo $grafana_pod
prometheus-5f85fd7695-gb568

kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.15", GitCommit:"c2bd642c70b3629223ea3b7db566a267a1e2d0df", GitTreeState:"clean", BuildDate:"2018-07-11T17:59:56Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.15", GitCommit:"c2bd642c70b3629223ea3b7db566a267a1e2d0df", GitTreeState:"clean", BuildDate:"2018-07-11T17:52:15Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

snap list
Name Version Rev Tracking Publisher Notes
aws-cli 1.15.71 135 stable aws✓ classic
core 16-2.35.5 5742 stable canonical✓ core
helm 2.11.0 63 stable snapcrafters classic
kubectl 1.12.1 462 stable canonical✓ classic

from ffdl.

Tomcli avatar Tomcli commented on September 27, 2024

Hi @stock99, I updated the script at #150 to make it able to run with K8S 1.8.x. Let me know if you encounter any new issue.

from ffdl.

stock99 avatar stock99 commented on September 27, 2024

seem to be ok now after removing 'pod/' in the script. The connection error in the opening post was because I fat-fingered on one of the export statement in dind installation.

But then I got an error message for the test routine make test-push-data-s3 && make test-job-submit :
Getting all models ...
Handling connection for 32060
ID Name Framework Training status Submitted Completed

0 records found.
Makefile:213: recipe for target 'test-job-submit' failed
make: *** [test-job-submit] Error 1

======
attached is the console log
error_log.txt

from ffdl.

chengboonrong avatar chengboonrong commented on September 27, 2024

Anyone can help? I got this error messages when running the make test-job-submit

Downloading Docker images and test training data. This may take a while.
Context "dind" modified.
error: there is no need to specify a resource type as a separate argument when passing arguments in resource/name form (e.g. 'kubectl get resource/<resource_name>' instead of 'kubectl get resource resource/<resource_name>'
Submitting example training job (tf-model)
S3 URL: http://:30381 REST URL: http://localhost:31961
Executing in etc/examples/tf-model: DLAAS_URL=http://localhost:31961 DLAAS_USERNAME=test-user DLAAS_PASSWORD=test /home/chris/FfDL/cli/bin/ffdl-linux train manifest.yml .
sed: can't read : No such file or directory
name: tf_convolutional_network_tutorial
description: Convolutional network model using tensorflow
version: "1.0"
gpus: 0
cpus: 0.5
memory: 1Gb
learners: 1

# Object stores that allow the system to retrieve training data.
data_stores:
  - id: sl-internal-os
    type: mount_cos
    training_data:
      container: tf_training_data
    training_results:
      container: tf_trained_model
    connection:
      auth_url: http://10.192.0.3:30417
      user_name: test
      password: test

framework:
  name: tensorflow
  version: "1.5.0-py3"
  command: >
    python3 convolutional_network.py --trainImagesFile ${DATA_DIR}/train-images-idx3-ubyte.gz
      --trainLabelsFile ${DATA_DIR}/train-labels-idx1-ubyte.gz --testImagesFile ${DATA_DIR}/t10k-images-idx3-ubyte.gz
      --testLabelsFile ${DATA_DIR}/t10k-labels-idx1-ubyte.gz --learningRate 0.001
      --trainingIters 2000
  # Change trainingIters to 20000 if you want your model to have over 80% Accuracy rate.

evaluation_metrics:
  type: tensorboard
  in: "$JOB_STATE_DIR/logs/tb"
  # (Eventual) Available event types: 'images', 'distributions', 'histograms', 'images'
  # 'audio', 'scalars', 'tensors', 'graph', 'meta_graph', 'run_metadata'
  #  event_types: [scalars]
/home/chris/FfDL/etc/examples/tf-model
Deploying model with manifest 'manifest_testrun.yml' and model files in '.'...
Handling connection for 31961
Handling connection for 31961
FAILED
Error 200: OK

Test job submitted. Track the status via "DLAAS_URL=http://localhost:31961 DLAAS_USERNAME=test-user DLAAS_PASSWORD=test /home/chris/FfDL/cli/bin/ffdl-linux list".

from ffdl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.