controlcpluscontrolv / golem-image-classifier Goto Github PK

Golem Image Classifier is built using the Service API of Yapapi to interact with the Golem Network (golem.network). Built as part of a Gitcoin Bounty

License: GNU General Public License v3.0

Dockerfile 1.41% Python 98.59%

machine-learning golem yagna

golem-image-classifier's Introduction

Golem Image Classifier Service

This service was designed for the bounty put out by Golem as seen here. To run this service a pre-existing requestor node setup is required, but if you don't have one a quick primer can be found here. This service defines an requestor.py subsystem which can be given predict tasks. A dataset is required to make predictions, right now the neural nets itself is vgg16, but I am looking into EfficientNetV2.

Using the Service

Clone this repo into the folder of your choice, the main componet needed for testing is the requestor.py script, but the entire service code is included in the service folder if you need to check something. Next download Model Weights and name it as "vgg16.h5", this is a required step as the service requires these weights for initialization. Make sure these weights are in the same folder as the requestor script.

The service responds to 2 main types of requests, Predict and Train. The Requestor script itself is used a subprocess that must be initialized with parameters before being incorporated into a larger process. See demo.py for examples, a couple of files are needed, and all are zipped as .tar.gz in order to reduce time spent sending data, so more can be used for training. Arguments for tasks are seperated by spaces.

Demo

Demo video here

Files used in demo.py

Demo.py showcases how to interact with the requestor in an automated way, running the requestor as a subprocess allows it to interconnect with existing ML implementations without needing to build additional network handling to send and recieve requests, instead handling stdout and stdin directly. Running demo.py does a couple things

First requestor.py is started with -d dataset -c dog monkey cat cow as args, this tells it the classes to be used in the demo, and dataset is the name of our dataset so it searches for "dataset.tar.gz"
Sends the requestor a task once intialized to predict with test1.jpg, this should respond with monkey in the stdout marking the prediction

Another task is then sent with more validation images and training images, once recieved the neural net then trains on those images and returns "Model Training Success" once complete.

Finally, the test image is sent again to verify that the neural net is working, and was trained properly

The demo is primarly there to showcase how to incorporate the requestor as a subprocess module, I chose this approach for ease into existing neural net implementations, as its similar to using another library, but with an added daemon process.

Requestor

The requestor script requires 2 things upon initialization, a dataset archive in .tar.gz format with a similar format to the one shown in /services/dataset ,and a list of class names.

Example - requestor.py -d dataset -c dog monkey cat cow

It then prompts the user for input on which task they would like to execute. Tasks are single strings with arguments seperated by a space. The 2 types of tasks the requestor responds to are "predict" or "train".

Predict

Required Args

a .jpg file in the same directory as the requestor script

Returns a labal inside of stdout.

Example - "predict test1.jpg"

A line end character may be needed if you are using stdin to queue up and not manually entering the tasks.

Train

Required Args

A .tar.gz archive containing training images, important to note these images must be directly inside the archive, not a subdirectory within it
A .tar.gz archive containing validation images, important to note these images must be directly inside the archive, not a subdirectory within it

Returns "Model Successfully Trained" in stdout

Example - "train train.tar.gz valid.tar.gz"

A line end character may be needed if you are using stdin to queue up and not manually entering the tasks.

Modifying for Personal/Business Use

If you plant to modify this for personal or business use, use a dataset with the same format as shown in /service/dataset and zip it up in .tar.gz, then use the demo.py script as a example to base your script to off/modify it. Th e vgg16.h5 weights can be changed for other neural nets with minimal modification as neural net is initialized from the weights, but optimizations are for vgg16 so you will encounter irregularities/errors.

Swapping out datasets and using the vgg16 will work fine though so long as it is in the proper format. It is reccomended to incorporate the requestor.py script as a subprocess into your ML implementation. Tasks can be queued up via stdin and seperated with a line end character at the end. See demo.py for more information.

Questions?

If you have any extra questions make sure to reach out to Nebula on the Golem Discord!

golem-image-classifier's People

Contributors

Stargazers

Watchers

Forkers

krunch3r76 unfortun8

golem-image-classifier's Issues

/model/ model folder not found

Model folder is on volume, path is correct and files are there, after all it works on docker, so it must be some permssions issue, not sure how to fix

  File "/home/controlc/.local/lib/python3.9/site-packages/yapapi/services.py", line 469, in _run_instance
    batch = batch_task.result()
  File "/home/controlc/Downloads/gimg4/requestor.py", line 48, in start
    yield self._ctx.commit()
  File "/home/controlc/.local/lib/python3.9/site-packages/yapapi/services.py", line 478, in _run_instance
    fut_result = yield batch
  File "/home/controlc/.local/lib/python3.9/site-packages/yapapi/engine.py", line 556, in process_batches
    results = await get_batch_results()
  File "/home/controlc/.local/lib/python3.9/site-packages/yapapi/engine.py", line 542, in get_batch_results
    raise CommandExecutionError(evt.command, evt.message, evt.stderr)
yapapi.rest.activity.CommandExecutionError: Command '{'run': {'entry_point': '/golem/run/imageclassifier.py', 'args': ('--predict', '/golem/work/data/test', '--batch', '1'), 'capture': {'stdout': {'stream': {}}, 'stderr': {'stream': {}}}}}' failed on provider; message: 'ExeScript command exited with code 1'; stderr: '2021-07-13 03:14:24.576198: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-07-13 03:14:24.576234: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
  File "/golem/run/imageclassifier.py", line 53, in <module>
    predict(args.predict, int(args.batch))
  File "/golem/run/imageclassifier.py", line 31, in predict
    model = keras.models.load_model("/golem/work/model")
  File "/usr/local/lib/python3.8/site-packages/keras/saving/save.py", line 206, in load_model
    return saved_model_load.load(filepath, compile, options)
  File "/usr/local/lib/python3.8/site-packages/keras/saving/saved_model/load.py", line 109, in load
    meta_graph_def = loader_impl.parse_saved_model(path).meta_graphs[0]
  File "/usr/local/lib/python3.8/site-packages/tensorflow/python/saved_model/loader_impl.py", line 113, in parse_saved_model
    raise IOError(
OSError: SavedModel file does not exist at: /golem/work/model/{saved_model.pbtxt|saved_model.pb}

Code Review

This is a general issue for discussion and changes pertaining to Code Review from the Golem Team.

Worker for provider '' failed; reason: 'coroutine' object has no attribute 'anext'

File send issue

Exit error

yapapi.rest.activity.CommandExecutionError: Command '{'run': {'entry_point': 'imageclassifier.py', 'args': ('--trainmodel True',), 'capture': {'stdout': {'stream': {}}, 'stderr': {'stream': {}}}}}' failed on provider; message: 'Runtime error: Error { code: Internal, message: "Running process failed, exit code: 8", context: {} }'

Fix Issues defined in code Review #2

Keeping this to a minimum because this is me talking to myself

Transform to

Clean base model, ie a pretrained net you feed in data for a prediction
Always running script
Http frontend to communicate with requestor

Foreman runs without doing anything

log below

I don't even know anymore

Traceback (most recent call last):
  File "/home/controlc/.local/lib/python3.9/site-packages/yapapi/services.py", line 469, in _run_instance
    batch = batch_task.result()
  File "/home/controlc/Downloads/gimg5/requestor.py", line 62, in start
    service = yield self._ctx.commit()
  File "/home/controlc/.local/lib/python3.9/site-packages/yapapi/services.py", line 478, in _run_instance
    fut_result = yield batch
  File "/home/controlc/.local/lib/python3.9/site-packages/yapapi/engine.py", line 556, in process_batches
    results = await get_batch_results()
  File "/home/controlc/.local/lib/python3.9/site-packages/yapapi/engine.py", line 542, in get_batch_results
    raise CommandExecutionError(evt.command, evt.message, evt.stderr)
yapapi.rest.activity.CommandExecutionError: Command '{'run': {'entry_point': '/bin/sh', 'args': ('-c', '/golem/run/ImageClassification.py', '&'), 'capture': {'stdout': {'stream': {}}, 'stderr': {'stream': {}}}}}' failed on provider; message: 'ExeScript command exited with code 2'; stderr: '/golem/run/ImageClassification.py: 1: /golem/run/ImageClassification.py: import: not found
/golem/run/ImageClassification.py: 2: /golem/run/ImageClassification.py: import: not found
/golem/run/ImageClassification.py: 3: /golem/run/ImageClassification.py: from: not found
/golem/run/ImageClassification.py: 4: /golem/run/ImageClassification.py: from: not found
/golem/run/ImageClassification.py: 5: /golem/run/ImageClassification.py: from: not found
/golem/run/ImageClassification.py: 6: /golem/run/ImageClassification.py: from: not found
/golem/run/ImageClassification.py: 7: /golem/run/ImageClassification.py: from: not found
/golem/run/ImageClassification.py: 8: /golem/run/ImageClassification.py: from: not found
/golem/run/ImageClassification.py: 9: /golem/run/ImageClassification.py: from: not found
/golem/run/ImageClassification.py: 10: /golem/run/ImageClassification.py: from: not found
/golem/run/ImageClassification.py: 11: /golem/run/ImageClassification.py: from: not found
/golem/run/ImageClassification.py: 12: /golem/run/ImageClassification.py: from: not found
/golem/run/ImageClassification.py: 13: /golem/run/ImageClassification.py: from: not found
/golem/run/ImageClassification.py: 14: /golem/run/ImageClassification.py: import: not found
/golem/run/ImageClassification.py: 15: /golem/run/ImageClassification.py: import: not found
/golem/run/ImageClassification.py: 16: /golem/run/ImageClassification.py: import: not found
/golem/run/ImageClassification.py: 17: /golem/run/ImageClassification.py: import: not found
/golem/run/ImageClassification.py: 18: /golem/run/ImageClassification.py: from: not found
/golem/run/ImageClassification.py: 19: /golem/run/ImageClassification.py: from: not found
/golem/run/ImageClassification.py: 20: /golem/run/ImageClassification.py: train_path: not found
/golem/run/ImageClassification.py: 21: /golem/run/ImageClassification.py: valid_path: not found
/golem/run/ImageClassification.py: 22: /golem/run/ImageClassification.py: test_path: not found
/golem/run/ImageClassification.py: 23: /golem/run/ImageClassification.py: Syntax error: "(" unexpected

Test Hash

41fedaae93cd8014cc7134d56a5302eeb7933dec312bd214bcf00ed7

Prediction not Printing out

Model Transferred
instances: [('q53.b2', 'starting')]
instances: [('q53.b2', 'starting')]
[CommandExecuted(agr_id='b45fc9470111d48441ddc3bb18df788c47876c4ddbee3cf24e58e9bc14eab880', script_id='6', cmd_idx=0, command={'run': {'entry_point': '/bin/ls', 'args': ('/golem/work/dataset/test',), 'capture': {'stdout': {'stream': {}}, 'stderr': {'stream': {}}}}}, success=True, message=None, stdout='Unknown\n', stderr=None)]
What task do you wish to run? [predict/train] : predict
What is the name of the image you wish to identify : test2.jpg
All instances started :)
instances: [('q53.b2', 'running')]
instances: [('q53.b2', 'running')]
instances: [('q53.b2', 'running')]
Test Image Sent!
instances: [('q53.b2', 'running')]
instances: [('q53.b2', 'running')]
instances: [('q53.b2', 'running')]
[CommandExecuted(agr_id='b45fc9470111d48441ddc3bb18df788c47876c4ddbee3cf24e58e9bc14eab880', script_id='8', cmd_idx=0, command={'run': {'entry_point': '/golem/run/imageclassifier.py', 'args': ('--predict', '/golem/work/dataset/test', '--batch', '1'), 'capture': {'stdout': {'stream': {}}, 'stderr': {'stream': {}}}}}, success=True, message=None, stdout='Found 1 images belonging to 1 classes.\n[3]\n', stderr="2021-07-18 23:30:21.807033: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory\n2021-07-18 23:30:21.807067: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.\n2021-07-18 23:30:24.213196: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory\n2021-07-18 23:30:24.213225: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)\n2021-07-18 23:30:24.213241: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host ((none)): /proc/driver/nvidia/version does not exist\n2021-07-18 23:30:24.213694: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA\nTo enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n2021-07-18 23:30:33.567080: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)\n2021-07-18 23:30:33.568744: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 3792000000 Hz\n")]