thefoundryvisionmongers / nuke-ml-server Goto Github PK
View Code? Open in Web Editor NEWA Nuke client plug-in which connects to a Python server to allow Machine Learning inference in Nuke.
License: Apache License 2.0
A Nuke client plug-in which connects to a Python server to allow Machine Learning inference in Nuke.
License: Apache License 2.0
Hi! Many thanks for this great project.
I'm trying to write some integration tests, but without much success. Here is what I tried so far, but I get the errors below.
image_node = nuke.nodes.Read(file = "/path/to/file.jpg")
ml_node = nuke.nodes.MLClient()
# Maybe something like this would trigger the node to connect to the server?
# ml_node.knobs()['connect'].execute()
ml_node.setInput( 0, image_node )
nuke.execute(ml_node, 1,2)
# RuntimeError: MLClient1 cannot be executed
# MLClient1 cannot be executed
Any advice is highly appreciated
Is it possible to use these tools without uid 0 being set by the user?
Add the ability to load and unload models on the fly, while the server is running
Whenever I save a Nuke script with MLClient nodes and then load that Nuke back, I get errors messages: "MLClientXX.YY: no such knob". Where YY is the custom attribute in my ML model.
As a result all the fields go back to their default value, including the input that specifies the path for the pre-trained model..
Is that behaviour expected?
Is there something I can do to avoid losing information?
Or perhaps can you fix on your end?
Thank you!!
Just leaving this here to help people out.
When I type the name of my machine, or "localhost" in the "host" Input for the MLClient node it fails to connect giving me the error message:
Hostname is invalid
If I type the typical localhost IP 127.0.0.1 it fails with the message:
Could not connect to server. Please check your host / port numbers.
It will ONLY work if I type the IP address returned by the command line:
hostname -I | awk '{print $1}'
It would be great to resolve the host name.
If I have a few MLClient nodes, all pointing to the same class model which has custom knobs, and then I hit the "Connect" button, than things change in unpredictable ways:
Ideally it would be great if it was not that destructive and try first retain the knobs and their values or expressions.
Hi folks,
I'm modifying the server to run with Python3, as my ML environments require it.
I think I may have ran into some issues with protobuf. In my server output, I see the following:
Server -> Listening on port: 55555
Server -> Receiving message of size: 6
Server -> 6 bytes read
Server -> Message parsed
Server -> Received info request
Server -> Serializing message
Server -> Sending response message of size: 98
Server -> -----------------------------------------------
But the client output says, reporting it read data of size 0, instead of 98
Client -> Connected to 172.17.0.2
Client -> Sending info request
Client -> Created message
Client -> Serialized message
Client -> Created char array of length 6
Client -> Copied to char array
Client -> Message sent
Client -> Reading header data
Client -> Reading data of size: 0
Client -> Deserializing message
Client -> Closed connection
This is my first time using protobuf, so I thought I'd just enquire if there's anything obvious I should be looking for in terms of why it works with Python2 and not Python3,
cheers
Hi!
Is there a plan to add support for other kinds of native Nuke knobs?
I listed a few in the title just as example.
I wonder if it could check for the attribute "shape" and if it's a tuple than perhaps map to the appropriate knob? That way it would be compatible with pytorch or numpy tensors for example. Just an idea.
Also along these lines, it would be great to, instead of use the data type as the only input, perhaps there could be a way to specify metadata about each input, which could distinguish between RGB vs 3D position types, as well as provide min/max values for a float input.
I think all that could go a long way to create intuitive interfaces for ML models embedded in Nuke.
First of all, thanks for this plugin! I managed to build and install it (I think), but when I try to drop the node into Nuke I get the following error:
/usr/local/home/fcole/.nuke/MLClient.so: undefined symbol: _ZNK2DD5Image2Op15input_longlabelB5cxx11Ei
I checked the build paths and it does seem like it is built against the Nuke version I am running (11.2v3). I've had a couple other Nuke installs on this machine, though, so wondering if this error could be caused by finding a stale library somewhere.
TCL expressions is another common trick used in comp nuke scripts and it would be great to also work on MLClient nodes.
That didn't work for my custom string knob, which would always receive the literal string containing the TCL expression.
Is that something that would need to happen in the server side? If so... could that be provided as an API to make it simpler to write wrappers for pre-trained models?
Hello,
During server installation for nuke-ml-server on ubuntu 18.04, I get the following error when I run this commad :
sudo docker build -t mlserver -f Dockerfile .
ERROR :
WARNING: Discarding https://files.pythonhosted.org/packages/4a/85/db5a2df477072b2902b0eb892feb37d88ac635d36245a72a6a69b23b383a/PyYAML-3.12.tar.gz#sha256=592766c6303207a20efc445587778322d7f73b161bd994f227adaa341ba212ab (from https://pypi.org/simple/pyyaml/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement pyyaml==3.12 (from versions: 3.10, 3.11, 3.12, 3.13b1, 3.13rc1, 3.13, 4.2b1, 4.2b2, 4.2b4, 5.1b1, 5.1b3, 5.1b5, 5.1, 5.1.1, 5.1.2, 5.2b1, 5.2, 5.3b1, 5.3, 5.3.1, 5.4b1, 5.4b2, 5.4, 5.4.1, 6.0b1, 6.0, 6.0.1)
ERROR: No matching distribution found for pyyaml==3.12
This error comes as part of the requirements installation for the 'detectron' repository : https://github.com/facebookresearch/Detectron/blob/main/requirements.txt
If you can give me any pointers on how to solve this that would be very helpful!
Kind regards,
Shashwat
I just realized that if I instantiate more than one MLClient node for my custom class it seems that both Nuke nodes will talk to the a single instance of my BaseModel-derived class on the server side.
I was hoping that each instance of my BaseModel class would somehow be associated with one nuke node, so I could do things like, keep a reference to a pre-trained model and re-use it for each new input.
But now, it seems like I would have to keep in my BaseModel-derived class some notion of cache that keeps alive any model that the user is using in that same Nuke script.. For example, let's say the user is comparing the outputs of two pre-trained models.
Would it be possible to send messages to the server when nodes are removed/added so that cache can purge some items?
Or better yet, would it be possible for the Server to instantiate one BaseModel object per nuke node?
Is it possible to install the software without access to the internet if the files are downloaded in advance?
Many visual effect facilities have large investments in CPU only render farms, is it possible to do inference on a distributed CPU render farm?
Cross posting from the community forum where I posted a simpler repro: https://community.foundry.com/discuss/topic/159878/continuous-render-for-planariop
I'm trying to modify the MLClient / Server so that the server can pass progressive updates to the PlanarIop in Nuke. The reason for this is my model is an optimisation based style transfer, which can take a few minutes to run through sufficient epochs. My hope was to pass an update through for say every 10 epochs, so that the user gets some kind of color output quickly which then refines in front of them.
The only alternative I could think of was a button the user can press repeatedly to progressively refine the result, but that's not a great user experience.
Just tagging @ringdk as I'm not sure if this repo is still actively maintained. For what it's worth, I attempted to use Torchscript / Copycat, however a limitation there is that you cannot initialise an optimiser in Torchscript, which leaves my model dead in the water.
I have a custom ML model that has inputs, which MLClient created as dynamic knobs.
If I select that node and hit Ctrl+G then I get a pop-up with several error messages of the type:
<node>.<custom_knob>: no such knob
I open the group and introspect the MLClient node and I see it has lost all dynamic knobs and I have to click "Connect" to get them back, with all the values lost.
Hi Folks
The license for the project is listed as Apache License, Version 2.0. At the same time the readme and the MLClient node reads: "This is strictly non-commercial". These are legally in conflict. The Apache license you assigned to the code means I can pretty much do whatever I want with the code and sell it to whomever I want. Also the license is irrevocable.
Copied from the include Apache License 2.0 in this project:
- Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
You really should remove the "This is strictly non-commercial" part from the readme and the node.
Is it possible to do training from the the nuke-ML-server?
The examples are inference in Caffe2 from Facebook
Is it possible to pass label or other ground truth data to the ML model and have it learn from the toolset, it seems that is a inference only.
Where would the model checkpoints be stored?
Would the data rate be adequate?
Hey I know that CUDA 10 and CUDNN 7.4 at least are required for Turing based Cards the RTX range, has this been tested on Turing cards?
I'm sure this problem can get very complicated and may require custom implementations, but I was wondering whether you have intentions or ideas on how to manage the limited GPU resources across all MLClient nodes instantiated in a nuke scene.
Nuke MLClient nodes could be talking with different or same classes in the MLServer side, using any kind of back end (pytorch, tensorflow, ..).
This is somewhat related to issue #21 but it goes beyond that because it deals with all the classes used in a Nuke session.
Here's a broad idea that may be a good discussion starter:
It feels that this more general approach would make issue #21 irrelevant and it would deal with complex scenarios, including multi-gpu.
Do you see a benefit adding something like that to the MLServer API?
I'm just trying to use the MLClient node pretending it is a regular Nuke node and I this is another common need: to copy and paste and get same knob values. They are not copied, I guess because they are dynamic knobs. I think simple things like that would be expected in order to consider this approach viable for production.
Server -> Receiving message of size: 24883378
Server -> 24883378 bytes read
Server -> Message parsed
Server -> Received inference request
Server -> Requesting inference on model: densepose
Server -> Starting inference
WARNING:root:[====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
Server -> Exception caught on inference on model:
Server -> Serializing message
Server -> Sending response message of size: 18
Server -> ----------------------------------------------
But in the good news pile, MaskRCNN works.
I am using centos.
dockerfile is for Ubuntu. Can you make it for centos?
Is there a reason why CMake 3 is a requirement?
[kognat@vxfhost Server]$ sudo docker run --runtime=nvidia -v /home/kognat/dev/nuke-ML-server/Models:/workspace/ml-server/models:ro -it nuke-ml-magic:latest
[sudo] password for kognat:
root@a1b2b7bf4646:/workspace/ml-server# python
Python 2.7.16 |Anaconda, Inc.| (default, Mar 14 2019, 21:00:58)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
2019-05-25 07:49:46.583944: F tensorflow/core/platform/cpu_feature_guard.cc:37] The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine.
Aborted (core dumped)
Party over dude.
Sorry about being lazy but a picture says it all
protobuf-3.5.1 was compiled as follows
cd ~/dev/
wget https://github.com/protocolbuffers/protobuf/releases/download/v3.5.1/protobuf-cpp-3.5.1.tar.gz
tar -zxvf protobuf-cpp-3.5.1.tar.gz
cd protobuf-3.5.1/cmake
mkdir build && cd build
cmake3 .. -DCMAKE_INSTALL_PREFIX=~/opt/protobuf-3.5.1 -DCMAKE_POSITION_INDEPENDENT_CODE=ON
make -j12
make install
Then the Plugin was compiled as follows
cd ~/dev
git clone https://github.com/TheFoundryVisionmongers/nuke-ML-server
cd nuke-ML-server/build/
cmake3 .. -DCMAKE_INSTALL_PREFIX=~/opt/protobuf-3.5.1 -DNUKE_INSTALL_PATH=/usr/local/Nuke11.3v4/
make -j12
Then Nuke was run
export NUKE_PATH=/home/kognat/dev/nuke-ML-server/build/Plugins/Client
/usr/local/Nuke11.3v4/Nuke11.3
See screenshot attached.
Add a shutdown server option, for example with a keyboard shortcut or/and other way
Hi there! thank you so much for sharing this implementation.
The current code lists the models under the directory in which the server was launched, and that seems somewhat restrictive.
Would it be possible for you to add support to multiple locations for the models that the server can see?
Ideally I would like to launch the server and specify perhaps in an environment variable, multiple directories where it could search for models, separated by ":" (to follow Linux conventions).
And perhaps "baseModel.py" should be in a separate dir, so that one could point to it's location using PYTHONPATH when launching the server and that would make possible to all custom nodes, wherever they are, to import the base class.
How do you like that?
The contents and definition of the dynamic knobs are serialized in Nuke script alright, and they are retrieved during load, but they aren't applied to the node until the node is evaluated (i.e. plugged to the viewer)
I believe this is not ideal, because:
In my mind, what should change is the moment where UI is updated. To me, it should only occur in three situations:
I don't think UI should change at all during evaluation. If for any reason the UI is out-of-date with the actual model in the server (i.e. the names/types of input parameters are different), that should be detected during evaluation, then an error should be displayed to user, letting him know he should click "connect" to refresh his UI.
That way one can load a Nuke script with MLClient nodes, and even if the server is down, all the knobs will remain with their values or expressions. It won't be a destructive operation to load the script, try to infer, see errors in the viewer and save back to file.
Thanks for sharing this great project and its plugin to introducing deep learning into Nuke with simple steps.
I am also doing some similar work in Nuke, but my solution is Blink API. Blink API has ability to use cross-platform GPU and CPU computing, and can work like OpenGL which has been used for cross-platform inference (Tensorflow.js using WebGL). So it is suitable for building cross-platform inference engine and is easy for real world deployment. As for using full pipeline deep learning framework, installing full deep learning environment is still hard nowadays, especially for windows user. Also if users do not have Nvidia graphic card, it might be a problem for them to use GPU. Because of these, I think Blink API (OpenGl/OpenCl or inference engine like TensorRT, TVM...) is more suitable for near future use.
I have done a small experiment using Blink API for inference, and it worked. The model is a sequential convolutional neural network with 10 conv2D layers, which can achieve most low-level visual tasks such as super-resolution, deblur and denoise. The model was hard coded in C++ plugin. Complex model support can be enabled by introducing computational graph. The implement of conv2D is the simplest convolution, but the speed is acceptable.
Here is the example code for the kernel (conv2D 9x9 64 with batch nomalization and relu).
// Copyright (c) 2019 Hepesu Animation Toolkits Project. All Rights Reserved.
#define epsilon 1e-7
inline float batchNorm(float x, float mean, float var, float gamma, float beta){
return (x - mean) / (sqrt(var) + epsilon) * gamma + beta;
}
inline float relu(float x){
return max(0.0f, x);
}
kernel ConvBlockAKernel : public ImageComputationKernel<ePixelWise>
{
Image<eRead, eAccessRanged2D, eEdgeConstant> src;
Image<eWrite> dst;
param:
float weight[81];
float bias;
float gamma;
float beta;
float mean;
float var;
int outputChannel;
local:
int2 _filterOffset;
int2 _kernelSize;
void init()
{
_kernelSize[0] = 9;
_kernelSize[1] = 9;
int2 filterRadius(_kernelSize[0] / 2, _kernelSize[1] / 2);
_filterOffset[0] = -filterRadius[0];
_filterOffset[1] = -filterRadius[1];
src.setRange(-filterRadius[0], -filterRadius[1], filterRadius[0], filterRadius[1]);
}
void process() {
// Init value with 0.0
float value = 0.0;
for (int in_channel = 0; in_channel < src.kComps; in_channel++){
// Iterate in ks x ks range
for(int j = 0; j < _kernelSize[1]; j++) {
for(int i = 0; i < _kernelSize[0]; i++) {
value += weight[j * 9 + i] * src(i + _filterOffset[0], j + _filterOffset[1], in_channel);
}
}
}
// Add bias then bn and relu
dst(outputChannel >= dst.kComps ? dst.kComps - 1 : outputChannel) = relu(batchNorm(value + bias, mean, var, gamma, beta));
}
};
Here is the code for calling this kernel in plugin. I am using setParamValue to pass weights of the model. If the weights are huge, this can be done by passing them as an image source.
// Copyright (c) 2019 Hepesu Animation Toolkits Project. All Rights Reserved.
Blink::Kernel blockA(_convBlockAWideProgram, computeDevice, imagesIA, kBlinkCodegenDefault);
for (int outChan = 0; outChan < 64; ++outChan){
float weights[81];
for (int i = 0; i < 81; ++i)
weights[i] = dequantize(wide_block1_weight[outChan][i], wide_block1_weight_max, wide_block1_weight_min);
blockA.setParamValue("weight", weights, 81);
blockA.setParamValue("bias", dequantize(wide_block1_bias[outChan], wide_block1_bias_max, wide_block1_bias_min));
blockA.setParamValue("mean", dequantize(wide_block1_mean[outChan], wide_block1_mean_max, wide_block1_mean_min));
blockA.setParamValue("var", dequantize(wide_block1_var[outChan], wide_block1_var_max, wide_block1_var_min));
blockA.setParamValue("gamma", dequantize(wide_block1_gamma[outChan], wide_block1_gamma_max, wide_block1_gamma_min));
blockA.setParamValue("beta", dequantize(wide_block1_beta[outChan], wide_block1_beta_max, wide_block1_beta_min));
blockA.setParamValue("outputChannel", outChan);
blockA.iterate();
}
The performance can be further improved by using winograd or GEMM which is used in most deep learning framework. With these, the inference can be supported by all platform, even for Nuke 9, and users do not need to installing any other software or drivers. Also the inference is just done by stripe, so it looks like most node, users do not need to wait for whole image which may cost very long time and huge memory. But due to limitations of Blink API, the implementation of general inference engine (with computational graph) is not easy. The GEMM and a complex computational graph might exceed the limit of Blink API. So for latest nuke, I prefer to use inference engine TensorRT to do the jobs.
The server solution is great for testing newest deep learning technology, and I think this is the main stream for the future. But it would be great, if the inference can be done in the local machine. customers can use deep learning tools just like other simple nodes.
I was looking at this issue
https://forums.docker.com/t/libc-incompatibilities-when-will-they-emerge/9895
Will this Dockerfile image run on the glibc 2.12 found in CentOS 6 images?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.