jiazhihao / taso Goto Github PK
View Code? Open in Web Editor NEWThe Tensor Algebra SuperOptimizer for Deep Learning
License: Apache License 2.0
The Tensor Algebra SuperOptimizer for Deep Learning
License: Apache License 2.0
Hi, I pulled the docker image. The programs in the example folder run nicely. However if I try to run TASO on an ONNX model, for example by calling python python/test.py
I get
Traceback (most recent call last):
File "python/test.py", line 1, in <module>
import taso
File "/usr/TASO/python/taso/__init__.py", line 1, in <module>
from .core import *
ModuleNotFoundError: No module named 'taso.core'
I tried to recompile the project but I am still getting the same error.
If I perform optimization on a computer without a GPU and I will run it on a GPU.
Will it affect performance?
Hi. I applied TASO (commit dce8c4d) to the several models of onnx-models. One frequent error I got is:
Cuda failure: 77
/workspace/taso/src/cudnn/element_kernel.cu:242
Aborting...
Affected models are: inception-v2-9, mnist-8.log, resnet101-v2-7, resnet18-v2-7, roberta-base-11, shufflenet-9, vgg19-7, yolov4.
Since trivial mnist is in the list, I suspect that the problem was caused by some environment bug, such as package version mismatch or alike.
The error message is not very verbose, and named CUDA line doesn't look suspicious. I would be glad to provide more debugging information but unfortunately I'm not a expert in low-level CUDA. Could you please suggest what can I do to collect more information?
for example, graph optimization can fuse the conv, batchnorm layer to a new conv layer, and calculate the new fused conv weight parameters, can taso calculate the new weight in the transformation after fusing conv and batchnorm?
@jiazhihao hi,I met a problem when I try to optimize inception_v3 model with TASO.Could you please show me how to deal with it and how to make all the operators of inception_v3 supported by TASO?thx
step:
1.convert inception_v3 model from .pb to .onnx (use the tf2onnx)
2.load and optimize inception_v3.onnx with TASO
this occured when I load inception_v3.onnx into TASO
(vir-taso) [root@centos /taso-master/taso/examples/myexamples]# python inceptionV3_taso_opt.py
Load onnx model...
0 Add
cuDNN does noot suppoort zero stride for broadcast
Consider switch to other library for broadcastable operators.
python: /taso-master/taso/src/cudnn/cuda_helper.cu:94: void helperSetBroadcastableTensorDescriptor(const taso::Tensor&, const taso::Tensor&, cudnnTensorDescriptor_t): Assertion `false' failed.
Aborted
Some useful infomation
inception_v3 model: https://github.com/dmlc/web-data/blob/master/tensorflow/models/InceptionV3/inception_v3_2016_08_28_frozen-with_shapes.pb
GPU:NVIDIA Tesla P100 PCIe 16GB
CUDA Version 9.0.176 / CUDNN Version 7.4.1
Hi, I tried generating substitutions with a larger search space with depth=4 (MAX_NUM_OPS=4) in generator.cc
. The algorithm examined around 300 mil graphs which take up all of memory (128 GB main memory with 28 GB swap space) then crashed.
Would it be possible to run the algorithm with larger search space if more memory is given? Would optimizing with substitutions with depth of 4 produce a substantially better speed than the default setting in TASO at the moment?
I think further substitution may be generated if we can fix "Cannot find input tensor" and "unsupported" issue.
Below is the log:
Found unsupported ONNX operator: LRN (Skipped)
Cannot find input tensor for operator: name(Pad) type(Pad) (Skipped)
Cannot find input tensor for operator: name(pooling) type(MaxPool) (Skipped)
Cannot find input tensor for operator: name(convolution1) type(Conv) (Skipped)
Cannot find input tensor for operator: name(activation1) type(Relu) (Skipped)
Found unsupported ONNX operator: LRN (Skipped)
Cannot find input tensor for operator: name(Pad1) type(Pad) (Skipped)
Cannot find input tensor for operator: name(pooling1) type(MaxPool) (Skipped)
Cannot find input tensor for operator: name(convolution2) type(Conv) (Skipped)
Cannot find input tensor for operator: name(activation2) type(Relu) (Skipped)
Found unsupported ONNX operator: LRN (Skipped)
Found unsupported ONNX operator: Flatten (Skipped)
Cannot find input tensor for operator: name(innerProduct) type(Gemm) (Skipped)
Found unsupported ONNX operator: Flatten (Skipped)
Cannot find input tensor for operator: name(innerProduct1) type(Gemm) (Skipped)
cost[Conv2D]: i(1 3 128 128) w(20 3 5 5) s(1 1) p(1) cost(0.0544) total_cost(0.0544)
cost[Activation]: mode(8) cost(0.0124) total_cost(0.0669)
Cost metrics: exe_time(0.0669) flops(0.0871) memory_access(6.6384) kernel_launches(2)
===== Start Cost-Based Backtracking Search =====
[0] cost = 0.0669 bestCost = 0.0669 candidates.size() = 0
[1] cost = 0.0273 bestCost = 0.0273 candidates.size() = 0
===== Finish Cost-Based Backtracking Search =====
cost[Conv2D]: i(1 3 128 128) w(20 3 5 5) s(1 1) p(1) cost(0.0273) total_cost(0.0273)
Cost metrics: exe_time(0.0273) flops(0.0882) memory_access(5.4653) kernel_launches(1)
(
In ops.cc method new_weight, current logic is when weight_initial != NULL, then allocate_memory, else weight_ptr would be null ptr.
I don't know whether this logic is what we wanted, but there is an exception about it.
When look into ops_cudnn.cu:allocate_memory, there is a null condition inside actually.
So I think even we don't check weight_initial in ops.cc:new_weight, the logic still holds and no exception.
Hi,
I first download the .onnx model from this repo:
https://github.com/onnx/models/tree/master/vision/classification/resnet/resnet50
Then I use the code below to load the model
old_model = taso.load_onnx("./resnet50.onnx")
An error message appears that
File "/home/ubuntu/taso/python/taso/init.py", line 730, in load_onnx
assert len(node_list) == len(mode.graph.node), "Internal error when reording ONNX operators"
I am wondering how to solve this problem, thanks!
I run resnext50.py in example
found some weight tensor is 0.0
Just as this error shows
conv + bn
will convert to fuse -> new_conv->bias->add
, so the new graph will be fuse->new_conv->bias->add->relu
, then we want to merge conv + relu
to conv
, But there is no edge from conv to relu.
Hi, I try to optimize onnx model, but got CUDNN_STATUS_BAD_PARAM error at src/cudnn/conv2d_kernel.cu:149.
My onnx model could be downloaded from
https://drive.google.com/open?id=1JOoKnXf69hbBpyAWMIEhHdc4Iapv5kcR.
Hi, now how many transformation rules are used in TASO?
verify.py
hangs when running on TASO/graph_subst.pb. The problematic rules seem to be 99, 100, 130, 131.
I also noticed TASO/graph_subst.pb contains 132 rules in total, while the generated pb file has 819 rules. Could you help me understand the difference? Thanks!
Does TASO support TensorRT?
https://github.com/jiazhihao/TASO/blob/master/src/core/transpose.cc#L60
Hi dear author, it seems that the " != " should be modified to "==" : )
for (int i = 0; i < ndim; i++)
for (int j = i + 1; j < ndim; j++)
if (permArray[i] != permArray[j]) {
return Op::INVALID_OP;
}
Just as the title:
CUDNN failure: CUDNN_STATUS_BAD_PARAM
/disk2/ouhang.oh/experiment/TASO/src/cudnn/element_kernel.cu:193
The op is OP_EW_MUL
ele->inputs[0]
$12 = {static MAX_KEY_LENGTH = 274, static MAGIC_NUMBER = 23333, numDim = 4, dim = {1, 64,
128, 96, 21845, 1068004771, 1050824725, 1075071798}, stride = {786432, 12288, 96, 1,
1068058575, 1050951425, 1075066352, 1050770695}, idx = 0, op = {static INVALID_OP = {
static INVALID_OP = , guid = 0,
ptr = 0x0}, guid = 255, ptr = 0x555584b2b2b0}, data_ptr = 0x0, split = {{
ele->inputs[1]
$13 = {static MAX_KEY_LENGTH = 274, static MAGIC_NUMBER = 23333, numDim = 3, dim = {64, 1,
1, 1075096021, 1051062975, 1075074253, 1068048265, 1075113656}, stride = {1, 1, 1,
1050865948, 1075080782, 1068015193, 1075058777, 1067985559}, idx = 0, op = {
static INVALID_OP = {
static INVALID_OP = , guid = 0,
ptr = 0x0}, guid = 106, ptr = 0x555583fa0900}, data_ptr = 0x7ffef4625000, split = {{
static NO_SPLIT = {
static NO_SPLIT = , num = 0, pos = {
0 <repeats 32 times>}}, num = 0, pos = {1068053135, 1050845104, 1075074414,
ele->outputs[0]
$14 = {static MAX_KEY_LENGTH = 274, static MAGIC_NUMBER = 23333, numDim = 4, dim = {1, 64,
128, 96, 0, 0, 0, 0}, stride = {786432, 12288, 96, 1, 1075072159, 1068021451,
1075087136, 1068032736}, idx = 0, op = {static INVALID_OP = {
static INVALID_OP = , guid = 0,
ptr = 0x0}, guid = 0, ptr = 0x0}, data_ptr = 0x0, split = {{static NO_SPLIT = {
static NO_SPLIT = , num = 0, pos = {
0 <repeats 32 times>}}, num = 0, pos = {1068014156, 1050874984, 1075088149,
I'm able to build the source code for TASO (albeit with the following warning that I'm not sure how to fix):
-- Configuring done
CMake Warning at CMakeLists.txt:69 (add_library):
Cannot generate a safe linker search path for target taso_runtime because
files in some directories may conflict with libraries in implicit
directories:
link library [libcublas.so] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
/usr/lib/x86_64-linux-gnu/stubs
Some of these libraries may not be found correctly.
-- Generating done
-- Build files have been written to: /tmp/taso/build
but once I try to install, I get this error:
...
[ 59%] Building CUDA object CMakeFiles/taso_runtime.dir/src/cudnn/concat_kernel.cu.o
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/stddef.h:217:0:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
#warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead."
^~~~~~~
In file included from /tmp/tmpxft_00003481_00000000-5_activation_kernel.cudafe1.stub.c:6:0,
from tmpxft_00003481_00000000-5_activation_kernel.cudafe1.stub.c:1:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
#warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead."
^~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/stddef.h:217:0:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
#warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead."
^~~~~~~
In file included from /tmp/tmpxft_0000348d_00000000-5_batchnorm_kernel.cudafe1.stub.c:6:0,
from tmpxft_0000348d_00000000-5_batchnorm_kernel.cudafe1.stub.c:1:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
#warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead."
^~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/stddef.h:217:0:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
#warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead."
^~~~~~~
In file included from /tmp/tmpxft_000034a3_00000000-5_cast_kernel.cudafe1.stub.c:6:0,
from tmpxft_000034a3_00000000-5_cast_kernel.cudafe1.stub.c:1:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
#warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead."
^~~~~~~
[ 61%] Building CUDA object CMakeFiles/taso_runtime.dir/src/cudnn/constant_kernel.cu.o
[ 62%] Building CUDA object CMakeFiles/taso_runtime.dir/src/cudnn/conv2d_kernel.cu.o
[ 64%] Building CUDA object CMakeFiles/taso_runtime.dir/src/cudnn/cuda_helper.cu.o
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/stddef.h:217:0:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
#warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead."
^~~~~~~
In file included from /tmp/tmpxft_000034c7_00000000-5_concat_kernel.cudafe1.stub.c:6:0,
from tmpxft_000034c7_00000000-5_concat_kernel.cudafe1.stub.c:1:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
#warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead."
^~~~~~~
/tmp/tmpxft_000034c7_00000000-5_concat_kernel.cudafe1.stub.c: In function ‘void __device_stub__Z18assign_with_stridePfPKfiii(_ZN4taso8DATATYPEE*, const _ZN4taso8DATATYPEE*, int, int, int)’:
/tmp/tmpxft_000034c7_00000000-5_concat_kernel.cudafe1.stub.c:12:149: error: ‘__args_arr’ was not declared in this scope
void __device_stub__Z18assign_with_stridePfPKfiii(_ZN4taso8DATATYPEE *__par0, const _ZN4taso8DATATYPEE *__par1, int __par2, int __par3, int __par4){__cudaSetupArgSimple(__par0, 0UL);__cudaSetupArgSimple(__par1, 8UL);__cudaSetupArgSimple(__par2, 16UL);__cudaSetupArgSimple(__par3, 20UL);__cudaSetupArgSimple(__par4, 24UL);__cudaLaunch(((char *)((void ( *)(_ZN4taso8DATATYPEE *, const _ZN4taso8DATATYPEE *, int, int, int))assign_with_stride)));}
^
/tmp/tmpxft_000034c7_00000000-5_concat_kernel.cudafe1.stub.c:12:149: error: ‘__args_idx’ was not declared in this scope
void __device_stub__Z18assign_with_stridePfPKfiii(_ZN4taso8DATATYPEE *__par0, const _ZN4taso8DATATYPEE *__par1, int __par2, int __par3, int __par4){__cudaSetupArgSimple(__par0, 0UL);__cudaSetupArgSimple(__par1, 8UL);__cudaSetupArgSimple(__par2, 16UL);__cudaSetupArgSimple(__par3, 20UL);__cudaSetupArgSimple(__par4, 24UL);__cudaLaunch(((char *)((void ( *)(_ZN4taso8DATATYPEE *, const _ZN4taso8DATATYPEE *, int, int, int))assign_with_stride)));}
^
CMakeFiles/taso_runtime.dir/build.make:862: recipe for target 'CMakeFiles/taso_runtime.dir/src/cudnn/concat_kernel.cu.o' failed
make[2]: *** [CMakeFiles/taso_runtime.dir/src/cudnn/concat_kernel.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/stddef.h:217:0:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
#warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead."
^~~~~~~
In file included from /tmp/tmpxft_000034ed_00000000-5_constant_kernel.cudafe1.stub.c:6:0,
from tmpxft_000034ed_00000000-5_constant_kernel.cudafe1.stub.c:1:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
#warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead."
^~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/stddef.h:217:0:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
#warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead."
^~~~~~~
In file included from /tmp/tmpxft_00003517_00000000-5_cuda_helper.cudafe1.stub.c:6:0,
from tmpxft_00003517_00000000-5_cuda_helper.cudafe1.stub.c:1:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
#warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead."
^~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/stddef.h:217:0:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
#warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead."
^~~~~~~
In file included from /tmp/tmpxft_00003503_00000000-5_conv2d_kernel.cudafe1.stub.c:6:0,
from tmpxft_00003503_00000000-5_conv2d_kernel.cudafe1.stub.c:1:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
#warning "crt/device_functions.h is an internal header file and must not be used directly. Please use cuda_runtime_api.h or cuda_runtime.h instead."
^~~~~~~
/tmp/tmpxft_00003517_00000000-5_cuda_helper.cudafe1.stub.c: In function ‘void __device_stub__Z13assign_kernelPfif(float*, int, float)’:
/tmp/tmpxft_00003517_00000000-5_cuda_helper.cudafe1.stub.c:12:83: error: ‘__args_arr’ was not declared in this scope
void __device_stub__Z13assign_kernelPfif(float *__par0, int __par1, float __par2){__cudaSetupArgSimple(__par0, 0UL);__cudaSetupArgSimple(__par1, 8UL);__cudaSetupArgSimple(__par2, 12UL);__cudaLaunch(((char *)((void ( *)(float *, int, float))assign_kernel)));}
^
/tmp/tmpxft_00003517_00000000-5_cuda_helper.cudafe1.stub.c:12:83: error: ‘__args_idx’ was not declared in this scope
void __device_stub__Z13assign_kernelPfif(float *__par0, int __par1, float __par2){__cudaSetupArgSimple(__par0, 0UL);__cudaSetupArgSimple(__par1, 8UL);__cudaSetupArgSimple(__par2, 12UL);__cudaLaunch(((char *)((void ( *)(float *, int, float))assign_kernel)));}
^
/tmp/tmpxft_00003517_00000000-5_cuda_helper.cudafe1.stub.c: In function ‘void __device_stub__Z11copy_kernelPfPKfi(float*, const float*, int)’:
/tmp/tmpxft_00003517_00000000-5_cuda_helper.cudafe1.stub.c:1:96: error: ‘__args_arr’ was not declared in this scope
#pragma GCC diagnostic push
^
/tmp/tmpxft_00003517_00000000-5_cuda_helper.cudafe1.stub.c:1:96: error: ‘__args_idx’ was not declared in this scope
#pragma GCC diagnostic push
^
CMakeFiles/taso_runtime.dir/build.make:934: recipe for target 'CMakeFiles/taso_runtime.dir/src/cudnn/cuda_helper.cu.o' failed
make[2]: *** [CMakeFiles/taso_runtime.dir/src/cudnn/cuda_helper.cu.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/taso_runtime.dir/all' failed
make[1]: *** [CMakeFiles/taso_runtime.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2
the problems seem unrelated but I thought I'd include the warnings to provide all the context.
Thanks for your help!
what's the use of xflow, it's a python library or not ?
I use pip install xflow, but when I use xflow to optimizite my onnx model ,it return error:
graph = xflow.load_onnx(args.file)
NameError: name 'xflow' is not defined
can you tell me the difference between xflow.optimize() and taso.optimize()??
thanks.
In split.cc:137 parent.divide(left, right, curPos); the assertion is broken.
I have a temp fix to make:
SplitInfo parent = inputs[0].split[axis], left, right;
parent.num = inputs[0].dim[axis];
But the overall logic should also be checked as well in initialization of tensor.split field
After doing
python3 setup.py install
When I import taso
it gives
from .core import *
ImportError: No module named core
docker image nvidia
-- The CXX compiler identification is GNU 7.4.0
-- The CUDA compiler identification is NVIDIA 10.0.130
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Found Protobuf: /usr/local/lib/libprotobuf.so;-lpthread (found version "3.6.1")
-- PROTOBUF=/usr/local/lib/libprotobuf.so
-- Performing Test SUPPORT_CXX11
-- Performing Test SUPPORT_CXX11 - Success
-- Found Threads: TRUE
-- Found CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
-- Found CUDA_CUDA_LIBRARY=/usr/local/cuda/targets/x86_64-linux/lib/stubs/libcuda.so
-- Found CUDA_CUDART_LIBRARY=/usr/local/cuda/lib64/libcudart.so
-- Found CUDA_NVRTC_LIBRARY=CUDA_NVRTC_LIBRARY-NOTFOUND
-- Found CUDA_CUDNN_LIBRARY=CUDA_CUDNN_LIBRARY-NOTFOUND
-- Found CUDA_CUBLAS_LIBRARY=CUDA_CUBLAS_LIBRARY-NOTFOUND
-- CUDA_INCLUDE_DIR=/usr/local/cuda/include
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_CUBLAS_LIBRARY
linked by target "taso_runtime" in directory /storage/data/taso
CUDA_CUDNN_LIBRARY
linked by target "taso_runtime" in directory /storage/data/taso
-- Configuring incomplete, errors occurred!
See also "/storage/data/taso/build/CMakeFiles/CMakeOutput.log".
See also "/storage/data/taso/build/CMakeFiles/CMakeError.log".
The code seems to miss BroadcastAdd, FuseConvBatchNormBias, and FuseConvBatchNormAlphaVar operators implementation in C or with DNNL so the code won't compile.
Would you plan to support DNNL fully?
Hi,
I used followed the installation instruction and successfully ran the python examples.
I follwed SOSP19AE.pdf and successfully compile and generate new graph substitution rules. How ever, when using new substitutions, I got assertion error in GraphXfer::create_operator_from_pb ()
in substitutions.pb as follows:
python: /usr/TASO/src/core/substitution.cc:351: void taso::GraphXfer::create_operator_from_pb(const GraphSubst::Operator&, std::map<int, taso::TensorX>&, bool): Assertion `false' failed.
Aborted (core dumped)
I commented out lines
TASO/src/generator/generator.cc
Line 1762 in a310b60
TASO/src/generator/generator.cc
Line 1763 in a310b60
In src/generator/generator.cc
to not use operator Constant_IMM
which caused Assertion 'false' failed.
I build and run python examples/resnet50.py
but still get the error:
python: /usr/TASO/src/core/substitution.cc:309: void taso::GraphXfer::create_operator_from_pb(const GraphSubst::Operator&, std::map<int, taso::TensorX>&, bool): Assertion `pbOp.input_size() == 2' failed.
Aborted (core dumped)
How to fix this error?
Is there a version of 'generator.cc' that was used in the paper?
After pulling and launching the tase docker container, I am unable to run the basic example script. It exits with the following error:
meistecl@su-lee:~/Documents/repositories/taso$ docker/run_docker.sh tasoml/cuda100
WORKSPACE: /usr/TASO
IMAGE NAME: tasoml/cuda100
DOCKER BINARY: nvidia-docker
root@su-lee:/usr/TASO# python examples/resnet50.py
Cuda failure: 2
/usr/TASO/src/cudnn/ops_cudnn.cu:42
Aborting...
root@su-lee:/usr/TASO#
Additionally, if I copy a sample onnx model file to the container (just a generic .onnx chosen from their GitHub repo) and run taso.load()
on the file, it exits with the same error.
Thank you for your help!
for op:
input: "conv1"
input: "Pad_pads"
input: "Pad_value"
output: "legacy_padded_tensor"
name: "Pad"
op_type: "Pad"
domain: ""
Below assertion fails:
outputs = xf_operators[op.op_type](op, graph, tensors, model.graph.initializer)
if not isinstance(outputs, list):
outputs = [outputs]
assert len(outputs) == len(op.output), "Number of output tensors mismatch"
and
def _pad(op, graph, tensors, initializer):
inputs = _get_inputs(op, graph, tensors, initializer)
attrs = _parse_attribute(op.attribute)
# Currently treat pad as a no op
assert sum(attrs['pads']) == 0
return inputs
Because no attrs pads in this op
Need to support taso::OP_EW_DIV in element_cudnn.cu:41
Traceback (most recent call last):
File "examples/test_onnx.py", line 4, in
graph = taso.load_onnx("/home/taso/yolov3.onnx")
File "/home/.local/lib/python3.7/site-packages/taso-0.1.0-py3.7-linux-x86_64.egg/taso/init.py", line 459, in load_onnx
assert False, "Unsupported ONNX operator: {}".format(op.op_type)
AssertionError: Unsupported ONNX operator: Squeeze
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File "/usr/lib/python3/dist-packages/apport/init.py", line 5, in
from apport.report import Report
File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in
import apport.fileutils
File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in
from apport.packaging_impl import impl as packaging
File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in
import apt
File "/usr/lib/python3/dist-packages/apt/init.py", line 23, in
import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'
Original exception was:
Traceback (most recent call last):
File "examples/test_onnx.py", line 4, in
graph = taso.load_onnx("/home/taso/yolov3.onnx")
File "/home/.local/lib/python3.7/site-packages/taso-0.1.0-py3.7-linux-x86_64.egg/taso/init.py", line 459, in load_onnx
assert False, "Unsupported ONNX operator: {}".format(op.op_type)
AssertionError: Unsupported ONNX operator: Squeeze
Without it, if op1 depends on op2's output, the runtime here may evaluate op1 first which causes failure
python examples/test_onnx.py -f convert_mx_onnx/mx_resnet18.onnx
I run above script with provided docker image. The onnx file convert_mx_onnx/mx_resnet18.onnx
is my resnet18 converted from mxnet. the source graph as below:
source graph
But the optimized graph is different with the source graph obviously, which haven't BN layer and have multiple outputs
the graph created by examples/resnext50.py
is normal. which haven't BN layer. Has BN been merged into Conv layer ? Can anyone explain this ? thanks.
examples/test_onnx.py
error info as below:
Traceback (most recent call last):
File "examples/test_onnx.py", line 16, in <module>
print(" original_cost = {}".format(graph.cost()))
AttributeError: 'taso.core.PyGraph' object has no attribute 'cost'
Hi. Just found an error when applied TASO to bertsquad-8
model of onnx-models.
Note, that I was on dce8c4d at the time of experiment
Traceback (most recent call last):
File "taso/examples/test_onnx.py", line 12, in <module>
graph = taso.load_onnx(args.file)
File "/opt/conda/lib/python3.8/site-packages/taso-0.1.0-py3.8-linux-x86_64.egg/taso/__init__.py", line 805, in load_onnx
outputs = xf_operators[op.op_type](op, graph, tensors, model.graph.initializer)
File "/opt/conda/lib/python3.8/site-packages/taso-0.1.0-py3.8-linux-x86_64.egg/taso/__init__.py", line 539, in _slice
assert len(inputs) >= 3, "Slice requires at least 3 inputs"
AssertionError: Slice requires at least 3 inputs
Hello,
From the Dockerfile, I see that you are using Cuda 10.0 and Ubuntu 16.
Can I use the latest versions instead? WIll it work?
Hi, I want compile generator to generate the substitution sets, but the compile result is showing "generator.cc:17:23: fatal error: xflow/ops.h: No such file or directory". I notice that xflow(https://github.com/dsouzajude/xFlow/tree/1.0.0) is a python project, and didn't give any C source code. How can I get those xflow C type files? Thanks a lot, if someone knows how to handle this problem and provides some hints for this.
hi,
thanks for the great work. But does it support optimize computation graph of pytorch for faster training? If supports, is there any benchmark?
Here is the error I found after applying TASO (commit dce8c4d ) to the GPT-2 model of onnx-models.
Traceback (most recent call last):
File "taso/examples/test_onnx.py", line 19, in <module>
onnx.checker.check_model(onnx_model)
File "/opt/conda/lib/python3.8/site-packages/onnx/checker.py", line 91, in check_model
C.check_model(model.SerializeToString())
onnx.onnx_cpp2py_export.checker.ValidationError: Graph must be in single static assignment (SSA) form, however 'data' has been used as graph input names multiple times.
Just as the title:
(gdb) p input
$1 = (const taso::Tensor &) @0x555589dcf678: {static MAX_KEY_LENGTH = 274, static MAGIC_NUMBER = 23333,
numDim = 1, dim = {1, -2008300352, 21845, -1980118144, 21845, 1057128885, 1021911609, 1035326753}, stride = {1,
1048604497, -1069232552, -1078297696, 1065886482, 1049973618, -1096231070, -1140642707}, idx = 0, op = {
static INVALID_OP = {static INVALID_OP = , guid = 0,
ptr = 0x0}, guid = 105, ptr = 0x55558837f1c0}, data_ptr = 0x7ffef061ea00, split = {{static NO_SPLIT = {
static NO_SPLIT = , num = 0, pos = {
(gdb) p output
$2 = (const taso::Tensor &) @0x555589dd1178: {static MAX_KEY_LENGTH = 274, static MAGIC_NUMBER = 23333,
numDim = 4, dim = {1, 64, 128, 128, 0, 0, 0, 0}, stride = {1048576, 16384, 128, 1, 1045711936, -1098908207,
1011689796, 1054273973}, idx = 0, op = {static INVALID_OP = {
static INVALID_OP = , guid = 0, ptr = 0x0}, guid = 0,
ptr = 0x0}, data_ptr = 0x0, split = {{static NO_SPLIT = {
static NO_SPLIT = , num = 0, pos = {
I build TASO
from source according the install doc.
mkdir build; cd build; cmake ..
sudo make install -j 4
cd ../python
python setup.py install
the output info of cmake ..
as below:
-- The CXX compiler identification is GNU 5.4.0
-- The CUDA compiler identification is NVIDIA 9.0.176
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Found Protobuf: /root/3rdparty/protobuf-3.9.0/lib/libprotobuf.so;-lpthread (found version "3.9.0")
-- PROTOBUF=/root/3rdparty/protobuf-3.9.0/lib/libprotobuf.so
-- Performing Test SUPPORT_CXX11
-- Performing Test SUPPORT_CXX11 - Success
-- Found Threads: TRUE
-- Found CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
-- Found CUDA_CUDA_LIBRARY=/usr/local/cuda/targets/x86_64-linux/lib/stubs/libcuda.so
-- Found CUDA_CUDART_LIBRARY=/usr/local/cuda/lib64/libcudart.so
-- Found CUDA_NVRTC_LIBRARY=/usr/local/cuda/lib64/libnvrtc.so
-- Found CUDA_CUDNN_LIBRARY=/usr/lib/x86_64-linux-gnu/libcudnn.so
-- Found CUDA_CUBLAS_LIBRARY=/usr/local/cuda/lib64/libcublas.so
-- CUDA_INCLUDE_DIR=/usr/local/cuda/include
-- Configuring done
-- Generating done
-- Build files have been written to: /mnt/tscpfs2/xiaotao.chen/Repositories/TASO/build
the output info of make install -j
as below:
Install the project...
-- Install configuration: ""
-- Installing: /usr/local/lib/libtaso_runtime.so
-- Set runtime path of "/usr/local/lib/libtaso_runtime.so" to ""
-- Up-to-date: /usr/local/./include
-- Installing: /usr/local/./include/taso
-- Installing: /usr/local/./include/taso/ops.h
-- Installing: /usr/local/./include/taso/substitution.h
-- Installing: /usr/local/./include/taso/cuda_helper.h
**after run python setup.py install
, to check taso is installed by pip list|grep taso
**
taso 0.1.0
Then i try to run the example with the script python examples/resnext50.py
, and it shows:
Traceback (most recent call last):
File "examples/resnext50.py", line 1, in <module>
import taso as ts
File "/mnt/tscpfs2/xiaotao.chen/Repositories/TASO/python/taso/__init__.py", line 1, in <module>
from .core import *
ImportError: No module named core
I don't know which step i missed. May your help, Thanks. @jiazhihao
After doing
python3 setup.py install
When I import taso
it gives
from .core import *
ImportError: No module named core
Thanks for the great work, recently I 'm trying to do the experiments described in your docs, everything works fine except the experiments measuring the inference latency between mataflow and taso. It fails when I perform the experiments on our 2080ti platform,which gives information like this:
python3 nasrnn.py
Cuda failure: 2
/home/edge/hanskalan/sosp19ae/src/cudnn/ops_cudnn.cu:51
Aborting...
but when I do the same experiment on our Tesla P100 platform with the same configuration(at least I have export the same environment variables in the ~/.bashrc
and /etc/profile
),I can successsfully execute the following command:python3 examples/model.py
and get the expecting result, but when I enter the examples
directory and execute python3 model.py
I get the following fail message:
python: /home/user/hanskalan/sosp19ae/src/core/substitution.cc:312: static void XFlow::GraphXfer::load_graph_xfer_from_pb_file(XFlow::Model*, std::vector<XFlow::GraphXfer*>&, std::__cxx11::string): Assertion `collection.ParseFromIstream(&input)' failed.
已放弃 (核心已转储)
By the way,I 've also tried executing python3 examples/model.py
on the 2080ti platform but it also fails with the same error message as before.
I wonder is there anything wrong when I reproduce the experiment ?Thank you.
Hi, Dear TASO authors.
I I tried to optimize bert with onnx model in https://github.com/onnx/models/tree/master/text/machine_comprehension/bert-squad.
An error came out: python3: /home/workspace/TASO/taso/src/core/reshape.cc:39: taso::Tensor* taso::Graph::reshape(taso::TensorHandle, const std::vector<int>&): Assertion input_size == 1' failed.
The error seems caused by def _reshape(op, graph, tensors, initializer)
, when shape is not in initializer but is produced by other ops, then the shape list would be empty, so that reshape fail.
def _reshape(op, graph, tensors, initializer):
inputs = _get_inputs(op, graph, tensors, initializer)
assert len(inputs) == 2
shape = list()
for data in initializer:
if data.name == op.input[1]:
shape = list()
if data.int64_data != []:
for dim in data.int64_data:
shape.append(dim)
elif data.raw_data and data.raw_data != []:
shape_in_array = numpy_helper.to_array(data)
for dim in shape_in_array:
shape.append(dim)
outputs = graph.reshape(inputs[0], tuple(shape))
return outputs
Look forward to your commit, thanks!
Failure in measure_element_cost cudnnOpTensor
Since couldn't get the name of this op, just hard to debug. Hope we can add such information in taso OpBase class or somewhere else suitable
Dear TASO authors. There are no conditional statements inside the project file:
https://github.com/jiazhihao/TASO/blob/master/CMakeLists.txt
I have observed that I can build "generator.cc" via https://github.com/jiazhihao/TASO/blob/master/src/generator/compile.sh
But why is it not a part of CMakeLists.txt ?
Is it due to historical reasons or it's so just due to some hidden problem?
This will make it easier to add new operator attributes.
Hello, firstly, thank you for making your code available! I was able to build and install TASO as described in the installation instructions. I am using an older GPU (NVIDIA GTX650) with CUDA 10.2 and cuDNN 8. I modified TASO to allocate less memory at startup and am able to run the ResNet50 and ResNext50 python examples.
However, it turns out that TASO is only applying a single substitution for each of these examples to generate the final optimized graph. This is the case even with higher values of alpha like 1.2 and a larger iteration budget. Also, I was able to verify that TASO is only loading around 130 rewrite rules from the graph_subst.pb file that is in the git repo. I was expecting this number to be closer to 700 (as mentioned in your SOSP paper). Are there any additional steps I need to run in order for more rewrite rules to be considered?
Hello, I have seen your video of taso on youtube. In order to accelerate the speed of your searching process, you will do graph partition at first. So is that means your subtitions result will greatly depends on the graph partition. Since resblock and inception module can be easily divided into sungraphs. But on other tasks, this process will be complicate. Will graph partition be the bottleneck of this graph substitution method? Look forward to your answer. Thank you!
Hi,
Thanks for the great work. I meet a problem during using the code.
To export the optimized nasrnn model, I add the three lines at the end of examples/nasrnn.py
:
onnx_model = xf.export_onnx(new_graph)
onnx.checker.check_model(onnx_model)
onnx.save(onnx_model, "nasrnn_opt_xflow.onnx")
Then run the command $ python examples/nasrnn.py
.
The program gives the following error message on function xf.export_onnx(new_graph)
:
......
cost[Concat]: numInputs(2) cost(0.0001) total_cost(1.3592)
cost[Matmul]: input(1:1024 1024:1) weight(1024:512 512:1) cost(0.0116) total_cost(1.3708)
Cost metrics: exe_time(1.3708) flops(0.0196) memory_access(0.4395) kernel_launches(190)
op.guid=53520 mytype=Concat inedges=2
Traceback (most recent call last):
File "examples/nasrnn.py", line 36, in <module>
onnx_model = xf.export_onnx(new_graph)
File "/home/yaoyao/anaconda3/lib/python3.7/site-packages/xflow-0.0.0-py3.7-linux-x86_64.egg/xflow/__init__.py", line 277, in export_onnx
inputs.append(_input_tensor_name(graph, e, op))
File "/home/yaoyao/anaconda3/lib/python3.7/site-packages/xflow-0.0.0-py3.7-linux-x86_64.egg/xflow/__init__.py", line 240, in _input_tensor_name
return "{}{}_{}".format(mytype, op['guid'], input_weight_names[mytype][inedge['dstIdx']])
KeyError: 'Concat'
Thanks a lot if you can help!
Here is the list of possible unsupported operators besides StridedSlice:
NonMaxSuppressionV2
Fill
CropAndResize
TopKV2
ResizeNearestNeighbor,
Merge
I see there is TopK in onnx while don't know what's the difference yet, investigating now.
For this operator plz see: onnx/tensorflow-onnx#715
That's why I convert it as custom operator instead of the official conversion.
So does ResizesNearestNeighbor, Fill
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.