jiazhihao / taso Goto Github PK

View Code? Open in Web Editor NEW

678.0 25.0 89.0 1.24 MB

The Tensor Algebra SuperOptimizer for Deep Learning

License: Apache License 2.0

CMake 1.03% C++ 65.97% Python 20.57% Cuda 11.91% Shell 0.26% Dockerfile 0.27%

deep-learning deep-neural-networks inference-optimization

taso's People

Contributors

Stargazers

Watchers

Forkers

luxgraph olivia111 dkumazaw ouhangkresnik wangyu- arryboom fireae wharu caimogu zhaojp-frank puddingfjz wxbbuaa2011 hjchen2 googol-lab gaomy3832 burness qunluo hlu1 ranzhejiang xchani xyuan lyriczhao xiaming9880 anhtuanhsgs luchangli03 xiaocenxiaocen joranwang yaoyaowd vishalbelsare santhnm2 pipigenius d3v3l0 whwh1996 hiigao sirius93123 gongweibao liuxiaoxuanpku joejiong yuki-aios zeta1999 lockshaw wang-y-z pengaoao kiritohugh niupple jspark1105 seanjparker xiayuqing0622 asdlei99 shigangli sailfish009 liaopeiyuan mfkiwl sagebati hafizas101 wanan51 shiningrain joapolarbear hilbert-yaa andsmi97 ilovefrancy zineos kyrie-zhao blacklunx yangchengtest marvin-yu yinghuoai jiashuyao hyaloid fangrn supercb cvv-student pyjhzwh shoz-f 1ntegr8 cslinwang ashwinvenkatram gg-big-org molojl maryamhgf markuswt fuliuwei empowerszc machinelearning-system chamikasudusinghe r4gb0y

taso's Issues

python test from docker image not working

Hi, I pulled the docker image. The programs in the example folder run nicely. However if I try to run TASO on an ONNX model, for example by calling python python/test.py I get

Traceback (most recent call last):
  File "python/test.py", line 1, in <module>
    import taso
  File "/usr/TASO/python/taso/__init__.py", line 1, in <module>
    from .core import *
ModuleNotFoundError: No module named 'taso.core'

I tried to recompile the project but I am still getting the same error.

CPU/GPU

If I perform optimization on a computer without a GPU and I will run it on a GPU.
Will it affect performance?

Cuda error 77, please suggest how to debug

Hi. I applied TASO (commit dce8c4d) to the several models of onnx-models. One frequent error I got is:

Cuda failure: 77
/workspace/taso/src/cudnn/element_kernel.cu:242
Aborting...

Affected models are: inception-v2-9, mnist-8.log, resnet101-v2-7, resnet18-v2-7, roberta-base-11, shufflenet-9, vgg19-7, yolov4.
Since trivial mnist is in the list, I suspect that the problem was caused by some environment bug, such as package version mismatch or alike.

The error message is not very verbose, and named CUDA line doesn't look suspicious. I would be glad to provide more debugging information but unfortunately I'm not a expert in low-level CUDA. Could you please suggest what can I do to collect more information?

can taso convert old weights to to new weights?

for example, graph optimization can fuse the conv, batchnorm layer to a new conv layer, and calculate the new fused conv weight parameters, can taso calculate the new weight in the transformation after fusing conv and batchnorm?

Unsupported zero stride for broadcast

@jiazhihao hi,I met a problem when I try to optimize inception_v3 model with TASO.Could you please show me how to deal with it and how to make all the operators of inception_v3 supported by TASO?thx
step:
1.convert inception_v3 model from .pb to .onnx (use the tf2onnx)
2.load and optimize inception_v3.onnx with TASO
this occured when I load inception_v3.onnx into TASO

(vir-taso) [root@centos /taso-master/taso/examples/myexamples]# python inceptionV3_taso_opt.py
Load onnx model...
0 Add
cuDNN does noot suppoort zero stride for broadcast
Consider switch to other library for broadcastable operators.
python: /taso-master/taso/src/cudnn/cuda_helper.cu:94: void helperSetBroadcastableTensorDescriptor(const taso::Tensor&, const taso::Tensor&, cudnnTensorDescriptor_t): Assertion `false' failed.
Aborted

Some useful infomation
inception_v3 model: https://github.com/dmlc/web-data/blob/master/tensorflow/models/InceptionV3/inception_v3_2016_08_28_frozen-with_shapes.pb
GPU:NVIDIA Tesla P100 PCIe 16GB
CUDA Version 9.0.176 / CUDNN Version 7.4.1

Larger search depth exhausts memory space

Hi, I tried generating substitutions with a larger search space with depth=4 (MAX_NUM_OPS=4) in generator.cc. The algorithm examined around 300 mil graphs which take up all of memory (128 GB main memory with 28 GB swap space) then crashed.

Would it be possible to run the algorithm with larger search space if more memory is given? Would optimizing with substitutions with depth of 4 produce a substantially better speed than the default setting in TASO at the moment?

Successfully Run but Little Improvement

I think further substitution may be generated if we can fix "Cannot find input tensor" and "unsupported" issue.
Below is the log:

Found unsupported ONNX operator: LRN (Skipped)
Cannot find input tensor for operator: name(Pad) type(Pad) (Skipped)
Cannot find input tensor for operator: name(pooling) type(MaxPool) (Skipped)
Cannot find input tensor for operator: name(convolution1) type(Conv) (Skipped)
Cannot find input tensor for operator: name(activation1) type(Relu) (Skipped)
Found unsupported ONNX operator: LRN (Skipped)
Cannot find input tensor for operator: name(Pad1) type(Pad) (Skipped)
Cannot find input tensor for operator: name(pooling1) type(MaxPool) (Skipped)
Cannot find input tensor for operator: name(convolution2) type(Conv) (Skipped)
Cannot find input tensor for operator: name(activation2) type(Relu) (Skipped)
Found unsupported ONNX operator: LRN (Skipped)
Found unsupported ONNX operator: Flatten (Skipped)
Cannot find input tensor for operator: name(innerProduct) type(Gemm) (Skipped)
Found unsupported ONNX operator: Flatten (Skipped)
Cannot find input tensor for operator: name(innerProduct1) type(Gemm) (Skipped)
cost[Conv2D]: i(1 3 128 128) w(20 3 5 5) s(1 1) p(1) cost(0.0544) total_cost(0.0544)
cost[Activation]: mode(8) cost(0.0124) total_cost(0.0669)
Cost metrics: exe_time(0.0669) flops(0.0871) memory_access(6.6384) kernel_launches(2)

    ===== Start Cost-Based Backtracking Search =====
    [0] cost = 0.0669 bestCost = 0.0669 candidates.size() = 0
    [1] cost = 0.0273 bestCost = 0.0273 candidates.size() = 0
    ===== Finish Cost-Based Backtracking Search =====

    cost[Conv2D]: i(1 3 128 128) w(20 3 5 5) s(1 1) p(1) cost(0.0273) total_cost(0.0273)
    Cost metrics: exe_time(0.0273) flops(0.0882) memory_access(5.4653) kernel_launches(1)

(

Support PRelu in TASO

When weight_initial is NULL then new_weight will fail

In ops.cc method new_weight, current logic is when weight_initial != NULL, then allocate_memory, else weight_ptr would be null ptr.

I don't know whether this logic is what we wanted, but there is an exception about it.

When look into ops_cudnn.cu:allocate_memory, there is a null condition inside actually.

So I think even we don't check weight_initial in ops.cc:new_weight, the logic still holds and no exception.

AssertionError: Internal error when reording ONNX operators

Hi,
I first download the .onnx model from this repo:
https://github.com/onnx/models/tree/master/vision/classification/resnet/resnet50

Then I use the code below to load the model
old_model = taso.load_onnx("./resnet50.onnx")

An error message appears that
File "/home/ubuntu/taso/python/taso/init.py", line 730, in load_onnx
assert len(node_list) == len(mode.graph.node), "Internal error when reording ONNX operators"

I am wondering how to solve this problem, thanks!

zero weight in resnext50

I run resnext50.py in example
found some weight tensor is 0.0

AssertionError: Unsupported ONNX operator: StridedSlice

Just as this error shows

failed merge conv + bn + relu

conv + bn will convert to fuse -> new_conv->bias->add, so the new graph will be fuse->new_conv->bias->add->relu， then we want to merge conv + relu to conv, But there is no edge from conv to relu.

CUDNN failure: CUDNN_STATUS_BAD_PARAM-src/cudnn/conv2d_kernel.cu

Hi, I try to optimize onnx model, but got CUDNN_STATUS_BAD_PARAM error at src/cudnn/conv2d_kernel.cu:149.
My onnx model could be downloaded from
https://drive.google.com/open?id=1JOoKnXf69hbBpyAWMIEhHdc4Iapv5kcR.

How many rules are there in TASO now?

Hi, now how many transformation rules are used in TASO?

verify.py hangs on graph_subst.pb

verify.py hangs when running on TASO/graph_subst.pb. The problematic rules seem to be 99, 100, 130, 131.

I also noticed TASO/graph_subst.pb contains 132 rules in total, while the generated pb file has 819 rules. Could you help me understand the difference? Thanks!

TensorRT support

Does TASO support TensorRT?

A bug about transpose???

https://github.com/jiazhihao/TASO/blob/master/src/core/transpose.cc#L60
Hi dear author, it seems that the " != " should be modified to "==" : )

  for (int i = 0; i < ndim; i++)
    for (int j = i + 1; j < ndim; j++)
      if (permArray[i] != permArray[j]) {
        return Op::INVALID_OP;
      }

Documentation for the Python interface

CUDNN failure: CUDNN_STATUS_BAD_PARAM TASO/src/cudnn/element_kernel.cu:193

Just as the title:
CUDNN failure: CUDNN_STATUS_BAD_PARAM
/disk2/ouhang.oh/experiment/TASO/src/cudnn/element_kernel.cu:193

The op is OP_EW_MUL

ele->inputs[0]

$12 = {static MAX_KEY_LENGTH = 274, static MAGIC_NUMBER = 23333, numDim = 4, dim = {1, 64,
128, 96, 21845, 1068004771, 1050824725, 1075071798}, stride = {786432, 12288, 96, 1,
1068058575, 1050951425, 1075066352, 1050770695}, idx = 0, op = {static INVALID_OP = {
static INVALID_OP = , guid = 0,
ptr = 0x0}, guid = 255, ptr = 0x555584b2b2b0}, data_ptr = 0x0, split = {{

ele->inputs[1]

$13 = {static MAX_KEY_LENGTH = 274, static MAGIC_NUMBER = 23333, numDim = 3, dim = {64, 1,
1, 1075096021, 1051062975, 1075074253, 1068048265, 1075113656}, stride = {1, 1, 1,
1050865948, 1075080782, 1068015193, 1075058777, 1067985559}, idx = 0, op = {
static INVALID_OP = {
static INVALID_OP = , guid = 0,
ptr = 0x0}, guid = 106, ptr = 0x555583fa0900}, data_ptr = 0x7ffef4625000, split = {{
static NO_SPLIT = {
static NO_SPLIT = , num = 0, pos = {
0 <repeats 32 times>}}, num = 0, pos = {1068053135, 1050845104, 1075074414,

ele->outputs[0]

$14 = {static MAX_KEY_LENGTH = 274, static MAGIC_NUMBER = 23333, numDim = 4, dim = {1, 64,
128, 96, 0, 0, 0, 0}, stride = {786432, 12288, 96, 1, 1075072159, 1068021451,
1075087136, 1068032736}, idx = 0, op = {static INVALID_OP = {
static INVALID_OP = , guid = 0,
ptr = 0x0}, guid = 0, ptr = 0x0}, data_ptr = 0x0, split = {{static NO_SPLIT = {
static NO_SPLIT = , num = 0, pos = {
0 <repeats 32 times>}}, num = 0, pos = {1068014156, 1050874984, 1075088149,

Error building TASO from source code

I'm able to build the source code for TASO (albeit with the following warning that I'm not sure how to fix):

-- Configuring done
CMake Warning at CMakeLists.txt:69 (add_library):
  Cannot generate a safe linker search path for target taso_runtime because
  files in some directories may conflict with libraries in implicit
  directories:

    link library [libcublas.so] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/lib/x86_64-linux-gnu/stubs

  Some of these libraries may not be found correctly.


-- Generating done
-- Build files have been written to: /tmp/taso/build

but once I try to install, I get this error:

...
[ 59%] Building CUDA object CMakeFiles/taso_runtime.dir/src/cudnn/concat_kernel.cu.o
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/stddef.h:217:0:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
 #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead."
  ^~~~~~~
In file included from /tmp/tmpxft_00003481_00000000-5_activation_kernel.cudafe1.stub.c:6:0,
                 from tmpxft_00003481_00000000-5_activation_kernel.cudafe1.stub.c:1:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
 #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead."
  ^~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/stddef.h:217:0:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
 #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead."
  ^~~~~~~
In file included from /tmp/tmpxft_0000348d_00000000-5_batchnorm_kernel.cudafe1.stub.c:6:0,
                 from tmpxft_0000348d_00000000-5_batchnorm_kernel.cudafe1.stub.c:1:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
 #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead."
  ^~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/stddef.h:217:0:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
 #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead."
  ^~~~~~~
In file included from /tmp/tmpxft_000034a3_00000000-5_cast_kernel.cudafe1.stub.c:6:0,
                 from tmpxft_000034a3_00000000-5_cast_kernel.cudafe1.stub.c:1:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
 #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead."
  ^~~~~~~
[ 61%] Building CUDA object CMakeFiles/taso_runtime.dir/src/cudnn/constant_kernel.cu.o
[ 62%] Building CUDA object CMakeFiles/taso_runtime.dir/src/cudnn/conv2d_kernel.cu.o
[ 64%] Building CUDA object CMakeFiles/taso_runtime.dir/src/cudnn/cuda_helper.cu.o
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/stddef.h:217:0:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
 #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead."
  ^~~~~~~
In file included from /tmp/tmpxft_000034c7_00000000-5_concat_kernel.cudafe1.stub.c:6:0,
                 from tmpxft_000034c7_00000000-5_concat_kernel.cudafe1.stub.c:1:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
 #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead."
  ^~~~~~~
/tmp/tmpxft_000034c7_00000000-5_concat_kernel.cudafe1.stub.c: In function ‘void __device_stub__Z18assign_with_stridePfPKfiii(_ZN4taso8DATATYPEE*, const _ZN4taso8DATATYPEE*, int, int, int)’:
/tmp/tmpxft_000034c7_00000000-5_concat_kernel.cudafe1.stub.c:12:149: error: ‘__args_arr’ was not declared in this scope
 void __device_stub__Z18assign_with_stridePfPKfiii(_ZN4taso8DATATYPEE *__par0, const _ZN4taso8DATATYPEE *__par1, int __par2, int __par3, int __par4){__cudaSetupArgSimple(__par0, 0UL);__cudaSetupArgSimple(__par1, 8UL);__cudaSetupArgSimple(__par2, 16UL);__cudaSetupArgSimple(__par3, 20UL);__cudaSetupArgSimple(__par4, 24UL);__cudaLaunch(((char *)((void ( *)(_ZN4taso8DATATYPEE *, const _ZN4taso8DATATYPEE *, int, int, int))assign_with_stride)));}
                                                                                                                                                     ^
/tmp/tmpxft_000034c7_00000000-5_concat_kernel.cudafe1.stub.c:12:149: error: ‘__args_idx’ was not declared in this scope
 void __device_stub__Z18assign_with_stridePfPKfiii(_ZN4taso8DATATYPEE *__par0, const _ZN4taso8DATATYPEE *__par1, int __par2, int __par3, int __par4){__cudaSetupArgSimple(__par0, 0UL);__cudaSetupArgSimple(__par1, 8UL);__cudaSetupArgSimple(__par2, 16UL);__cudaSetupArgSimple(__par3, 20UL);__cudaSetupArgSimple(__par4, 24UL);__cudaLaunch(((char *)((void ( *)(_ZN4taso8DATATYPEE *, const _ZN4taso8DATATYPEE *, int, int, int))assign_with_stride)));}
                                                                                                                                                     ^
CMakeFiles/taso_runtime.dir/build.make:862: recipe for target 'CMakeFiles/taso_runtime.dir/src/cudnn/concat_kernel.cu.o' failed
make[2]: *** [CMakeFiles/taso_runtime.dir/src/cudnn/concat_kernel.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/stddef.h:217:0:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
 #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead."
  ^~~~~~~
In file included from /tmp/tmpxft_000034ed_00000000-5_constant_kernel.cudafe1.stub.c:6:0,
                 from tmpxft_000034ed_00000000-5_constant_kernel.cudafe1.stub.c:1:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
 #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead."
  ^~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/stddef.h:217:0:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
 #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead."
  ^~~~~~~
In file included from /tmp/tmpxft_00003517_00000000-5_cuda_helper.cudafe1.stub.c:6:0,
                 from tmpxft_00003517_00000000-5_cuda_helper.cudafe1.stub.c:1:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
 #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead."
  ^~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/stddef.h:217:0:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
 #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead."
  ^~~~~~~
In file included from /tmp/tmpxft_00003503_00000000-5_conv2d_kernel.cudafe1.stub.c:6:0,
                 from tmpxft_00003503_00000000-5_conv2d_kernel.cudafe1.stub.c:1:
/usr/local/cuda-10.0/include/crt/host_runtime.h:19:2: warning: #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
 #warning "crt/device_functions.h is an internal header file and must not be used directly.  Please use cuda_runtime_api.h or cuda_runtime.h instead."
  ^~~~~~~
/tmp/tmpxft_00003517_00000000-5_cuda_helper.cudafe1.stub.c: In function ‘void __device_stub__Z13assign_kernelPfif(float*, int, float)’:
/tmp/tmpxft_00003517_00000000-5_cuda_helper.cudafe1.stub.c:12:83: error: ‘__args_arr’ was not declared in this scope
 void __device_stub__Z13assign_kernelPfif(float *__par0, int __par1, float __par2){__cudaSetupArgSimple(__par0, 0UL);__cudaSetupArgSimple(__par1, 8UL);__cudaSetupArgSimple(__par2, 12UL);__cudaLaunch(((char *)((void ( *)(float *, int, float))assign_kernel)));}
                                                                                   ^
/tmp/tmpxft_00003517_00000000-5_cuda_helper.cudafe1.stub.c:12:83: error: ‘__args_idx’ was not declared in this scope
 void __device_stub__Z13assign_kernelPfif(float *__par0, int __par1, float __par2){__cudaSetupArgSimple(__par0, 0UL);__cudaSetupArgSimple(__par1, 8UL);__cudaSetupArgSimple(__par2, 12UL);__cudaLaunch(((char *)((void ( *)(float *, int, float))assign_kernel)));}
                                                                                   ^
/tmp/tmpxft_00003517_00000000-5_cuda_helper.cudafe1.stub.c: In function ‘void __device_stub__Z11copy_kernelPfPKfi(float*, const float*, int)’:
/tmp/tmpxft_00003517_00000000-5_cuda_helper.cudafe1.stub.c:1:96: error: ‘__args_arr’ was not declared in this scope
 #pragma GCC diagnostic push
                                                                                                ^
/tmp/tmpxft_00003517_00000000-5_cuda_helper.cudafe1.stub.c:1:96: error: ‘__args_idx’ was not declared in this scope
 #pragma GCC diagnostic push
                                                                                                ^
CMakeFiles/taso_runtime.dir/build.make:934: recipe for target 'CMakeFiles/taso_runtime.dir/src/cudnn/cuda_helper.cu.o' failed
make[2]: *** [CMakeFiles/taso_runtime.dir/src/cudnn/cuda_helper.cu.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/taso_runtime.dir/all' failed
make[1]: *** [CMakeFiles/taso_runtime.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2

the problems seem unrelated but I thought I'd include the warnings to provide all the context.

Thanks for your help!

xflow error

what's the use of xflow, it's a python library or not ?
I use pip install xflow, but when I use xflow to optimizite my onnx model ,it return error:
graph = xflow.load_onnx(args.file)
NameError: name 'xflow' is not defined

can you tell me the difference between xflow.optimize() and taso.optimize()??
thanks.

assert num == 0 in split

In split.cc:137 parent.divide(left, right, curPos); the assertion is broken.

I have a temp fix to make:

SplitInfo parent = inputs[0].split[axis], left, right;
parent.num = inputs[0].dim[axis];

But the overall logic should also be checked as well in initialization of tensor.split field

from .core import * ImportError: No module named core

After doing
python3 setup.py install

When I import taso it gives

from .core import *
ImportError: No module named core

Error buidl source

docker image nvidia


-- The CXX compiler identification is GNU 7.4.0
-- The CUDA compiler identification is NVIDIA 10.0.130
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Found Protobuf: /usr/local/lib/libprotobuf.so;-lpthread (found version "3.6.1")
-- PROTOBUF=/usr/local/lib/libprotobuf.so
-- Performing Test SUPPORT_CXX11
-- Performing Test SUPPORT_CXX11 - Success
-- Found Threads: TRUE
-- Found CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
-- Found CUDA_CUDA_LIBRARY=/usr/local/cuda/targets/x86_64-linux/lib/stubs/libcuda.so
-- Found CUDA_CUDART_LIBRARY=/usr/local/cuda/lib64/libcudart.so
-- Found CUDA_NVRTC_LIBRARY=CUDA_NVRTC_LIBRARY-NOTFOUND
-- Found CUDA_CUDNN_LIBRARY=CUDA_CUDNN_LIBRARY-NOTFOUND
-- Found CUDA_CUBLAS_LIBRARY=CUDA_CUBLAS_LIBRARY-NOTFOUND
-- CUDA_INCLUDE_DIR=/usr/local/cuda/include
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_CUBLAS_LIBRARY
    linked by target "taso_runtime" in directory /storage/data/taso
CUDA_CUDNN_LIBRARY
    linked by target "taso_runtime" in directory /storage/data/taso

-- Configuring incomplete, errors occurred!
See also "/storage/data/taso/build/CMakeFiles/CMakeOutput.log".
See also "/storage/data/taso/build/CMakeFiles/CMakeError.log".

DNNL partly implemented - won't compile

The code seems to miss BroadcastAdd, FuseConvBatchNormBias, and FuseConvBatchNormAlphaVar operators implementation in C or with DNNL so the code won't compile.

Would you plan to support DNNL fully?

Assertion failed when using new substitution rules.

Hi,

I used followed the installation instruction and successfully ran the python examples.

I follwed SOSP19AE.pdf and successfully compile and generate new graph substitution rules. How ever, when using new substitutions, I got assertion error in GraphXfer::create_operator_from_pb () in substitutions.pb as follows:

python: /usr/TASO/src/core/substitution.cc:351: void taso::GraphXfer::create_operator_from_pb(const GraphSubst::Operator&, std::map<int, taso::TensorX>&, bool): Assertion `false' failed.
Aborted (core dumped)

I commented out lines

TASO/src/generator/generator.cc

Line 1762 in a310b60

ops.push_back(new ConstantIMMTemp(w1.numDim, w1.dim));

and

TASO/src/generator/generator.cc

Line 1763 in a310b60

operator_names[ops.back()] = "Constant_IMM";

In src/generator/generator.cc to not use operator Constant_IMM which caused Assertion 'false' failed.

I build and run python examples/resnet50.py but still get the error:

python: /usr/TASO/src/core/substitution.cc:309: void taso::GraphXfer::create_operator_from_pb(const GraphSubst::Operator&, std::map<int, taso::TensorX>&, bool): Assertion `pbOp.input_size() == 2' failed.
Aborted (core dumped)

How to fix this error?
Is there a version of 'generator.cc' that was used in the paper?

Error running example from docker image

After pulling and launching the tase docker container, I am unable to run the basic example script. It exits with the following error:

meistecl@su-lee:~/Documents/repositories/taso$ docker/run_docker.sh tasoml/cuda100
WORKSPACE: /usr/TASO
IMAGE NAME: tasoml/cuda100
DOCKER BINARY: nvidia-docker
root@su-lee:/usr/TASO# python examples/resnet50.py 
Cuda failure: 2
/usr/TASO/src/cudnn/ops_cudnn.cu:42
Aborting...
root@su-lee:/usr/TASO#

Additionally, if I copy a sample onnx model file to the container (just a generic .onnx chosen from their GitHub repo) and run taso.load() on the file, it exits with the same error.

Thank you for your help!

output length mismatch and no attr

for op:
input: "conv1"
input: "Pad_pads"
input: "Pad_value"
output: "legacy_padded_tensor"
name: "Pad"
op_type: "Pad"
domain: ""

Below assertion fails:

outputs = xf_operators[op.op_type](op, graph, tensors, model.graph.initializer)
if not isinstance(outputs, list):
outputs = [outputs]
assert len(outputs) == len(op.output), "Number of output tensors mismatch"

and

def _pad(op, graph, tensors, initializer):
inputs = _get_inputs(op, graph, tensors, initializer)
attrs = _parse_attribute(op.attribute)
# Currently treat pad as a no op
assert sum(attrs['pads']) == 0
return inputs

Because no attrs pads in this op

void taso::Model::measure_element_cost(taso::Element*): Assertion `false' failed.

Need to support taso::OP_EW_DIV in element_cudnn.cu:41

Unsupported ONNX operator: Squeeze

Traceback (most recent call last):
File "examples/test_onnx.py", line 4, in
graph = taso.load_onnx("/home/taso/yolov3.onnx")
File "/home/.local/lib/python3.7/site-packages/taso-0.1.0-py3.7-linux-x86_64.egg/taso/init.py", line 459, in load_onnx
assert False, "Unsupported ONNX operator: {}".format(op.op_type)
AssertionError: Unsupported ONNX operator: Squeeze
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File "/usr/lib/python3/dist-packages/apport/init.py", line 5, in
from apport.report import Report
File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in
import apport.fileutils
File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in
from apport.packaging_impl import impl as packaging
File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in
import apt
File "/usr/lib/python3/dist-packages/apt/init.py", line 23, in
import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'

Original exception was:
Traceback (most recent call last):
File "examples/test_onnx.py", line 4, in
graph = taso.load_onnx("/home/taso/yolov3.onnx")
File "/home/.local/lib/python3.7/site-packages/taso-0.1.0-py3.7-linux-x86_64.egg/taso/init.py", line 459, in load_onnx
assert False, "Unsupported ONNX operator: {}".format(op.op_type)
AssertionError: Unsupported ONNX operator: Squeeze

Add topological sort in init.py

Without it, if op1 depends on op2's output, the runtime here may evaluate op1 first which causes failure

the optimized graph haven't BN layer and it's different with the source graph in onnx

script

python examples/test_onnx.py -f convert_mx_onnx/mx_resnet18.onnx

I run above script with provided docker image. The onnx file convert_mx_onnx/mx_resnet18.onnx is my resnet18 converted from mxnet. the source graph as below:
source graph

But the optimized graph is different with the source graph obviously, which haven't BN layer and have multiple outputs

optimized source graph

the graph created by examples/resnext50.py is normal. which haven't BN layer. Has BN been merged into Conv layer ? Can anyone explain this ? thanks.

another problem in `examples/test_onnx.py`

error info as below:

Traceback (most recent call last):
  File "examples/test_onnx.py", line 16, in <module>
    print(" original_cost = {}".format(graph.cost()))
AttributeError: 'taso.core.PyGraph' object has no attribute 'cost'

Bertsquad-8 from onnx-models: AssertionError: Slice requires at least 3 inputs

Hi. Just found an error when applied TASO to bertsquad-8 model of onnx-models.
Note, that I was on dce8c4d at the time of experiment

Traceback (most recent call last):
  File "taso/examples/test_onnx.py", line 12, in <module>
    graph = taso.load_onnx(args.file)
  File "/opt/conda/lib/python3.8/site-packages/taso-0.1.0-py3.8-linux-x86_64.egg/taso/__init__.py", line 805, in load_onnx
    outputs = xf_operators[op.op_type](op, graph, tensors, model.graph.initializer)
  File "/opt/conda/lib/python3.8/site-packages/taso-0.1.0-py3.8-linux-x86_64.egg/taso/__init__.py", line 539, in _slice
    assert len(inputs) >= 3, "Slice requires at least 3 inputs"
AssertionError: Slice requires at least 3 inputs

Compiling TASO

Hello,

From the Dockerfile, I see that you are using Cuda 10.0 and Ubuntu 16.
Can I use the latest versions instead? WIll it work?

Compile generator.cc: cannot find "xflow/ops.h" file

Hi, I want compile generator to generate the substitution sets, but the compile result is showing "generator.cc:17:23: fatal error: xflow/ops.h: No such file or directory". I notice that xflow(https://github.com/dsouzajude/xFlow/tree/1.0.0) is a python project, and didn't give any C source code. How can I get those xflow C type files? Thanks a lot, if someone knows how to handle this problem and provides some hints for this.

optimize pytorch computation graph for training

hi,
thanks for the great work. But does it support optimize computation graph of pytorch for faster training? If supports, is there any benchmark?

Gpt2-10 model from onnx-models: Graph must be in single static assignment (SSA) form

Here is the error I found after applying TASO (commit dce8c4d ) to the GPT-2 model of onnx-models.

Traceback (most recent call last):
  File "taso/examples/test_onnx.py", line 19, in <module>
    onnx.checker.check_model(onnx_model)
  File "/opt/conda/lib/python3.8/site-packages/onnx/checker.py", line 91, in check_model
    C.check_model(model.SerializeToString())
onnx.onnx_cpp2py_export.checker.ValidationError: Graph must be in single static assignment (SSA) form, however 'data' has been used as graph input names multiple times.

cuda_helper.cu:82: void helperSetBroadcastableTensorDescriptor(const taso::Tensor&, const taso::Tensor&, cudnnTensorDescriptor_t): Assertion `false' failed.

Just as the title:

(gdb) p input
$1 = (const taso::Tensor &) @0x555589dcf678: {static MAX_KEY_LENGTH = 274, static MAGIC_NUMBER = 23333,
numDim = 1, dim = {1, -2008300352, 21845, -1980118144, 21845, 1057128885, 1021911609, 1035326753}, stride = {1,
1048604497, -1069232552, -1078297696, 1065886482, 1049973618, -1096231070, -1140642707}, idx = 0, op = {
static INVALID_OP = {static INVALID_OP = , guid = 0,
ptr = 0x0}, guid = 105, ptr = 0x55558837f1c0}, data_ptr = 0x7ffef061ea00, split = {{static NO_SPLIT = {
static NO_SPLIT = , num = 0, pos = {

(gdb) p output
$2 = (const taso::Tensor &) @0x555589dd1178: {static MAX_KEY_LENGTH = 274, static MAGIC_NUMBER = 23333,
numDim = 4, dim = {1, 64, 128, 128, 0, 0, 0, 0}, stride = {1048576, 16384, 128, 1, 1045711936, -1098908207,
1011689796, 1054273973}, idx = 0, op = {static INVALID_OP = {
static INVALID_OP = , guid = 0, ptr = 0x0}, guid = 0,
ptr = 0x0}, data_ptr = 0x0, split = {{static NO_SPLIT = {
static NO_SPLIT = , num = 0, pos = {

Runtime Error: ImportError: No module named core

I build TASO from source according the install doc.

mkdir build; cd build; cmake ..
sudo make install -j 4
cd ../python
python setup.py install

the output info of cmake .. as below:

-- The CXX compiler identification is GNU 5.4.0
-- The CUDA compiler identification is NVIDIA 9.0.176
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Found Protobuf: /root/3rdparty/protobuf-3.9.0/lib/libprotobuf.so;-lpthread (found version "3.9.0") 
-- PROTOBUF=/root/3rdparty/protobuf-3.9.0/lib/libprotobuf.so
-- Performing Test SUPPORT_CXX11
-- Performing Test SUPPORT_CXX11 - Success
-- Found Threads: TRUE  
-- Found CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
-- Found CUDA_CUDA_LIBRARY=/usr/local/cuda/targets/x86_64-linux/lib/stubs/libcuda.so
-- Found CUDA_CUDART_LIBRARY=/usr/local/cuda/lib64/libcudart.so
-- Found CUDA_NVRTC_LIBRARY=/usr/local/cuda/lib64/libnvrtc.so
-- Found CUDA_CUDNN_LIBRARY=/usr/lib/x86_64-linux-gnu/libcudnn.so
-- Found CUDA_CUBLAS_LIBRARY=/usr/local/cuda/lib64/libcublas.so
-- CUDA_INCLUDE_DIR=/usr/local/cuda/include
-- Configuring done
-- Generating done
-- Build files have been written to: /mnt/tscpfs2/xiaotao.chen/Repositories/TASO/build

the output info of make install -j as below:

Install the project...
-- Install configuration: ""
-- Installing: /usr/local/lib/libtaso_runtime.so
-- Set runtime path of "/usr/local/lib/libtaso_runtime.so" to ""
-- Up-to-date: /usr/local/./include
-- Installing: /usr/local/./include/taso
-- Installing: /usr/local/./include/taso/ops.h
-- Installing: /usr/local/./include/taso/substitution.h
-- Installing: /usr/local/./include/taso/cuda_helper.h

**after run python setup.py install, to check taso is installed by pip list|grep taso **

taso                               0.1.0

Then i try to run the example with the script python examples/resnext50.py, and it shows:

Traceback (most recent call last):
  File "examples/resnext50.py", line 1, in <module>
    import taso as ts
  File "/mnt/tscpfs2/xiaotao.chen/Repositories/TASO/python/taso/__init__.py", line 1, in <module>
    from .core import *
ImportError: No module named core

I don't know which step i missed. May your help, Thanks. @jiazhihao

from .core import * ImportError: No module named core

After doing
python3 setup.py install

When I import taso it gives

from .core import *
ImportError: No module named core

Some problem happened on different experiment platforms

Thanks for the great work, recently I 'm trying to do the experiments described in your docs, everything works fine except the experiments measuring the inference latency between mataflow and taso. It fails when I perform the experiments on our 2080ti platform,which gives information like this:

python3 nasrnn.py

Cuda failure: 2
/home/edge/hanskalan/sosp19ae/src/cudnn/ops_cudnn.cu:51
Aborting...

but when I do the same experiment on our Tesla P100 platform with the same configuration(at least I have export the same environment variables in the ~/.bashrc and /etc/profile),I can successsfully execute the following command:python3 examples/model.py and get the expecting result, but when I enter the examplesdirectory and execute python3 model.py I get the following fail message:

python: /home/user/hanskalan/sosp19ae/src/core/substitution.cc:312: static void XFlow::GraphXfer::load_graph_xfer_from_pb_file(XFlow::Model*, std::vector<XFlow::GraphXfer*>&, std::__cxx11::string): Assertion `collection.ParseFromIstream(&input)' failed.
已放弃 (核心已转储)

By the way,I 've also tried executing python3 examples/model.py on the 2080ti platform but it also fails with the same error message as before.

I wonder is there anything wrong when I reproduce the experiment ?Thank you.

Error when optimizing BERT-SQuAD

Hi, Dear TASO authors.
I I tried to optimize bert with onnx model in https://github.com/onnx/models/tree/master/text/machine_comprehension/bert-squad.
An error came out: python3: /home/workspace/TASO/taso/src/core/reshape.cc:39: taso::Tensor* taso::Graph::reshape(taso::TensorHandle, const std::vector<int>&): Assertion input_size == 1' failed.

The error seems caused by def _reshape(op, graph, tensors, initializer), when shape is not in initializer but is produced by other ops, then the shape list would be empty, so that reshape fail.

def _reshape(op, graph, tensors, initializer):
    inputs = _get_inputs(op, graph, tensors, initializer)
    assert len(inputs) == 2
    shape = list()
    for data in initializer:
        if data.name == op.input[1]:
            shape = list()
            if data.int64_data != []:
                for dim in data.int64_data:
                    shape.append(dim)
            elif data.raw_data and data.raw_data != []:
                shape_in_array = numpy_helper.to_array(data)
                for dim in shape_in_array:
                    shape.append(dim)
    outputs = graph.reshape(inputs[0], tuple(shape))
    return outputs

Look forward to your commit, thanks!

CUDNN failure: CUDNN_STATUS_BAD_PARAM

Failure in measure_element_cost cudnnOpTensor

Since couldn't get the name of this op, just hard to debug. Hope we can add such information in taso OpBase class or somewhere else suitable

Why "generator" building is not a part of CMakeLists.txt?

Dear TASO authors. There are no conditional statements inside the project file:
https://github.com/jiazhihao/TASO/blob/master/CMakeLists.txt

I have observed that I can build "generator.cc" via https://github.com/jiazhihao/TASO/blob/master/src/generator/compile.sh

But why is it not a part of CMakeLists.txt ?
Is it due to historical reasons or it's so just due to some hidden problem?

Change attribute type from enum to string

This will make it easier to add new operator attributes.

Problems with Rule Application (ResNet50 and ResNext50) and Number of Default Rules

Hello, firstly, thank you for making your code available! I was able to build and install TASO as described in the installation instructions. I am using an older GPU (NVIDIA GTX650) with CUDA 10.2 and cuDNN 8. I modified TASO to allocate less memory at startup and am able to run the ResNet50 and ResNext50 python examples.

However, it turns out that TASO is only applying a single substitution for each of these examples to generate the final optimized graph. This is the case even with higher values of alpha like 1.2 and a larger iteration budget. Also, I was able to verify that TASO is only loading around 130 rewrite rules from the graph_subst.pb file that is in the git repo. I was expecting this number to be closer to 700 (as mentioned in your SOSP paper). Are there any additional steps I need to run in order for more rewrite rules to be considered?

Graph partition

Hello, I have seen your video of taso on youtube. In order to accelerate the speed of your searching process, you will do graph partition at first. So is that means your subtitions result will greatly depends on the graph partition. Since resblock and inception module can be easily divided into sungraphs. But on other tasks, this process will be complicate. Will graph partition be the bottleneck of this graph substitution method? Look forward to your answer. Thank you!

xf.export_onnx error

Hi,

Thanks for the great work. I meet a problem during using the code.

To export the optimized nasrnn model, I add the three lines at the end of examples/nasrnn.py:

onnx_model = xf.export_onnx(new_graph)
onnx.checker.check_model(onnx_model)
onnx.save(onnx_model, "nasrnn_opt_xflow.onnx")

Then run the command $ python examples/nasrnn.py.
The program gives the following error message on function xf.export_onnx(new_graph):

......
        cost[Concat]: numInputs(2) cost(0.0001) total_cost(1.3592)
        cost[Matmul]: input(1:1024 1024:1) weight(1024:512 512:1) cost(0.0116) total_cost(1.3708)
        Cost metrics: exe_time(1.3708) flops(0.0196) memory_access(0.4395) kernel_launches(190)
op.guid=53520 mytype=Concat inedges=2
Traceback (most recent call last):
  File "examples/nasrnn.py", line 36, in <module>
    onnx_model = xf.export_onnx(new_graph)
  File "/home/yaoyao/anaconda3/lib/python3.7/site-packages/xflow-0.0.0-py3.7-linux-x86_64.egg/xflow/__init__.py", line 277, in export_onnx
    inputs.append(_input_tensor_name(graph, e, op))
  File "/home/yaoyao/anaconda3/lib/python3.7/site-packages/xflow-0.0.0-py3.7-linux-x86_64.egg/xflow/__init__.py", line 240, in _input_tensor_name
    return "{}{}_{}".format(mytype, op['guid'], input_weight_names[mytype][inedge['dstIdx']])
KeyError: 'Concat'

Thanks a lot if you can help!

Possible unsupported operators

Here is the list of possible unsupported operators besides StridedSlice:

NonMaxSuppressionV2
Fill
CropAndResize
TopKV2
ResizeNearestNeighbor,
Merge

I see there is TopK in onnx while don't know what's the difference yet, investigating now.
For this operator plz see: onnx/tensorflow-onnx#715
That's why I convert it as custom operator instead of the official conversion.
So does ResizesNearestNeighbor, Fill

jiazhihao / taso Goto Github PK

taso's People

Contributors

Stargazers

Watchers

Forkers

taso's Issues

script

another problem in examples/test_onnx.py

Recommend Projects

Recommend Topics

Recommend Org

another problem in `examples/test_onnx.py`