Giter VIP home page Giter VIP logo

ideep's Introduction

Intel® Optimization for Chainer*

Chainer Backend for Intel Architecture, a Chainer module providing numpy like API and DNN acceleration using MKL-DNN.

Requirements

This preview version is tested on Ubuntu 16.04, Centos 7.4 and OS X.

Minimum requirements:

  • cmake 3.0.0+
  • C++ compiler with C++11 standard support (GCC 5.3+ if you want to build tests)
  • Python 2.7.6+, 3.5.2+, 3.6.0+
  • Numpy 1.13
  • Swig 3.0.12
  • Doxygen 1.8.5
  • (optional) MPICH devel 3.2

Other requirements:

  • Testing utilities
    • Gtest
    • pytest

Installation

Install setuptools:

If you use old setuptools, upgrade it:

pip install -U setuptools

Install python package from the source code:

CentOS:

git submodule update --init && mkdir build && cd build && cmake3 ..
cd ../python
python setup.py install

Other:

git submodule update --init && mkdir build && cd build && cmake ..
cd ../python
python setup.py install

Install python package via PYPI:

pip install ideep4py

Since Python3.7 doesn't work with numpy==1.13, we built iDeep4py Python3.7 wheel based on numpy==1.16.0, remember to upgrade numpy version to 1.16.0 before install iDeep4py Python3.7 wheel. Suggest installing Python package using virtualenv to avoid installing Python packages globally which could break system tools or other projects.

Install python package via Conda:

conda install -c intel ideep4py

Install python package via Docker:

We are providing the official Docker images based on different platforms on Docker Hub.

docker pull chainer/chainer:latest-intel-python2
docker run -it chainer/chainer:latest-intel-python2 /bin/bash

Multinode support:

Non-blocking multinode data parallelism is supported. The system is requried to meet MPICH dependency and user needs to replace the cmake command in build process:

Make sure your MPI executable is in PATH:

PATH=$PATH:<path-to-mpiexec>
# use the following line when you execute cmake or cmake3
# CentOS:
cmake3 -Dmultinode=ON ..
# Other:
cmake -Dmultinode=ON ..

Execute the test:

cd total_reduce/test
mpirun -N 4 python3 test_1payload_inplace.py

The commands above will start 4 MPI processes on your machine and conduct a blocking allreduce operation among all 4 processes. To test it in a real multinode environment, compile your file and use the following commands:

cd total_reduce/test
mpirun -f <hostlist> -N 4 python3 test_1payload_inplace.py

More information

License

MIT License (see LICENSE file).

ideep's People

Contributors

caozhongz avatar delock avatar fengyuan14 avatar gujinghui avatar hongzhen1 avatar manofmountain avatar mingxiaoh avatar opencici2006 avatar penghuicheng avatar wuhuikx avatar zhouhaiy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ideep's Issues

question about which version to use

Hi, I'm trying to reproduce MLPerf/inference_results_v0.5, and I meet some trouble with ideep API.
For example, auto in_format = dataOrder_ == "NCHW" ? ideep::format::nchw : ideep::format::nhwc; , I think it has changed to format_tag. And reinit has changed to init, right?
But I don't know all correspondences. Should I use an older version ideep as third_party of pytorch and rebuild pytorch from source? I really appreciate it if you can help me.

Memory Leakage in mpool

Hi, folks,
We noticed some memory leakage when using IDEEP. Here is the top stacktrace:

==1260204==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1464320 byte(s) in 1 object(s) allocated from:
    #0 0x385e10 in posix_memalign (/data/users/yinghai/fbsource/fbcode/buck-out/dev/gen/fblearner/predictor/model/tests/caffe2_xray_memory_leak+0x385e10)
    #1 0x7f3ca066f72d in ideep::utils::scratch_allocator::mpool::malloc(unsigned long) third-party-buck/gcc-5-glibc-2.23/build/ideep/include/ideep/allocators.hpp:141
    #2 0x7f3ca066ea3b in char* ideep::utils::scratch_allocator::malloc<ideep::computation>(unsigned long) third-party-buck/gcc-5-glibc-2.23/build/ideep/include/ideep/allocators.hpp:196
    #3 0x7f3ca066dd55 in void ideep::param::init<ideep::utils::scratch_allocator, ideep::computation>(ideep::param::descriptor const&) third-party-buck/gcc-5-glibc-2.23/build/ideep/include/ideep/tensor.hp
p:551
    #4 0x7f3ca066cbac in ideep::param::reshape(std::vector<int, std::allocator<int> >) third-party-buck/gcc-5-glibc-2.23/build/ideep/include/ideep/tensor.hpp:651
    #5 0x7f3ca099a14f in caffe2::IDEEPSqueezeOp::RunOnDevice() caffe2/caffe2/ideep/operators/squeeze_op.cc:58

So the memory is allocated at

int rc = ::posix_memalign(&ptr, alignment_, len);

And I looked around, its free function doesn't really call free(). Instead, it moves blocks to free list.

void free(void *ptr) {

Am I missing something?

Doesn't expose version

There is no way to tell what the version of ideep is from the library. There should be.

No support to zero-dim tensors

Hi folks, I notice that the following code will throw error

ideep::tensor::resize(dims, itensor::data_type::f32);

if dims contains zero dimension. For example, (0, 2). This is a legit tensor shape which will emerge from rcnn use case. In fact, old MKL-ML operators support such shapes. I don't know whether this is a regression of IDEEP or MKL-DNN. Please help us take a look. Thanks.

You can use this tiny test case to reproduce the issue:
pytorch/pytorch#8459

@4pao @gujinghui

[feature request] add example code for ideep on the pytorch_dnnl{,_dev} branch

I'd like to request for a simple .cc example file using ideep on the pytorch_dnnl{,_dev} branch.

Developers of some distributions, say Debian, won't include the sources from git submodules when doing the packaging. In that sense, these developers will need a way of sanity testing to ensure that the packaged ideep indeed works with the separately packaged onednn.

Thank you very much :-)

Issue with `to_bytes(const int)`

Hey folks, I'm looking at this line:

auto len = sizeof(arg) - (__builtin_clz(arg) / 8);

Looks like it's trying to squeeze out the leading 0 bits. But I don't understand that next line is

bytestring(as_cstring, len);

where address to as_cstring starts with zeros. Can someone help me understand what it is doing? Thanks.

And BTW, __builtin_clz(0) is undefined behavior...
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

Wheels should not depend on libpythonX.Y.so.1

Currently iDeep wheels provided on PyPI depends on libpythonX.Y.so.1.
This violates the manylinux1 rule. https://www.python.org/dev/peps/pep-0513/#libpythonx-y-so-1
Moreover, it cannot be run in some environment.

Furthermore, explicit linking to libpython creates problems in the common configuration where Python is not built with --enable-shared. In particular, on Debian and Ubuntu systems, apt install pythonX.Y does not even install libpythonX.Y.so.1, meaning that any wheel that did depend on libpythonX.Y.so.1 could fail to import.

Why will there be a free in `get_mkldnn_primitive_desc_t()`

Hi folks,
We are debugging some ASAN use-after-free issue in IDEEP ops.

The offending part is

(4-byte-read-heap-use-after-free)
#0 0x7f727ea48ae2 in caffe2::Tensor::GetDeviceType() const caffe2/caffe2/core/tensor.h:177
    #1 0x7f727f0bde16 in bool caffe2::Blob::IsType<caffe2::Tensor>(caffe2::DeviceType) const caffe2/caffe2/core/blob.h:72
    #2 0x7f727f0bd81a in caffe2::CopyIDEEPToCPUOp::RunOnDevice() caffe2/caffe2/ideep/operators/utility_ops.cc:34
    #3 0x7f727ee140cc in caffe2::IDEEPOperator::Run(int) caffe2/caffe2/ideep/utils/ideep_operator.h:54
    #4 0x7f727ec60a1d in caffe2::SimpleNet::Run() caffe2/caffe2/core/net_simple.cc:63

And the memory that is being read is freed at

#0 0x43da00 in operator delete(void*) ()
    #1 0x7f727ee2b29e in ideep::param::get_mkldnn_primitive_desc_t() const ideep/include/ideep/tensor.hpp:629
    #2 0x7f727ee26049 in ideep::param::get_descriptor() const ideep/include/ideep/tensor.hpp:647
    #3 0x7f727f08c42c in void ideep::batch_normalization_forward_inference::compute<ideep::utils::allocator>(ideep::tensor const&, ideep::tensor const&, ideep::tensor const&, ideep::tensor const&, ideep::tensor const&, ideep::tensor&, float) ideep/include/ideep/computations.hpp:2595
    #4 0x7f727f08b773 in caffe2::IDEEPSpatialBNOp::RunOnDevice() caffe2/caffe2/ideep/operators/spatial_batch_norm_op.cc:38
    #5 0x7f727ee140cc in caffe2::IDEEPOperator::Run(int) caffe2/caffe2/ideep/utils/ideep_operator.h:54
    #6 0x7f727ec60a1d in caffe2::SimpleNet::Run() caffe2/caffe2/core/net_simple.cc:63

As I'm looking at the code, I don't understand why ideep::param::get_mkldnn_primitive_desc_t() const would induce a free. Any ideas?

Update MKL-DNN submodule

When building PyTorch from source with the -DUSE_MKL=ON and -DUSE_IDEEP=ON flags, the compilation of MKL-DNN fails with GCC 8 because the submodule version of MKL-DNN is too old.

A bugfix for MKL-DNN was made recently (see oneapi-src/oneDNN#283) and should solve the issue. For now, this issue can be resolved when building with the -DCMAKE_CXX_FLAGS=-Wno-format-truncation flag.

mkl-dnn version used by PyTorch causes internal compiler error when built by latest VS2019

See below:

C:\Users\circleci\project\build\win_tmp\bin\sccache-cl.exe   /TP -DDNNL_ENABLE_CONCURRENT_EXEC -DDNNL_ENABLE_MAX_CPU_ISA -DDNNL_X64=1 -DIDEEP_USE_MKL -DNOMINMAX -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_DEPRECATE=1 -D_OPENMP_NOFORCE_MANIFEST -D_WIN -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -I..\cmake\..\third_party\benchmark\include -Icaffe2\contrib\aten -I..\third_party\onnx -Ithird_party\onnx -I..\third_party\foxi -Ithird_party\foxi -I..\third_party\ideep\mkl-dnn\include -Ithird_party\ideep\mkl-dnn\include -I..\third_party\ideep\mkl-dnn\src -I..\cmake\..\third_party\googletest\googlemock\include -I..\cmake\..\third_party\googletest\googletest\include -I..\third_party\protobuf\src -Iwin_tmp\mkl\include -I..\third_party -I..\cmake\..\third_party\eigen -IC:\Jenkins\Miniconda3\include -IC:\Jenkins\Miniconda3\lib\site-packages\numpy\core\include -I..\cmake\..\third_party\pybind11\include /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -openmp:experimental -DNDEBUG -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\gemm_convolution_utils.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\ /FS -c ..\third_party\ideep\mkl-dnn\src\cpu\gemm_convolution_utils.cpp
FAILED: third_party/ideep/mkl-dnn/src/cpu/CMakeFiles/dnnl_cpu.dir/gemm_convolution_utils.cpp.obj 
C:\Users\circleci\project\build\win_tmp\bin\sccache-cl.exe   /TP -DDNNL_ENABLE_CONCURRENT_EXEC -DDNNL_ENABLE_MAX_CPU_ISA -DDNNL_X64=1 -DIDEEP_USE_MKL -DNOMINMAX -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_DEPRECATE=1 -D_OPENMP_NOFORCE_MANIFEST -D_WIN -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -I..\cmake\..\third_party\benchmark\include -Icaffe2\contrib\aten -I..\third_party\onnx -Ithird_party\onnx -I..\third_party\foxi -Ithird_party\foxi -I..\third_party\ideep\mkl-dnn\include -Ithird_party\ideep\mkl-dnn\include -I..\third_party\ideep\mkl-dnn\src -I..\cmake\..\third_party\googletest\googlemock\include -I..\cmake\..\third_party\googletest\googletest\include -I..\third_party\protobuf\src -Iwin_tmp\mkl\include -I..\third_party -I..\cmake\..\third_party\eigen -IC:\Jenkins\Miniconda3\include -IC:\Jenkins\Miniconda3\lib\site-packages\numpy\core\include -I..\cmake\..\third_party\pybind11\include /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -openmp:experimental -DNDEBUG -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\gemm_convolution_utils.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\ /FS -c ..\third_party\ideep\mkl-dnn\src\cpu\gemm_convolution_utils.cpp
C:\Users\circleci\project\third_party\ideep\mkl-dnn\src\cpu\gemm_convolution_utils.cpp(401) : fatal error C1001: Internal compiler error.
(compiler file 'd:\agent\_work\7\s\src\vctools\Compiler\Utc\src\p2\main.c', line 195)
 To work around this problem, try simplifying or changing the program near the locations listed above.
If possible please provide a repro here: https://developercommunity.visualstudio.com 
Please choose the Technical Support command on the Visual C++ 
 Help menu, or open the Technical Support help file for more information
  cl!RaiseException()+0x69
  cl!RaiseException()+0x69
  cl!CloseTypeServerPDB()+0x22e6b
  cl!CloseTypeServerPDB()+0xcd30a
Microsoft (R) C/C++ Optimizing Compiler Version 19.27.29111 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

Would it be possible to update version of MKL-DNN used by ideep:pytorch branch to include oneapi-src/oneDNN#805

OneDNN3.x support timelines

Hi, could someone please provide details on (i) timelines for OneDNN3.x support in ideep and (ii) is the ideep_dev_3.0 the development branch for the same?
Thank you!

Question: interoperability with TBB

Hi, we're using TBB in our project, in particular we're using MKL-DNN library compiled with TBB instead of OpenMP. I noticed there's explicit usage of OMP though pragmas in ideep source files, what are the implications of using TBB version of MKL-DNN with ideep?

Is a Windows build functionnal?

Hi !
I'd like to know if a windows build is functional at the moment (since it's not part of the recommended platforms).
I tried compiling and installing it by following the readme instructions, but I'm stuck when installing the package with cd ../python && py -3 setup.py install.
Intel MKL is correctly found but I'm getting LINK errors such as gemm_convolution.obj : error LNK2019: unresolved external symbol _cblas_sgemm.

I attached the full trace, so you can check MKL is found, as well as the different unresolved symbols.
trace.txt

tests break: fatal error: 'mkl_vsl.h' file not found

In file included from /usr/ports/math/ideep/work/ideep-2.0.0-119-gb57539e/include/ideep.hpp:44:
/usr/ports/math/ideep/work/ideep-2.0.0-119-gb57539e/include/ideep/computations.hpp:61:10: fatal error: 'mkl_vsl.h' file not found
#include <mkl_vsl.h>
         ^~~~~~~~~~~
1 error generated.

FreeBSD 12

ideep4py cannot be installed on OSX

I cannot install ideep4py on OSX.

I cloned the codes on master branch and followed README.md, but python setup.py install doesn't work. python setup.py install returns ideep4py/py/mm/mdarray.h:37:10: fatal error: 'forward_list' file not found, so I tried CFLAGS=-stdlib=libc++ python setup.py install.

However, another error occurs as bellow.

running install
Installing ...
CMake Warning (dev) at CMakeLists.txt:3 (project):
  Policy CMP0048 is not set: project() command manages VERSION variables.
  Run "cmake --help-policy CMP0048" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.

  The following variable(s) would be set to empty:

    CMAKE_PROJECT_VERSION
    CMAKE_PROJECT_VERSION_MAJOR
    CMAKE_PROJECT_VERSION_MINOR
    CMAKE_PROJECT_VERSION_PATCH
This warning is for project developers.  Use -Wno-dev to suppress it.

-- VTune profiling environment is unset
CMake Deprecation Warning at mkl-dnn/CMakeLists.txt:21 (cmake_policy):
  The OLD behavior for policy CMP0048 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.


CMake Deprecation Warning at mkl-dnn/CMakeLists.txt:22 (cmake_policy):
  The OLD behavior for policy CMP0054 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.


-- CMAKE_BUILD_TYPE is unset, defaulting to Release
-- Detecting Intel(R) MKL: trying mklml_intel
-- Intel(R) MKL: include /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/include
-- Intel(R) MKL: lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libmklml.dylib
-- Intel(R) MKL: OpenMP lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libiomp5.dylib
-- Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES)
-- Could NOT find OpenMP_CXX (missing: OpenMP_CXX_FLAGS OpenMP_CXX_LIB_NAMES)
-- Could NOT find OpenMP (missing: OpenMP_C_FOUND OpenMP_CXX_FOUND)
-- VTune profiling environment is unset
-- Detecting Intel(R) MKL: trying mklml_intel
-- Intel(R) MKL: include /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/include
-- Intel(R) MKL: lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libmklml.dylib
-- Intel(R) MKL: OpenMP lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libiomp5.dylib
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/nogu-atsu/ideep/build
[  0%] Linking CXX shared library libmkldnn.dylib
Undefined symbols for architecture x86_64:
  "_cblas_gemm_s8u8s32", referenced from:
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)1>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)2>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)5>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)6>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<false, (mkldnn_data_type_t)1>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<false, (mkldnn_data_type_t)2>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<false, (mkldnn_data_type_t)5>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      ...
  "_cblas_saxpy", referenced from:
      mkldnn::impl::cpu::gemm_inner_product_fwd_t<(mkldnn_data_type_t)1>::execute_forward() in gemm_inner_product.cpp.o
  "_cblas_sgemm", referenced from:
      mkldnn::impl::cpu::_gemm_convolution_fwd_t<true, false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_forward() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_convolution_fwd_t<false, false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_forward() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_convolution_bwd_data_t<false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_backward_data() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_convolution_bwd_weights_t<false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_backward_weights() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::gemm_inner_product_fwd_t<(mkldnn_data_type_t)1>::execute_forward() in gemm_inner_product.cpp.o
      mkldnn::impl::cpu::gemm_inner_product_bwd_data_t<(mkldnn_data_type_t)1>::execute_backward_data() in gemm_inner_product.cpp.o
      mkldnn::impl::cpu::gemm_inner_product_bwd_weights_t<(mkldnn_data_type_t)1>::execute_backward_weights() in gemm_inner_product.cpp.o
      ...
  "_cblas_sgemm_alloc", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
  "_cblas_sgemm_compute", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::packed_gemm(int, int, int, int, int, int, int, int, int, float const*, float*, float*, bool, float) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::packed_gemm(int, int, int, int, int, int, int, int, int, float const*, float*, float*, bool, float) in ref_rnn.cpp.o
  "_cblas_sgemm_free", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::free_packed_weights(int, int, float**) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::free_packed_weights(int, int, float**) in ref_rnn.cpp.o
  "_cblas_sgemm_pack", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
  "_cblas_sscal", referenced from:
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::execute_forward_dense() in ref_softmax.cpp.o
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::_scal(int, float, float*) in ref_softmax.cpp.o
  "_vsExp", referenced from:
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::execute_forward_dense() in ref_softmax.cpp.o
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::_exp(int, float const*, float*) in ref_softmax.cpp.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [mkl-dnn/src/libmkldnn.0.14.0.dylib] Error 1
make[1]: *** [mkl-dnn/src/CMakeFiles/mkldnn.dir/all] Error 2
make: *** [all] Error 2
running build
Installing ...
CMake Warning (dev) at CMakeLists.txt:3 (project):
  Policy CMP0048 is not set: project() command manages VERSION variables.
  Run "cmake --help-policy CMP0048" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.

  The following variable(s) would be set to empty:

    CMAKE_PROJECT_VERSION
    CMAKE_PROJECT_VERSION_MAJOR
    CMAKE_PROJECT_VERSION_MINOR
    CMAKE_PROJECT_VERSION_PATCH
This warning is for project developers.  Use -Wno-dev to suppress it.

-- VTune profiling environment is unset
CMake Deprecation Warning at mkl-dnn/CMakeLists.txt:21 (cmake_policy):
  The OLD behavior for policy CMP0048 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.


CMake Deprecation Warning at mkl-dnn/CMakeLists.txt:22 (cmake_policy):
  The OLD behavior for policy CMP0054 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.


-- CMAKE_BUILD_TYPE is unset, defaulting to Release
-- Detecting Intel(R) MKL: trying mklml_intel
-- Intel(R) MKL: include /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/include
-- Intel(R) MKL: lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libmklml.dylib
-- Intel(R) MKL: OpenMP lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libiomp5.dylib
-- Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES)
-- Could NOT find OpenMP_CXX (missing: OpenMP_CXX_FLAGS OpenMP_CXX_LIB_NAMES)
-- Could NOT find OpenMP (missing: OpenMP_C_FOUND OpenMP_CXX_FOUND)
-- VTune profiling environment is unset
-- Detecting Intel(R) MKL: trying mklml_intel
-- Intel(R) MKL: include /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/include
-- Intel(R) MKL: lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libmklml.dylib
-- Intel(R) MKL: OpenMP lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libiomp5.dylib
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/nogu-atsu/ideep/build
[  0%] Linking CXX shared library libmkldnn.dylib
Undefined symbols for architecture x86_64:
  "_cblas_gemm_s8u8s32", referenced from:
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)1>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)2>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)5>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)6>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<false, (mkldnn_data_type_t)1>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<false, (mkldnn_data_type_t)2>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<false, (mkldnn_data_type_t)5>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      ...
  "_cblas_saxpy", referenced from:
      mkldnn::impl::cpu::gemm_inner_product_fwd_t<(mkldnn_data_type_t)1>::execute_forward() in gemm_inner_product.cpp.o
  "_cblas_sgemm", referenced from:
      mkldnn::impl::cpu::_gemm_convolution_fwd_t<true, false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_forward() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_convolution_fwd_t<false, false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_forward() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_convolution_bwd_data_t<false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_backward_data() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_convolution_bwd_weights_t<false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_backward_weights() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::gemm_inner_product_fwd_t<(mkldnn_data_type_t)1>::execute_forward() in gemm_inner_product.cpp.o
      mkldnn::impl::cpu::gemm_inner_product_bwd_data_t<(mkldnn_data_type_t)1>::execute_backward_data() in gemm_inner_product.cpp.o
      mkldnn::impl::cpu::gemm_inner_product_bwd_weights_t<(mkldnn_data_type_t)1>::execute_backward_weights() in gemm_inner_product.cpp.o
      ...
  "_cblas_sgemm_alloc", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
  "_cblas_sgemm_compute", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::packed_gemm(int, int, int, int, int, int, int, int, int, float const*, float*, float*, bool, float) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::packed_gemm(int, int, int, int, int, int, int, int, int, float const*, float*, float*, bool, float) in ref_rnn.cpp.o
  "_cblas_sgemm_free", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::free_packed_weights(int, int, float**) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::free_packed_weights(int, int, float**) in ref_rnn.cpp.o
  "_cblas_sgemm_pack", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
  "_cblas_sscal", referenced from:
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::execute_forward_dense() in ref_softmax.cpp.o
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::_scal(int, float, float*) in ref_softmax.cpp.o
  "_vsExp", referenced from:
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::execute_forward_dense() in ref_softmax.cpp.o
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::_exp(int, float const*, float*) in ref_softmax.cpp.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [mkl-dnn/src/libmkldnn.0.14.0.dylib] Error 1
make[1]: *** [mkl-dnn/src/CMakeFiles/mkldnn.dir/all] Error 2
make: *** [all] Error 2
running build_py
running build_ext
Installing ...
CMake Warning (dev) at CMakeLists.txt:3 (project):
  Policy CMP0048 is not set: project() command manages VERSION variables.
  Run "cmake --help-policy CMP0048" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.

  The following variable(s) would be set to empty:

    CMAKE_PROJECT_VERSION
    CMAKE_PROJECT_VERSION_MAJOR
    CMAKE_PROJECT_VERSION_MINOR
    CMAKE_PROJECT_VERSION_PATCH
This warning is for project developers.  Use -Wno-dev to suppress it.

-- VTune profiling environment is unset
CMake Deprecation Warning at mkl-dnn/CMakeLists.txt:21 (cmake_policy):
  The OLD behavior for policy CMP0048 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.


CMake Deprecation Warning at mkl-dnn/CMakeLists.txt:22 (cmake_policy):
  The OLD behavior for policy CMP0054 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.


-- CMAKE_BUILD_TYPE is unset, defaulting to Release
-- Detecting Intel(R) MKL: trying mklml_intel
-- Intel(R) MKL: include /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/include
-- Intel(R) MKL: lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libmklml.dylib
-- Intel(R) MKL: OpenMP lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libiomp5.dylib
-- Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES)
-- Could NOT find OpenMP_CXX (missing: OpenMP_CXX_FLAGS OpenMP_CXX_LIB_NAMES)
-- Could NOT find OpenMP (missing: OpenMP_C_FOUND OpenMP_CXX_FOUND)
-- VTune profiling environment is unset
-- Detecting Intel(R) MKL: trying mklml_intel
-- Intel(R) MKL: include /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/include
-- Intel(R) MKL: lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libmklml.dylib
-- Intel(R) MKL: OpenMP lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libiomp5.dylib
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/nogu-atsu/ideep/build
[  0%] Linking CXX shared library libmkldnn.dylib
Undefined symbols for architecture x86_64:
  "_cblas_gemm_s8u8s32", referenced from:
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)1>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)2>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)5>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)6>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<false, (mkldnn_data_type_t)1>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<false, (mkldnn_data_type_t)2>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<false, (mkldnn_data_type_t)5>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      ...
  "_cblas_saxpy", referenced from:
      mkldnn::impl::cpu::gemm_inner_product_fwd_t<(mkldnn_data_type_t)1>::execute_forward() in gemm_inner_product.cpp.o
  "_cblas_sgemm", referenced from:
      mkldnn::impl::cpu::_gemm_convolution_fwd_t<true, false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_forward() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_convolution_fwd_t<false, false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_forward() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_convolution_bwd_data_t<false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_backward_data() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_convolution_bwd_weights_t<false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_backward_weights() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::gemm_inner_product_fwd_t<(mkldnn_data_type_t)1>::execute_forward() in gemm_inner_product.cpp.o
      mkldnn::impl::cpu::gemm_inner_product_bwd_data_t<(mkldnn_data_type_t)1>::execute_backward_data() in gemm_inner_product.cpp.o
      mkldnn::impl::cpu::gemm_inner_product_bwd_weights_t<(mkldnn_data_type_t)1>::execute_backward_weights() in gemm_inner_product.cpp.o
      ...
  "_cblas_sgemm_alloc", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
  "_cblas_sgemm_compute", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::packed_gemm(int, int, int, int, int, int, int, int, int, float const*, float*, float*, bool, float) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::packed_gemm(int, int, int, int, int, int, int, int, int, float const*, float*, float*, bool, float) in ref_rnn.cpp.o
  "_cblas_sgemm_free", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::free_packed_weights(int, int, float**) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::free_packed_weights(int, int, float**) in ref_rnn.cpp.o
  "_cblas_sgemm_pack", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
  "_cblas_sscal", referenced from:
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::execute_forward_dense() in ref_softmax.cpp.o
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::_scal(int, float, float*) in ref_softmax.cpp.o
  "_vsExp", referenced from:
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::execute_forward_dense() in ref_softmax.cpp.o
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::_exp(int, float const*, float*) in ref_softmax.cpp.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [mkl-dnn/src/libmkldnn.0.14.0.dylib] Error 1
make[1]: *** [mkl-dnn/src/CMakeFiles/mkldnn.dir/all] Error 2
make: *** [all] Error 2
building 'ideep4py._ideep4py' extension
swigging ideep4py/py/ideep4py.i to ideep4py/py/ideep4py_wrap.cpp
swig -python -c++ -builtin -modern -modernargs -Iideep4py/py/mm -Iideep4py/py/primitives -Iideep4py/py/swig_utils -o ideep4py/py/ideep4py_wrap.cpp ideep4py/py/ideep4py.i
gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -stdlib=libc++ -Iideep4py/include -Iideep4py/include/mklml -Iideep4py/include/ideep -Iideep4py/py/mm -Iideep4py/py/primitives -I/Users/nogu-atsu/.pyenv/versions/anaconda3-4.3.1/include/python3.6m -I/Users/nogu-atsu/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/numpy/core/include -c ideep4py/py/ideep4py_wrap.cpp -o build/temp.macosx-10.7-x86_64-3.6/ideep4py/py/ideep4py_wrap.o -std=c++11 -Wno-unknown-pragmas -march=native -mtune=native -D_TENSOR_MEM_ALIGNMENT_=4096
In file included from ideep4py/py/ideep4py_wrap.cpp:3920:
ideep4py/py/mm/mdarray.h:41:10: fatal error: 'ideep.hpp' file not found
#include "ideep.hpp"
         ^~~~~~~~~~~
1 error generated.
error: command 'gcc' failed with exit status 1

How can I solve this?

question: mkl-dnn upgrade for PyTorch

Hi Intel ideep team, thanks for the awesome works of integrating mkl-dnn with PyTorch.😃

I noticed that the pytorch_dnnl branch is used for such integration, while the latest commit shows the ideep leverages the 1.5 version of mkl-dnn.

$ cd /root/pytorch/third_party/ideep
$ git log
commit 938cc68897bb46b8d4b228966edd9e23e471cf3b (HEAD, origin/pytorch_dnnl)
Author: pinzhenx <[email protected]>
Date:   Tue Jun 16 18:54:21 2020 +0000

    bump onednn to v1.5

Our application runs PyTorch on Intel CPU architecture, and we relies on mkl-dnn to gain better performances.

We want to know when will ideep upgrades mkl-dnn for PyToch, do you have a timetable for doing so?

What's the relationship between IDEEP and MKLDNN

Hi folks,
I haven't dug very deep on either IDEEP or MKLDNN codebase yet. Just the high level question, what is the relationship between IDEEP and MKLDNN? If IDEEP just a wrapper around MKLDNN primitives to make writing DNN operators easier? Thanks.

which ideep branch is meant for pytorch integration?

Hi, I see several branches in ideep with pytorch reference; for example I see pytorch-internal and pytorch_dnnl
Currently I have this PR for pytorch-internal, and I'm looking for getting this into PyTorch 1.13. Please let me know if this is not the correct branch and appreciate any feedback on the PR. thank you!

Recent force push means all submodule pointers broke

Recently ideep repository had a force push from https://github.com/pytorch/ideep/commit/fb1adc449de61b56e92f8a81e02b91c068209f47 to 526cf81

This generally makes downstream users grumpy because it means that their submodule pointers stop working with:

fatal: reference is not a tree: fb1adc449de61b56e92f8a81e02b91c068209f47
Unable to checkout 'fb1adc449de61b56e92f8a81e02b91c068209f47' in submodule path 
'third_party/ideep'

Generally, it's better to avoid force push. But if there is no other option, I'd recommend keeping the old commits around using a tag.

Big performance penalty due to repeated reordering of weights

We noticed that IDEEP spends quite a bit time doing reordering of tensors. This problem is especially obvious when input size is small (e.g. batch_size=1). After some profiling, I noticed that during our Caffe2 run, most of the time is spent redordering tensors, weight tensors in particular. This does not make sense and should be done only once, as suggested in the example of MKL-DNN (https://github.com/intel/mkl-dnn/blob/master/examples/simple_net.cpp#L798). Hopefully, we can resolve this issue as it's killing the performance and make it noncompetitive against the MKLML ops.

Code of interest:

if (src.get_descriptor() != comp.expected_src_descriptor()) {
src_in.init<alloc, convolution_forward>(
comp.expected_src_descriptor());
reorder::compute(src, src_in);
}
if (weights.get_descriptor() != comp.expected_weights_descriptor()) {
weights_in.init<alloc, convolution_forward>(
comp.expected_weights_descriptor());
reorder::compute(weights, weights_in);
}

@gujinghui @jgong5

question: project status?

ideep contains an ancient copy of mkl-dnn (0.x version), while the latest verson is onednn (1.X, or 2.Xbeta). What's the status of this project? and do you have plan to upgrade the embedded copy of onednn?

I'm asking this because I'm preparing packages for official Debian archive, and I don't want to deal with inactive projects.
pytorch/pytorch#37332

Thanks in advance :-)

How to install ideep4py on Windows?

I cannot install ideep4py on Windows.

Is it possible to install ideep4py on Windows?
If is it possible, could you tell me how to install ideep4py on Windows?
Intel MKL is already in windows.

Can you give me advice?

Build ideep with installed mkl-dnn?

Hi, I just wonder if I can build ideep with an already-installed mkl-dnn?
Currently, I've been obtaining the following error message if I didn't git clone --recursive ?

CMake Error at cmake/mkldnn.cmake:16 (add_subdirectory):
The source directory

....../ideep/mkl-dnn

does not contain a CMakeLists.txt file.
Call Stack (most recent call first):
CMakeLists.txt:13 (include)

I prefer building ideep with the newest mkl-dnn which has already been installed?

Cheerrs
Pei

How to use Conv+Sum+Relu fusion?

Hi, folks,
I'm trying to make sense of this constraint for ConvFusion (iattr::residue) op:
https://github.com/pytorch/pytorch/blob/150af6ac1eaedf8aa2ca2a1ca9938bfb3d24d1c5/caffe2/ideep/operators/conv_fusion_op.cc#L144

Basically, what it means is that we need to overwrite the input S with output. How does this even work in inference? Because after one run, the value of my S has changed. And subsequent run will observe different weights. It doesn't look correct to me. Any thoughts? @4pao

Make the Windows build functional

Hi! I tried to compile ideep with MSVC, but met the following kinds of issues:

  1. symbol annotation : attribute((visibility("default"))) -> __dllspec(dllexport) / __dllspec(dllimport)
  2. unsupported OpenMP clauses: #ifndef _MSC_VER
  3. unsupported C99 VLA in MSVC: int a[c]; -> int* a = new int[c];
  4. inline assembly code

I'm able to resolve the first three kinds of issues but then blocked by the last one because I have little knowledge about assembly code. I really think that you should support ideep on Windows because it is heavily used in deep learning libraries like PyTorch [1] and so on. Without that support, even if MKLDNN itself is supported on Windows, we could not use them actually in those frameworks. Doesn't that sound a little bit weird? What's more, the work won't take too much time because I was able to solve issue kind 1-3 in only 3-4 hrs.

References:
[1] pytorch/pytorch#15982

Any possibility to implement ChannelShuffle op?

Hi folks,
Shufflenet is a pretty popular model (https://arxiv.org/abs/1707.01083). And we tried to run in on IDEEP ops. The only thing that MKLDNN doesn't support is the ChannelShuffle primitive, so we ended up doing fallback to our our implementation and do the context switch. We noticed that MKLDNN's depthwise CNN really helps in this model but the performance is hampered by the context switch, which manifests as reordering of inputs at the conv op. Since channel shuffle is conceptually similar to reorder, is there any possibility that we can add support to that? Here is a reference implementation of ChannelShuffle just to get the idea of what it is: https://github.com/pytorch/pytorch/blob/master/caffe2/operators/channel_shuffle_op.h#L14-L64

Conceptually, you can think if as splitting the channel dimension (c) into two dimension g and k (c = g * k) and then transpose g and k and merge it back to c.

oneDNN compatibility?

It looks DNNL is now oneDNN. Will ideep be upgraded to be compatible with oneDNN ?

Thank you...

file INSTALL cannot find "/opt/intel/mkl/include/../lib/libmklml_intel.so"

  1. Almost there but still failed to install ideep. Where is mklml ?
Make Error at cmake_install.cmake:45 (file):
  file INSTALL cannot find "/opt/intel/mkl/include/../lib/libmklml_intel.so".


Makefile:88: recipe for target 'install' failed
make: *** [install] Error 1
building 'ideep4py._ideep4py' extension
swigging ideep4py/py/ideep4py.i to ideep4py/py/ideep4py_wrap.cpp
swig -python -c++ -builtin -modern -modernargs -Iideep4py/py/mm -Iideep4py/py/primitives -Iideep4py/py/swig_utils -o ideep4py/py/ideep4py_wrap.cpp ideep4py/py/ideep4py.i
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fdebug-prefix-map=/build/python3.6-EKG1lX/python3.6-3.6.5=. -specs=/usr/share/dpkg/no-pie-compile.specs -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -Iideep4py/include -Iideep4py/include/mklml -Iideep4py/include/ideep -Iideep4py/py/mm -Iideep4py/py/primitives -I/usr/include/python3.6m -I~/.local/lib/python3.6/site-packages/numpy/core/include -c ideep4py/py/ideep4py_wrap.cpp -o build/temp.linux-x86_64-3.6/ideep4py/py/ideep4py_wrap.o -std=c++11 -Wno-unknown-pragmas -march=native -mtune=native -D_TENSOR_MEM_ALIGNMENT_=4096 -fopenmp
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from ideep4py/py/ideep4py_wrap.cpp:3920:0:
ideep4py/py/mm/mdarray.h:41:10: fatal error: ideep.hpp: No such file or directory
 #include "ideep.hpp"
          ^~~~~~~~~~~
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
  1. And if I carry out the installation from within out-of-source folder build by
    sudo make install

it's so weird that the installation of ideep will remove mkl symbolic link always.

Any further suggestions?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.