ornl-qci / exatn Goto Github PK
View Code? Open in Web Editor NEWHierarchical Tensor Networks at Exascale
License: BSD 3-Clause "New" or "Revised" License
Hierarchical Tensor Networks at Exascale
License: BSD 3-Clause "New" or "Revised" License
Right now we have tensor runtime depending on the library exatn-runtime-graph, which provides tensor_exec_state.cpp. This runtime-graph library is installed in plugins, because it also provides graph implementation plugins.
We need to separate tensor_graph.hpp, tensor_exec_state.* from the specific graph impl plugins. The former should be installed in lib/ and the latter should be installed in plugins.
When I build exatn with mpich, it builds with mpich fine. However, make install still sets runtime path to openmpi which is probably provided by automatic finder in CMAKE. We need to make sure that CMAKE no longer finds MPI on its own but is using the MPI provided by the user.
D = Q7 * Q8, where
Q7 and Q8 reside on {0,1},
D resides on 0.
Process 0 replicates Q7 and Q8 from process 1, but D is local.
Current storage domain rule: Tensor operand residence domains must coincide with the execution domain.
Corrected storage domain rule: (1) The execution domain must have access to complete tensor operands; (2) The residence domain of the output tensors must not exceed the execution domain.
We need to update tpls/CMakeLists.txt to provide a more robust build of the ExaTensor submodule.
After following the prescribed steps, I get the following build error after running the following prompt:
FC=gfortran-9 CC=gcc-9 CXX=g++-9 cmake .. -DEXATN_BUILD_TESTS=TRUE -DPYTHON_INCLUDE_DIR=$(python3 -c "import sysconfig; print(sysconfig.get_paths()['platinclude'])")
-- The C compiler identification is GNU 9.1.0
-- The CXX compiler identification is GNU 9.1.0
-- Checking whether C compiler has -isysroot
-- Checking whether C compiler has -isysroot - yes
-- Checking whether C compiler supports OSX deployment target flag
-- Checking whether C compiler supports OSX deployment target flag - yes
-- Check for working C compiler: /usr/local/bin/gcc-9
-- Check for working C compiler: /usr/local/bin/gcc-9 -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Checking whether CXX compiler has -isysroot
-- Checking whether CXX compiler has -isysroot - yes
-- Checking whether CXX compiler supports OSX deployment target flag
-- Checking whether CXX compiler supports OSX deployment target flag - yes
-- Check for working CXX compiler: /usr/local/bin/g++-9
-- Check for working CXX compiler: /usr/local/bin/g++-9 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at CMakeLists.txt:11 (exatn_configure_library_rpath):
Unknown CMake command "exatn_configure_library_rpath".
CMake Warning (dev) in CMakeLists.txt:
No cmake_minimum_required command is present. A line of code such as
cmake_minimum_required(VERSION 3.14)
should be added at the top of the file. The version specified may be lower
if you wish to support older CMake versions for this project. For more
information run "cmake --help-policy CMP0000".
This warning is for project developers. Use -Wno-dev to suppress it.
-- Configuring incomplete, errors occurred!
a = np.array([
[1., 0, 0],
[0., 1, 1]
])
b = np.array([
[1., 0, 3, 0],
[1, 1, 2, 2],
[-1, 1, -2, 2],
])
exatn.createTensor('C1', [2, 4], 0.)
exatn.createTensor('A1', np.array(a, copy=True)) # copy to prevent exatn from changing the array data
exatn.createTensor('B1', np.array(b, copy=True))
exatn.evaluateTensorNetwork('test', 'C1(a, c) = A1(a, b) * B1(b, c)')
c1_exatn = exatn.getLocalTensor('C1')
assert np.allclose(c1_exatn, np.dot(a, b)) # raises assertion error
"""
The result should be
[[1 0 3 0],
[0 2 0 4]]
Easily checkable by hand for above example matrices
"""
The data received back from exatn by exatn.getLocalTensor(name)
match the original array.
exatn.initTensorRnd
exatn.createTensor('Xr', [2, 3], 0)
exatn.createTensor('Yr', [3, 4], 0)
exatn.initTensorRnd('Xr')
exatn.initTensorRnd('Yr')
xr = exatn.getTensorData('Xr')
yr = exatn.getTensorData('Yr')
zr = np.dot(xr, yr)
exatn.createTensor('Zr', [2, 4], 0)
exatn.evaluateTensorNetwork('rnd', 'Zr(a, b) = Xr(a, c) * Yr(c, b)')
zr_exatn = exatn.getTensorData('Zr')
assert np.allclose(zr_exatn, zr), "Results do not match!" #Raises as well
Having found this issue, I tried to understand what are the numbers that exatn returns - they were correct for scalar product of vectors and somewhat correct for matrix-vector multiplication.
I noticed that for square matrices, when doing A*B
exatn returns B*A
.
However, it turns out that it's more complicated - rectangular matrices do not result in index mismatch, but produce completely different numbers.
Turns out exatn re-arranges data in an interesting way - it applies a function S
to each input, and S^-1
to output as defined below.
Continuing from the previous code snippet:
def S(x):
""" Reshape to reverse-ordered shape,
then transpose to match the original shape
"""
x = x.reshape(*reversed(x.shape))
x = x.transpose() # reverses order of axis by default
return x
def inv_S(x):
""" Inverse of S. """
x = x.transpose() # reverses order of axis by default
x = x.reshape(*reversed(x.shape))
return x
adj_zr = inv_S(np.dot(S(xr), S(yr)))
assert np.allclose(zr_exatn, adj_zr) # Will NOT raise!
>>> S(np.array([
[1, 2, 3, 4],
[5, 6, 7, 8]
]))
array([[1, 3, 5, 7],
[2, 4, 6, 8]])
The same issue is observed when contracting larger tensor networks.
x, y, z = [np.random.randn(*sh) for sh in [
(2, 3),
(3, 2, 2),
(2, 2, 2)
]] # generate some random data
# %%
exatn.createTensor('X', np.array(x, copy=True))
exatn.createTensor('Y', np.array(y, copy=True))
exatn.createTensor('Z', np.array(z, copy=True))
tn = exatn.TensorNetwork('test2')
tn.appendTensor(1, 'X')
tn.appendTensor(2, 'Y', [(1, 0)])
tn.appendTensor(3, 'Z', [(1, 0), (2,1)])
tn.printIt()
# This is equivalent to exatn.evaluateTensorNetwork('test2', 'F0(a,b) = X(a,c) * Y(c,d,e) * Z(d,e,b)')
exatn.evaluate(tn)
result = tn.getTensor(0)
result_name = result.getName()
result_data = exatn.getTensorData(result_name)
### Compare results to einsum
einsum_data = np.einsum('ij,jkl,klm->im', x, y, z)
assert np.allclose(einsum_data, result_data) # Raises
### Fix numerics with reshape-transform
einsum_data_adj = inv_S(np.einsum('ij,jkl,klm->im', S(x), S(y), S(z)))
assert np.allclose(einsum_data_adj, result_data) # Does not raise!
Note that S(A)*S(B)
produces completely different numbers from A*B
- it is not just an issue of indexing/shaping convention. To obtain A*B
from exatn one has to perform S(S^-1(A)*S^-1(B))
.
All the tests and examples in python folder use tensors that are fixed poins for S
. They are either constant-valued (simple.py, hamiltonian.py...) or diagonal and square (quantum_circuit_network.py, large_circuit.py...) and hence do not change when reshaped+transposed.
I looked through C++ tests in src/exatn/tests/NumServerTester.cpp
and it looks like the same problem is there as well, for example on line 1332 again all tensors are square and diagonal.
What implications does this have on cache efficiency?
Exatn was built from latest devel branch (commit 7c722c5)
More detailed examples at https://github.com/danlkv/QTensor/blob/dev/scratchpad/exatn/exatn_demo.py
The ExaTN build with cuTensor has a bug on the ExaTN side. However, the ExaTN build with cuTensorNet and cuTensor works correctly via cuTensorNet.
I'm trying to build exatn so I can use it in conjunction with tnqvm, but I've hit a couple of issues.
First, I had to comment out lines 17-31 in tpls/CMakeLists.txt, where it does git submodule update --init --recursive
, as git complains that this should be run in a top-level directory and gives a non-zero return value. Given the advice in the README to run git submodule update --init --recursive
in the correct place, I suspect this part of the cmake is redundant anyway.
Second, I can get the libraries to build, but fail at runtime with an error while loading shared libraries, specifically libtalsh. In the link line during build, talsh is linked with a full path to its relative location, but this is no longer valid once moved to the install directory. I can fix it by manually relinking with -L<path-to-install-directory>/lib -ltalsh
, but I'm hoping for a CMake solution...
All help appreciated!
Relevant extract from ldd libexatn.so
:
../../../installdir/lib/libtalsh.so => not found
../../../../installdir/lib/libtalsh.so => not found
../../../installdir/lib/libtalsh.so => not found
../../../installdir/lib/libtalsh.so => not found
I'd like to create an executable called exatn-config that downstream users can leverage in standard Makefile builds. For downstream CMake users, we will have them just use exatn::exatn targets, but for Makefile users we need to be able to provide the include headers and link line that they can integrate in their makefiles. My inspiration for this is the llvm-config executable.
Users should be able to run commands like this
$ exatn-config --libs (to return the link line)
$ exatn-config --includes (to get the include paths)
$ exatn-config --ldflags (to get any linker flags)
$ exatn-config --cxxflags (to get all cxx flags)
I think we could do this pretty easily with a python script. We could reference necessary CMake variables in that script and use CMake configure_command() to write those concrete variables.
Setting CMAKE_INSTALL_PREFIX
installs all libraries as expected however this causes tests to fail at runtime with the following error:
[service-registry] Could not open plugin directory.
Could not find service with name eager-dag-executor. Perhaps the service is not Identifiable.
#ERROR(exatn::service): Invalid ExaTN service: eager-dag-executor in the Service Registry.
Could not find service with name boost-digraph. Perhaps the service is not Identifiable.
#ERROR(exatn::service): Invalid ExaTN service: boost-digraph in the Service Registry.
[login2:32431] *** Process received signal ***
[login2:32431] Signal: Segmentation fault (11)
[login2:32431] Signal code: Address not mapped (1)
[login2:32431] Failing at address: (nil)
This is due to exatn::initialize()
calling serviceRegistry->initialize
with no arguments, which in turn causes the exatnPluginPath
to be set to $HOME/.exatn/plugins
by default.
Trying to run cmake for ExaTn using Intel's C/C++/Fortran compilers indicates that they are unsupported, then throws an error.
OS Information:
Linux version 3.10.0-957.10.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Thu Feb 7 07:12:53 UTC 2019
(base) build$ CC=icc CXX=icpc FC=ifort cmake .. -DEXATN_BUILD_TESTS=TRUE -DCMAKE_BUILD_TYPE=Release \
-DPATH_INTEL_ROOT=$MKLROOT/.. -DBLAS_LIB=MKL -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc \
-DCMAKE_Fortran_COMPILER=ifort -DCMAKE_INSTALL_PREFIX=~/.exatn
-- The CXX compiler identification is Intel 19.1.0.20191121
-- The Fortran compiler identification is Intel 19.1.0.20191121
-- Check for working CXX compiler: /soft/compilers/intel-2020/compilers_and_libraries_2020.0.166/linux/bin/intel64/icpc
-- Check for working CXX compiler: /soft/compilers/intel-2020/compilers_and_libraries_2020.0.166/linux/bin/intel64/icpc -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working Fortran compiler: /soft/compilers/intel-2020/compilers_and_libraries_2020.0.166/linux/bin/intel64/ifort
-- Check for working Fortran compiler: /soft/compilers/intel-2020/compilers_and_libraries_2020.0.166/linux/bin/intel64/ifort -- works
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Checking whether /soft/compilers/intel-2020/compilers_and_libraries_2020.0.166/linux/bin/intel64/ifort supports Fortran 90
-- Checking whether /soft/compilers/intel-2020/compilers_and_libraries_2020.0.166/linux/bin/intel64/ifort supports Fortran 90 -- yes
-- Found OpenMP_CXX: -qopenmp (found version "5.0")
-- Found OpenMP_Fortran: -qopenmp (found version "5.0")
-- Found OpenMP: TRUE (found version "5.0")
-- Found MPI_CXX: /home/cibrahim/anaconda3/lib/libmpi.so (found version "3.1")
-- Found MPI_Fortran: /home/cibrahim/anaconda3/lib/libmpi_usempif08.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- MPIRUN: /home/cibrahim/anaconda3/bin/mpiexec
-- The C compiler identification is Intel 19.1.0.20191121
-- Check for working C compiler: /soft/compilers/intel-2020/compilers_and_libraries_2020.0.166/linux/bin/intel64/icc
-- Check for working C compiler: /soft/compilers/intel-2020/compilers_and_libraries_2020.0.166/linux/bin/intel64/icc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
CMake Warning at tpls/cppmicroservices/CMakeLists.txt:108 (message):
You are using an unsupported compiler! Compilation has only been tested
with Clang (Linux or Apple), GCC and MSVC.
CMake Error at tpls/cppmicroservices/CMakeLists.txt:111 (if):
if given arguments:
"CMAKE_CXX_COMPILER_VERSION" "AND" "CMAKE_CXX_COMPILER_VERSION" "VERSION_LESS"
Unknown arguments specified
-- Configuring incomplete, errors occurred!
See also "/home/cibrahim/exatn/build/CMakeFiles/CMakeOutput.log".
This method currently only re-orders links, but not the TensorShape and TensorSignature. It also needs output tensor reset.
ExaTN needs to install TAL-SH headers as well.
I believe you need to update to if(NOT EXISTS "${BLAS_PATH}/libblas.so")
For some reason, ExaTN's CMAKE is still discovering MPI (see below). Didn't we turn that off to make sure the MPI is determined by the user?
-- Found MPI_CXX: /usr/local/mpi/openmpi/git/lib/libmpi.so (found version "3.1")
-- Found MPI_Fortran: /usr/local/mpi/openmpi/git/lib/libmpi_usempif08.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- Found CUDA: /usr/local/cuda (found version "10.0")
-- MPIRUN: /usr/local/mpi/openmpi/git/bin/mpiexec
src/runtime/tests/TensorRuntimeTester
terminate called after throwing an instance of 'std::runtime_error'
what(): Bundle#5 start failed: libtalsh.so: cannot open shared object file: No such file or directory
Aborted (core dumped)
Hi dear authors, when I tried to build the project, an error occurs at the final stage.
My build instructions: CC=gcc CXX=g++ FC=gfortran cmake .. -DCMAKE_BUILD_TYPE=Release -DEXATN_BUILD_TESTS=TRUE -DBLAS_LIB=ATLAS -DBLAS_PATH=/usr/lib/x86_64-linux-gnu/blas
The error information:
Building CXX object src/parser/syntax_handler/CMakeFiles/taprol-syntax-handler.dir/taprol_syntax_handler.cpp.o
/home/lhp/exatn/src/parser/syntax_handler/taprol_syntax_handler.cpp:16:50: error: expected class-name before ‘{’ token
16 | class TaProlSyntaxHandler : public SyntaxHandler {
| ^
/home/lhp/exatn/src/parser/syntax_handler/taprol_syntax_handler.cpp: In constructor ‘{anonymous}::TaProlSyntaxHandler::TaProlSyntaxHandler()’:
/home/lhp/exatn/src/parser/syntax_handler/taprol_syntax_handler.cpp:18:27: error: class ‘{anonymous}::TaProlSyntaxHandler’ does not have any field named ‘SyntaxHandler’
18 | TaProlSyntaxHandler() : SyntaxHandler("taprol") {}
| ^~~~~~~~~~~~~
/home/lhp/exatn/src/parser/syntax_handler/taprol_syntax_handler.cpp: In member function ‘void {anonymous}::TaProlSyntaxHandler::GetReplacement(clang::Preprocessor&, clang::Declarator&, clang::CachedTokens&, llvm::raw_string_ostream&)’:
/home/lhp/exatn/src/parser/syntax_handler/taprol_syntax_handler.cpp:44:11: error: ‘getDeclText’ was not declared in this scope
44 | OS << getDeclText(PP, D) << "{\n";
| ^~~~~~~~~~~
/home/lhp/exatn/src/parser/syntax_handler/taprol_syntax_handler.cpp: At global scope:
/home/lhp/exatn/src/parser/syntax_handler/taprol_syntax_handler.cpp:62:8: error: ‘SyntaxHandlerRegistry’ does not name a type
62 | static SyntaxHandlerRegistry::Add<TaProlSyntaxHandler>
| ^~~~~~~~~~~~~~~~~~~~~
make[2]: *** [src/parser/syntax_handler/CMakeFiles/taprol-syntax-handler.dir/build.make:76: src/parser/syntax_handler/CMakeFiles/taprol-syntax-handler.dir/taprol_syntax_handler.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:2334: src/parser/syntax_handler/CMakeFiles/taprol-syntax-handler.dir/all] Error 2
make: *** [Makefile:146: all] Error 2
It seems that just editing taprol_syntax_handler.cpp
file can save this problem, but I couldn't find how to fix it.
Thank you so much.
Leverage pybind11 to provide a Python API for the numerics package data structures
exatn::sync executed on exatn::TensorOperation uses its Id assigned during the operation submission. This Id refers to a local DAG node. Two potential problems: (1) Attempt to sync a tensor operation that has never been submitted will result in a SegFault since the Id is not set yet; (2) The global exatn::sync is expected to destroy the current DAG, so a later attempt to access its node will result in a SegFault. A proper way to sync a tensor operation is to sync every output operand.
On branch devel: After I have updated TensorMethod interface, which is now called TensorFunctor, two tests (NumServerTester, TensorRuntimeTester) began to fail with that below. I have no idea why it is complaining now as I see no relation to CPPMicroservices in my changes. Note that I built the MPI version of ExaTN (gcc/8.2.0, mpich/3.2.1).
terminate called after throwing an instance of 'std::runtime_error'
what(): Bundle#3 start failed: libantlr4-runtime.so.SOVERSION: cannot open shared object file: No such file or directory
Aborted
It looks like when one has two or more Activators in the same folder, they create multiple definition problem during linking. Right now the devel branch fails because there are two activators in the runtime/executor folder, specifically NodeExecutorActivator and GraphExecutorActivator. It turns out both create the same activator ctor and dtor, resulting in multiple definition problem during linking of exatn-runtime. Right now I deactivated both Activators in the runtime/executor/CMakeLists.txt. This needs to be fixed and we need to document explicitly how to properly use Activators in the code.
Excuse me,when I run the.py file in python/examples/,I got a fail.
For example,I type the python contraction.py command to run contraction.py.
The result is as follows:
A1 shape: (2, 3)
B1 shape: (3, 4)
Traceback (most recent call last):
File "contraction.py", line 37, in
test_exatn()
File "contraction.py", line 18, in test_exatn
exatn.createTensor('C1', [2, 4], 0.0)
AttributeError: module 'exatn' has no attribute 'createTensor'
What should I do?
Hi, i try to run python/tests/test-tensorRuntime.py
and get the following error:
File "test-tensorRuntime.py", line 23, in <module>
create_tensor1.setTensorOperand(tensor1)
TypeError: setTensorOperand(): incompatible function arguments. The following argument types are supported:
1. (self: exatn._pyexatn.TensorOperation, arg0: exatn::numerics::Tensor, arg1: bool) -> None
Invoked with: <exatn._pyexatn.TensorOpCreate object at 0x7f272d4af3b0>, <exatn._pyexatn.Tensor object at 0x7f272d4af2d0>
changing line 20 from create_tensor0.setTensorOperand(tensor0)
to create_tensor0.setTensorOperand(tensor0, True)
helps: same error is raised on next line.
compiled with this:
cmake .. -DEXATN_BUILD_TESTS=TRUE -DPYTHON_INCLUDE_DIR=(python
-c "import sysconfig; print(sysconfig.get_paths()['platinclude'])") -DENABLE_CUDA=True
In src/runtime/tensor_runtime.cpp (lines 10-11), the TensorRuntime constructor is supposed to created an instance of TensorGraphExecutor and TensorNodeExecutor via the call to exatn::getService<> from CPP Microservices. However it results in SegFault in all CTEST tests, which makes me believe something got screwed up again with the CPP Microservices usage. The lines 10-11 are currently commented out, but we need to uncomment them and fix the SegFaults.
Linking to BLAS libraries required by ExaTENSOR/TALSH is a mess because there is no standard way of doing so. Below are the details how different BLAS libraries should be linked, namely which specific environment variables the ExaTENSOR/TALSH Makefile expects for each particular BLAS library. It is highly unlikely we can import this from CMAKE, so I would recommend to require the user to explicitly provide the necessary PATHs for the chosen BLAS library (BLAS_LIB) and then add the necessary linked libraries for each choice inside CMakeLists.txt
ATLAS (any default Linux/Mac BLAS):
BLASLIB=ATLAS
PATH_BLAS_ATLAS (where libblas.so is)
Linked libraries: libblas
MKL (Intel MKL):
BLASLIB=MKL
PATH_INTEL (intel root directory, for example /opt/intel)
Linked libraries (GNU compiler): libmkl_intel_lp64, libmkl_gnu_thread, libmkl_core, libpthread, libm, libdl
Linked libraries (Intel compiler): libmkl_intel_lp64, libmkl_intel_thread, libmkl_core, libpthread, libm, libdl, libiomp5
ACML:
BLASLIB=ACML
PATH_BLAS_ACML (where libacml_mp.so is)
Linked libraries: libacml_mp
ESSL (IBM ESSL on Summit):
BLASLIB=ESSL
PATH_BLAS_ESSL (where libessl.so is)
PATH_IBM_XL_CPP (where IBM XL C++ libraries are)
PATH_IBM_XL_FOR (where IBM XL Fortran libraries are)
PATH_IBM_XL_SMP (where IBM XL SMP libraries are)
Linked libraries: libessl, libxlf90_r, libxlfmath
Recently (branch devel) I decided to use talsh::Tensor instead of TensorDenseBlock in tensor_method.hpp (inside TensorMethod interface which is now called TensorFunctor). Consequently, I went inside exatn-py.cpp and replaced TensorDenseBlock with talsh::Tensor, which necessitated inclusion of talshxx.hpp, which in turn includes talsh.h, whic is a C header with many symbols in GLOBAL scope. This C part did not build in exatn-py.cpp. So I got rid of it. But now it does not look like I can refer to talsh::Tensor in the Python bindings (see commented block there). What would a solution be here?
Branch devel. After gluing together exatn::numerics and exatn::runtime it looks like CPP microservices exatn::getService does not work (it does not discover tensor runtime services). Three test fail with the same error.
The build fails under GCC-13 since the implicit inclusion of std headers has changed. See https://gcc.gnu.org/gcc-13/porting_to.html
The pull request is up on ORNL-QCI/CppMicroServices#1
This repo needs to update to the then merged version
Sometimes it occurs that the memory manager is running out of its RAM buffer space, even CPU only. That is, the index splitting algorithm does not accurately model the RAM buffer fragmentation. In this case, one can fall back to a regular RAM allocator on Host.
When merging two tensor networks via TensorNetwork::appendTensorNetwork() and TensorNetwork::appendTensorNetworkGate(), we need to enable adjustment of tensor ids in the secondary (appended) tensor network such that all tensors in the resulting tensor network will have unique distinct ids regardless of which ids they had in the secondary tensor network.
Also, the combined tensor network name rules need to be clarified.
We need a separate API for retrieving services from CppMicroServices
The path ${CUQUANTUM_PATH}/lib64/
may not be valid for all cuQuantum
installations.
e.g., when I unzip the latest beta version, the path is lib
not lib64
.
When running this simple ExaTn benchmark, ExaTn throws an error and becomes unresponsive.
Python seems to be trying to use an anaconda MKL library, despite ExaTn being built with a specific, separate MKL installation.
Output:
(base) $ ipython simple.py
#DEBUG(exatn::runtime::TensorRuntime)[MAIN_THREAD]: DAG executor set to lazy-dag-executor + talsh-node-executor
#DEBUG(exatn::runtime::TalshNodeExecutor): TAL-SH initialized with Host buffer size of 1072693248 bytes
INTEL MKL ERROR: /home/.../anaconda3/lib/libmkl_avx.so: undefined symbol: mkl_sparse_optimize_bsr_trsm_i8.
Intel MKL FATAL ERROR: Cannot load libmkl_avx.so or libmkl_def.so.
ExaTn CMake Configuration:
cmake .. -DEXATN_BUILD_TESTS=TRUE -DCMAKE_BUILD_TYPE=Release -DPATH_INTEL_ROOT=$MKLROOT/.. -DBLAS_LIB=MKL -DCMAKE_INSTALL_PREFIX=~/.exatn
OS Version Information:
Linux version 3.10.0-957.10.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Thu Feb 7 07:12:53 UTC 2019
The ExaTN Driver-rpc test (client + server) is failing because it cannot find service HamiltonianTest for some reason:
~/src/exatn/build_mkl_cuda_openmpi_debug/src/driver-rpc/mpi/tests$ mpiexec -n 1 ./server_test : -n 1 ./client_test
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from client_test
[ RUN ] client_test.checkSimple
Could not find service with name HamiltonianTest. Perhaps the service is not Identifiable.
#ERROR(exatn::service): Invalid ExaTN service: HamiltonianTest in the Service Registry.
client_test: /home/div/src/exatn/src/exatn/./exatn_service.hpp:28: std::shared_ptr<_Tp> exatn::getService(const string&) [with Service = talsh::TensorFunctorexatn::Identifiable; std::__cxx11::string = std::__cxx11::basic_string]: Assertion `false' failed.
[exadesktop:11617] *** Process received signal ***
[exadesktop:11617] Signal: Aborted (6)
[exadesktop:11617] Signal code: (-6)
[mpi-server] starting server at port name 3961454594.0:3924421586
[exadesktop:11617] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f70ff98f890]
[exadesktop:11617] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f70f903ee97]
[exadesktop:11617] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f70f9040801]
[exadesktop:11617] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x3039a)[0x7f70f903039a]
[exadesktop:11617] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x30412)[0x7f70f9030412]
[exadesktop:11617] [ 5] ./client_test[0x408d60]
[exadesktop:11617] [ 6] ./client_test[0x407a59]
[exadesktop:11617] [ 7] /home/div/src/exatn/build_mkl_cuda_openmpi_debug/lib/libgtestd.so(_ZN7testing8internal38HandleSehExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc+0x65)[0x7f70f9a3d955]
[exadesktop:11617] [ 8] /home/div/src/exatn/build_mkl_cuda_openmpi_debug/lib/libgtestd.so(_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc+0x5a)[0x7f70f9a37745]
[exadesktop:11617] [ 9] /home/div/src/exatn/build_mkl_cuda_openmpi_debug/lib/libgtestd.so(_ZN7testing4Test3RunEv+0xee)[0x7f70f9a16c88]
[exadesktop:11617] [10] /home/div/src/exatn/build_mkl_cuda_openmpi_debug/lib/libgtestd.so(_ZN7testing8TestInfo3RunEv+0x10f)[0x7f70f9a17557]
[exadesktop:11617] [11] /home/div/src/exatn/build_mkl_cuda_openmpi_debug/lib/libgtestd.so(_ZN7testing8TestCase3RunEv+0x107)[0x7f70f9a17bc9]
[exadesktop:11617] [12] /home/div/src/exatn/build_mkl_cuda_openmpi_debug/lib/libgtestd.so(_ZN7testing8internal12UnitTestImpl11RunAllTestsEv+0x2a9)[0x7f70f9a227f9]
[exadesktop:11617] [13] /home/div/src/exatn/build_mkl_cuda_openmpi_debug/lib/libgtestd.so(_ZN7testing8internal38HandleSehExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc+0x65)[0x7f70f9a3e966]
[exadesktop:11617] [14] /home/div/src/exatn/build_mkl_cuda_openmpi_debug/lib/libgtestd.so(_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc+0x5a)[0x7f70f9a38553]
[exadesktop:11617] [15] /home/div/src/exatn/build_mkl_cuda_openmpi_debug/lib/libgtestd.so(_ZN7testing8UnitTest3RunEv+0xba)[0x7f70f9a213b4]
[exadesktop:11617] [16] ./client_test(_Z13RUN_ALL_TESTSv+0x11)[0x4089e0]
[exadesktop:11617] [17] ./client_test(main+0x3a)[0x408406]
[exadesktop:11617] [18] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f70f9021b97]
[exadesktop:11617] [19] ./client_test[0x40792a]
[exadesktop:11617] *** End of error message ***
When building with MPI, the ExaTensor build now fails with
make[4]: Entering directory '/home/cades/dev/exatn/tpls/ExaTensor/DDSS'
/usr/bin/gfortran ./OBJ/main.o libddss.a -lgomp -L/usr/lib/openmpi/lib -L. -L. -lstdc++ -o test_ddss.x
libddss.a(service_mpi.o): In function `__service_mpi_MOD_quit':
service_mpi.F90:(.text+0x633): undefined reference to `mpi_wtime_'
service_mpi.F90:(.text+0x83a): undefined reference to `mpi_finalize_'
service_mpi.F90:(.text+0x8dd): undefined reference to `mpi_abort_'
libddss.a(service_mpi.o): In function `__service_mpi_MOD_dil_global_comm_barrier':
service_mpi.F90:(.text+0x9c8): undefined reference to `mpi_barrier_'
service_mpi.F90:(.text+0x9f7): undefined reference to `mpi_allreduce_'
service_mpi.F90:(.text+0xa43): undefined reference to `mpi_barrier_'
service_mpi.F90:(.text+0xa72): undefined reference to `mpi_allreduce_'
...
If I go into that directory and run
mpifort ./OBJ/main.o libddss.a -lgomp -L/usr/lib/openmpi/lib -L. -L. -lstdc++ -o test_ddss.x
it compiles.
It appears that the new
ifeq ($(EXATN_SERVICE),YES)
FCOMP = $(CMAKE_Fortran_COMPILER)
else
FCOMP = $(COMP_PREF) $(FC_$(WRAP))
endif
calls in the Makefiles need to also take into account if MPI_LIB is NONE and then use the appropriate $(PATH_$(MPI_LIB)_BIN)/mpi{cc,cxx,fort}
call.
Hi
I try to build exatn and run into an undefined reference error.
See the link to the gist for logs and all the versions in cmake
-- The CXX compiler identification is GNU 10.1.0
-- The Fortran compiler identification is GNU 10.1.0
⟩ uname -a
Linux archlinux 5.6.15-arch1-1 #1 SMP PREEMPT Wed, 27 May 2020 23:42:26 +0000 x86_64 GNU/Linux
https://gist.github.com/DaniloZZZ/e391f25e2c8a908d7f7fe533a2134de4
I do have mpi installed and looks like I'm having the issue described in #28 .
However, I have the same issue with the same errors on another Windows WSL machine (Ubuntu 20.04), which does not have MPI and Cmake doesn't find it.
I just pulled a fresh version of the repo on my mac. I configured with
FC=gfortran-8 CXX=g++-8 CC=gcc-8 cmake .. -DEXATN_BUILD_TESTS=TRUE
and ran make, and observed in the ExaTensor build
cd /Users/aqw/exatn/build && /usr/local/lib/python3.7/site-packages/cmake/data/CMake.app/Contents/bin/cmake -E cmake_depends "Unix Makefiles" /Users/aqw/exatn /Users/aqw/exatn/tpls/boost-cmake /Users/aqw/exatn/build /Users/aqw/exatn/build/tpls/boost-cmake /Users/aqw/exatn/build/tpls/boost-cmake/CMakeFiles/Boost_regex.dir/DependInfo.cmake --color=
cd /Users/aqw/exatn/tpls/ExaTensor && /usr/local/lib/python3.7/site-packages/cmake/data/CMake.app/Contents/bin/cmake -E env CPP_GNU=/usr/local/bin/g++-8 CC_GNU=/usr/local/bin/gcc-8 FC_GNU=/usr/local/bin/gfortran-8 GPU_CUDA=NOCUDA MPILIB=NONE BLASLIB=NONE EXA_NO_BUILD=NO PATH_NONE= EXA_TALSH_ONLY=YES EXATN_SERVICE=YES EXA_OS=NO_LINUX make
Dependee "/Users/aqw/exatn/build/tpls/boost-cmake/CMakeFiles/Boost_regex.dir/DependInfo.cmake" is newer than depender "/Users/aqw/exatn/build/tpls/boost-cmake/CMakeFiles/Boost_regex.dir/depend.internal".
Dependee "/Users/aqw/exatn/build/tpls/boost-cmake/CMakeFiles/CMakeDirectoryInformation.cmake" is newer than depender "/Users/aqw/exatn/build/tpls/boost-cmake/CMakeFiles/Boost_regex.dir/depend.internal".
Scanning dependencies of target Boost_regex
/Applications/Xcode.app/Contents/Developer/usr/bin/make -C ./TALSH
g++ -I. -I. -I. -c -O3 -fopenmp -DNO_GPU -DNO_AMD -DNO_PHI -DNO_BLAS -DNO_LINUX -fPIC -DEXATN_SERVICE -std=c++11 timer.cpp -o ./OBJ/timer.o
clang: error: unsupported option '-fopenmp'
make[4]: *** [OBJ/timer.o] Error 1
make[3]: *** [ExaTensor] Error 2
make[2]: *** [tpls/CMakeFiles/exatensor-build] Error 2
make[1]: *** [tpls/CMakeFiles/exatensor-build.dir/all] Error 2
even though CPP_GNU was set in tpls/CMakeLists.txt, it did not map down to TALSH/Makefile. I modifed CPP_GNU = g++ in TALSH/Makefile to ?= g++ and it worked.
Python numpy::array assumes the row-major storage layout by default whereas ExaTN tensors are stored in the column-major way. The current Pybind11 interface does not account for that and produces invalid numPy arrays, so it is not usable at all right now. The affected functions are initTensorData, where the supplied C++ std::vector assumes column-major storage, and getLocalTensor, where the returned C++ local tensor copy assumes column-major storage. In both cases those are currently mapped to row-major numpy::arrays, which is invalid. We need to make sure that the incoming/outgoing numpy::arrays are constructed as column-major (Python allows any striding). Do you know how to do this (how to make all numpy::arrays column-major in our pybind11 interface)?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.