ornl-qci / exatn Goto Github PK

Hierarchical Tensor Networks at Exascale

License: BSD 3-Clause "New" or "Revised" License

CMake 1.89% Dockerfile 0.07% C++ 94.84% Python 0.15% Shell 0.01% ANTLR 0.04% Makefile 0.01% C 0.67% Cuda 2.33%

exatn's Issues

CPPMicroservices fail two tests

On branch devel: After I have updated TensorMethod interface, which is now called TensorFunctor, two tests (NumServerTester, TensorRuntimeTester) began to fail with that below. I have no idea why it is complaining now as I see no relation to CPPMicroservices in my changes. Note that I built the MPI version of ExaTN (gcc/8.2.0, mpich/3.2.1).

terminate called after throwing an instance of 'std::runtime_error'
what(): Bundle#3 start failed: libantlr4-runtime.so.SOVERSION: cannot open shared object file: No such file or directory
Aborted

ExaTn Python Example Throws Undefined Symbol error.

When running this simple ExaTn benchmark, ExaTn throws an error and becomes unresponsive.

Python seems to be trying to use an anaconda MKL library, despite ExaTn being built with a specific, separate MKL installation.

Output:

(base) $ ipython simple.py
#DEBUG(exatn::runtime::TensorRuntime)[MAIN_THREAD]: DAG executor set to lazy-dag-executor + talsh-node-executor
#DEBUG(exatn::runtime::TalshNodeExecutor): TAL-SH initialized with Host buffer size of 1072693248 bytes
INTEL MKL ERROR: /home/.../anaconda3/lib/libmkl_avx.so: undefined symbol: mkl_sparse_optimize_bsr_trsm_i8.
Intel MKL FATAL ERROR: Cannot load libmkl_avx.so or libmkl_def.so.

ExaTn CMake Configuration:

cmake ..  -DEXATN_BUILD_TESTS=TRUE -DCMAKE_BUILD_TYPE=Release -DPATH_INTEL_ROOT=$MKLROOT/.. -DBLAS_LIB=MKL -DCMAKE_INSTALL_PREFIX=~/.exatn

OS Version Information:
Linux version 3.10.0-957.10.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Thu Feb 7 07:12:53 UTC 2019

exatn::sync(TensorOperation) is inconsistent

exatn::sync executed on exatn::TensorOperation uses its Id assigned during the operation submission. This Id refers to a local DAG node. Two potential problems: (1) Attempt to sync a tensor operation that has never been submitted will result in a SegFault since the Id is not set yet; (2) The global exatn::sync is expected to destroy the current DAG, so a later attempt to access its node will result in a SegFault. A proper way to sync a tensor operation is to sync every output operand.

CPP Microservices SegFault

In src/runtime/tensor_runtime.cpp (lines 10-11), the TensorRuntime constructor is supposed to created an instance of TensorGraphExecutor and TensorNodeExecutor via the call to exatn::getService<> from CPP Microservices. However it results in SegFault in all CTEST tests, which makes me believe something got screwed up again with the CPP Microservices usage. The lines 10-11 are currently commented out, but we need to uncomment them and fix the SegFaults.

Numerics Python API

Leverage pybind11 to provide a Python API for the numerics package data structures

Separate exatn:: API from exatn_service:: API

We need a separate API for retrieving services from CppMicroServices

Error thrown when trying to build ExaTn using Intel compilers.

Trying to run cmake for ExaTn using Intel's C/C++/Fortran compilers indicates that they are unsupported, then throws an error.

OS Information:
Linux version 3.10.0-957.10.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Thu Feb 7 07:12:53 UTC 2019

(base) build$ CC=icc CXX=icpc FC=ifort cmake ..  -DEXATN_BUILD_TESTS=TRUE -DCMAKE_BUILD_TYPE=Release \
-DPATH_INTEL_ROOT=$MKLROOT/.. -DBLAS_LIB=MKL -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc \
-DCMAKE_Fortran_COMPILER=ifort -DCMAKE_INSTALL_PREFIX=~/.exatn
                                                                                                                
-- The CXX compiler identification is Intel 19.1.0.20191121
-- The Fortran compiler identification is Intel 19.1.0.20191121
-- Check for working CXX compiler: /soft/compilers/intel-2020/compilers_and_libraries_2020.0.166/linux/bin/intel64/icpc
-- Check for working CXX compiler: /soft/compilers/intel-2020/compilers_and_libraries_2020.0.166/linux/bin/intel64/icpc -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working Fortran compiler: /soft/compilers/intel-2020/compilers_and_libraries_2020.0.166/linux/bin/intel64/ifort
-- Check for working Fortran compiler: /soft/compilers/intel-2020/compilers_and_libraries_2020.0.166/linux/bin/intel64/ifort  -- works
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Checking whether /soft/compilers/intel-2020/compilers_and_libraries_2020.0.166/linux/bin/intel64/ifort supports Fortran 90
-- Checking whether /soft/compilers/intel-2020/compilers_and_libraries_2020.0.166/linux/bin/intel64/ifort supports Fortran 90 -- yes
-- Found OpenMP_CXX: -qopenmp (found version "5.0") 
-- Found OpenMP_Fortran: -qopenmp (found version "5.0") 
-- Found OpenMP: TRUE (found version "5.0")  
-- Found MPI_CXX: /home/cibrahim/anaconda3/lib/libmpi.so (found version "3.1") 
-- Found MPI_Fortran: /home/cibrahim/anaconda3/lib/libmpi_usempif08.so (found version "3.1") 
-- Found MPI: TRUE (found version "3.1")  
-- MPIRUN: /home/cibrahim/anaconda3/bin/mpiexec
-- The C compiler identification is Intel 19.1.0.20191121
-- Check for working C compiler: /soft/compilers/intel-2020/compilers_and_libraries_2020.0.166/linux/bin/intel64/icc
-- Check for working C compiler: /soft/compilers/intel-2020/compilers_and_libraries_2020.0.166/linux/bin/intel64/icc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
CMake Warning at tpls/cppmicroservices/CMakeLists.txt:108 (message):
  You are using an unsupported compiler! Compilation has only been tested
  with Clang (Linux or Apple), GCC and MSVC.


CMake Error at tpls/cppmicroservices/CMakeLists.txt:111 (if):
  if given arguments:

    "CMAKE_CXX_COMPILER_VERSION" "AND" "CMAKE_CXX_COMPILER_VERSION" "VERSION_LESS"

  Unknown arguments specified


-- Configuring incomplete, errors occurred!
See also "/home/cibrahim/exatn/build/CMakeFiles/CMakeOutput.log".

ExaTN CMAKE MPI

For some reason, ExaTN's CMAKE is still discovering MPI (see below). Didn't we turn that off to make sure the MPI is determined by the user?

-- Found MPI_CXX: /usr/local/mpi/openmpi/git/lib/libmpi.so (found version "3.1")
-- Found MPI_Fortran: /usr/local/mpi/openmpi/git/lib/libmpi_usempif08.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- Found CUDA: /usr/local/cuda (found version "10.0")
-- MPIRUN: /usr/local/mpi/openmpi/git/bin/mpiexec

Python bindings

Recently (branch devel) I decided to use talsh::Tensor instead of TensorDenseBlock in tensor_method.hpp (inside TensorMethod interface which is now called TensorFunctor). Consequently, I went inside exatn-py.cpp and replaced TensorDenseBlock with talsh::Tensor, which necessitated inclusion of talshxx.hpp, which in turn includes talsh.h, whic is a C header with many symbols in GLOBAL scope. This C part did not build in exatn-py.cpp. So I got rid of it. But now it does not look like I can refer to talsh::Tensor in the Python bindings (see commented block there). What would a solution be here?

TensorRuntime test failure

src/runtime/tests/TensorRuntimeTester
terminate called after throwing an instance of 'std::runtime_error'
what(): Bundle#5 start failed: libtalsh.so: cannot open shared object file: No such file or directory
Aborted (core dumped)

error thrown when building

Hi dear authors, when I tried to build the project, an error occurs at the final stage.

My build instructions: CC=gcc CXX=g++ FC=gfortran cmake .. -DCMAKE_BUILD_TYPE=Release -DEXATN_BUILD_TESTS=TRUE -DBLAS_LIB=ATLAS -DBLAS_PATH=/usr/lib/x86_64-linux-gnu/blas

The error information:

Building CXX object src/parser/syntax_handler/CMakeFiles/taprol-syntax-handler.dir/taprol_syntax_handler.cpp.o
/home/lhp/exatn/src/parser/syntax_handler/taprol_syntax_handler.cpp:16:50: error: expected class-name before ‘{’ token
   16 | class TaProlSyntaxHandler : public SyntaxHandler {
      |                                                  ^
/home/lhp/exatn/src/parser/syntax_handler/taprol_syntax_handler.cpp: In constructor ‘{anonymous}::TaProlSyntaxHandler::TaProlSyntaxHandler()’:
/home/lhp/exatn/src/parser/syntax_handler/taprol_syntax_handler.cpp:18:27: error: class ‘{anonymous}::TaProlSyntaxHandler’ does not have any field named ‘SyntaxHandler’
   18 |   TaProlSyntaxHandler() : SyntaxHandler("taprol") {}
      |                           ^~~~~~~~~~~~~
/home/lhp/exatn/src/parser/syntax_handler/taprol_syntax_handler.cpp: In member function ‘void {anonymous}::TaProlSyntaxHandler::GetReplacement(clang::Preprocessor&, clang::Declarator&, clang::CachedTokens&, llvm::raw_string_ostream&)’:
/home/lhp/exatn/src/parser/syntax_handler/taprol_syntax_handler.cpp:44:11: error: ‘getDeclText’ was not declared in this scope
   44 |     OS << getDeclText(PP, D) << "{\n";
      |           ^~~~~~~~~~~
/home/lhp/exatn/src/parser/syntax_handler/taprol_syntax_handler.cpp: At global scope:
/home/lhp/exatn/src/parser/syntax_handler/taprol_syntax_handler.cpp:62:8: error: ‘SyntaxHandlerRegistry’ does not name a type
   62 | static SyntaxHandlerRegistry::Add<TaProlSyntaxHandler>
      |        ^~~~~~~~~~~~~~~~~~~~~
make[2]: *** [src/parser/syntax_handler/CMakeFiles/taprol-syntax-handler.dir/build.make:76: src/parser/syntax_handler/CMakeFiles/taprol-syntax-handler.dir/taprol_syntax_handler.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:2334: src/parser/syntax_handler/CMakeFiles/taprol-syntax-handler.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

It seems that just editing taprol_syntax_handler.cpp file can save this problem, but I couldn't find how to fix it.
Thank you so much.

TAL-SH headers not installed

ExaTN needs to install TAL-SH headers as well.

Tensor network merging

When merging two tensor networks via TensorNetwork::appendTensorNetwork() and TensorNetwork::appendTensorNetworkGate(), we need to enable adjustment of tensor ids in the secondary (appended) tensor network such that all tensors in the resulting tensor network will have unique distinct ids regardless of which ids they had in the secondary tensor network.
Also, the combined tensor network name rules need to be clarified.

How do I How do I run the.py file in python/examples/？

Excuse me,when I run the.py file in python/examples/,I got a fail.
For example,I type the python contraction.py command to run contraction.py.
The result is as follows:
A1 shape: (2, 3)
B1 shape: (3, 4)
Traceback (most recent call last):
File "contraction.py", line 37, in
test_exatn()
File "contraction.py", line 18, in test_exatn
exatn.createTensor('C1', [2, 4], 0.0)
AttributeError: module 'exatn' has no attribute 'createTensor'

What should I do?

exatn::getService failure

Branch devel. After gluing together exatn::numerics and exatn::runtime it looks like CPP microservices exatn::getService does not work (it does not discover tensor runtime services). Three test fail with the same error.

Build problems

I'm trying to build exatn so I can use it in conjunction with tnqvm, but I've hit a couple of issues.

First, I had to comment out lines 17-31 in tpls/CMakeLists.txt, where it does git submodule update --init --recursive, as git complains that this should be run in a top-level directory and gives a non-zero return value. Given the advice in the README to run git submodule update --init --recursive in the correct place, I suspect this part of the cmake is redundant anyway.

Second, I can get the libraries to build, but fail at runtime with an error while loading shared libraries, specifically libtalsh. In the link line during build, talsh is linked with a full path to its relative location, but this is no longer valid once moved to the install directory. I can fix it by manually relinking with -L<path-to-install-directory>/lib -ltalsh, but I'm hoping for a CMake solution...

All help appreciated!

Relevant extract from ldd libexatn.so:

        ../../../installdir/lib/libtalsh.so => not found
        ../../../../installdir/lib/libtalsh.so => not found
        ../../../installdir/lib/libtalsh.so => not found
        ../../../installdir/lib/libtalsh.so => not found

Fail to build: undefined reference to OMP_

I try to build exatn and run into an undefined reference error.
See the link to the gist for logs and all the versions in cmake

-- The CXX compiler identification is GNU 10.1.0
-- The Fortran compiler identification is GNU 10.1.0

⟩ uname -a
Linux archlinux 5.6.15-arch1-1 #1 SMP PREEMPT Wed, 27 May 2020 23:42:26 +0000 x86_64 GNU/Linux

https://gist.github.com/DaniloZZZ/e391f25e2c8a908d7f7fe533a2134de4

I do have mpi installed and looks like I'm having the issue described in #28 .

However, I have the same issue with the same errors on another Windows WSL machine (Ubuntu 20.04), which does not have MPI and Cmake doesn't find it.

CPP MicroServices failure

It looks like when one has two or more Activators in the same folder, they create multiple definition problem during linking. Right now the devel branch fails because there are two activators in the runtime/executor folder, specifically NodeExecutorActivator and GraphExecutorActivator. It turns out both create the same activator ctor and dtor, resulting in multiple definition problem during linking of exatn-runtime. Right now I deactivated both Activators in the runtime/executor/CMakeLists.txt. This needs to be fixed and we need to document explicitly how to properly use Activators in the code.

TensorNetwork::reoderOutputModes()

This method currently only re-orders links, but not the TensorShape and TensorSignature. It also needs output tensor reset.

Mac OS X Build error -fopenmp, used g++ instead of CMAKE_CXX_COMPILER

I just pulled a fresh version of the repo on my mac. I configured with

FC=gfortran-8 CXX=g++-8 CC=gcc-8 cmake .. -DEXATN_BUILD_TESTS=TRUE

and ran make, and observed in the ExaTensor build

cd /Users/aqw/exatn/build && /usr/local/lib/python3.7/site-packages/cmake/data/CMake.app/Contents/bin/cmake -E cmake_depends "Unix Makefiles" /Users/aqw/exatn /Users/aqw/exatn/tpls/boost-cmake /Users/aqw/exatn/build /Users/aqw/exatn/build/tpls/boost-cmake /Users/aqw/exatn/build/tpls/boost-cmake/CMakeFiles/Boost_regex.dir/DependInfo.cmake --color=
cd /Users/aqw/exatn/tpls/ExaTensor && /usr/local/lib/python3.7/site-packages/cmake/data/CMake.app/Contents/bin/cmake -E env CPP_GNU=/usr/local/bin/g++-8 CC_GNU=/usr/local/bin/gcc-8 FC_GNU=/usr/local/bin/gfortran-8 GPU_CUDA=NOCUDA MPILIB=NONE BLASLIB=NONE EXA_NO_BUILD=NO PATH_NONE= EXA_TALSH_ONLY=YES EXATN_SERVICE=YES EXA_OS=NO_LINUX make
Dependee "/Users/aqw/exatn/build/tpls/boost-cmake/CMakeFiles/Boost_regex.dir/DependInfo.cmake" is newer than depender "/Users/aqw/exatn/build/tpls/boost-cmake/CMakeFiles/Boost_regex.dir/depend.internal".
Dependee "/Users/aqw/exatn/build/tpls/boost-cmake/CMakeFiles/CMakeDirectoryInformation.cmake" is newer than depender "/Users/aqw/exatn/build/tpls/boost-cmake/CMakeFiles/Boost_regex.dir/depend.internal".
Scanning dependencies of target Boost_regex
/Applications/Xcode.app/Contents/Developer/usr/bin/make -C ./TALSH
g++ -I. -I. -I. -c -O3 -fopenmp -DNO_GPU -DNO_AMD -DNO_PHI -DNO_BLAS -DNO_LINUX -fPIC -DEXATN_SERVICE -std=c++11 timer.cpp -o ./OBJ/timer.o
clang: error: unsupported option '-fopenmp'
make[4]: *** [OBJ/timer.o] Error 1
make[3]: *** [ExaTensor] Error 2
make[2]: *** [tpls/CMakeFiles/exatensor-build] Error 2
make[1]: *** [tpls/CMakeFiles/exatensor-build.dir/all] Error 2

even though CPP_GNU was set in tpls/CMakeLists.txt, it did not map down to TALSH/Makefile. I modifed CPP_GNU = g++ in TALSH/Makefile to ?= g++ and it worked.

ExaTENSOR BLAS libraries

Linking to BLAS libraries required by ExaTENSOR/TALSH is a mess because there is no standard way of doing so. Below are the details how different BLAS libraries should be linked, namely which specific environment variables the ExaTENSOR/TALSH Makefile expects for each particular BLAS library. It is highly unlikely we can import this from CMAKE, so I would recommend to require the user to explicitly provide the necessary PATHs for the chosen BLAS library (BLAS_LIB) and then add the necessary linked libraries for each choice inside CMakeLists.txt

ATLAS (any default Linux/Mac BLAS):
BLASLIB=ATLAS
PATH_BLAS_ATLAS (where libblas.so is)
Linked libraries: libblas

MKL (Intel MKL):
BLASLIB=MKL
PATH_INTEL (intel root directory, for example /opt/intel)
Linked libraries (GNU compiler): libmkl_intel_lp64, libmkl_gnu_thread, libmkl_core, libpthread, libm, libdl
Linked libraries (Intel compiler): libmkl_intel_lp64, libmkl_intel_thread, libmkl_core, libpthread, libm, libdl, libiomp5

ACML:
BLASLIB=ACML
PATH_BLAS_ACML (where libacml_mp.so is)
Linked libraries: libacml_mp

ESSL (IBM ESSL on Summit):
BLASLIB=ESSL
PATH_BLAS_ESSL (where libessl.so is)
PATH_IBM_XL_CPP (where IBM XL C++ libraries are)
PATH_IBM_XL_FOR (where IBM XL Fortran libraries are)
PATH_IBM_XL_SMP (where IBM XL SMP libraries are)
Linked libraries: libessl, libxlf90_r, libxlfmath

Running out of RAM buffer space

Sometimes it occurs that the memory manager is running out of its RAM buffer space, even CPU only. That is, the index splitting algorithm does not accurately model the RAM buffer fragmentation. In this case, one can fall back to a regular RAM allocator on Host.

ExaTN driver-rpc test is failing

The ExaTN Driver-rpc test (client + server) is failing because it cannot find service HamiltonianTest for some reason:
~/src/exatn/build_mkl_cuda_openmpi_debug/src/driver-rpc/mpi/tests$ mpiexec -n 1 ./server_test : -n 1 ./client_test
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from client_test
[ RUN ] client_test.checkSimple
Could not find service with name HamiltonianTest. Perhaps the service is not Identifiable.
#ERROR(exatn::service): Invalid ExaTN service: HamiltonianTest in the Service Registry.
client_test: /home/div/src/exatn/src/exatn/./exatn_service.hpp:28: std::shared_ptr<_Tp> exatn::getService(const string&) [with Service = talsh::TensorFunctorexatn::Identifiable; std::__cxx11::string = std::__cxx11::basic_string]: Assertion `false' failed.
[exadesktop:11617] *** Process received signal ***
[exadesktop:11617] Signal: Aborted (6)
[exadesktop:11617] Signal code: (-6)
[mpi-server] starting server at port name 3961454594.0:3924421586
[exadesktop:11617] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f70ff98f890]
[exadesktop:11617] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f70f903ee97]
[exadesktop:11617] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f70f9040801]
[exadesktop:11617] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x3039a)[0x7f70f903039a]
[exadesktop:11617] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x30412)[0x7f70f9030412]
[exadesktop:11617] [ 5] ./client_test[0x408d60]
[exadesktop:11617] [ 6] ./client_test[0x407a59]
[exadesktop:11617] [ 7] /home/div/src/exatn/build_mkl_cuda_openmpi_debug/lib/libgtestd.so(_ZN7testing8internal38HandleSehExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc+0x65)[0x7f70f9a3d955]
[exadesktop:11617] [ 8] /home/div/src/exatn/build_mkl_cuda_openmpi_debug/lib/libgtestd.so(_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc+0x5a)[0x7f70f9a37745]
[exadesktop:11617] [ 9] /home/div/src/exatn/build_mkl_cuda_openmpi_debug/lib/libgtestd.so(_ZN7testing4Test3RunEv+0xee)[0x7f70f9a16c88]
[exadesktop:11617] [10] /home/div/src/exatn/build_mkl_cuda_openmpi_debug/lib/libgtestd.so(_ZN7testing8TestInfo3RunEv+0x10f)[0x7f70f9a17557]
[exadesktop:11617] [11] /home/div/src/exatn/build_mkl_cuda_openmpi_debug/lib/libgtestd.so(_ZN7testing8TestCase3RunEv+0x107)[0x7f70f9a17bc9]
[exadesktop:11617] [12] /home/div/src/exatn/build_mkl_cuda_openmpi_debug/lib/libgtestd.so(_ZN7testing8internal12UnitTestImpl11RunAllTestsEv+0x2a9)[0x7f70f9a227f9]
[exadesktop:11617] [13] /home/div/src/exatn/build_mkl_cuda_openmpi_debug/lib/libgtestd.so(_ZN7testing8internal38HandleSehExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc+0x65)[0x7f70f9a3e966]
[exadesktop:11617] [14] /home/div/src/exatn/build_mkl_cuda_openmpi_debug/lib/libgtestd.so(_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc+0x5a)[0x7f70f9a38553]
[exadesktop:11617] [15] /home/div/src/exatn/build_mkl_cuda_openmpi_debug/lib/libgtestd.so(_ZN7testing8UnitTest3RunEv+0xba)[0x7f70f9a213b4]
[exadesktop:11617] [16] ./client_test(_Z13RUN_ALL_TESTSv+0x11)[0x4089e0]
[exadesktop:11617] [17] ./client_test(main+0x3a)[0x408406]
[exadesktop:11617] [18] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f70f9021b97]
[exadesktop:11617] [19] ./client_test[0x40792a]
[exadesktop:11617] *** End of error message ***

exatn and runtime depends libexatn-runtime-graph installed in plugins/

Right now we have tensor runtime depending on the library exatn-runtime-graph, which provides tensor_exec_state.cpp. This runtime-graph library is installed in plugins, because it also provides graph implementation plugins.

We need to separate tensor_graph.hpp, tensor_exec_state.* from the specific graph impl plugins. The former should be installed in lib/ and the latter should be installed in plugins.

Custom install locations not supported

Setting CMAKE_INSTALL_PREFIX installs all libraries as expected however this causes tests to fail at runtime with the following error:

[service-registry] Could not open plugin directory.
Could not find service with name eager-dag-executor. Perhaps the service is not Identifiable.
#ERROR(exatn::service): Invalid ExaTN service: eager-dag-executor in the Service Registry.
Could not find service with name boost-digraph. Perhaps the service is not Identifiable.
#ERROR(exatn::service): Invalid ExaTN service: boost-digraph in the Service Registry.
[login2:32431] *** Process received signal ***
[login2:32431] Signal: Segmentation fault (11)
[login2:32431] Signal code: Address not mapped (1)
[login2:32431] Failing at address: (nil)

This is due to exatn::initialize() calling serviceRegistry->initialize with no arguments, which in turn causes the exatnPluginPath to be set to $HOME/.exatn/plugins by default.

Hard-coded OpenMPI references in CMAKE

When I build exatn with mpich, it builds with mpich fine. However, make install still sets runtime path to openmpi which is probably provided by automatic finder in CMAKE. We need to make sure that CMAKE no longer finds MPI on its own but is using the MPI provided by the user.

exatn-config executable

I'd like to create an executable called exatn-config that downstream users can leverage in standard Makefile builds. For downstream CMake users, we will have them just use exatn::exatn targets, but for Makefile users we need to be able to provide the include headers and link line that they can integrate in their makefiles. My inspiration for this is the llvm-config executable.

Users should be able to run commands like this

$ exatn-config --libs (to return the link line)
$ exatn-config --includes (to get the include paths)
$ exatn-config --ldflags (to get any linker flags)
$ exatn-config --cxxflags (to get all cxx flags)

I think we could do this pretty easily with a python script. We could reference necessary CMake variables in that script and use CMake configure_command() to write those concrete variables.

cuTensor backend integration has a bug

The ExaTN build with cuTensor has a bug on the ExaTN side. However, the ExaTN build with cuTensorNet and cuTensor works correctly via cuTensorNet.

devel build fail due to if(NOT EXISTS ${BLAS_PATH}/libblas.so)

I believe you need to update to if(NOT EXISTS "${BLAS_PATH}/libblas.so")

Python tests error: incompatible function arguments.

Hi, i try to run python/tests/test-tensorRuntime.py
and get the following error:

File "test-tensorRuntime.py", line 23, in <module>
  create_tensor1.setTensorOperand(tensor1)
TypeError: setTensorOperand(): incompatible function arguments. The following argument types are supported:
  1. (self: exatn._pyexatn.TensorOperation, arg0: exatn::numerics::Tensor, arg1: bool) -> None

Invoked with: <exatn._pyexatn.TensorOpCreate object at 0x7f272d4af3b0>, <exatn._pyexatn.Tensor object at 0x7f272d4af2d0>

changing line 20 from create_tensor0.setTensorOperand(tensor0) to create_tensor0.setTensorOperand(tensor0, True) helps: same error is raised on next line.

OS: Arch Linux x86_643
Python 3.7.3
gcc and g++: 8.3.0
cuda: 10.1.105-12

compiled with this:

cmake .. -DEXATN_BUILD_TESTS=TRUE -DPYTHON_INCLUDE_DIR=(python
-c "import sysconfig; print(sysconfig.get_paths()['platinclude'])") -DENABLE_CUDA=True

Update CMake build of ExaTensor

We need to update tpls/CMakeLists.txt to provide a more robust build of the ExaTensor submodule.

Wrong numerical results

1. Incorrect numerics

for matrices generated by numpy

a = np.array([
    [1., 0, 0],
    [0., 1, 1]
])
b = np.array([
    [1., 0, 3, 0],
    [1,  1,  2, 2],
    [-1, 1, -2, 2],
])
exatn.createTensor('C1', [2, 4], 0.)
exatn.createTensor('A1', np.array(a, copy=True)) # copy to prevent exatn from changing the array data
exatn.createTensor('B1', np.array(b, copy=True))
exatn.evaluateTensorNetwork('test', 'C1(a, c) = A1(a, b) * B1(b, c)')

c1_exatn = exatn.getLocalTensor('C1')

assert np.allclose(c1_exatn, np.dot(a, b)) # raises assertion error

"""
The result should be
[[1 0 3 0],
 [0 2 0 4]]

Easily checkable by hand for above example matrices
"""

The data received back from exatn by exatn.getLocalTensor(name) match the original array.

For matrices generated by `exatn.initTensorRnd`

exatn.createTensor('Xr', [2, 3], 0)
exatn.createTensor('Yr', [3, 4], 0)
exatn.initTensorRnd('Xr')
exatn.initTensorRnd('Yr')

xr = exatn.getTensorData('Xr')
yr = exatn.getTensorData('Yr')
zr = np.dot(xr, yr)

exatn.createTensor('Zr', [2, 4], 0)
exatn.evaluateTensorNetwork('rnd', 'Zr(a, b) = Xr(a, c) * Yr(c, b)')
zr_exatn = exatn.getTensorData('Zr')

assert np.allclose(zr_exatn, zr), "Results do not match!" #Raises as well

What is going on

Having found this issue, I tried to understand what are the numbers that exatn returns - they were correct for scalar product of vectors and somewhat correct for matrix-vector multiplication.
I noticed that for square matrices, when doing A*B exatn returns B*A.
However, it turns out that it's more complicated - rectangular matrices do not result in index mismatch, but produce completely different numbers.

Turns out exatn re-arranges data in an interesting way - it applies a function S to each input, and S^-1 to output as defined below.

Continuing from the previous code snippet:

def S(x):
    """ Reshape to reverse-ordered shape,
     then transpose to match the original shape
    """
    x = x.reshape(*reversed(x.shape))
    x = x.transpose() # reverses order of axis by default
    return x

def inv_S(x):
    """ Inverse of S. """
    x = x.transpose() # reverses order of axis by default
    x = x.reshape(*reversed(x.shape))
    return x

adj_zr = inv_S(np.dot(S(xr), S(yr)))


assert np.allclose(zr_exatn, adj_zr) # Will NOT raise!

>>> S(np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8]
]))
array([[1, 3, 5, 7],
       [2, 4, 6, 8]])

The same issue is observed when contracting larger tensor networks.

x, y, z = [np.random.randn(*sh) for sh in [
    (2, 3),
    (3, 2, 2),
    (2, 2, 2)
]] # generate some random data
# %%
exatn.createTensor('X', np.array(x, copy=True))
exatn.createTensor('Y', np.array(y, copy=True))
exatn.createTensor('Z', np.array(z, copy=True))

tn = exatn.TensorNetwork('test2')
tn.appendTensor(1, 'X')
tn.appendTensor(2, 'Y', [(1, 0)])
tn.appendTensor(3, 'Z', [(1, 0), (2,1)])
tn.printIt()

# This is equivalent to exatn.evaluateTensorNetwork('test2', 'F0(a,b) = X(a,c) * Y(c,d,e) * Z(d,e,b)')
exatn.evaluate(tn)
result = tn.getTensor(0)
result_name = result.getName()
result_data = exatn.getTensorData(result_name)

### Compare results to einsum
einsum_data = np.einsum('ij,jkl,klm->im', x, y, z)
assert np.allclose(einsum_data, result_data) # Raises

### Fix numerics with reshape-transform

einsum_data_adj = inv_S(np.einsum('ij,jkl,klm->im', S(x), S(y), S(z)))

assert np.allclose(einsum_data_adj, result_data) # Does not raise!

Note that S(A)*S(B) produces completely different numbers from A*B - it is not just an issue of indexing/shaping convention. To obtain A*B from exatn one has to perform S(S^-1(A)*S^-1(B)).

2. Tests are not robust

All the tests and examples in python folder use tensors that are fixed poins for S. They are either constant-valued (simple.py, hamiltonian.py...) or diagonal and square (quantum_circuit_network.py, large_circuit.py...) and hence do not change when reshaped+transposed.

I looked through C++ tests in src/exatn/tests/NumServerTester.cpp and it looks like the same problem is there as well, for example on line 1332 again all tensors are square and diagonal.

What implications does this have on cache efficiency?

Exatn was built from latest devel branch (commit 7c722c5)
More detailed examples at https://github.com/danlkv/QTensor/blob/dev/scratchpad/exatn/exatn_demo.py

Handle different cuQuantum SDK directory structure in CMake

The path ${CUQUANTUM_PATH}/lib64/ may not be valid for all cuQuantum installations.
e.g., when I unzip the latest beta version, the path is lib not lib64.

ExaTensor CMAKE_*_COMPILER Update breaks MPI build

When building with MPI, the ExaTensor build now fails with

make[4]: Entering directory '/home/cades/dev/exatn/tpls/ExaTensor/DDSS'
/usr/bin/gfortran ./OBJ/main.o libddss.a -lgomp -L/usr/lib/openmpi/lib -L. -L. -lstdc++ -o test_ddss.x
libddss.a(service_mpi.o): In function `__service_mpi_MOD_quit':
service_mpi.F90:(.text+0x633): undefined reference to `mpi_wtime_'
service_mpi.F90:(.text+0x83a): undefined reference to `mpi_finalize_'
service_mpi.F90:(.text+0x8dd): undefined reference to `mpi_abort_'
libddss.a(service_mpi.o): In function `__service_mpi_MOD_dil_global_comm_barrier':
service_mpi.F90:(.text+0x9c8): undefined reference to `mpi_barrier_'
service_mpi.F90:(.text+0x9f7): undefined reference to `mpi_allreduce_'
service_mpi.F90:(.text+0xa43): undefined reference to `mpi_barrier_'
service_mpi.F90:(.text+0xa72): undefined reference to `mpi_allreduce_'
...

If I go into that directory and run

mpifort ./OBJ/main.o libddss.a -lgomp -L/usr/lib/openmpi/lib -L. -L. -lstdc++ -o test_ddss.x

it compiles.

It appears that the new

ifeq ($(EXATN_SERVICE),YES)
FCOMP = $(CMAKE_Fortran_COMPILER)
else
FCOMP = $(COMP_PREF) $(FC_$(WRAP))
endif

calls in the Makefiles need to also take into account if MPI_LIB is NONE and then use the appropriate $(PATH_$(MPI_LIB)_BIN)/mpi{cc,cxx,fort} call.

GCC-13 compability

The build fails under GCC-13 since the implicit inclusion of std headers has changed. See https://gcc.gnu.org/gcc-13/porting_to.html

The pull request is up on ORNL-QCI/CppMicroServices#1
This repo needs to update to the then merged version

Tensor storage domain after replication

D = Q7 * Q8, where
Q7 and Q8 reside on {0,1},
D resides on 0.
Process 0 replicates Q7 and Q8 from process 1, but D is local.
Current storage domain rule: Tensor operand residence domains must coincide with the execution domain.
Corrected storage domain rule: (1) The execution domain must have access to complete tensor operands; (2) The residence domain of the output tensors must not exceed the execution domain.

Python numPy::array conversion invalid

Python numpy::array assumes the row-major storage layout by default whereas ExaTN tensors are stored in the column-major way. The current Pybind11 interface does not account for that and produces invalid numPy arrays, so it is not usable at all right now. The affected functions are initTensorData, where the supplied C++ std::vector assumes column-major storage, and getLocalTensor, where the returned C++ local tensor copy assumes column-major storage. In both cases those are currently mapped to row-major numpy::arrays, which is invalid. We need to make sure that the incoming/outgoing numpy::arrays are constructed as column-major (Python allows any striding). Do you know how to do this (how to make all numpy::arrays column-major in our pybind11 interface)?

Mac OSX Build Error

After following the prescribed steps, I get the following build error after running the following prompt:

FC=gfortran-9 CC=gcc-9 CXX=g++-9 cmake .. -DEXATN_BUILD_TESTS=TRUE -DPYTHON_INCLUDE_DIR=$(python3 -c "import sysconfig; print(sysconfig.get_paths()['platinclude'])")

-- The C compiler identification is GNU 9.1.0
-- The CXX compiler identification is GNU 9.1.0
-- Checking whether C compiler has -isysroot
-- Checking whether C compiler has -isysroot - yes
-- Checking whether C compiler supports OSX deployment target flag
-- Checking whether C compiler supports OSX deployment target flag - yes
-- Check for working C compiler: /usr/local/bin/gcc-9
-- Check for working C compiler: /usr/local/bin/gcc-9 -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Checking whether CXX compiler has -isysroot
-- Checking whether CXX compiler has -isysroot - yes
-- Checking whether CXX compiler supports OSX deployment target flag
-- Checking whether CXX compiler supports OSX deployment target flag - yes
-- Check for working CXX compiler: /usr/local/bin/g++-9
-- Check for working CXX compiler: /usr/local/bin/g++-9 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at CMakeLists.txt:11 (exatn_configure_library_rpath):
  Unknown CMake command "exatn_configure_library_rpath".


CMake Warning (dev) in CMakeLists.txt:
  No cmake_minimum_required command is present.  A line of code such as

    cmake_minimum_required(VERSION 3.14)

  should be added at the top of the file.  The version specified may be lower
  if you wish to support older CMake versions for this project.  For more
  information run "cmake --help-policy CMP0000".
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Configuring incomplete, errors occurred!

ornl-qci / exatn Goto Github PK

exatn's Issues

1. Incorrect numerics

for matrices generated by numpy

For matrices generated by exatn.initTensorRnd

What is going on

2. Tests are not robust

Recommend Projects

Recommend Topics

Recommend Org

For matrices generated by `exatn.initTensorRnd`