llnl / chai Goto Github PK

Copy-hiding array abstraction to automatically migrate data between memory spaces

License: BSD 3-Clause "New" or "Revised" License

CMake 5.73% C++ 88.97% Shell 4.49% Dockerfile 0.82%

cpp data-abstraction gpu memory-management blt raja portability radiuss

chai's Introduction

CHAI v2024.02.1

CHAI is a library that handles automatic data migration to different memory spaces behind an array-style interface. It was designed to work with RAJA and integrates with it. CHAI could be used with other C++ abstractions, as well.

CHAI uses CMake and BLT to handle builds. Make sure that you have a modern compiler loaded and the configuration is as simple as:

$ git submodule update --init --recursive
$ mkdir build && cd build
$ cmake -DCUDA_TOOLKIT_ROOT_DIR=/path/to/cuda ../

CMake will provide output about which compiler is being used, and what version of CUDA was detected. Once CMake has completed, CHAI can be built with Make:

$ make

For more advanced configuration you can use standard CMake variables.

More information is available in the CHAI documentation.

Authors

The original developers of CHAI are:

Holger Jones ([email protected])
David Poliakoff ([email protected])
Peter Robinson ([email protected])

Contributors include:

David Beckingsale ([email protected])
Riyaz Haque ([email protected])
Adam Kunen ([email protected])

Release

Unlimited Open Source - BSD Distribution

For release details and restrictions, please read the LICENSE file. It is also linked here: LICENSE

LLNL-CODE-705877
OCEC-16-189

chai's People

Contributors

Stargazers

Watchers

chai's Issues

Umpire memory sanitizer issue with 2.2.1 and clang 9 on Lassen

When using /usr/tce/packages/clang/clang-upstream-2019.03.26/bin/clang++ to build the 2.2.1 release on Lassen I get the following error

/usr/WS1/geosadmn/geosx/thirdPartyLibs/build-lassen-clang@upstream-release/chai/src/chai/src/tpl/umpire/src/umpire/util/memory_sanitizers.hpp:28:10: fatal error: 'sanitizer/asan_interface.h' file not found
#include <sanitizer/asan_interface.h>

I think this was fixed in a more recent version of Umpire, and develop builds fine.

I'm updating the GEOSX tpls and it would be nice to be able to point to a CHAI release instead of a commit off of develop.

Fix dynamic_pointer_cast for managed_ptr

The CPU pointer is cast using dynamic_cast, but the GPU pointer is cast using a static_cast. The latter needs to be changed to a dynamic_cast.

CMakeLists.txt flow control statement nesting issue

Trying to build chai using spack -- under spack/spack#25321 -- and getting the following error (that also appeared when building care and glvis):

...
==> [2021-08-12-00:04:42.293990] chai: Building chai-develop-wzzsm5uvishgiebaczrwbzakg646fb3m [CMakePackage]
==> [2021-08-12-00:04:42.310775] chai: Executing phase: 'cmake'
==> [2021-08-12-00:04:42.319438] 'cmake' '-G' 'Unix Makefiles' '-DCMAKE_INSTALL_PREFIX:STRING=/home/software/radiuss/[padded-to-512-chars]/morepadding/linux-ubuntu18.04-x86_64/gcc-7.5.0/chai-develop-wzzsm5uvishgiebaczrwbzakg646fb3m' '-DCMAKE_BUILD_TYPE:STRING=RelWithDebInfo' '-DCMAKE_INTERPROCEDURAL_OPTIMIZATION:BOOL=OFF' '-DCMAKE_VERBOSE_MAKEFILE:BOOL=ON' '-DCMAKE_INSTALL_RPATH_USE_LINK_PATH:BOOL=OFF' '-DCMAKE_INSTALL_RPATH:STRING=/home/software/radiuss/[padded-to-512-chars]/morepadding/linux-ubuntu18.04-x86_64/gcc-7.5.0/chai-develop-wzzsm5uvishgiebaczrwbzakg646fb3m/lib;/home/software/radiuss/[padded-to-610-chars]/morepadding/linux-ubuntu18.04-x86_64/gcc-7.5.0/chai-develop-wzzsm5uvishgiebaczrwbzakg646fb3m/lib64;/home/software/radiuss/[padded-to-612-chars]/linux-ubuntu18.04-x86_64/gcc-7.5.0/umpire-develop-iztrmuhvzvvffh32phtiyy62u3fdvjtx/lib;/home/software/radiuss/[padded-to-600-chars]/linux-ubuntu18.04-x86_64/gcc-7.5.0/camp-master-kf2ofmqh54quzf4bysluyumymjrszhen/lib' '-DCMAKE_PREFIX_PATH:STRING=/home/software/radiuss/[padded-to-512-chars]/linux-ubuntu18.04-x86_64/gcc-7.5.0/umpire-develop-iztrmuhvzvvffh32phtiyy62u3fdvjtx;/home/software/radiuss/[padded-to-596-chars]/linux-ubuntu18.04-x86_64/gcc-7.5.0/camp-master-kf2ofmqh54quzf4bysluyumymjrszhen;/home/software/radiuss/[padded-to-593-chars]/linux-ubuntu18.04-x86_64/gcc-7.5.0/blt-0.4.1-w3rx3gge3lvk5u63sh2rr2f6zfarqcrz;/home/software/radiuss/[padded-to-591-chars]/linux-ubuntu18.04-x86_64/gcc-7.5.0/cmake-3.21.1-4uhxhmxa4yv6z4nzjnskjrwgmwmsduik' '-DBLT_SOURCE_DIR=/home/software/radiuss/[padded-to-512-chars]/linux-ubuntu18.04-x86_64/gcc-7.5.0/blt-0.4.1-w3rx3gge3lvk5u63sh2rr2f6zfarqcrz' '-DENABLE_CUDA=OFF' '-DENABLE_HIP=OFF' '-DENABLE_PICK:BOOL=ON' '-Dumpire_DIR:PATH=/home/software/radiuss/[padded-to-512-chars]/linux-ubuntu18.04-x86_64/gcc-7.5.0/umpire-develop-iztrmuhvzvvffh32phtiyy62u3fdvjtx/share/umpire/cmake' '-DENABLE_TESTS=OFF' '-DENABLE_BENCHMARKS:BOOL=OFF' '-DENABLE_EXAMPLES:BOOL=ON' '-DENABLE_BENCHMARKS=OFF' '/tmp/root/spack-stage/spack-stage-chai-develop-wzzsm5uvishgiebaczrwbzakg646fb3m/spack-src'
CMake Error at CMakeLists.txt:50 (endif):
  Flow control statements are not properly nested.


-- Configuring incomplete, errors occurred!
...

Adding these packages to Spack Cloud CI PR builds is being deferred until the impact of the fix is assessed.

chai-develop-cmakelist-flow-error.txt

Looking at the code, it appears the error has been around since a commit 10 months ago. See line

CHAI/CMakeLists.txt

Line 49 in be56276

endif()

Support out of source BLT with BLT_SOURCE_DIR

It would be useful if you supported out of source BLT instances. Here is an example of how to do so:

################################
BLT

################################
if (DEFINED BLT_SOURCE_DIR)

Support having a shared BLT outside of the repository if given a BLT_SOURCE_DIR

if (NOT EXISTS ${BLT_SOURCE_DIR}/SetupBLT.cmake)
message(FATAL_ERROR "Given BLT_SOURCE_DIR does not contain SetupBLT.cmake")
endif()

else()

Use internal BLT if no BLT_SOURCE_DIR is given

set(BLT_SOURCE_DIR "${PROJECT_SOURCE_DIR}/cmake/blt" CACHE PATH "")
if (NOT EXISTS ${BLT_SOURCE_DIR}/SetupBLT.cmake)
message(FATAL_ERROR
"The BLT submodule is not present. "
"Run the following two commands in your git repository: \n"
" git submodule init\n"
" git submodule update" )
endif()
endif()

include(${BLT_SOURCE_DIR}/SetupBLT.cmake)

Consider removing disabled execution spaces from enum

We just hit a case where we were pulling out a raw pointer using chai::ExecutionSpace::GPU, and it was greater than NUM_EXECUTION_SPACES since CHAI was built with CUDA disabled. Totally a bug on our end, but I think it would be preferable for uses of execution spaces that are disabled to not even compile. This would involve the following code:

enum ExecutionSpace {
  /*! Default, no execution space. */
  NONE = 0,
  /*! Executing in CPU space */
  CPU,
#if defined(CHAI_ENABLE_CUDA) || defined(CHAI_ENABLE_HIP) || defined(CHAI_ENABLE_GPU_SIMULATION_MODE)
  /*! Execution in GPU space */
  GPU,
#endif
#if defined(CHAI_ENABLE_UM)
  UM,
#endif
#if defined(CHAI_ENABLE_PINNED)
  PINNED,
#endif
  // NUM_EXECUTION_SPACES should always be last!
  /*! Used to count total number of spaces */
  NUM_EXECUTION_SPACES
#if !defined(CHAI_ENABLE_CUDA) && !defined(CHAI_ENABLE_HIP) && !defined(CHAI_ENABLE_GPU_SIMULATION_MODE)
  ,GPU
#endif
#if !defined(CHAI_ENABLE_UM)
  ,UM
#endif
#if !defined(CHAI_ENABLE_PINNED)
  ,PINNED
#endif
};

The new code would remove entries after NUM_EXECUTION_SPACES:

enum ExecutionSpace {
  /*! Default, no execution space. */
  NONE = 0,
  /*! Executing in CPU space */
  CPU,
#if defined(CHAI_ENABLE_CUDA) || defined(CHAI_ENABLE_HIP) || defined(CHAI_ENABLE_GPU_SIMULATION_MODE)
  /*! Execution in GPU space */
  GPU,
#endif
#if defined(CHAI_ENABLE_UM)
  UM,
#endif
#if defined(CHAI_ENABLE_PINNED)
  PINNED,
#endif
  // NUM_EXECUTION_SPACES should always be last!
  /*! Used to count total number of spaces */
  NUM_EXECUTION_SPACES
};

std::filesystem not found in default build

Building after a default configuration leads to an error because std::filesystem is not found.
Changing BLT_CXX_STD from c++14 to c++17 solve the issue.
The std::filesystem include should be protected or the support of c++14 dropped.

Your current Documentation is broken

Documentation found here: https://chai.readthedocs.io/en/develop/
Which is linked in your readme has major sections missing.
For example the tutorial section is a stub found here: https://chai.readthedocs.io/en/develop/tutorial.html
Code documentation is also a stub found here:
https://chai.readthedocs.io/en/develop/code_documentation.html

I came across your library and would like to use it, but I'd love to be able to see what it does first :)

README instructions outdated (?), build fails with CUDA enabled

On an Ubuntu 20.04 machine, with cuda 11.4 and g++ 9.3, I follow the instructions on the README:

$ git clone [email protected]:LLNL/CHAI.git
...
$ cd CHAI
$ git submodule update --init --recursive
...
$ mkdir build && cd build
$ cmake -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda ../
...
-- CUDA Support is Off 
...
CMake Warning:
  Manually-specified variables were not used by the project:

    CUDA_TOOLKIT_ROOT_DIR

So, it seems the toolkit directory is being ignored and not actually enabling cuda (?). If we force cuda to be enabled, cmake configures as one would expect, but the library itself fails to build:

$ cmake -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DENABLE_CUDA=ON ../
...
-- CUDA Support is ON
...
-- Configuring done
-- Generating done
-- Build files have been written to: ...
$ make -j
[  0%] Building CXX object blt/thirdparty_builtin/googletest-master-2020-01-07/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o
[  0%] Building CXX object blt/tests/smoke/CMakeFiles/blt_cuda_version_smoke.dir/blt_cuda_version_smoke.cpp.o
[  3%] Building CUDA object blt/tests/smoke/CMakeFiles/blt_cuda_smoke.dir/blt_cuda_smoke.cpp.o
...
[ 94%] Linking CUDA device code CMakeFiles/chai-example.exe.dir/cmake_device_link.o
/usr/bin/ld: ../lib/libumpire.a(Allocator.cpp.o): in function `__sti____cudaRegisterAll()':
tmpxft_0001b8fd_00000000-6_Allocator.cudafe1.cpp:(.text+0xee3): undefined reference to `__cudaRegisterLinkedBinary_44_tmpxft_0001b8fd_00000000_7_Allocator_cpp1_ii_a17095a1'
/usr/bin/ld: ../lib/libumpire.a(Replay.cpp.o): in function `__sti____cudaRegisterAll()':
tmpxft_0001b8fe_00000000-6_Replay.cudafe1.cpp:(.text+0x6fb): undefined reference to `__cudaRegisterLinkedBinary_41_tmpxft_0001b8fe_00000000_7_Replay_cpp1_ii_5eca6429'
/usr/bin/ld: ../lib/libumpire.a(ResourceManager.cpp.o): in function `__sti____cudaRegisterAll()':
tmpxft_0001b8f9_00000000-6_ResourceManager.cudafe1.cpp:(.text+0xe1a3): undefined reference to `__cudaRegisterLinkedBinary_50_tmpxft_0001b8f9_00000000_7_ResourceManager_cpp1_ii_42a9a1b2'

There are many more errors like this.

Optionally allow allocations to be zeroed

We would like to be able to specify that an allocation be zeroed upon allocation, sort of like calloc instead of malloc, but with all the host/device and pool usage that Umpire and Chai provide. Since allocations can happen lazily, and on either the device or the host, (I think) Chai has the best information of where the allocation is happening when it happens so that the zeroing can happen most efficiently without extra memory transfers.

@rchen20, @ajkunen, Verinder Rana, and John Loffeld also know the details. What other information is needed to flesh this request out?

Exit time error with 2.3.0

When using CHAI 2.3.0 with Umpire 4.1.2 and RAJA 0.13.0 I get a segfault when free'ing a ManagedArray that is destroyed at exit time (after main). This occurred with GCC 8.3.1 but not clang 10.0.1. When using CHAI 2.2.1 with Umpire 4.1.2 and RAJA 0.12.1 both compilers work fine.

terminate called after throwing an instance of 'umpire::util::Exception'
  what():  ! Umpire Exception [/usr/WS2/corbett5/geosx/uberenv_libs/builds/spack-stage-umpire-4.1.2-qs26ycp2lqi32tsvjid6wzboag6p4lrq/spack-src/src/umpire/ResourceManager.cpp:405]:  getAllocator Allocator "2" not found. Available allocators: 
    Backtrace: 18 frames
    0 0x2aaaae7d7cd5 No dladdr: /usr/WS2/corbett5/geosx/uberenv_libs/[email protected]/[email protected]/lib/libumpire.so(+0x3dcd5) [0x2aaaae7d7cd5]
    1 0x2aaaae7d7d40 No dladdr: /usr/WS2/corbett5/geosx/uberenv_libs/[email protected]/[email protected]/lib/libumpire.so(_ZN6umpire4util10backtracerINS0_12trace_alwaysEE13get_backtraceERNS0_9backtraceE+0x10) [0x2aaaae7d7d40]
    2 0x2aaaae7d33b2 No dladdr: /usr/WS2/corbett5/geosx/uberenv_libs/[email protected]/[email protected]/lib/libumpire.so(_ZN6umpire15ResourceManager12getAllocatorEi+0x1b2) [0x2aaaae7d33b2]
    3 0x2aaaad18baa3 No dladdr: /usr/WS2/corbett5/geosx/[email protected]/lib/libgeosx_core.so(_ZN4chai12ArrayManager4freeEPNS_13PointerRecordENS_14ExecutionSpaceE+0x113) [0x2aaaad18baa3]
    4 0x4bc17e No dladdr: ./tests/testFunctions(_ZN7LvArray10ChaiBufferIlE4freeEv+0x40) [0x4bc17e]
    5 0x4b475e No dladdr: ./tests/testFunctions(_ZN7LvArray18bufferManipulation4freeINS_10ChaiBufferIlEEEEvRT_l+0x2f) [0x4b475e]
    6 0x2aaaac43f684 No dladdr: /usr/WS2/corbett5/geosx/[email protected]/lib/libgeosx_core.so(_ZN7LvArray5ArrayIlLi1EN4camp7int_seqIlJLl0EEEElNS_10ChaiBufferEED1Ev+0x2e) [0x2aaaac43f684]
    7 0x2aaaac84b8f8 No dladdr: /usr/WS2/corbett5/geosx/[email protected]/lib/libgeosx_core.so(_ZN5geosx13TableFunctionD1Ev+0x30) [0x2aaaac84b8f8]
    8 0x2aaaac84b98c No dladdr: /usr/WS2/corbett5/geosx/[email protected]/lib/libgeosx_core.so(_ZN5geosx13TableFunctionD0Ev+0x18) [0x2aaaac84b98c]
    9 0x2aaaac7c0d9d No dladdr: /usr/WS2/corbett5/geosx/[email protected]/lib/libgeosx_core.so(_ZN5geosx12MappedVectorINS_14dataRepository5GroupEPS2_SslE11deleteValueIS3_EENSt9enable_ifIXsrSt7is_sameIT_S3_E5valueEvE4typeEl+0x5d) [0x2aaaac7c0d9d]
    10 0x2aaaac7bf9dd No dladdr: /usr/WS2/corbett5/geosx/[email protected]/lib/libgeosx_core.so(_ZN5geosx12MappedVectorINS_14dataRepository5GroupEPS2_SslE5clearEv+0x49) [0x2aaaac7bf9dd]
    11 0x2aaaac7bee80 No dladdr: /usr/WS2/corbett5/geosx/[email protected]/lib/libgeosx_core.so(_ZN5geosx12MappedVectorINS_14dataRepository5GroupEPS2_SslED1Ev+0x18) [0x2aaaac7bee80]
    12 0x2aaaac7ba938 No dladdr: /usr/WS2/corbett5/geosx/[email protected]/lib/libgeosx_core.so(_ZN5geosx14dataRepository5GroupD1Ev+0x42) [0x2aaaac7ba938]
    13 0x2aaaac86d056 No dladdr: /usr/WS2/corbett5/geosx/[email protected]/lib/libgeosx_core.so(_ZN5geosx15FunctionManagerD1Ev+0x2a) [0x2aaaac86d056]
    14 0x2aaac4934ce9 No dladdr: /lib64/libc.so.6(+0x39ce9) [0x2aaac4934ce9]
    15 0x2aaac4934d37 No dladdr: /lib64/libc.so.6(+0x39d37) [0x2aaac4934d37]
    16 0x2aaac491d55c No dladdr: /lib64/libc.so.6(__libc_start_main+0xfc) [0x2aaac491d55c]
    17 0x4a4459 No dladdr: ./tests/testFunctions() [0x4a4459]

Guard test folder with ENABLE_TESTS

You can't currently disable tests.

Original (src/CMakeLists.txt:117):
add_subdirectory(tests)

Proposed Fix:
if(ENABLE_TESTS)
add_subdirectory(tests)
endif()

Issues using ManagedArrays as class members

So, I've recently been trying to test out CHAI for some work here at LLNL in effort to improve some GPU performance of a number of mechanics models.

I ran into some issues with some code that looked something like down below:

Also, sorry about the slightly long example... I wanted a smallish reproducer that is a fair representation of the toy problem I've been working on. Part of the issue here is probably also related to my previous CHAI issue #174 of wanting to use custom Umpire resources for the ManagedArrays.

edit:

I realize I forgot to put the error message in here and what versions of the library I'm using and all:

CUDAassert: an illegal memory access was encountered /g/g20/carson16/snls_vectorization/raja/install_dir/include/RAJA/policy/cuda/MemUtils_CUDA.hpp 183
terminate called after throwing an instance of 'std::runtime_error'
  what():  CUDAassert

This error occurs when using the chai::ManagedArray::Operator[] for the case where the ManagedArrays aren't initialized through the class constructor but through the memoryManager object.

Additional useful information: I'm using the tagged release of CHAI v2.3.0, RAJA v0.13.0, and then Umpire v4.1.2 with Umpire and RAJA being out of builds due to some weird issues I was running into with Umpire and camp with the in source builds of them. My cmake invocation looked something like when built on rzansel:

cmake ../ -DENABLE_RAJA_PLUGIN=ON -DENABLE_CUDA=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/tce/packages/cuda/cuda-10.1.243/ -Dumpire_DIR=${UMPIRE_DIR} -DRAJA_DIR=${RAJA_DIR} -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../install_dir/

#ifndef PROB_GPU_THREADS
#define PROB_GPU_THREADS 256
#endif

#include "RAJA/RAJA.hpp"

#include "umpire/strategy/DynamicPool.hpp"
#include "umpire/Allocator.hpp"
#include "umpire/ResourceManager.hpp"
#include "chai/ManagedArray.hpp"

/// The PROB_FORALL wrapper where GPU threads are set to a default value
#define PROB_FORALL(i, st, end, ...)           \
PROB_ForallWrap<PROB_GPU_THREADS>(		  \
st,                                            \
end,                                           \
[=] __device__ (int i) {__VA_ARGS__},     \
[&] (int i) {__VA_ARGS__})

/// The MFEM_FORALL wrapper that allows one to change the number of GPU threads
#define PROB_FORALL_T(i, threads, st, end, ...)  \
PROB_ForallWrap<threads>(			          \
st,                                              \
end,                                             \
[=] __device__ (int i) {__VA_ARGS__},       \
[&] (int i) {__VA_ARGS__})
   /// This has largely been inspired by the MFEM device
   /// class, since they make use of it with their FORALL macro
   /// It's recommended to only have one object for the lifetime
   /// of the material models being used, so no clashing with
   /// multiple objects can occur in regards to which models
   /// run on what ExecutionSpace backend.

   class Device {
      private:
         static Device device_singleton;
         chai::ExecutionSpace _es;
         static Device& Get() { return device_singleton; }
      public:
#ifdef __CUDACC__
         Device() : _es(chai::ExecutionSpace::GPU) {}
#else
         Device() : _es(chai::ExecutionSpace::CPU) {}
#endif
         Device(chai::ExecutionSpace es) : _es(es) {
            Get()._es = es;
         }
         void SetBackend(chai::ExecutionSpace es) { Get()._es = es; }
         static inline chai::ExecutionSpace GetBackend() { return Get()._es; }
         ~Device() {
#ifdef __CUDACC__
            Get()._es = chai::ExecutionSpace::GPU;
#else
            Get()._es = chai::ExecutionSpace::CPU;
#endif
         }
   };

      Device Device::device_singleton;
      

   /// The forall kernel body wrapper. It should be noted that one
   /// limitation of this wrapper is that the lambda captures can
   /// only capture functions / variables that are publically available
   /// if this is called within a class object.
   template <const int NUMTHREADS, typename DBODY, typename HBODY>
   inline void PROB_ForallWrap(const int st,
                               const int end,
                               DBODY &&d_body,
                               HBODY &&h_body)
   {
      // Additional backends can be added as seen within the MFEM_FORALL
      // which this was based on.
      
      // Device::Backend makes use of a global variable
      // so as long as this is set in one central location
      // and you don't have multiple Device objects changing
      // the backend things should just work no matter where this
      // is used.
      switch(Device::GetBackend()) {
#ifdef HAVE_RAJA
   #ifdef RAJA_ENABLE_CUDA
         case(chai::ExecutionSpace::GPU): {
            printf("Running on GPU...\n");
            RAJA::forall<RAJA::cuda_exec<NUMTHREADS>>(RAJA::RangeSegment(st, end), d_body);
            break;
         }
   #endif
   #ifdef RAJA_ENABLE_OPENMP
         case(chai::ExecutionSpace::NONE): {
            RAJA::forall<RAJA::omp_parallel_for_exec>(RAJA::RangeSegment(st, end), h_body);
            break;
         }
   #endif
#endif
         case(chai::ExecutionSpace::CPU):
         default: {
            // Moved from a for loop to raja forall so that the chai ManagedArray
            // would automatically move the memory over
            RAJA::forall<RAJA::seq_exec>(RAJA::RangeSegment(st, end), h_body);
            break;
         }
      } // End of switch
   } // end of forall wrap

class memoryManager2 {
   public:
      memoryManager2() :
      _complete(false),
      _rm(umpire::ResourceManager::getInstance())
      {
         _host_allocator = _rm.getAllocator("HOST");
   #ifdef __CUDACC__
         // Do we want to make this pinned memory instead?
         _device_allocator = _rm.makeAllocator<umpire::strategy::DynamicPool>
                            ("DEVICE_pool", _rm.getAllocator("DEVICE"));
   #endif
      }
     
      /** Changes the internal host allocator to be one that
       *  corresponds with the integer id provided. This method
       *  should be preferably called before the class is initialized
       *  as complete.
       *  This host allocator should hopefully not be a pooled memory allocator
       *  due to performance reasons.  
       */
      __host__
      void setHostAllocator(int id)
      {
         if(_rm.getAllocator(id).getPlatform() == umpire::Platform::host) {
            _host_allocator = _rm.getAllocator(id);
         } else {
            printf("memoryManager::setHostAllocator. The supplied id should be associated with a host allocator");
         }
      }
      /** Changes the internal device allocator to be one that
       *  corresponds with the integer id provided. This method
       *  should be preferably called before the class is initialized
       *  as complete.
       *  This device allocator should hopefully be a pooled memory allocator
       *  due to performance reasons.
       */
      __host__
      void setDeviceAllocator(int id)
      {
   #ifdef __CUDACC__
         // We don't want to disassociate our default device allocator from
         // Umpire just in case it still has memory associated with it floating around.
         if(_rm.getAllocator(id).getPlatform() == umpire::Platform::cuda) {
            _device_allocator = _rm.getAllocator(id);
         } else {
            printf("memoryManager::setDeviceAllocator. The supplied id should be associated with a device allocator");
         }
   #endif
      }
      /// Tells the class that it is now considered completely initialized
      __host__
      void complete() { _complete = true; }
      /// Returns a boolean for whether or not the class is complete
      __host__
      bool getComplete() { return _complete; }


      template<typename T>
      __host__
      inline
      chai::ManagedArray<T> allocManagedArray(std::size_t size=0)
      {
         chai::ManagedArray<T> array(size, 
         std::initializer_list<chai::ExecutionSpace>{chai::CPU
#if defined(CHAI_ENABLE_CUDA) || defined(CHAI_ENABLE_HIP)
            , chai::GPU
#endif
            },
            std::initializer_list<umpire::Allocator>{_host_allocator
#if defined(CHAI_ENABLE_CUDA) || defined(CHAI_ENABLE_HIP)
            , _device_allocator
#endif
         });

         return array;
      }

      template<typename T>
      __host__
      inline
      chai::ManagedArray<T>* allocPManagedArray(std::size_t size=0)
      {
         auto array = new chai::ManagedArray<T>(size, 
         std::initializer_list<chai::ExecutionSpace>{chai::CPU
#if defined(CHAI_ENABLE_CUDA) || defined(CHAI_ENABLE_HIP)
            , chai::GPU
#endif
            },
            std::initializer_list<umpire::Allocator>{_host_allocator
#if defined(CHAI_ENABLE_CUDA) || defined(CHAI_ENABLE_HIP)
            , _device_allocator
#endif
         });

         return array;

      }

      virtual ~memoryManager2(){}
  private:
      bool _complete = false;
#ifdef HAVE_UMPIRE
      umpire::Allocator _host_allocator;
#ifdef __CUDACC__
      umpire::Allocator _device_allocator;
#endif
      umpire::ResourceManager& _rm;
#endif
};

class testCase
{
   public:
      chai::ManagedArray<double> data_public;
      // This is the only way I've found to work...
      // If these aren't initialized here and instead done down below in init()
      // then things fail.
      testCase(int nBatch) : data_public(chai::ManagedArray<double>(nBatch)),
      data_private(chai::ManagedArray<double>(nBatch)) 
      {
         init(nBatch);
      }
      ~testCase()
      {
         data_public.free();
         data_private.free();
      }

      void init(int nBatch)
      {
         memoryManager2 mm;

         auto test = mm.allocManagedArray<double>(nBatch);

         // Now this should work...
         printf("Running a simple test of things...");
         PROB_FORALL(i, 0, nBatch, {
            test[i] = 1.0;
         });

         test.free();

         // neither of these methods work
         // You just get that 
         // I've also tried making this a pointer as well...
         // data_public = mm.allocManagedArray<double>(nBatch); 
         // data_private = mm.allocManagedArray<double>(nBatch);
         printf("\nin init for testCase class\n");
         PROB_FORALL(i, 0, nBatch, {
            data_public[i] = 1.0;
            data_private[i] = 1.0;
         });
         printf("it worked \n");
      }

   private:
      chai::ManagedArray<double> data_private;
};

int main()
{
    testCase test(5000);
    return 0;
}

How to cite CHAI?

Hello! Is there a preferred way to cite CHAI?

Deprecate getPointer method in favor of data method

Can we mark the getPointer method as deprecated since the data method does the same thing and is more concise and consistent with the std library?

Ability to use temporary memory resource sets within ManagedArrays

@davidbeckingsale so for a library that I work on here at LLNL, I've been recently been looking at integrating CARE into it in-order to help simplify our internal memory management and our forall type kernels with the ever increasing need to support more and more hardware. I've been in talks with @adayton1 a bit about this, and he's been super helpful.

During some of our more recent discussions, I asked about how to make sure CARE uses the resource sets that an application code has told our library to use, so these would usually be resource sets associated with temporary memory. He pointed me to https://github.com/LLNL/CARE/blob/develop/src/care/care.cpp#L22, but warned me that this would change the global allocators that CHAI is using. I of course would rather avoid doing that for obvious reasons, and so @adayton1 mentioned there might be a way within CHAI to have a ManagedArray constructor take in a list of allocators which should allow me to make use of these temporary memory resource sets. Therefore, I figured it'd be a good idea to just open an issue and see if you had an advice/ideas for how best to handle this sort of use case of CHAI which could then be propagated up into CARE's use of ManagedArrays/host_device_ptrs.

CNMEM / CNEM option documented but unused?

An option named ENABLE_CNMEM or ENABLE_CNEM is mentioned in the doc but seem unused.

In fact, it seems it was removed in d7d0d5f.

I’ll remove it from the doc.

ManagedArrays copy constructed outside of a lambda capture are broken

EDIT: See my comment below for an even simpler test case.

Data movement is triggered by the copy constructor, right? I have a case where a ManagedArray is a member of a class. The copy constructor of the class is called, which calls the copy constructor of the ManagedArray. However, the following test case fails.

class TestClass {
public:
TestClass(chai::ManagedArray values) : m_values(values) {}
TestClass(const TestClass& other) : m_values(other.m_values) {}
CHAI_HOST_DEVICE int getValue(const int i) const { return m_values[i]; }
private:
chai::ManagedArray m_values;
};

CUDA_TEST(managed_ptr, cuda_inner_ManagedArray)
{
const int expectedValue = rand();

chai::ManagedArray array(1, chai::CPU);
array[0] = expectedValue;

TestClass temp(array);
chai::ManagedArray results(1, chai::GPU);

forall(cuda(), 0, 1, [=] device (int i) {
results[i] = temp.getValue(i);
});

results.move(chai::CPU);
ASSERT_EQ(results[0], expectedValue);
}

Need testing on ManagedArray<const T>

#include <chai/ManagedArray.hpp>
int main(){
  chai::ManagedArray<const double> foo(50);
  const double* foo2 = (const double*) foo;
}

This will cause an error in the casting operator, as registerTouch is ill-defined:

ArrayManager.hpp:113:8: note: candidate function not viable: no known conversion from 'const double *' to 'void *' for 1st argument;

I'm writing workarounds, but it'd be nice for this to work.

More configuration checking

Should CHAI_ENABLE_PINNED=ON be a configuration error if CUDA/HIP/GPU simulation mode are all disabled?
Should CHAI_ENABLE_UM=ON be a configuration error if CUDA/HIP/GPU simulation mode are all disabled?

Also, if CUDA/HIP/GPU simulation mode are all disabled, should the resource manager be required to be off?

Zero-length ManagedArrays leak memory

We've seen an issue with CHAI allocating zero-sized arrays. We see them report as memory leaks using gcc memory sanitizer, and also with valgrind.

We can see this on a CPU-only build with the default malloc Umpire allocator.

build failure on LC toss3 machines

When using the umpire host-config to build chai, i get an error.

The command:

cmake -C ../src/tpl/umpire/host-configs/toss_3_x86_64_ib/clang_4_0_0.cmake ..

gives:

CMake Error at blt/cmake/thirdparty/SetupCUDA.cmake:6 (enable_language):
  The CMAKE_CUDA_COMPILER:

    /usr/tce/packages/cuda/cuda-9.1.85/bin/nvcc

  is not a full path to an existing compiler tool.

  Tell CMake where to find the compiler by setting either the environment
  variable "CUDACXX" or the CMake cache entry CMAKE_CUDA_COMPILER to the full
  path to the compiler, or to the compiler name if it is in the PATH.
Call Stack (most recent call first):
  blt/cmake/thirdparty/SetupThirdParty.cmake:77 (include)
  blt/SetupBLT.cmake:100 (include)
  CMakeLists.txt:102 (include)


-- Configuring incomplete, errors occurred!
See also "/g/g15/settgast/workspace/Codes/geosx/CHAI/build-quartz/CMakeFiles/CMakeOutput.log".
See also "/g/g15/settgast/workspace/Codes/geosx/CHAI/build-quartz/CMakeFiles/CMakeError.log".

If disable cuda, I get the following error:

CMake Error: Error required internal CMake variable not set, cmake may not be built correctly.
Missing variable is:
CMAKE_CUDA_COMPILE_WHOLE_COMPILATION
CMake Error: Error required internal CMake variable not set, cmake may not be built correctly.
Missing variable is:
CMAKE_CUDA_DEVICE_LINK_EXECUTABLE
-- Generating done
-- Build files have been written to: /g/g15/settgast/workspace/Codes/geosx/CHAI/build-quartz

I get this last error on many platforms using various hostconfigs. The August20 snapshot doesn't have these problems.

Compilation issue

Issue with

gcc --version       
gcc (GCC) 13.1.1 20230429

and

clang --version
clang version 15.0.7
Target: x86_64-pc-linux-gnu

/chai/src/chai/src/tpl/umpire/src/umpire/tpl/camp/include/camp/resource.hpp:72:22: error: ‘runtime_error’ is not a member of ‘std’
   72 |           throw std::runtime_error("Incompatible Resource type get cast.");

and

/chai/src/chai/src/tpl/umpire/src/umpire/tpl/camp/include/camp/resource/host.hpp:58:21: error: there are no arguments to ‘malloc’ that depend on a template parameter, so a declaration of ‘malloc’ must be available [-fpermissive]
   58 |         return (T *)malloc(sizeof(T) * n);

and

/chai/src/chai/src/tpl/umpire/src/umpire/tpl/camp/include/camp/resource/host.hpp:66:71: error: ‘free’ was not declared in this scope
   66 |       void deallocate(void *p, MemoryAccess = MemoryAccess::Device) { free(p); }

coming from umpire, coming from camp... Is it possible to update these tpl ?

https://github.com/LLNL/camp/blob/main/include/camp/resource/host.hpp has
https://github.com/LLNL/camp/blob/main/include/camp/resource.hpp seems to not need anymore

Configure fails to recognize external Camp built with Raja: The following imported targets are referenced, but are missing: blt::openmp

Obviously we do not need to build camp twice and create a conflict at installation, however passing -Dcamp_DIR= fails to work:

CMake Error at src/tpl/umpire/src/tpl/CMakeLists.txt:106 (find_package):
  Found package configuration file:

    /opt/local/lib/cmake/camp/campConfig.cmake

  but it set camp_FOUND to FALSE so package "camp" is considered to be NOT
  FOUND.  Reason given by package:

  The following imported targets are referenced, but are missing: blt::openmp

-- Configuring incomplete, errors occurred!

Both raja and camp we built with OpenMP enabled. I do not understand what else it wants.

Can it be fixed and perhaps made more transparent?

Linking against cuda_runtime rather than cudart

Hello,

I'm having trouble linking my project to chai because chai wants to pull in cuda_runtime:

set_target_properties(chai PROPERTIES
  INTERFACE_INCLUDE_DIRECTORIES "/home/amklinv/spack/opt/spack/linux-ubuntu20.04-skylake/gcc-11.2.0/umpire-6.0.0-vzkbb7g3yc57jqa5xwodjynhrx5z2azs/include;${_IMPORT_PREFIX}/include"
  INTERFACE_LINK_LIBRARIES "umpire;cuda_runtime;RAJA;cuda"
)

On my system with cuda 11.4.4, cuda_runtime.so is not available.

libOpenCL.so             libcudadevrt.a                libcuinj64.so                libcusparse.so             libnppicc.so.11           libnppim.so              libnppitc_static.a       libnvptxcompiler_static.a
libOpenCL.so.1           libcudart.so                  libcuinj64.so.11.4           libcusparse.so.11          libnppicc.so.11.4.0.110   libnppim.so.11           libnpps.so               libnvrtc-builtins.so
libOpenCL.so.1.0         libcudart.so.11.0             libcuinj64.so.11.4.120       libcusparse.so.11.6.0.120  libnppicc_static.a        libnppim.so.11.4.0.110   libnpps.so.11            libnvrtc-builtins.so.11.4
libOpenCL.so.1.0.0       libcudart.so.11.4.148         libculibos.a                 libcusparse_static.a       libnppidei.so             libnppim_static.a        libnpps.so.11.4.0.110    libnvrtc-builtins.so.11.4.152
libaccinj64.so           libcudart_static.a            libcurand.so                 liblapack_static.a         libnppidei.so.11          libnppist.so             libnpps_static.a         libnvrtc.so
libaccinj64.so.11.4      libcufft.so                   libcurand.so.10              libmetis_static.a          libnppidei.so.11.4.0.110  libnppist.so.11          libnvToolsExt.so         libnvrtc.so.11.2
libaccinj64.so.11.4.120  libcufft.so.10                libcurand.so.10.2.5.120      libnppc.so                 libnppidei_static.a       libnppist.so.11.4.0.110  libnvToolsExt.so.1       libnvrtc.so.11.4.152
libcublas.so             libcufft.so.10.5.2.100        libcurand_static.a           libnppc.so.11              libnppif.so               libnppist_static.a       libnvToolsExt.so.1.0.0   stubs
libcublas.so.11          libcufft_static.a             libcusolver.so               libnppc.so.11.4.0.110      libnppif.so.11            libnppisu.so             libnvblas.so
libcublas.so.11.6.5.2    libcufft_static_nocallback.a  libcusolver.so.11            libnppc_static.a           libnppif.so.11.4.0.110    libnppisu.so.11          libnvblas.so.11
libcublasLt.so           libcufftw.so                  libcusolver.so.11.2.0.120    libnppial.so               libnppif_static.a         libnppisu.so.11.4.0.110  libnvblas.so.11.6.5.2
libcublasLt.so.11        libcufftw.so.10               libcusolverMg.so             libnppial.so.11            libnppig.so               libnppisu_static.a       libnvjpeg.so
libcublasLt.so.11.6.5.2  libcufftw.so.10.5.2.100       libcusolverMg.so.11          libnppial.so.11.4.0.110    libnppig.so.11            libnppitc.so             libnvjpeg.so.11
libcublasLt_static.a     libcufftw_static.a            libcusolverMg.so.11.2.0.120  libnppial_static.a         libnppig.so.11.4.0.110    libnppitc.so.11          libnvjpeg.so.11.5.2.120
libcublas_static.a       libcufilt.a                   libcusolver_static.a         libnppicc.so               libnppig_static.a         libnppitc.so.11.4.0.110  libnvjpeg_static.a

I don't know if this is something that got renamed in cuda, but raja correctly links against cudart_static.

Double free corruption when chai is used both in a shared object and an executable linking to it

Hi,
I don't know if this is a potential issue or simply a misuse of the library. When I have a a shared object and a main executable, both compiling against (the static library) chai, and where the executable links dynamically to my shared object, I get a double free error at the end of execution. This is most likely due to the static block that registers the chai plugin being called twice, which leads to chai being added twice to the list of the plugins. When exiting, the cleanup is done twice on the same object.
Here's a minimal working example.
The shared object's header:

$ cat testchai.hpp
#ifndef TESTCHAI_H
#define TESTCHAI_H
#include "chai/ArrayManager.hpp"
class TestChai
{
  public:
  void testChai();
};
#endif

and cpp file:

$ cat testchai.cpp
#include "testchai.hpp"
void TestChai::testChai()
{
  chai::ArrayManager *rm = chai::ArrayManager::getInstance();
}

and the main executable:

$ cat testchaimain.cpp
#include "chai/ArrayManager.hpp"
#include "testchai.hpp"
int main()
{
  chai::ArrayManager *rm = chai::ArrayManager::getInstance();
  TestChai t;
  t.testChai();
 return 0;
}

If now I compile the shared object:

$ g++ -o testchai.o -c testchai.cpp -I/path/to/chai/include -I/path/to/raja/include
$ g++ -shared -o testchai.so testchai.o /path/to/chai/lib/libchai.a /path/to/raja/libRAJA.a

and the executable:

$ g++ -o testchaimain.o -c testchaimain.cpp -I/path/to/chai/include -I/path/to/raja/include
$ g++ -o testchaimain testchaimain.o /path/to/chai/lib/libchai.a /path/to/chai/lib/libumpire.a /path/to/raja/libRAJA.a testchai.so

when I run I get a double free error:

$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:.
$ ./testchaimain
free() double free detected in tchache 2
Aborted (core dumped)

Is this a proper issue or is it simply forbidden to have multiple objects compiled against the static chai library?

Release doesn't include submodules?

When building CHAI from the release tarball I get the following error

CMake Error at CMakeLists.txt:62 (message):
        The BLT submodule is not present.

Build configuration with RAJA plugin enabled fails to add BLT stub

When trying to build CHAI with the RAJA plugin enabled, I get the following CMake error:

CMake Error at blt/cmake/BLTMacros.cmake:550 (add_library):
  add_library cannot create target "blt_stub" because another target with the
  same name already exists.  The existing target is an interface library
  created in source directory
  "/home/jhdavis/repos/perf-port/BabelStream/CHAI/src/tpl/umpire".  See
  documentation for policy CMP0002 for more details.
Call Stack (most recent call first):
  src/tpl/raja/cmake/SetupPackages.cmake:123 (blt_import_library)
  src/tpl/raja/CMakeLists.txt:118 (include)

The following steps reproduce the error:

git clone [email protected]:LLNL/CHAI.git
cd CHAI
git submodule update --init --recursive
mkdir build && cd build
cmake -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.6 -DENABLE_CUDA=ON -DENABLE_TESTS=Off -DENABLE_BENCHMARKS=Off -DCHAI_ENABLE_RAJA_PLUGIN=ON -DCMAKE_CUDA_ARCHITECTURES=86 ../

It seems like this might be related to this issue as well. I'm using CMake 3.22.1 and CUDA 11.6, and I'm cloning from the develop branch.

type of element size

ManagedArray::m_elems is a uint, which is an alias for "unsigned int". This propagates to the allocation/constructor functions as well. Shouldn't this be "size_t"?

Issue with ManagedArray::Operator[] and RAJA

@davidbeckingsale and others

I recently ran into a case for an application I'm working on where I had the initial chai::ExecutionSpace set to chai::ExecutionSpace::GPU for a chai::ManagedArray, and then I had it run within a RAJA forall loop on the CPU to initialize the data for some unit tests. Later on when trying to access the data on the host using the ManagedArray::Operator[] I would get a segfault. My understanding talking with @davidbeckingsale offline is that this should work as the data should now be allocated on the host/cpu.

I've included a MWE representing the issue.

Also, one other issue I noted was that doing a lambda capture by reference for the host RAJA loop also resulted in a segfault. I've included that below as well as commented out section of code. I will note that I've found this sort of lambda capture works fine if the original ExecutionSpace is set to the CPU.

Also, I'm using the following hashes for CHAI, RAJA, and Umpire:
CHAI: df3e8e0
RAJA: 3047fa720132d19ee143b1fcdacaa72971f5988c (v0.13.0 tagged release)
Umpire: 447f4640eff7b8f39d3c59404f3b03629b90c021 (v4.1.2 tagged release)

Additional information:
Compiled on rzansel with gcc/7.3.1, cuda/10.1.243, and cmake/3.14.5

#include "RAJA/RAJA.hpp"

#include "umpire/strategy/DynamicPool.hpp"
#include "umpire/Allocator.hpp"
#include "umpire/ResourceManager.hpp"

#include "chai/config.hpp"
#include "chai/ExecutionSpaces.hpp"
#include "chai/ManagedArray.hpp"

int main()
{

      auto& rm = umpire::ResourceManager::getInstance();
      auto host_allocator = rm.getAllocator("HOST");
#ifdef __CUDACC__
      auto device_allocator = rm.makeAllocator<umpire::strategy::DynamicPool>
                              ("DEVICE_pool", rm.getAllocator("DEVICE"));
#endif

      const int size = 5000;

      chai::ManagedArray<double> array(size, 
      std::initializer_list<chai::ExecutionSpace>{chai::CPU
#if defined(CHAI_ENABLE_CUDA) || defined(CHAI_ENABLE_HIP)
         , chai::GPU
#endif
         },
         std::initializer_list<umpire::Allocator>{host_allocator
#if defined(CHAI_ENABLE_CUDA) || defined(CHAI_ENABLE_HIP)
         , device_allocator
#endif
      },
      chai::ExecutionSpace::GPU);

      std::cout << "Running GPU runs" << std::endl;
      // This works
      RAJA::forall<RAJA::cuda_exec<256>>(RAJA::RangeSegment(0, size),
         [=] __device__ (int i) {
            array[i] = i;
      });

      std::cout << "Running CPU runs" << std::endl;
      // This should work but fails
      // RAJA::forall<RAJA::seq_exec>(RAJA::RangeSegment(0, size),
      //    [&] (int i) {
      //       array[i] = i;
      //    });
      // This works
      RAJA::forall<RAJA::seq_exec>(RAJA::RangeSegment(0, size),
         [=] (int i) {
            array[i] = i;
         });
      std::cout << "Printing out data" << std::endl;
      // These work
      // std::cout << array.data(chai::ExecutionSpace::CPU)[0] << std::endl;
      // std::cout << array.data()[0] << std::endl;
      // This should work since we last ran things on the CPU but fails
      std::cout << array[0] << std::endl;
      array.free();
      return 0;
}