stotko / stdgpu Goto Github PK
View Code? Open in Web Editor NEWstdgpu: Efficient STL-like Data Structures on the GPU
Home Page: https://stotko.github.io/stdgpu/
License: Apache License 2.0
stdgpu: Efficient STL-like Data Structures on the GPU
Home Page: https://stotko.github.io/stdgpu/
License: Apache License 2.0
The insert function here can only be inserted from the last position, which is obviously different from the intention of insert, which should be inserted from any position.
I notices that some functions like stdgpu::detail::memcpy
is non-async and running on DEFAULT cuda stream. More details: stdgpu::detail::memcpy depends on dispatch_memcpy
and it looks like:
dispatch_memcpy(void* destination,
const void* source,
index64_t bytes,
dynamic_memory_type destination_type,
dynamic_memory_type source_type) {
...
// use default stream here.
STDGPU_CUDA_SAFE_CALL(cudaMemcpy(destination, source, static_cast<std::size_t>(bytes), kind));
}
For example. if we use cuda graph and try to catch all operations on stream, error raises because diff streams (default and customers') are mixed.
stdgpu : CUDA ERROR :
Error : operation would make the legacy stream depend on a capturing blocking stream
File : external/stdgpu/src/stdgpu/cuda/impl/memory.cpp:123
Function : void stdgpu::cuda::dispatch_memcpy(void *, const void *, stdgpu::index64_t, stdgpu::dynamic_memory_type, stdgpu::dynamic_memory_type)
So my request: Run all stdgpu operations on a specified cuda stream
Our bitset
class is a GPU version of std::bitset
which, however, is designed to cover more use cases. In particular, its interface and implementation (run-time fixed-sized) is somewhere between std::bitset
(compile-time fixed-sized) and boost::dynamic_bitset
(run-time dynamic-sized). This may lead to confusion if users expect the exact same API as std::bitset
.
There are several ways to address this issue:
dynamic_bitset
and extend its API to match (as close as possible) boost.vector<bool>
and change/extend its API to match (as close as possible) vector
.At the moment, the last option seems to be a good compromise. However, it does not fully solve the problem regarding potential user confusion. Since any of the options will break the API, this change is considered for stdgpu 2.0.0
Building VS project failed when the backend is CUDA 12.5.
STDGPU_BACKEND
equaling to STDGPU_BACKEND_CUDA
.Building succeed.
Building failed.
Selecting Windows SDK version 10.0.20348.0 to target Windows 10.0.22610.
Created device flags : $<$<COMPILE_LANGUAGE:CUDA>:-Xcompiler=/W2>
Created test device flags : $<$<COMPILE_LANGUAGE:CUDA>:-Wno-deprecated-declarations>
Detected user-provided CCs : 52
Created host flags : $<$<COMPILE_LANGUAGE:CXX>:/W2>
Created test host flags : $<$<COMPILE_LANGUAGE:CXX>:/wd4996>
Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) (Required is at least version "1.9.1")
CMake Deprecation Warning at test/googletest-1.11.0/CMakeLists.txt:4 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
CMake Deprecation Warning at test/googletest-1.11.0/googletest/CMakeLists.txt:56 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
************************ stdgpu Configuration Summary *************************
General:
Version : 1.3.0
System : Windows
Build type :
Build:
STDGPU_BACKEND : STDGPU_BACKEND_CUDA
STDGPU_BUILD_SHARED_LIBS : OFF
STDGPU_SETUP_COMPILER_FLAGS : ON
STDGPU_TREAT_WARNINGS_AS_ERRORS : OFF
STDGPU_ANALYZE_WITH_CLANG_TIDY : OFF
STDGPU_ANALYZE_WITH_CPPCHECK : OFF
Configuration:
STDGPU_ENABLE_CONTRACT_CHECKS : ON
STDGPU_USE_32_BIT_INDEX : ON
Examples:
STDGPU_BUILD_EXAMPLES : ON
Tests:
STDGPU_BUILD_TESTS : ON
STDGPU_BUILD_TEST_COVERAGE : OFF
Documentation:
Doxygen : NO
*******************************************************************************
Configuring done (4.1s)
生成开始于 21:00...
1>------ 已启动生成: 项目: ZERO_CHECK, 配置: Debug x64 ------
1>1>Checking Build System
2>------ 已启动生成: 项目: stdgpu, 配置: Debug x64 ------
2>Building Custom Rule E:/Repos/open3d/build/stdgpu/src/ext_stdgpu/src/stdgpu/CMakeLists.txt
2>iterator.cpp
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(90,29): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(101,29): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(115,55): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(130,55): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(208,40): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(218,49): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(252,37): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(263,9): error C2059: 语法错误:“volatile”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(271,5): error C3861: “__syncthreads”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(280,12): error C3861: “__syncthreads_and”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(289,12): error C3861: “__syncthreads_or”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(298,5): error C3861: “__syncwarp”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(307,12): error C3861: “__any_sync”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(316,12): error C3861: “__all_sync”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(325,12): error C3861: “__ballot_sync”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(335,9): error C2059: 语法错误:“volatile”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(346,9): error C2059: 语法错误:“volatile”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(357,9): error C2059: 语法错误:“volatile”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(368,12): error C3861: “__shfl_sync”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(377,35): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(388,39): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(398,9): error C2059: 语法错误:“volatile”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(406,9): error C2059: 语法错误:“volatile”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(415,39): error C2065: “threadIdx”: 未声明的标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(416,40): error C2065: “threadIdx”: 未声明的标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(417,13): error C2065: “threadIdx”: 未声明的标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(427,34): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(438,34): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(479,39): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(489,39): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(499,39): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(509,39): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cuda\std\detail\libcxx\include\__cuda\ptx\ptx_helper_functions.h(40,44): error C3861: “__cvta_generic_to_shared”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cuda\std\detail\libcxx\include\__cuda\ptx\ptx_helper_functions.h(60,44): error C3861: “__cvta_generic_to_global”: 找不到标识符
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(450,26): error C3856: “is_proxy_reference”: 符号不是 模板 类
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(450,70): error C2065: “Container”: 未声明的标识符
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(450,43): error C2923: "stdgpu::detail::back_insert_iterator_proxy": "Container" 不是参数 "Container" 的有效 模板 类型参数
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(450,43): error C2143: 语法错误: 缺少“;”(在“stdgpu::detail::back_insert_iterator_proxy”的前面)
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(450,79): error C2059: 语法错误:“>”
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(451,7): error C2059: 语法错误:“public”
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(451,22): error C2872: “detail”: 不明确的符号
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(451,30): error C2039: "true_type": 不是 "thrust::detail" 的成员
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(452,1): error C2143: 语法错误: 缺少“;”(在“{”的前面)
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(452,1): error C2447: “{”: 缺少函数标题(是否是老式的形式表?)
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(459,22): error C2872: “detail”: 不明确的符号
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(467,22): error C2872: “detail”: 不明确的符号
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(101,38): error C2872: “detail”: 不明确的符号
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(101,46): error C2039: "execution_policy_base": 不是 "thrust::detail" 的成员
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(101,67): error C2988: 不可识别的模板声明/定义
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(101,67): error C2143: 语法错误: 缺少“,”(在“<”的前面)
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(165,40): error C2872: “detail”: 不明确的符号
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(165,48): error C2039: "execution_policy_base": 不是 "thrust::detail" 的成员
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(165,69): error C2988: 不可识别的模板声明/定义
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(165,69): error C2143: 语法错误: 缺少“,”(在“<”的前面)
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(41,40): error C2872: “detail”: 不明确的符号
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(41,48): error C2039: "execution_policy_base": 不是 "thrust::detail" 的成员
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(41,69): error C2988: 不可识别的模板声明/定义
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(41,69): error C2143: 语法错误: 缺少“,”(在“<”的前面)
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(68,42): error C2872: “detail”: 不明确的符号
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(68,50): error C2039: "execution_policy_base": 不是 "thrust::detail" 的成员
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(68,71): error C2988: 不可识别的模板声明/定义
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(68,71): error C2143: 语法错误: 缺少“,”(在“<”的前面)
2>limits.cpp
2>正在生成代码...
2>已完成生成项目“stdgpu.vcxproj”的操作 - 失败。
========== 生成: 1 成功,1 失败,0 最新,0 已跳过 ==========
========== 生成 于 21:00 完成,耗时 01.772 秒 ==========
Hello, I found that sometimes queue.size()
< 0, I guess it is because more than one threads pop
the empty queue at the same time.
I can see this warning when program running:
stdgpu/src/stdgpu/impl/deque_detail.cuh
Lines 434 to 441 in d9a4587
queue.valid()=false
) after the CUDA kernel, which causes unexpected result when I try to reuse it in another kernel like:
__global__ void kernel1(){
// push and pop operation
}
__global__ void kernel2(){
// push and pop operation
}
int main() {
kernel1<<<>>>(queue);
kernel2<<<>>>(queue);
}
What should I do? Thank in advance!
In contrast to boost, thrust and others, stdgpu is not a header-only library and, hence, requires shipping a compiled library. The following module currently require source file compilation:
bitset
: Contains host-only functions which also contain code executed on the device.device
: Contains a function relying on backend-specific host API functions.iterator
: Only contains a wrapper function to hide the dependency to memory
from the header.limits
: Contains the definition of static member variables.memory
: Both the general as well as the backend-specific parts handle the allocation and memcpy parts in the sources. mutex
: Contains host-only functions which also contain code executed on the device.Inlining bitset
and mutex
will make the library independent of the required GPU architecture, e.g. the compute capability set for CUDA. Even if we decide not to go for header-only, achieving architecture independence might be a good compromise.
Up to now, the container classes have a fixed capacity and are created using the non-standard createDeviceObject
factory function. Furthermore, since ease of use in GPU kernels is considered a key feature, the copy constructors are currently restricted to perform only shallow copies rather than deep copies. This behavior makes the container still feel non-standard and unintuitive to some degree, especially for new users.
In order to fix both issues, the design of the copy operations needs to be revised to match the STL more closely. At first glance, this seems to be an easy task:
reference_wrapper<T>
class which can be used on the GPU.However, objects (or at least their states) need to be copied from CPU to GPU memory in order to allow for the proper execution of an operation. Since we want to make the containers work for as many backends and use cases as possible, we cannot make any assumptions how this transfer will be performed or whether this really requires calling the copy constructor or not. reference_wrapper<T>
does not solve this problem since it points to the original object which lives in CPU memory.
Therefore, the current proposal would be:
shallow_copy_wrapper<T>
class (suggestions for a better name are welcome) which wraps the object state. This class is copyable such that the object state can be easily passed to the GPU similar to reference_wrapper<T>
. However, if the state of the original object is changed, e.g. due to a resize operation, this change will not be visible or propagated to the wrapper invalidating it. Thus, we trade object consistency with GPU support.shallow_copy_wrapper<T>
is only intended to allow crossing memory boundaries and to enable container usage on the GPU. For CPU usage, std::reference_wrapper<T>
should be used instead if required.createDeviceObject
and destroyDeviceObject
factory functions.This change will break existing usage within kernels and thrust algorithms (functors). A reasonable transition strategy would be to introduce shallow_copy_wrapper<T>
in the last minor release of version 1 (which might be 1.3.0) and provide an option to disable the copy constructor and copy assignment operators. This way, users could start porting to the new copy model and will only need to move away from the factory functions in version 2.0.0.
Describe the bug
unordered_map creation crashes with cuda 12.x (testing with release build)
Steps to reproduce
In cu main()
stdgpu::unordered_map<int, int> map = stdgpu::unordered_map<int, int>::createDeviceObject(1);
Expected behavior
Abvious
Actual behavior
Exception thrown at 0x00007FFD29D140AC in CudaHelloWorld.exe: Microsoft C++ exception: thrust::system::system_error at memory location 0x0000009373EFF210.
Unhandled exception at 0x00007FFD29D140AC in CudaHelloWorld.exe: Microsoft C++ exception: thrust::system::system_error at memory location 0x0000009373EFF210.
System (please complete the following information):
Describe the bug
The insertion of unordered_map does not work correctly under capacity=4
.
Steps to reproduce
Change the n
from 100
to 4
in examples/cuda/unordered_map.cu, compile and run this example code.
Expected behavior
the terminal should be:
The duplicate-free map of numbers contains 5 elements (5 expected) and the computed sums are (2, 4) ((10, 30) expected)
Actual behavior
the terminal actually shows:
The duplicate-free map of numbers contains 2 elements (5 expected) and the computed sums are (2, 4) ((10, 30) expected)
System (please complete the following information):
In order to implement several functions in a portable manner across the backends, thrust is used as the abstraction of choice. This may, however, limit extending support to other possible libraries making it desirable to reduce the coupling to thrust. Throughout the whole code base, the following classes and functions are used:
Header algorithm
:
all_of
copy
copy_if
count_if
equal
fill
for_each
stdgpu::for_each_index
implementation.generate
reduce
sequence
sort
tabulate
transform
transform_reduce
stdgpu::transform_reduce_index
implementation.Header execution
:
device
stdgpu::execution::device
.host
stdgpu::execution::host
.Header functional
:
equal_to<>
identity<>
plus<>
Header iterator
:
counting_iterator
stdgpu::for_each_index
and stdgpu::transform_reduce_index
implementation.discard_iterator
iterator_adaptor
.distance
iterator_adaptor
thrust
-compatible container iterators.transform_iterator
stdgpu::transform_range
implementation.zip_iterator
Header utility
:
pair
stdgpu::pair
.tuple
Other Headers:
random
Note: The examples
may still make use of thrust
to demonstrate interoperability.
1.When I want to use other containers (e.g., list, array), do I need to implement one myself?
2.The hip version of the example does not have the use of container classes, is it supported?
Is your feature request related to a problem? Please describe.
Thanks for an awesome library! I have a question/suggestion on a doc update to clarify how capacity, max_size and bucket_count are defined.
I'm using max_load_factor=1 and my interpretation is as follows:
When I calculate the current load factor, should I use "size/bucket_count" or "size/max_size"?
Describe the solution you'd like
Updated docs to better describe these variables
This issue comprises a list of deprecated functionality that will be removed in version 2.0.0:
Sources:
bit
: ispow2()
, log2pow2()
, mod2()
bitset
: Replace internal non-static member _bit_per_block
by static member version (changes object size)cstdlib
: sizedivPow2(std::size_t, std::size_t)
, sizediv_t
memory
: safe_pinned_host_allocator
, default_allocator_traits
mutex
: mutex_ref
ranges
: device_range(T*, index_t)
, host_range(T*, index_t)
, non-const begin()
and end()
member functionsunordered_map,unordered_set
: createDeviceObject(index_t, index_t)
, excess_count()
, total_count()
CMake:
STDGPU_ENABLE_AUXILIARY_ARRAY_WARNING
, STDGPU_ENABLE_MANAGED_ARRAY_WARNING
, STDGPU_USE_FAST_DESTROY
, STDGPU_USE_FIBONACCI_HASHING
C++17 has been released 5 years ago and the default compilers on Ubuntu 20.04 and Ubuntu 22.04 all support this standard. Support in CUDA has been added with CUDA 11.0 (released March 2020) and a potential future SYCL backend will require C++17 anyways. Furthermore, Ubuntu 18.04 will reach EOL soon in April 2023 and should not longer be used. Therefore, it makes sense to raise the requirements which will also simplify and unblock future developments.
Drop Ubuntu 18.04 support (EOL in April 2023)
GCC 7 -> 9, Clang 6 -> 10, MSVC 19.20 already sufficient
CMake 3.15 -> 3.18 (CUDA support for C++17), also aligns with the requirements for the Clang CUDA compiler
thrust 1.9.2 -> thrust 1.9.9
CUDA 10.0 -> 11.0
Library Code
limits
-> unblocks header-only*::value
by shorter *_v
versionsvoid_t
in type_traits
attribute.h
and use native attributesSTDGPU_HAS_CXX_17
atomic
backports in CUDA backend for CC 3.0 and lower (support removed with CUDA 11.0+)memory
to_address
implementation with if constexpr
std::byte
over unsigned char
in bit
noexcept
qualifier to function signaturesCMake
FindCUDAToolkit.cmake
moduleset_device_flags.cmake
Until now, the header names follow a simple convention. There are essentially three different types of extensions:
.h
: Header files that can be used in both CUDA (.cu
) and C++ (.cpp
) code..cuh
: Header files that can be used exclusively in CUDA (.cu
) code.With the recent addition of an OpenMP backend, this scheme no longer fits as the device code is now also compiled in .cpp
files. Furthermore, forward declarations follow a different convention which is also not obvious. This will also be the case for future backends. Thus, the current scheme is confusing and should be changed. Essentially, we have the following options:
.h
for all files including forward declaration files. This has the advantage of being a simple and uniform solution, but makes above inclusion limitations less obvious.Since the involving changes would lead to an API break, this will only be included in version 2.0.0.
Does it make sense to add a list and forward_list container?
What are some ways to optimize this project from the hip direction?
Amazing work! This is not a bug report, but just leaving it here in case someone needs this in the future
For those trying to compile this on Ubuntu 22.04 and friends alike, I managed to solve the following error:
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
435 | function(_Functor&& __f)
|
The problem is that the default C++ compiler for Ubuntu 22.04 has a problem making friends with nvcc, more info on the related issues at the bottom
Install an older compiler and tell nvcc
which is the host-compiler using CMake
sudo apt install g++10
cmake -DCMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-10 -Bbuild
At the time of this writing, stdgpu implements two different backends: CUDA (NVIDIA GPUs) and OpenMP (CPUs and GPUs). Extending this effort by introducing a ROCm backend (AMD GPUs) would further increase the number of use cases and improve the hardware support. This requires the following tasks to be completed:
exchange
and compare_exchange
required).The containers in the STL that support random access implement two different functions to access specific elements in the container:
operator[]
at
operator[]
just accesses the element without bound checks, while at
perfoms the same operation, but doing bound checks and throwing an out_of_bounds exception when the index is out of bounds.
I propose that the containers (deque and vector basically) implement at()
in terms of operator[]
, and perfom bound checks only when calling at()
. That way they're more compliant to the STL ones. This is my proposed solution for vector (for deque is the same idea):
template <typename T>
inline STDGPU_DEVICE_ONLY typename vector<T>::reference
vector<T>::at(const vector<T>::index_type n)
{
return const_cast<vector<T>::reference>(static_cast<const vector<T>*>(this)->at(n));
}
template <typename T>
inline STDGPU_DEVICE_ONLY typename vector<T>::const_reference
vector<T>::at(const vector<T>::index_type n) const
{
STDGPU_EXPECTS(0 <= n);
STDGPU_EXPECTS(n < size());
STDGPU_EXPECTS(occupied(n));
return this->operator[](n);
}
template <typename T>
inline STDGPU_DEVICE_ONLY typename vector<T>::reference
vector<T>::operator[](const vector<T>::index_type n)
{
return _data[n];
}
template <typename T>
inline STDGPU_DEVICE_ONLY typename vector<T>::const_reference
vector<T>::operator[](const vector<T>::index_type n) const
{
return _data[n];
}
Describe the bug
Current stdgpu seems to be NOT compatible with NVidia Thurst?
Steps to reproduce
git clone https://github.com/stotko/stdgpu.git
mkdir build
cd build
cmake ../
Expected behavior
Successfully built and run.
Actual behavior
No matter GCC or Clang, both failed.
[ 1%] Building CXX object src/stdgpu/CMakeFiles/stdgpu.dir/impl/iterator.cpp.o
cd ....../stdgpu/build/src/stdgpu && /usr/bin/c++ -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -Dstdgpu_EXPORTS -I....../stdgpu/src/stdgpu/.. -I....../stdgpu/build/src/stdgpu/include -isystem /usr/local/cuda/include -fPIC -Wall -pedantic -Wextra -Wshadow -Wsign-compare -Wconversion -Wfloat-equal -Wundef -Wdouble-promotion -MD -MT src/stdgpu/CMakeFiles/stdgpu.dir/impl/iterator.cpp.o -MF CMakeFiles/stdgpu.dir/impl/iterator.cpp.o.d -o CMakeFiles/stdgpu.dir/impl/iterator.cpp.o -c ....../stdgpu/src/stdgpu/impl/iterator.cpp
In file included from /usr/local/cuda/include/nv/detail/__target_macros:13,
from /usr/local/cuda/include/nv/target:195,
from /usr/local/cuda/include/cub/detail/device_synchronize.cuh:23,
from /usr/local/cuda/include/thrust/system/cuda/detail/util.h:36,
from /usr/local/cuda/include/thrust/system/cuda/detail/malloc_and_free.h:26,
from /usr/local/cuda/include/thrust/system/detail/adl/malloc_and_free.h:42,
from /usr/local/cuda/include/thrust/system/detail/generic/memory.inl:22,
from /usr/local/cuda/include/thrust/system/detail/generic/memory.h:69,
from /usr/local/cuda/include/thrust/detail/reference.h:28,
from ....../stdgpu/src/stdgpu/../stdgpu/iterator.h:30,
from ....../stdgpu/src/stdgpu/impl/iterator.cpp:16:
/usr/local/cuda/include/cub/util_device.cuh: In function ‘cudaError_t cub::PtxVersionUncached(int&)’:
/usr/local/cuda/include/cub/util_device.cuh:368:15: error: invalid conversion from ‘EmptyKernelPtr’ {aka ‘void (*)()’} to ‘const void*’ [-fpermissive]
368 | if (CubDebug(result = cudaFuncGetAttributes(&empty_kernel_attrs,
| ^~~~~~~~
| |
| EmptyKernelPtr {aka void (*)()}
In file included from /usr/local/cuda/include/thrust/system/cuda/detail/guarded_cuda_runtime_api.h:38,
from /usr/local/cuda/include/thrust/system/cuda/detail/malloc_and_free.h:19,
from /usr/local/cuda/include/thrust/system/detail/adl/malloc_and_free.h:42,
from /usr/local/cuda/include/thrust/system/detail/generic/memory.inl:22,
from /usr/local/cuda/include/thrust/system/detail/generic/memory.h:69,
from /usr/local/cuda/include/thrust/detail/reference.h:28,
from ....../stdgpu/src/stdgpu/../stdgpu/iterator.h:30,
from ....../stdgpu/src/stdgpu/impl/iterator.cpp:16:
/usr/local/cuda/include/cuda_runtime_api.h:4337:125: note: initializing argument 2 of ‘cudaError_t cudaFuncGetAttributes(cudaFuncAttributes*, const void*)’
4337 | extern __host__ __cudart_builtin__ cudaError_t CUDARTAPI cudaFuncGetAttributes(struct cudaFuncAttributes *attr, const void *func);
| ~~~~~~~~~~~~^~~~
make[2]: *** [src/stdgpu/CMakeFiles/stdgpu.dir/build.make:93: src/stdgpu/CMakeFiles/stdgpu.dir/impl/iterator.cpp.o] Error 1
make[2]: Leaving directory '....../stdgpu/build'
make[1]: *** [CMakeFiles/Makefile2:318: src/stdgpu/CMakeFiles/stdgpu.dir/all] Error 2
make[1]: Leaving directory '....../stdgpu/build'
make: *** [Makefile:149: all] Error 2
[ 1%] Building CXX object src/stdgpu/CMakeFiles/stdgpu.dir/impl/iterator.cpp.o
cd ....../build/src/stdgpu && /usr/bin/clang++ -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -I....../src/stdgpu/.. -I....../build/src/stdgpu/include -isystem /usr/local/cuda/include -Wall -pedantic -Wextra -Wshadow -Wsign-compare -Wconversion -Wfloat-equal -Wundef -Wdouble-promotion -MD -MT src/stdgpu/CMakeFiles/stdgpu.dir/impl/iterator.cpp.o -MF CMakeFiles/stdgpu.dir/impl/iterator.cpp.o.d -o CMakeFiles/stdgpu.dir/impl/iterator.cpp.o -c ....../src/stdgpu/impl/iterator.cpp
In file included from ....../src/stdgpu/impl/iterator.cpp:16:
In file included from ....../src/stdgpu/../stdgpu/iterator.h:30:
In file included from /usr/local/cuda/include/thrust/detail/reference.h:28:
In file included from /usr/local/cuda/include/thrust/system/detail/generic/memory.h:69:
In file included from /usr/local/cuda/include/thrust/system/detail/generic/memory.inl:22:
In file included from /usr/local/cuda/include/thrust/system/detail/adl/malloc_and_free.h:42:
In file included from /usr/local/cuda/include/thrust/system/cuda/detail/malloc_and_free.h:26:
In file included from /usr/local/cuda/include/thrust/system/cuda/detail/util.h:38:
/usr/local/cuda/include/cub/util_device.cuh:368:33: error: no matching function for call to 'cudaFuncGetAttributes'
if (CubDebug(result = cudaFuncGetAttributes(&empty_kernel_attrs,
^~~~~~~~~~~~~~~~~~~~~
/usr/local/cuda/include/cub/util_debug.cuh:115:64: note: expanded from macro 'CubDebug'
#define CubDebug(e) CUB_NS_QUALIFIER::Debug((cudaError_t) (e), __FILE__, __LINE__)
^
/usr/local/cuda/include/nv/detail/__target_macros:455:78: note: expanded from macro 'NV_IF_TARGET'
# define NV_IF_TARGET(cond, t, ...) _NV_BLOCK_EXPAND(_NV_TARGET_IF(cond, t, __VA_ARGS__))
^
/usr/local/cuda/include/nv/detail/__target_macros:419:74: note: expanded from macro '_NV_TARGET_IF'
# define _NV_TARGET_IF(cond, t, ...) _NV_IF( _NV_ARCH_COND_CAT(cond), t, __VA_ARGS__)
^
note: (skipping 24 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all)
/usr/local/cuda/include/nv/detail/__preprocessor:83:47: note: expanded from macro '_NV_STRIP_PAREN'
#define _NV_STRIP_PAREN(...) _NV_STRIP_PAREN1(__VA_ARGS__)
^~~~~~~~~~~
/usr/local/cuda/include/nv/detail/__preprocessor:82:48: note: expanded from macro '_NV_STRIP_PAREN1'
#define _NV_STRIP_PAREN1(...) _NV_STRIP_PAREN2 __VA_ARGS__
^~~~~~~~~~~
/usr/local/cuda/include/nv/detail/__preprocessor:81:31: note: expanded from macro '_NV_STRIP_PAREN2'
#define _NV_STRIP_PAREN2(...) __VA_ARGS__
^~~~~~~~~~~
/usr/local/cuda/include/cuda_runtime_api.h:4337:58: note: candidate function not viable: no known conversion from 'EmptyKernelPtr' (aka 'void (*)()') to 'const void *' for 2nd argument; take the address of the argument with &
extern __host__ __cudart_builtin__ cudaError_t CUDARTAPI cudaFuncGetAttributes(struct cudaFuncAttributes *attr, const void *func);
^
1 error generated.
make[2]: *** [src/stdgpu/CMakeFiles/stdgpu.dir/build.make:93: src/stdgpu/CMakeFiles/stdgpu.dir/impl/iterator.cpp.o] Error 1
make[2]: Leaving directory '....../build'
make[1]: *** [CMakeFiles/Makefile2:403: src/stdgpu/CMakeFiles/stdgpu.dir/all] Error 2
make[1]: Leaving directory '....../build'
make: *** [Makefile:149: all] Error 2
System (please complete the following information):
Hello, I'm trying to embed stdgpu into my project. I write the cmake file as the tutorial and cmake built successfully.
However, when I tried to build my own project, it failed and raised many errors.
[ 70%] Built target stdgpu
[ 80%] Built target foo
[ 90%] Building CXX object CMakeFiles/parallel_cache.dir/main.cpp.o
In file included from /home/yanglinzhuo/parallel_cache/stdgpu/src/stdgpu/../stdgpu/platform.h:34:0,
from /home/yanglinzhuo/parallel_cache/stdgpu/src/stdgpu/../stdgpu/iterator.h:33,
from /home/yanglinzhuo/parallel_cache/stdgpu_test.cuh:5,
from /home/yanglinzhuo/parallel_cache/main.cpp:3:
/home/yanglinzhuo/parallel_cache/stdgpu/src/stdgpu/../stdgpu/cuda/platform.h:48:37: error: expected unqualified-id before ‘sizeof’
#define STDGPU_CUDA_DEVICE_ONLY sizeof("STDGPU ERROR: Wrong compiler detected! Device-only functions must be compiled with the device compiler!")
^
/home/yanglinzhuo/parallel_cache/stdgpu/src/stdgpu/../stdgpu/cuda/platform.h:48:37: note: in definition of macro ‘STDGPU_CUDA_DEVICE_ONLY’
#define STDGPU_CUDA_DEVICE_ONLY sizeof("STDGPU ERROR: Wrong compiler detected! Device-only functions must be compiled with the device compiler!")
^~~~~~
/home/yanglinzhuo/parallel_cache/stdgpu/src/stdgpu/../stdgpu/platform.h:80:34: note: in expansion of macro ‘STDGPU_DETAIL_CAT2_DIRECT’
#define STDGPU_DETAIL_CAT2(A, B) STDGPU_DETAIL_CAT2_DIRECT(A, B)
^~~~~~~~~~~~~~~~~~~~~~~~~
...
I omit many error lines because they are similar. The main error here is error: expected unqualified-id before ‘sizeof’
.
I'm confused with these errors and have no ideas how to fix them.
Because I'm new to compile with cmake, so I think there maybe some mistakes in my cmake file.
The following is y project's sructure:
And the following is my cmake file:
cmake_minimum_required(VERSION 3.18)
project(parallel_cache)
set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(Torch_DIR /usr/local/libtorch/share/cmake/Torch) # My libtorch path
find_package(Torch REQUIRED)
# Exclude the examples from the build
set(STDGPU_BUILD_EXAMPLES OFF CACHE INTERNAL "")
# Exclude the tests from the build
set(STDGPU_BUILD_TESTS OFF CACHE INTERNAL "")
add_subdirectory(stdgpu)
set_property(TARGET stdgpu PROPERTY CUDA_ARCHITECTURES 60)
add_library(foo stdgpu)
set_target_properties(foo PROPERTIES LINKER_LANGUAGE CXX)
target_link_libraries(foo PUBLIC stdgpu::stdgpu)
add_executable(${PROJECT_NAME} "main.cpp" "stdgpu_test.cuh")
target_link_libraries(parallel_cache PUBLIC "${TORCH_LIBRARIES}")
target_link_libraries(parallel_cache PUBLIC foo)
set_property(TARGET parallel_cache PROPERTY CXX_STANDARD 14)
My system configuration is:
Wish for any help. Thanks.
As the containers should mimic their C++ counterparts as close as possible in terms of functionality, both per-element and iterator-based member functions are considered and provided. While the former allow for easy usage in the native context, that is e.g. in CUDA kernels for the CUDA backend, the latter iterator-based versions can be considered following algorithm
semantics. However, they lack support for execution_policy
s prohibiting greater flexibility such as using asynchronous CUDA streams. The affected functionality is listed below:
createDeviceObject
and destroyDeviceObject
bitset
:set
, reset
, flip
, count
, all
, any
, none
deque
:clear
, device_range
, valid
memory
:createDeviceArray
, destroyDeviceArray
, and for symmetry reasons also the respective host versionsmutex
:valid
queue
:
valid
stack
:
valid
unordered_map
, unordered_set
:device_range
, insert
, erase
, clear
, valid
vector
:insert
, erase
, clear
, valid
Option 1:
Add a respective execution_policy
parameter to all of these functions. This could either follow algorithm
and make this the first parameter such that each functions must be duplicated. Alternatively, it could be passed as the last parameter with a default value, at the cost of an inconsistent interface to algorithm
.
Option 2:
Add a scoped_execution_policy
class which acts as a customizable default policy for all calls within its scope. While this minimizes the required changes for the containers, proper global management may be hard to implement as the class types of the policies could theoretically be arbitrary.
Is your feature request related to a problem? Please describe.
I'd like to use this in a hardware vendor agnostic way, and more specifically I'd like to use with webgpu. Any plans to support in the future? OpenCL subset can compile to spirv, so that could be another option?
Describe the bug
unordered map creation freezes async processes
Steps to reproduce
runBuldKernel << < block_size_x, thread_size_x, 0, build_stream >> > (ng, object_size_ui);
// The line below would only complete when runBuldKernel is done
stdgpu::unordered_map<uint32_t, uint32_t> map = stdgpu::unordered_map<uint32_t, uint32_t>::createDeviceObject(8);
Expected behavior
The map creation and memory allocation should complete right away, without waiting for runBuldKernel to complete
Actual behavior
The map creation and memory allocation completes only after runBuldKernel is done
System (please complete the following information):
I'm working on an OpenMP/HIP code and trying to include stdgpu as a subproject. What I need is to
It seems like stdgpu needs to pass -DCMAKE_CXX_COMPILER=hcc
to cmake to build HIP backend but hcc doesn't support -fopenmp
yet. So the OpenMP libraries are missing when a find_package(OpenMP)
is encountered. Is there any way to workaround this?
I've tried to set the compiler to clang. The compiler complains that -hc
is an unknown argument. It looks like this is an hcc argument required by the rocthrust::rocthrust
target.
clang++ -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_HIP -Dstdgpu_EXPORTS
...
-isystem /opt/rocm-3.3.0/hip/include -isystem /opt/rocm/include -stdlib=libc++ -O3 -DNDEBUG -fPIC -hc -fPIC -std=c++14 -o CMakeFiles/stdgpu.dir/impl/iterator.cpp.o -c /external/stdgpu/src/stdgpu/impl/iterator.cpp
clang-9: error: unknown argument: '-hc'
Hi there,
I have been able to build stdgpu under windows with msvc 2019 and cuda 10.2 with no problems. The example projects work fine. However, taking the installed library and using it in a test setup produces some errors (the same is true for the addsubdirectories route). My CmakeLists.txt is:
cmake_minimum_required(VERSION 3.1)
set (CMAKE_CXX_STANDARD 14)
project(VoxelGrid LANGUAGES CXX CUDA)
file(GLOB srcfiles
${PROJECT_SOURCE_DIR}/src/*.h
${PROJECT_SOURCE_DIR}/src/*.cpp
)
include_directories(${PROJECT_SOURCE_DIR}/src)
set(stdgpu_DIR ${PROJECT_SOURCE_DIR}/3rdParty/stdgpu/lib/cmake/stdgpu)
find_package(stdgpu 1.0.0 REQUIRED)
add_executable(VoxelGridTest exe/main.cpp ${srcfiles})
target_link_libraries(VoxelGridTest PUBLIC stdgpu::stdgpu)
in main.cpp I copied the unordered_map example. The build fails with
FAILED: CMakeFiles/VoxelGridTest.dir/exe/main.cpp.obj
C:\PROGRA~2\MICROS~2\2019\COMMUN~1\VC\Tools\MSVC\1427~1.291\bin\Hostx64\x64\cl.exe /nologo /TP -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -I..\..\src -I..\..\3rdParty\stdgpu\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" /DWIN32 /D_WINDOWS /W3 /GR /EHsc /MD /Zi /O2 /Ob1 /DNDEBUG -std:c++14 /showIncludes /FoCMakeFiles\VoxelGridTest.dir\exe\main.cpp.obj /FdCMakeFiles\VoxelGridTest.dir\ /FS -c ..\..\exe\main.cpp
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(140): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu/atomic.cuh(332): note: see reference to class template instantiation 'stdgpu::atomic<T>' being compiled
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(141): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(150): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(152): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(160): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(162): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(171): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(171): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(172): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(180): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(180): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(181): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(189): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(189): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(190): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(198): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(198): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(199): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(207): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(207): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(208): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(217): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(217): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(218): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(226): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(226): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(227): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(235): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(235): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(236): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(244): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(244): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(245): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(253): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(253): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(254): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(261): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(261): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(262): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(269): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(269): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(270): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(277): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(277): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(278): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(287): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(287): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(288): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(296): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(296): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(297): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(305): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(305): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(306): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(314): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(314): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(315): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(323): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(323): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(324): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(422): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu/atomic.cuh(615): note: see reference to class template instantiation 'stdgpu::atomic_ref<T>' being compiled
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(423): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(432): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(434): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(442): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(444): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(453): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(453): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(454): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(462): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(462): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(463): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(471): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(471): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(472): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(480): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(480): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(481): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(489): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(489): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(490): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(499): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(499): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(500): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(508): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(508): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(509): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(517): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(517): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(518): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(526): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(526): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(527): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(535): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(535): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(536): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(543): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(543): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(544): error C2238: unexpected token(s) preceding ';'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(551): error C2988: unrecognizable template declaration/definition
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(551): error C2059: syntax error: 'sizeof'
C:\dev\VoxelGrid\3rdParty\stdgpu\include\stdgpu\atomic.cuh(551): fatal error C1003: error count exceeds 100; stopping compilation
We had a similar issue under linux. Thanks in advance for the support!
Hello, I'm new to this library. I wonder what should I do if I want to use something like stdgpu::unordered_set<thrust::pair<int, int>>
or stdgpu::unordered_set<myStruct>
? Because when I try to use stdgpu::unordered_set<thrust::pair<int, int>>::createDeviceObject(n)
, the error occurred:
/usr/local/include/c++/9.5.0/type_traits(2378): error: class "std::enable_if<false, thrust::pair<int, int>>" has no member "type"
detected during:
instantiation of type "std::enable_if_t<false, thrust::pair<int, int>>"
The recent improvements regarding the allocation system (see #56, #58, #61) increase the conformance of the container implementations with the C++ standard. A further (small) step is the introduction of an allocator object member in the container classes. However, since this will be a breaking change, it is postponed to stdgpu 2.0.0.
Hi thank you for this great project, it really hits the spot.
Is there any plan to support an option for shared library building on cmaking? The option would be helpful for integrating this repo to other projects (such as integrating to custom pytorch layer).
Don't know if this feature already exists but do unordered_maps support structures other than <int,int> like <int,stdgpu::unordered_set>. Similarly can unordered_set support <pair<int,int>> using a hashing function from boost::hash<pair<int, int>> or vectors using your own hashing function for vectors.
This library is excellent btw! Solves so many issues with support for STL like containers for GPU :)
Describe the bug
Cannot config project because Findthrust.cmake@15 cannot find the thrust properly.
Look at this code:
string(REGEX REPLACE "#define THRUST_VERSION[ \t]+" "" THRUST_VERSION_STRING ${THRUST_VERSION_STRING})
It does not take into account that #define THRUST_VERSION may be followed by comments.
Unfortunately, cuda 12.4's thrust/version.h has a comment after THRUST_VERSION
/*! \def THRUST_VERSION
* \brief The preprocessor macro \p THRUST_VERSION encodes the version
* number of the Thrust library as MMMmmmpp.
*
* \note THRUST_VERSION is formatted as `MMMmmmpp`, which differs from `CCCL_VERSION` that uses `MMMmmmppp`.
*
* <tt>THRUST_VERSION % 100</tt> is the sub-minor version.
* <tt>THRUST_VERSION / 100 % 1000</tt> is the minor version.
* <tt>THRUST_VERSION / 100000</tt> is the major version.
*/
#define THRUST_VERSION 200301 // macro expansion with ## requires this to be a single value
So ${THRUST_VERSION_STRING}
in cmake was parsed to 200301 // macro expansion with ## requires this to be a single value
and led to subsequent errors in judgment.
If you think what I'm saying makes sense, I'll be happy to fix it.
System (please complete the following information):
Hello, I use unordered_set
with the vector
and queue
in my code like:
// do something...
element = queue.pop()
// do something...
vector.push_back(...);
// do something...
auto dup_res = unordered_set.insert(...);
if (dup_res.second)
queue.push(...);
And I found that in my cuda kernel, sometimes the result of insert
operation of unordered_set
will be false because _excess_list_positions
is empty:
stdgpu/src/stdgpu/impl/unordered_base_detail.cuh
Lines 931 to 953 in 00820f9
if (result.second == operation_status::failed_collision)
{
if (full())
std::printf("full \n");
if (_excess_list_positions.empty())
std::printf("list empty\n");
}
if (result.second == operation_status::failed_no_action_required)
{
std::printf("no_action_required");
}
I am confused about this kind of failure, what does this mean and what should I do? Thanks in advance!
The backend system is currently restricted to build and install the library only for a single backend.
Current behavior:
STDGPU_BACKEND
to either STDGPU_BACKEND_CUDA
(default) or STDGPU_BACKEND_OPENMP
to control which backend will be used.stdgpu::stdgpu
for the particular choice of STDGPU_BACKEND
. Other backends will not be considered at all.Proposed behavior:
STDGPU_ENABLE_<BACKEND>
where <BACKEND>
is one of CUDA
, OPENMP
.stdgpu::<BACKEND>
for each enabled backend using the backend-specific settings and dependency checks.stdgpu::stdgpu
as an alias target to stdgpu::<BACKEND>
serving as a default which can be controlled via STDGPU_BACKEND
to match current behavior.This will make the system more flexible and allow users to choose freely between all enabled backends in their projects rather than being globally restricted to a single choice. Note that linking to more than one backend at the same time will be considered undefined behavior/ODR violation.
Furthermore, if only a single backend should be used at all times, this intend can also be expressed more clearly by linking to stdgpu::<BACKEND>
rather than the configuration-dependent stdgpu::stdgpu
target.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.