stdgpu: Efficient STL-like Data Structures on the GPU#
Features#
stdgpu is an open-source library providing generic GPU data structures for fast and reliable data management.
Lightweight C++17 library with minimal dependencies
CUDA, OpenMP, and HIP (experimental) backends
Familiar STL-like GPU containers
High-level, agnostic container functions like
insert(begin, end)
, to write shared C++ codeLow-level, native container functions like
find(key)
, to write custom CUDA kernels, etc.Interoperability with thrust GPU algorithms
Instead of providing yet another ecosystem, stdgpu is designed to be a lightweight container library. Previous libraries such as thrust, VexCL, ArrayFire or Boost.Compute focus on the fast and efficient implementation of various algorithms and only operate on contiguously stored data. stdgpu follows an orthogonal approach and focuses on fast and reliable data management to enable the rapid development of more general and flexible GPU algorithms just like their CPU counterparts.
At its heart, stdgpu offers the following GPU data structures and containers:
Atomic primitive types and references
Space-efficient bit array
Dynamically sized double-ended queue
Container adapters
Hashed collection of unique keys and key-value pairs
Dynamically sized contiguous array
In addition, stdgpu also provides further commonly used helper functionality in algorithm, bit, contract, cstddef, execution, functional, iterator, limits, memory, mutex, numeric, ranges, type_traits, utility.
Examples#
In order to reliably perform complex tasks on the GPU, stdgpu offers flexible interfaces that can be used in both agnostic code, e.g. via the algorithms provided by thrust, as well as in native code, e.g. in custom CUDA kernels.
For instance, stdgpu is extensively used in SLAMCast, a scalable live telepresence system, to implement real-time, large-scale 3D scene reconstruction as well as real-time 3D data streaming between a server and an arbitrary number of remote clients.
Agnostic code. In the context of SLAMCast, a simple task is the integration of a range of updated blocks into the duplicate-free set of queued blocks for data streaming which can be expressed very conveniently:
#include <stdgpu/cstddef.h> // stdgpu::index_t
#include <stdgpu/iterator.h> // stdgpu::make_device
#include <stdgpu/unordered_set.cuh> // stdgpu::unordered_set
class stream_set
{
public:
void
add_blocks(const short3* blocks,
const stdgpu::index_t n)
{
set.insert(stdgpu::make_device(blocks),
stdgpu::make_device(blocks + n));
}
// Further functions
private:
stdgpu::unordered_set<short3> set;
// Further members
};
Native code. More complex operations such as the creation of the duplicate-free set of updated blocks or other algorithms can be implemented natively, e.g. in custom CUDA kernels with stdgpu’s CUDA backend enabled:
#include <stdgpu/cstddef.h> // stdgpu::index_t
#include <stdgpu/unordered_map.cuh> // stdgpu::unordered_map
#include <stdgpu/unordered_set.cuh> // stdgpu::unordered_set
__global__ void
compute_update_set(const short3* blocks,
const stdgpu::index_t n,
const stdgpu::unordered_map<short3, voxel*> tsdf_block_map,
stdgpu::unordered_set<short3> mc_update_set)
{
// Global thread index
stdgpu::index_t i = blockIdx.x * blockDim.x + threadIdx.x;
if (i >= n) return;
short3 b_i = blocks[i];
// Neighboring candidate blocks for the update
short3 mc_blocks[8]
= {
short3(b_i.x - 0, b_i.y - 0, b_i.z - 0),
short3(b_i.x - 1, b_i.y - 0, b_i.z - 0),
short3(b_i.x - 0, b_i.y - 1, b_i.z - 0),
short3(b_i.x - 0, b_i.y - 0, b_i.z - 1),
short3(b_i.x - 1, b_i.y - 1, b_i.z - 0),
short3(b_i.x - 1, b_i.y - 0, b_i.z - 1),
short3(b_i.x - 0, b_i.y - 1, b_i.z - 1),
short3(b_i.x - 1, b_i.y - 1, b_i.z - 1),
};
for (stdgpu::index_t j = 0; j < 8; ++j)
{
// Only consider existing neighbors
if (tsdf_block_map.contains(mc_blocks[j]))
{
mc_update_set.insert(mc_blocks[j]);
}
}
}
More examples can be found in the examples
directory.
Citation#
If you use stdgpu in one of your projects, please cite the following publications:
stdgpu: Efficient STL-like Data Structures on the GPU
@UNPUBLISHED{stotko2019stdgpu,
author = {Stotko, P.},
title = {{stdgpu: Efficient STL-like Data Structures on the GPU}},
year = {2019},
month = aug,
note = {arXiv:1908.05936},
url = {https://arxiv.org/abs/1908.05936}
}
@article{stotko2019slamcast,
author = {Stotko, P. and Krumpen, S. and Hullin, M. B. and Weinmann, M. and Klein, R.},
title = {{SLAMCast: Large-Scale, Real-Time 3D Reconstruction and Streaming for Immersive Multi-Client Live Telepresence}},
journal = {IEEE Transactions on Visualization and Computer Graphics},
volume = {25},
number = {5},
pages = {2102--2112},
year = {2019},
month = may
}