Giter VIP home page Giter VIP logo

samginzburg / vectorvisor Goto Github PK

View Code? Open in Web Editor NEW
137.0 6.0 3.0 220.71 MB

VectorVisor is a vectorizing binary translator for GPUs, designed to make it easy to run many copies of a single-threaded WebAssembly program in parallel using GPUs

License: Apache License 2.0

Rust 39.42% Makefile 0.08% C 0.90% Python 8.91% Go 6.88% Shell 1.64% WebAssembly 41.12% Cuda 0.16% TypeScript 0.77% JavaScript 0.10% HTML 0.01%
compiler gpu gpu-computing gpu-programming opencl parallel-programming research webassembly webassembly-runtime vectorvisor

vectorvisor's Introduction

VectorVisor

GPU programming models offer a level of abstraction that is either too low-level (e.g., OpenCL, CUDA) or too high-level (e.g., TensorFlow, Halide), depending on the language. Not all applications fit into either category, resulting in lost opportunities for GPU acceleration.

VectorVisor is a vectorized binary translator which remedies this issue by taking in existing single-threaded WebAssembly programs and running many copies of them using a GPU. Unlike OpenCL or CUDA programs, we provide support for system calls and a CPU-like flat memory model. While less efficient than manual translation, this approach substantially reduces the barrier to accelerating throughput-oriented workloads using GPUs, ultimately improving the throughput of applications that would otherwise run on CPUs.

For more details check out our USENIX ATC 2023 paper here: https://www.usenix.org/conference/atc23/presentation/ginzburg

Installation, Setup, and Hardware Compatibility

Installation & Setup

VectorVisor requires OpenCL 1.2+ to be installed, in addition to having the proper GPU drivers installed and OpenCL development header files. VectorVisor is built using Rust, and requires a recent version of stable rust to compile as well.

OpenCL and GPU driver setups can be verified by running:

clinfo

VectorVisor can be built using cargo:

cargo build --release

Hardware Compatibility

VectorVisor was built with compatibility in mind, and should theoretically run on any GPU supporting OpenCL 1.2. In practice, VectorVisor has been mostly evaluated using NVIDIA GPUs on Linux. No Windows based setups have been evaluated, but before attempting this, ensure that TDR is either disabled or set to a larger timeout value.

Devices with full functionality should be able to run any of our benchmarks or examples. Partial functionality varies by device, but these devices should be able to run short examples at the minimum. All NVIDIA configurations below are evaluated using Ubuntu 18.04 LTS. The AMD v520 is evaluated using a preconfigured AWS Linux image (see make_image_amd.py in the benchmarks/ directory).

Vendor Evaluated OS GPU Level of Support
NVIDIA Linux GTX 1080 Ti
NVIDIA Linux RTX 2080 Ti
NVIDIA Linux RTX 3080 Ti
NVIDIA Linux T4
NVIDIA Linux A10G
NVIDIA Linux V100
AMD (ROCm/HSA OpenCL) Linux AMD Radeon Pro V520 ⚠️
Intel macOS Iris Pro ⚠️

Intel devices feature limited support, but fail for programs more complex than our smoke tests (compilation failures, possibly due to compiler bugs in the Intel OpenCL C compiler). AMD devices (ROCm/HSA OpenCL) run (but sometimes crash). Generally, NVIDIA GPUs obtain the best performance, although newer Intel/AMD dedicated GPUs have not been tested.

All non-nvidia targets should be run with the following flags:

--nvidia=false

AMD targets need to be run with:

--patch=true

Configuring VectorVisor

VectorVisor has many different configuration options which can be accessed with "--help"

cargo run --release -- --help

We include a benchmark suite (benchmarks/run_benchmarks_aws.py), which provides examples of how to run our sample benchmarks using different configurations. Generally, real applications require a heap size of 3--4 MiB, a stack size of 128 KiB, a hypercall buffer of 128-512 KiB, along with several other flags regarding application partitioning and "pretty" inputs. Different GPU configurations support varying amounts of concurrent VMs, based on the available GPU memory (e.g., 11 GiB, 16 GiB, 24 GiB) and application requirements.

Example Usage

We include both complete end-to-end benchmarks as well as a series of simpe smoke tests to confirm that VectorVisor is working properly.

Simple Examples

We include a series of simple examples in the examples/ directory. The "printreturn" flag is useful for debugging simple programs that return a value.

cd examples/
cargo run --release -- -i arithmetic/factorial.wat --printreturn=true

Running Full Applications

Our end-to-end benchmarks are built with the "wasm-serverless-invoke" library (wasm-serverless-invoke), which provides an interface for VectorVisor to transfer inputs to and from running VMs on the GPU. Examples of programs using this library can be found in the benchmarks/ directory (e.g., benchmarks/scrypt/, benchmarks/average, benchmarks/imageblur-bmp/, ...).

vectorvisor's People

Contributors

engshahrad avatar samginzburg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

vectorvisor's Issues

Why Only Single Threaded Applications?

First off, I just wanted to say this an amazing project and I see a lot of potential for where this could be lead!

But onto the main subject, is there a specific technical limitation which is stopping the translation of multi-threaded applications onto GPUs or is it not supported because WasmTime's WASI doesn't have support for threads yet? If it's the latter, I'd love to work on trying to support WASIX and converting using WasmTime to Wasmer! If it's the former, then that's too bad :/

I read the paper and was just a little confused why the reason applications needed to be single threaded was never addressed, so I thought I'd make an issue in case anyone else was curious in the future.

OpenCL failed to compile/vectorvisor stall

Hello, I am looking into this project to learn what you have done in this work.
I tried to run scrypt benchmark in the repository, by running the following commands(see build.sh in the attachment).

cd benchmark/scrypt
cargo build --release
cargo run --release -- --ip=0.0.0.0 --heap=3145728 --stack=262144 --hcallsize=131072 --partition=false --serverless=true --vmcount=4096 --vmgroups=1 --interleave=1 --pinput=true --fastreply=true --lgroup=64 --nvidia=true --input=benchmarks/scrypt/target/wasm32-wasi/release/scrypt.wasm

The generated opencl code failed to compile, with some messasges like this(full log is in compilation_failed.txt):

<kernel>:157873:37: error: redefinition of '__func_func_getenv'
<kernel>:168648:37: error: redefinition of '__func_func__ZN3std5alloc8rust_oom17hb466c6b0b424784eE'
<kernel>:176935:37: error: redefinition of '__func_func__ZN3std3sys4wasi4once4Once4call17h43619e2953d53b25E'
<kernel>:179098:37: error: redefinition of '__func_func__ZN4core6option13expect_failed17h8f72e66e0b3163c7E'
<kernel>:181066:37: error: redefinition of '__func_func__ZN72_$LT$sha2sha256Sha256$u20$as$u20$digestfixedFixedOutputDirty$GT$19finalize_into_dirty17h563df1210a5950c5E'
<kernel>:181743:37: error: redefinition of '__func_func__ZN4sha26sha2569Engine2566update17hcb501717ee07d7caE'
<kernel>:234127:37: error: redefinition of '__func_func__ZN3std3sys4wasi4once4Once4call17hee18ac680eb799ccE'
<kernel>:238120:37: error: redefinition of '__func_func___main_void'

I think this is because some functions appear in more than one partitions, and the modification as follow(cfg_optimizer.rs in the attachment) seems to solve the problem.

diff --git a/src/opencl_writer/cfg_optimizer.rs b/src/opencl_writer/cfg_optimizer.rs
index 9a5e6a5..2bb119c 100644
--- a/src/opencl_writer/cfg_optimizer.rs
+++ b/src/opencl_writer/cfg_optimizer.rs
@@ -254,6 +254,9 @@ pub fn form_partitions(

         current_partition.insert(String::from(f_name.clone()));

+               let func_copies = include_limit.get(&f_name).cloned().unwrap_or(0);
+               include_limit.insert(f_name.clone(), func_copies + 1);
+
         let (loop_called_fns, called_fns) = get_called_funcs(
             writer_ctx,
             &indirect_call_mapping_formatted,

I'm not sure if this is what the function intend to perform(the function seems to support more than one function copies, how would the other components handle that?).

By the way, does the --partition parameter have something to do with this? I previously run similar commands without disabling partition seems to pass the compilation, but result in some memory access violation.

With the patch, the vm started. Then I execute

go run run_scrypt.go 127.0.0.1 8000 1 1 300 256

to start the test. But nothing happened. The log is in the attachment(run.txt).

The line Set entry point: ["func_strncmp"] seems strange, I think it would be something like __start or main.

With other configurations, I received illegal memory access or unsupported hypercall errors.

The attachment include the scrypt.wasm generated by cargo.

Do you have some advice? Thank you.
Attach_20230925.tar.gz

Issue while cloning the repo

No idea why I am getting this error. Is remote corrupted?

dangu@Desktop MINGW64 /d/dev/3rd
$ git clone https://github.com/SamGinzburg/VectorVisor/
Cloning into 'VectorVisor'...
remote: Enumerating objects: 10710, done.
remote: Counting objects: 100% (119/119), done.
remote: Compressing objects: 100% (93/93), done.
remote: Total 10710 (delta 63), reused 52 (delta 25), pack-reused 10591
Receiving objects: 100% (10710/10710), 220.62 MiB | 34.03 MiB/s, done.
Resolving deltas: 100% (7373/7373), done.
error: invalid path 'benchmarks/2022-12-18-02:01:45/a10g_cuda/gpu_cuda_bench_imageblur_bmp_0.txt'
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

Failed to compile the project

Hello, I am trying to build the project. when I git clone the project and use cargo build --release , the building process with error:

error: could not compile cap-primitives (lib) due to 2 previous errors

and the triggering is

error[E0061]: this method takes 1 argument but 2 arguments were supplied
  --> /home/zyl/.cargo/registry/src/rsproxy.cn-0dccff568467c15b/cap-primitives-0.25.3/src/rustix/linux/fs/set_times_impl.rs:20:25
   |
20 |             return file.set_times(
   |                         ^^^^^^^^^
21 |                 atime.map(SystemTimeSpec::into_std),
22 |                 mtime.map(SystemTimeSpec::into_std),
   |                 ----------------------------------- unexpected argument of type `Option<fs_set_times::SystemTimeSpec>`
   |
note: expected `FileTimes`, found `Option<SystemTimeSpec>`

I have tried cargo update, but still failed, I am wondering if it is my mistake or not, thanks!

CL_DEVICE_NOT_FOUND

Running

cd examples/
RUST_BACKTRACE=full cargo run --release -- -i binops/sub.wat --printreturn=true

gives the following errors

     Running `/home/user/Code/VectorVisor/target/release/vectorvisor -i binops/sub.wat --printreturn=true`
[src/main.rs:364] matches.clone() = ArgMatches {
    args: {
        "hcallsize": MatchedArg {
            occurs: 0,
            indices: [
                30,
            ],
            vals: [
                "16384",
            ],
        },
        "pinput": MatchedArg {
            occurs: 0,
            indices: [
                37,
            ],
            vals: [
                "true",
            ],
        },
        "mexec": MatchedArg {
            occurs: 0,
            indices: [
                36,
            ],
            vals: [
                "1",
            ],
        },
        "patch": MatchedArg {
            occurs: 0,
            indices: [
                12,
            ],
            vals: [
                "false",
            ],
        },
        "serverless": MatchedArg {
            occurs: 0,
            indices: [
                27,
            ],
            vals: [
                "false",
            ],
        },
        "printreturn": MatchedArg {
            occurs: 1,
            indices: [
                4,
            ],
            vals: [
                "true",
            ],
        },
        "vmgroups": MatchedArg {
            occurs: 0,
            indices: [
                8,
            ],
            vals: [
                "1",
            ],
        },
        "globals-buffer-size": MatchedArg {
            occurs: 0,
            indices: [
                20,
            ],
            vals: [
                "",
            ],
        },
        "forceinline": MatchedArg {
            occurs: 0,
            indices: [
                25,
            ],
            vals: [
                "false",
            ],
        },
        "maxdup": MatchedArg {
            occurs: 0,
            indices: [
                33,
            ],
            vals: [
                "1",
            ],
        },
        "ip": MatchedArg {
            occurs: 0,
            indices: [
                28,
            ],
            vals: [
                "127.0.0.1",
            ],
        },
        "disablefastcalls": MatchedArg {
            occurs: 0,
            indices: [
                34,
            ],
            vals: [
                "false",
            ],
        },
        "localworkgroup": MatchedArg {
            occurs: 0,
            indices: [
                35,
            ],
            vals: [
                "999999",
            ],
        },
        "input": MatchedArg {
            occurs: 1,
            indices: [
                2,
            ],
            vals: [
                "binops/sub.wat",
            ],
        },
        "cflags": MatchedArg {
            occurs: 0,
            indices: [
                16,
            ],
            vals: [
                "",
            ],
        },
        "volatile": MatchedArg {
            occurs: 0,
            indices: [
                23,
            ],
            vals: [
                "false",
            ],
        },
        "maxloc": MatchedArg {
            occurs: 0,
            indices: [
                32,
            ],
            vals: [
                "500000",
            ],
        },
        "debugcallprint": MatchedArg {
            occurs: 0,
            indices: [
                15,
            ],
            vals: [
                "false",
            ],
        },
        "isgpu": MatchedArg {
            occurs: 0,
            indices: [
                10,
            ],
            vals: [
                "true",
            ],
        },
        "callstack": MatchedArg {
            occurs: 0,
            indices: [
                14,
            ],
            vals: [
                "1024",
            ],
        },
        "ldflags": MatchedArg {
            occurs: 0,
            indices: [
                17,
            ],
            vals: [
                "",
            ],
        },
        "interleave": MatchedArg {
            occurs: 0,
            indices: [
                9,
            ],
            vals: [
                "1",
            ],
        },
        "jitcache": MatchedArg {
            occurs: 0,
            indices: [
                22,
            ],
            vals: [
                "false",
            ],
        },
        "partition": MatchedArg {
            occurs: 0,
            indices: [
                24,
            ],
            vals: [
                "true",
            ],
        },
        "nvidia": MatchedArg {
            occurs: 0,
            indices: [
                11,
            ],
            vals: [
                "true",
            ],
        },
        "wasmtime": MatchedArg {
            occurs: 0,
            indices: [
                26,
            ],
            vals: [
                "false",
            ],
        },
        "max_smem_demo_space": MatchedArg {
            occurs: 0,
            indices: [
                39,
            ],
            vals: [
                "0",
            ],
        },
        "profile": MatchedArg {
            occurs: 0,
            indices: [
                41,
            ],
            vals: [
                "false",
            ],
        },
        "fastreply": MatchedArg {
            occurs: 0,
            indices: [
                38,
            ],
            vals: [
                "false",
            ],
        },
        "numfuncs": MatchedArg {
            occurs: 0,
            indices: [
                19,
            ],
            vals: [
                "",
            ],
        },
        "heap": MatchedArg {
            occurs: 0,
            indices: [
                6,
            ],
            vals: [
                "1048576",
            ],
        },
        "stack": MatchedArg {
            occurs: 0,
            indices: [
                5,
            ],
            vals: [
                "1048576",
            ],
        },
        "vmcount": MatchedArg {
            occurs: 0,
            indices: [
                7,
            ],
            vals: [
                "64",
            ],
        },
        "unsafewrite": MatchedArg {
            occurs: 0,
            indices: [
                13,
            ],
            vals: [
                "false",
            ],
        },
        "reqtimeout": MatchedArg {
            occurs: 0,
            indices: [
                40,
            ],
            vals: [
                "2000",
            ],
        },
        "entry": MatchedArg {
            occurs: 0,
            indices: [
                18,
            ],
            vals: [
                "",
            ],
        },
        "compile": MatchedArg {
            occurs: 0,
            indices: [
                21,
            ],
            vals: [
                "false",
            ],
        },
        "port": MatchedArg {
            occurs: 0,
            indices: [
                29,
            ],
            vals: [
                "8000",
            ],
        },
        "partitions": MatchedArg {
            occurs: 0,
            indices: [
                31,
            ],
            vals: [
                "100",
            ],
        },
    },
    subcommand: None,
    usage: Some(
        "USAGE:\n    vectorvisor [OPTIONS] --input <>",
    ),
}
[src/main.rs:420] compile_args.clone() = ""
[src/opencl_writer.rs:2499] program_start_mem_pages = 1
[src/opencl_writer.rs:2500] program_start_max_pages = 16
Func func__start cannot be optimized
[src/opencl_writer.rs:3214] &fast_function_set.len() = 1
Compiled: 2 functions
Entry point: 1
Globals buffer: 0
interleave: 1
thread 'main' panicked at src/main.rs:842:85:
called `Result::unwrap()` on an `Err` value: 

################################ OPENCL ERROR ############################### 

Error executing function: clGetDeviceIDs  

Status error code: CL_DEVICE_NOT_FOUND (-1)  

Please visit the following url for more information: 

https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clGetDeviceIDs.html#errors  

############################################################################# 

stack backtrace:
   0:     0x60ab8c096b3c - std::backtrace_rs::backtrace::libunwind::trace::h67a838aed1f4d6ec
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x60ab8c096b3c - std::backtrace_rs::backtrace::trace_unsynchronized::h1d1786bb1962baf8
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x60ab8c096b3c - std::sys_common::backtrace::_print_fmt::h5a0b1f807a002d23
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x60ab8c096b3c - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hf84ab6ad0b91784c
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x60ab8c0c1bdc - core::fmt::rt::Argument::fmt::h28f463bd1fdabed5
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/fmt/rt.rs:138:9
   5:     0x60ab8c0c1bdc - core::fmt::write::ha37c23b175e921b3
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/fmt/mod.rs:1114:21
   6:     0x60ab8c09385e - std::io::Write::write_fmt::haa1b000741bcbbe1
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/io/mod.rs:1763:15
   7:     0x60ab8c096924 - std::sys_common::backtrace::_print::h1ff1030b04dfb157
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/sys_common/backtrace.rs:47:5
   8:     0x60ab8c096924 - std::sys_common::backtrace::print::hb982056c6f29541c
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/sys_common/backtrace.rs:34:9
   9:     0x60ab8c097f93 - std::panicking::default_hook::{{closure}}::h11f92f82c62fbd68
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:272:22
  10:     0x60ab8c097cb4 - std::panicking::default_hook::hb8810fe276772c66
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:292:9
  11:     0x60ab8c098515 - std::panicking::rust_panic_with_hook::hd2f0efd2fec86cb0
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:731:13
  12:     0x60ab8c098411 - std::panicking::begin_panic_handler::{{closure}}::h3651b7fc4f61d784
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:609:13
  13:     0x60ab8c097066 - std::sys_common::backtrace::__rust_end_short_backtrace::hbc468e4b98c7ae04
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/sys_common/backtrace.rs:170:18
  14:     0x60ab8c098162 - rust_begin_unwind
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:597:5
  15:     0x60ab8b6f6505 - core::panicking::panic_fmt::h979245e2fdb2fabd
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/panicking.rs:72:14
  16:     0x60ab8b6f69e3 - core::result::unwrap_failed::h8c4b86241881fbbb
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/result.rs:1652:5
  17:     0x60ab8b8186fb - vectorvisor::main::h75b24b4e6aaf8cfd
  18:     0x60ab8b7e4813 - std::sys_common::backtrace::__rust_begin_short_backtrace::h4c45c397cd46ab21
  19:     0x60ab8b8d7c39 - std::rt::lang_start::{{closure}}::ha2fe3d93a5592b89
  20:     0x60ab8c08e14b - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::hf9057cfaeeb252e2
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/ops/function.rs:284:13
  21:     0x60ab8c08e14b - std::panicking::try::do_call::h629e203a624883e4
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:504:40
  22:     0x60ab8c08e14b - std::panicking::try::h7b61614724d6a4f1
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:468:19
  23:     0x60ab8c08e14b - std::panic::catch_unwind::h354ac1c0268491d8
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panic.rs:142:14
  24:     0x60ab8c08e14b - std::rt::lang_start_internal::{{closure}}::h919fee3c5ba8f617
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/rt.rs:148:48
  25:     0x60ab8c08e14b - std::panicking::try::do_call::h54583f67455bff32
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:504:40
  26:     0x60ab8c08e14b - std::panicking::try::hb0e12c4e01d39dc2
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:468:19
  27:     0x60ab8c08e14b - std::panic::catch_unwind::h367b6339e3ca9a3b
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panic.rs:142:14
  28:     0x60ab8c08e14b - std::rt::lang_start_internal::ha5ce8533eaa0fda8
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/rt.rs:148:20
  29:     0x60ab8b81baf5 - main
  30:     0x7bea23829d90 - __libc_start_call_main
                               at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
  31:     0x7bea23829e40 - __libc_start_main_impl
                               at ./csu/../csu/libc-start.c:392:3
  32:     0x60ab8b6f6ca5 - _start
  33:                0x0 - <unknown>

Current system information

Number of platforms:                             2
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 2.0 pocl 1.8  Linux, None+Asserts, RELOC, LLVM 11.1.0, SLEEF, DISTRO, POCL_DEBUG
  Platform Name:                                 Portable Computing Language
  Platform Vendor:                               The pocl project
  Platform Extensions:                           cl_khr_icd cl_pocl_content_size
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 3.0 CUDA 12.3.68
  Platform Name:                                 NVIDIA CUDA
  Platform Vendor:                               NVIDIA Corporation
  Platform Extensions:                           cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_opaque_fd cl_khr_external_memory_opaque_fd


  Platform Name:                                 Portable Computing Language
Number of devices:                               1
  Device Type:                                   CL_DEVICE_TYPE_CPU
  Vendor ID:                                     8086h
  Max compute units:                             16
  Max work items dimensions:                     3
    Max work items[0]:                           4096
    Max work items[1]:                           4096
    Max work items[2]:                           4096
  Max work group size:                           4096
  Preferred vector width char:                   16
  Preferred vector width short:                  16
  Preferred vector width int:                    8
  Preferred vector width long:                   4
  Preferred vector width float:                  8
  Preferred vector width double:                 4
  Native vector width char:                      16
  Native vector width short:                     16
  Native vector width int:                       8
  Native vector width long:                      4
  Native vector width float:                     8
  Native vector width double:                    4
  Max clock frequency:                           4800Mhz
  Address bits:                                  64
  Max memory allocation:                         8589934592
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          128
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    16
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16777216
  Global memory size:                            22889979904
  Constant buffer size:                          262144
  Max number of constant args:                   8
  Local memory type:                             Global
  Local memory size:                             262144
  Max pipe arguments:                            16
  Max pipe active reservations:                  1
  Max pipe packet size:                          1024
  Max global variable size:                      0
  Max global variable preferred total size:      0
  Max read/write image args:                     128
  Max on device events:                          1024
  Queue on device max size:                      262144
  Max on device queues:                          1
  Queue on device preferred size:                16384
  SVM capabilities:                              
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           Yes
    Fine grain system:                           No
    Atomics:                                     Yes
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     8
  Error correction support:                      0
  Unified memory for Host and Device:            1
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:                                
    Execute OpenCL kernels:                      Yes
    Execute native function:                     Yes
  Queue on Host properties:                              
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Queue on Device properties:                            
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x70150b916008
  Name:                                          pthread-Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
  Vendor:                                        GenuineIntel
  Device OpenCL C version:                       OpenCL C 1.2 pocl
  Driver version:                                1.8
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 1.2 pocl HSTR: pthread-x86_64-pc-linux-gnu-skylake
  Extensions:                                    cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64


  Platform Name:                                 NVIDIA CUDA
Number of devices:                               1
  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     10deh
  Max compute units:                             10
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           64
  Max work group size:                           1024
  Preferred vector width char:                   1
  Preferred vector width short:                  1
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      1
  Native vector width short:                     1
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1809Mhz
  Address bits:                                  64
  Max memory allocation:                         1586053120
  Image support:                                 Yes
  Max number of images read arguments:           256
  Max number of images write arguments:          16
  Max image 2D width:                            16384
  Max image 2D height:                           32768
  Max image 3D width:                            16384
  Max image 3D height:                           16384
  Max image 3D depth:                            16384
  Max samplers within kernel:                    32
  Max size of kernel argument:                   4352
  Alignment (bits) of base address:              4096
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               128
  Cache size:                                    491520
  Global memory size:                            6344212480
  Constant buffer size:                          65536
  Max number of constant args:                   9
  Local memory type:                             Scratchpad
  Local memory size:                             49152
  Max pipe arguments:                            0
  Max pipe active reservations:                  0
  Max pipe packet size:                          0
  Max global variable size:                      0
  Max global variable preferred total size:      0
  Max read/write image args:                     0
  Max on device events:                          0
  Queue on device max size:                      0
  Max on device queues:                          0
  Queue on device preferred size:                0
  SVM capabilities:                              
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           No
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     32
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1000
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:                                
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:                              
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Queue on Device properties:                            
    Out-of-Order:                                No
    Profiling :                                  No
  Platform ID:                                   0x59b5a6ffd670
  Name:                                          NVIDIA GeForce GTX 1060 6GB
  Vendor:                                        NVIDIA Corporation
  Device OpenCL C version:                       OpenCL C 1.2 
  Driver version:                                545.23.08
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 3.0 CUDA
  Extensions:                                    cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_opaque_fd cl_khr_external_memory_opaque_fd

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.