Giter VIP home page Giter VIP logo

bolt's Introduction

Bolt is a C++ template library optimized for heterogeneous computing. Bolt is designed to provide high-performance library implementations for common algorithms such as scan, reduce, transform, and sort. The Bolt interface was modeled on the C++ Standard Template Library (STL). Developers familiar with the STL will recognize many of the Bolt APIs and customization techniques.

The primary goal of Bolt is to make it easier for developers to utilize the inherent performance and power efficiency benefits of heterogeneous computing. It has interfaces that are easy to use, and has comprehensive documentation for the library routines, memory management, control interfaces, and host/device code sharing.

Compared to writing the equivalent functionality in OpenCL™, you’ll find that Bolt requires significantly fewer lines-of-code and less developer effort. Bolt is designed to provide a standard way to develop an application that can execute on either a regular CPU, or use any available OpenCL™ capable accelerated compute unit, with a single code path.

Here's a link to our BOLT wiki page.

Prerequisites

Windows

  1. Visual Studio 2010 onwards (VS2012 for C++ AMP)
  2. Tested with 32/64 bit Windows® 7/8 and Windows® Blue
  3. CMake 2.8.10
  4. TBB (For Multicore CPU path only) (4.1 Update 1 or Above) . See Building Bolt with TBB.
  5. APP SDK 2.8 or onwards.

Note: If the user has installed both Visual Studio 2012 and Visual Studio 2010, the latter should be updated to SP1.

Linux

  1. GCC 4.6.3 and above
  2. Tested with OpenSuse 12.3, RHEL 6.4 64bit, RHEL 6.3 32bit, Ubuntu 13.4
  3. CMake 2.8.10
  4. TBB (For Multicore CPU path only) (4.1 Update 1 or Above) . See Building Bolt with TBB.
  5. APP SDK 2.8 or onwards.

Note: Bolt pre-built binaries for Linux are build with GCC 4.7.3, same version should be used for Application building else user has to build Bolt from source with GCC 4.6.3 or higher.

Catalyst™ package

The latest Catalyst driver contains the most recent OpenCL runtime. Recommended Catalyst package is latest 13.11 Beta Driver.

13.4 and higher is supported.

Note: 13.9 in not supported.

Supported Devices

AMD APU Family with AMD Radeon™ HD Graphics

  • A-Series
  • C-Series
  • E-Series
  • E2-Series
  • G-Series
  • R-Series

AMD Radeon™ HD Graphics

  • 7900 Series (7990, 7970, 7950)
  • 7800 Series (7870, 7850)
  • 7700 Series (7770, 7750)

AMD Radeon™ HD Graphics

  • 6900 Series (6990, 6970, 6950)
  • 6800 Series (6870, 6850)
  • 6700 Series (6790 , 6770, 6750)
  • 6600 Series (6670)
  • 6500 Series (6570)
  • 6400 Series (6450)
  • 6xxxM Series

AMD Radeon™ Rx 2xx Graphics

  • R9 2xx Series
  • R8 2xx Series
  • R7 2xx Series

AMD FirePro™ Professional Graphics

  • W9100

Compiled binary windows packages (zip packages) for Bolt may be downloaded from the Bolt landing page hosted on AMD's Developer Central website.

Examples

The simple example below shows how to use Bolt to sort a random array of 8192 integers.

#include <bolt/cl/sort.h>
#include <vector>
#include <algorithm>

int main ()
{
    // generate random data (on host)
    size_t length = 8192
    std::vector<int> a (length);
    std::generate ( a.begin (), a.end(), rand );

    // sort, run on best device in the platform
    bolt::cl::sort(a.begin(), a.end());
    return 0;
}

The code will be familiar to programmers who have used the C++ Standard Template Library; the difference is the include file (bolt/cl/sort.h) and the bolt::cl namespace before the sort call. Bolt developers do not need to learn a new device-specific programming model to leverage the power and performance advantages of heterogeneous computing.

#include <bolt/cl/device_vector.h>
#include <bolt/cl/scan.h>
#include <vector>
#include <numeric>

int main()
{
  size_t length = 1024;
  // Create device_vector and initialize it to 1
  bolt::cl::device_vector< int > boltInput( length, 1 );

  // Calculate the inclusive_scan of the device_vector
  bolt::cl::inclusive_scan(boltInput.begin(),boltInput.end(),boltInput.begin( ) );

  // Create an std vector and initialize it to 1
  std::vector<int> stdInput( length, 1 );
 
  // Calculate the inclusive_scan of the std vector
  bolt::cl::inclusive_scan(stdInput.begin( ),stdInput.end( ),stdInput.begin( ) );
  return 0;
}

This example shows how Bolt simplifies management of heterogeneous memory. The creation and destruction of device resident memory is abstracted inside of the bolt::cl::device_vector <> class, which provides an interface familiar to nearly all C++ programmers. All of Bolt’s provided algorithms can take either the normal std::vector or the bolt::cl::device_vector<> class, which allows the user to control when and where memory is transferred between host and device to optimize performance.

Copyright and Licensing information

© 2012,2014 Advanced Micro Devices, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

bolt's People

Contributors

avinashcpandey avatar bensander avatar guacamoleo avatar hsa-libraries avatar jayavanth avatar mattpd avatar ravibanger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bolt's Issues

ConstantIteratorTest

The following code from ConstantIteratorTest should use bolt::cl::device_vector, but uses std::vector insted

TYPED_TEST_P( CountingIterator, DeviceTransformVector )

// initialize the data vector to be sequential numbers
std::vector< TypeParam > devVec( 3 );
bolt::cl::transform( devVec.begin( ), devVec.end( ), bolt::cl::make_counting_iterator( 42 ), devVec.begin( ),
bolt::cl::plus< TypeParam >( ) );
EXPECT_EQ( 42, devVec[ 0 ] );
EXPECT_EQ( 43, devVec[ 1 ] );
EXPECT_EQ( 44, devVec[ 2 ] );
}

bolt 1.2, typo in transform_reduce.inl

Bolt 1.2, file include/bolt/cl/detail/transform_reduce.inl, lines 446-447:

dblog->CodePathTaken(BOLTLOG::BOLT_TRANSFORMREDUCE,BOLTLOG::BOLT_MULTICORE_CPU,"

::Transform_Reduce::MULTICORE_CPU");

Clearly the string markers (") are located in two lines, which makes the compiler issue unnecessary warnings

Z Koza

Bolt1.2: bolt::cl::min_element and bolt::cl::max_element having issues while using device_vector with iterator.

This issue can be observed only with the bolt i,e, when calling bolt::min_element and bolt::max_element on device_vector. For std::min_element and std::max_element it is working fine.

CODE:

//code for BOLT_MIN_ELEMENT:

TEST(sanity_min_element_2bolt_cl_device_vect_loop, ints_loop){
int size = 10;
bolt::cl::device_vector intStdVect (size);
bolt::cl::device_vector intBoltVect (size);

for (int i = 0 ; i < size; i++){
    intBoltVect[i] = (int)std::rand()%65535 ;
     intStdVect[i] = intBoltVect[i];
}

bolt::cl::device_vector<int>::iterator std_min_ele; 
for (int i = 0 ; i < 1000; i++){
            std_min_ele = std::min_element (intStdVect.begin(), intStdVect.end());
}

bolt::cl::device_vector<int>::iterator bolt_min_ele;  
for (int i = 0 ; i < 1000; i++){
    bolt_min_ele =  bolt::cl::min_element(intBoltVect.begin(), intBoltVect.end());
}
EXPECT_EQ(*std_min_ele, *bolt_min_ele)<<std::endl;

}

// code for BOLT_MAX_ELEMENT:

TEST(sanity_max_element_2bolt_cl_device_vect_loop, ints_loop){
int size = 10;
bolt::cl::device_vector intStdVect (size);
bolt::cl::device_vector intBoltVect (size);

for (int i = 0 ; i < size; i++){
    intBoltVect[i] = (int)std::rand()%65535 ;
     intStdVect[i] = intBoltVect[i];
}

bolt::cl::device_vector<int>::iterator std_min_ele; 
for (int i = 0 ; i < 1000; i++){
            std_max_ele = std::max_element (intStdVect.begin(), intStdVect.end());
}

bolt::cl::device_vector<int>::iterator bolt_min_ele;  
for (int i = 0 ; i < 1000; i++){
    bolt_max_ele =  bolt::cl::max_element(intBoltVect.begin(), intBoltVect.end());
}
EXPECT_EQ(*std_max_ele, *bolt_max_ele)<<std::endl;

}

please remove boost

boost dependency is not needed and creates some compiler issues with different versions of boost out there.

You should use C++11 directly. And please update CMakefiles to support C++11 flags for gcc.

Missing Linux installation instructions

After downloading the binary tarball for Linux and unpacking it, I find a directory with some stuff in it, but no instructions on installation. For example, should one copy include, lib and lib64 to /usr/local? Or is it intended that Bolt applications should set up -I and -L compiler flags to wherever one unpacked Bolt? Is any special care needed if one already has Boost (and no doubt a different version of Boost) installed to avoid version conflicts?

throw opencl kernel compile issue when run test opencl case for example clBolt.Test.StableSort

Hi ,
clone the bolt codes, compile on rocm1,9 and opencl-runtime, build the project with cmake commands as "cmake -DBOOST_LIBRARYDIR=/home/qcxie/software/boost_1_65_1/stage/lib -DBOOST_ROOT=/home/qcxie/software/boost_1_65_1 -DGTEST_ROOT=/home/qcxie/software/boost_1_65_1 -DCMAKE_BUILD_TYPE=Debug -DBolt_BUILD64=1 -DCMAKE_CXX_FLAGS="-std =c++14 -fpermissive -I /opt/rocm/opencl/include -L/opt/rocm/opencl/lib/x86_64 -lOpenCL" ../" successfully,
but, it throws cl kernels error in running test case, for example clBolt.Test.StableSort
" error: unknown type name 'namespace' namespace bolt { namespace cl { "
how i should do to configure or set buildprogram optimons to fix this issue? thanks very much.

Missing includes of <boost/thread/lock_guard.hpp>

When I try to build Bolt from source, I get numerous errors about lock_guard because <boost/thread/lock_guard.hpp> is not included anywhere. This could be because I'm using my system's version of Boost (1.53) rather than using the superbuild.

Bolt1.2: std::partial sum is having compilation issues when we used with UDD by calling transform_iterator.

CODE:

int get_global_id(int i);

int global_id;

BOLT_FUNCTOR(UDD_trans,
struct UDD_trans
{
int i ;
float f ;

UDD_trans ()  
    {
    }; 

UDD_trans (int val1)  
    {
        i =  val1 ;

    }; 
bool operator == (const UDD_trans& other) const {
    return ((i+f) == (other.i+other.f));
}

UDD_trans operator() ()  const 
    { 
        UDD_trans temp ;
        temp.i =  get_global_id(0);
        return temp ;
        //return get_global_id(0); 
    }

};
);

int get_global_id(int i)
{
return global_id++;
}

BOLT_FUNCTOR(add_UDD,
struct add_UDD
{
int operator() (const UDD_trans x) const { return x.i + 3; }
typedef int result_type;
};
);

BOLT_TEMPLATE_REGISTER_NEW_ITERATOR( bolt::cl::device_vector, int, UDD_trans);
BOLT_TEMPLATE_REGISTER_NEW_TRANSFORM_ITERATOR( bolt::cl::transform_iterator, add_UDD, UDD_trans);

int main()
{

int length =  5;

std::vector< UDD_trans > svInVec1( length ); 

std::vector< int > stlOut(length);

add_UDD add1;
UDD_trans gen_udd(0) ;


// ADD
bolt::cl::transform_iterator< add_UDD, std::vector< UDD_trans >::const_iterator>        sv_trf_begin1 (svInVec1.begin(), add1) ;
bolt::cl::transform_iterator< add_UDD, std::vector< UDD_trans >::const_iterator>            sv_trf_end1   (svInVec1.end(),   add1) ;

    global_id = 0;
std::generate(svInVec1.begin(), svInVec1.end(), gen_udd);

        bolt::cl::plus<int> pls;

        bolt::cl::control ctrl = bolt::cl::control::getDefault();

    global_id = 0;

 //STD_PARTIAL_SUM
std::partial_sum(sv_trf_begin1, sv_trf_end1, stlOut.begin(), pls);

for (int i=0; i<length;  i++ )
{
    std::cout << "Val = "  << stlOut[i] <<  "\n" ;

}

return 0;
}

Style checker to use for Bolt

Does Bolt have a recommended style checker to use? I noticed that even though Bolt has coding style guideline, not all of it is used/enforced. In particular, use of tab character and indentation seem wrong and seems to be different depending who checks in the code.

The guideline states "Use only spaces, and indent 2 spaces at a time", it seems most of the code is indented with 4 spaces and uses tab characters as well as spaces.

image

CountingIterator with bolt::cl::copy

Hello,

I'm testing new functionality of Bolt 1.3 library on GPU.

I've followed example from follwong page: http://developer.amd.com/community/blog/2013/04/26/details-of-the-bolt-beta/

I found iterator functionality very attractive but the example from the mentioned webpage seems not to work correctly, at least under linux OS:

#include "bolt/cl/device_vector.h"
#include "bolt/cl/iterator/counting_iterator.h"
#include "bolt/cl/copy.h"

int main( int argc, char* argv[] )
{
    bolt::cl::device_vector< int > devV( 100 );
    bolt::cl::copy( bolt::cl::make_counting_iterator< int >( 10 ),
                    bolt::cl::make_counting_iterator< int >( 10 + devV.size( ) ),
                    devV.begin( ) );
}

The input vector is not changed at all. I've executed it using bolt::cl::transform instead of bolt::cl::copy and it is working correctly.

scatter_if in bolt::amp not possible?

Hi,
I'm studying Bolt and wanted to implement an example program that needs scatter_if operation (http://thrust.github.io/doc/group__scattering.html#ga1079bc05bcb3d4b5080f1e07444fee37). I started to port thrust scatter_if code, which uses permutation_iterator but came across this (https://groups.google.com/forum/#!topic/thrust-users/Xe2JkFy_hUk). The Google Group post claims that permutation_iterator in AMP kernel is not possible because of the restriction AMP put on the use of pointer in kernel (ie. restrict(amp)). Is this true? If so, is it possible to implement permutation_iterator in bolt::cl?

BTW, Bolt forum in AMD Dev Central does not seem to work correctly. It is set to private and my post there does not seem to go through. :(

Problems coaxing CMake to not include 32-bit compiler flags.

Just as the title says, I'm having trouble convincing CMake to let go of the -m32 compiler flag. Everything is working except for that one crucial hangup.

I'm using CMake-3.0.

I've tried to manually edit the CMAKE_CXX_COMPILER and CMAKE_EXE_LINKER variables from the command line with little success.

Any assistance would be most well received.

Cross platform issues when importing into OpenCV

In developing OpenCV(Open-source Computer Vision library)'s OpenCL module I intended to import Bolt library and use its sorting and scan APIs, but by browsing the source code I noticed that AMD-only Static OpenCL C++ templates is heavily used in OpenCL kernels.

For OpenCV, we must ensure that it can run OpenCL on most platforms, not only AMD's. Can I ask if you have any plans to make Bolt available on non-amdappsdk platforms, such as nvidia's and intel's OpenCL SDK?


To fix this issue, I have adapted some of the codes from Bolt and use macros to simulate templates. What I added in this branch is sort_by_key using OpenCL. By the way, I included radix sort with float type types, which is not supported yet for Bolt.

Please see the following links:
Host code
Radix sort kernel file

Thanks!

Bolt_1.3:Windows:- Scan family test cases fails for OpenCL CPU path.

Observed test case failures for bolt cl scan families like exclusive_scan, inclusive_scan, transform_exclusive_scan, transform_inclusive_scan only for the OpenCL CPU path. For other paths like Gpu, Automatic, MultiCoreCpu, and Serial Cpu same test cases are passing.

EXAMPLE:
TEST (sanity_exclusive_scan_stdVectVsDeviceVectWithIters, floatSameValuesSerialRange){

int size = 1000;
TAKE_THIS_CONTROL_PATH
bolt::cl::device_vector< float > boltInput( size, 1.125f );
bolt::cl::device_vector< float >::iterator boltEnd = bolt::cl::exclusive_scan( my_ctl, boltInput.begin( ), boltInput.end( ), boltInput.begin( ), 2.0f);

std::vector< float > stdInput( size, 1.125f);
std::vector< float >::iterator stdEnd  =    bolt::cl::exclusive_scan( stdInput.begin( ), stdInput.end( ), stdInput.begin( ), 2.0f );

EXPECT_FLOAT_EQ((*(boltEnd-1)), (*(stdEnd-1)))<<std::endl;

}

Don't default to 32 bit builds.

Linux systems do not have 32 bit headers and libraries installed by default. Debugging the errors that arise because of this cause a lot of overhead on the users end.

unable to input bolt::cl::transform_iterator into bolt::cl::copy

#include <iostream>
#include <vector>
#include <bolt/cl/iterator/counting_iterator.h>
#include <bolt/cl/iterator/transform_iterator.h>
#include <bolt/cl/functional.h>
#include <bolt/cl/device_vector.h>
#include <bolt/cl/copy.h>

BOLT_FUNCTOR(GetSquare,
struct GetSquare
{
public:
    int operator()(const int& globalId) const
    {
        return globalId*globalId;
    }
};);

int main()
{
    const std::size_t n=10;

    bolt::cl::control ctrl = bolt::cl::control::getDefault();

    bolt::cl::device_vector<int> debug(n);

    auto globalId = bolt::cl::make_counting_iterator(0);

    // This is OK
    // bolt::cl::transform(globalId, globalId + n, debug.begin(), GetSquare());

    // This causes compilation error
    auto square = bolt::cl::make_transform_iterator(globalId, GetSquare());
    bolt::cl::copy(square, square + n, debug.begin());

    for(int i = 0; i < n; i++)
    {
        std::cout << i << ": " << debug[i] << std::endl;
    }

    return 0;
}

This problem seems to be because bolt::cl::transform_iterator have the method getContainer() only with template type

template<typename Container >
Container& getContainer() const
{
    return this->base().getContainer( );
}

but bolt::cl::copy needs ITERATOR::getContainer() without any template type

V_OPENCL( kernels[whichKernel].setArg( 0, first.getContainer().getBuffer()), "Error setArg kernels[ 0 ]" );

This can be solved with C++11

auto getContainer() const -> decltype(base().getContainer())
{
    return this->base().getContainer();
}

I don't have any idea in C++03 (boost::result_of?).

Differences in develop and master branch.

I noticed that there are some differences between develop branch and master/v1.0 branch. Is that intentional? Most of the differences are minor documentation differences that probably won't affect much. But it seems like something was missed when syncing v1.0/master/develop branches in preparation of v1.0 release.

I'm just getting started with Git way of doing thing. I've only been using SVN. So, it's possible that I'm missing something...

Also, to submit an entry for Bolt Sample Code Contest, I should open a pull request to develop branch, right? Or does it not matter?

Java-bindings for Bolt

Enhancement:
Some JNA or JNI binding for a Bolt wrapper, to get Bolt-like API with GPU and performance access from Java.

Since Aparapi seems to be on hold, experimental and solving other issues that bolt does. And Sumatra may not even be available with Java9 in 2016, it would be very nice to get some Java access to GPU reduce, sort, and similar stuff

Open source OpenCL Static C++ Kernel Language Extension

Hi,
sorry as for sure that's not the best place to post but I remember seeing at APU13 a presentation saying AMD would open source his "OpenCL Static C++ Kernel Language Extension" so all OpenCL implementations could take advantage of it.. I think it was in Bolt presentation so asking here as I also remember it's was said either Q1 2014 or H1 2014 so must be coming soon right?
Also Bolt takes advantage of it right now in OCL path right?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.