Giter VIP home page Giter VIP logo

genomeworks's Introduction

GenomeWorks

Overview

GenomeWorks is a GPU-accelerated library for biological sequence analysis. This section provides a brief overview of the different components of GenomeWorks. For more detailed API documentation please refer to the documentation.

Clone GenomeWorks

Latest released version

This will clone the repo to the master branch, which contains code for latest released version and hot-fixes.

git clone --recursive -b master https://github.com/clara-parabricks/GenomeWorks.git

Latest development version

This will clone the repo to the default branch, which is set to be the latest development branch. This branch is subject to change frequently as features and bug fixes are pushed.

git clone --recursive https://github.com/clara-parabricks/GenomeWorks.git

System Requirements

Minimum requirements -

  1. Ubuntu 16.04 or Ubuntu 18.04
  2. CUDA 10.0+ (official instructions for installing CUDA are available here)
  3. GPU generation Pascal and later (compute capability >= 6.0)
  4. gcc/g++ 5.4.0+ / 7.x.x
  5. Python 3.6.7+
  6. CMake (>= 3.10.2)
  7. autoconf (required to output SAM/BAM files)
  8. automake (required to output SAM/BAM files)

GenomeWorks Setup

Build and Install

To build and install GenomeWorks -

mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=install -Dgw_cuda_gen_all_arch=OFF
make -j install

NOTE : The gw_cuda_gen_all_arch=OFF option pre-generates optimized code only for the GPU(s) on your system. For building a binary that pre-generates opimized code for all common GPU architectures, please remove the option or set it to ON.

NOTE : (OPTIONAL) To enable outputting overlaps in SAM/BAM format, pass the gw_build_htslib=ON option.

Package generation

Package generation puts the libraries, headers and binaries built by the make command above into a .deb/.rpm for portability and easy installation. The package generation itself doesn't guarantee any cross-platform compatibility.

It is recommended that a separate build and packaging be performed for each distribution and CUDA version that needs to be supported.

The type of package (deb vs rpm) is determined automatically based on the platform the code is being run on. To generate a package for the SDK -

make package

genomeworks Python API

The python API for the GenomeWorks SDK is available through the genomeworks python package. More details on how to use and develop genomeworks can be found in the README under pygenomeworks folder.

Development Support

Enable Unit Tests

To enable unit tests, add -Dgw_enable_tests=ON to the cmake command in the build step.

This builds GTest based unit tests for all applicable modules, and installs them under ${CMAKE_INSTALL_PREFIX}/tests. These tests are standalone binaries and can be executed directly. e.g.

cd $INSTALL_DIR
./tests/cudapoatests

Enable Benchmarks

To enable benchmarks, add -Dgw_enable_benchmarks=ON to the cmake command in the build step.

This builds Google Benchmark based microbenchmarks for applicable modules. The built benchmarks are installed under ${CMAKE_INSTALL_PREFIX}/benchmarks/<module> and can be run directly.

e.g.

#INSTALL_DIR/benchmarks/cudapoa/multibatch

A description of each of the benchmarks is present in a README under the module's benchmark folder.

Enable Doc Generation

To enable document generation for GenomeWorks, please install Doxygen on your system. OnceDoxygen has been installed, run the following to build documents.

make docs

Docs are also generated as part of the default all target when Doxygen is available on the system.

To disable documentation generation add -Dgw_generate_docs=OFF to the cmake command in the build step.

Code Formatting

GenomeWorks makes use of clang-format to format it's source and header files. To make use of auto-formatting, clang-format would have to be installed from the LLVM package (for latest builds, best to refer to http://releases.llvm.org/download.html).

Once clang-format has been installed, make sure the binary is in your path.

To add a folder to the auto-formatting list, use the macro gw_enable_auto_formatting(FOLDER). This will add all cpp source/header files to the formatting list.

To auto-format, run the following in your build directory.

make format

To check if files are correct formatted, run the following in your build directory.

make check-format

Running CI Tests Locally

Please note, your git repository will be mounted to the container, any untracked files will be removed from it. Before executing the CI locally, stash or add them to the index.

Requirements:

  1. docker (https://docs.docker.com/install/linux/docker-ce/ubuntu/)
  2. nvidia-docker (https://github.com/NVIDIA/nvidia-docker)
  3. nvidia-container-runtime (https://github.com/NVIDIA/nvidia-container-runtime)

Run the following command to execute the CI build steps inside a container locally:

bash ci/local/build.sh -r <GenomeWorks repo path>

ci/local/build.sh script was adapted from rapidsai/cudf

The default docker image is clara-genomics-base:cuda10.0-ubuntu16.04-gcc5-py3.7. Other images from gpuci/clara-genomics-base repository can be used instead, by using -i argument

bash ci/local/build.sh -r <GenomeWorks repo path> -i gpuci/clara-genomics-base:cuda10.0-ubuntu18.04-gcc7-py3.6

genomeworks's People

Contributors

ahehn-nv avatar akkamesh avatar alexomics avatar atadkase avatar cjw85 avatar edawson avatar epislim avatar gputester avatar gsneha26 avatar iiseymour avatar lanzju76 avatar larsbishop avatar mike-wendt avatar mimaric avatar nvericx avatar nvvishanthi avatar ohadmo avatar pb-dseifert avatar r-mafi avatar rached-ab avatar rilango avatar vellamike avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

genomeworks's Issues

improve alignment error handling in kernel

In CUDA aligner, some times valid inputs can lead to errors in processing e.g. when the hirschberg processing stack is full. We should have an error handling mechanism which reports when certain alignments could not be processed correctly so they can be reported back to the caller.

Specific case - cudaaligner/src/hirschberg_myers_gpu.cu has a printf(ERROR: Stack full) case.

Unable to clone

Hi,
The clone command fails with access right error.

$ git clone --recursive [email protected]:clara-genomics/ClaraGenomicsAnalysis.git
Cloning into 'ClaraGenomicsAnalysis'...
The authenticity of host 'github.com (140.82.114.3)' can't be established.
RSA key fingerprint is SHA256:nThbg6kXUpJWGl7E1IGOCspRomTxdCARLviKw6E5SY8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'github.com,140.82.114.3' (RSA) to the list of known hosts.
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Any thought?

genome_simulator slow on large genomes

After genome_simulator prints out

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 120/120 [05:41<00:00,  1.76s/it]
Simulating reads:
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 120/120 [06:54<00:00,  3.46s/it]

If running with a large genome (e.g 100MB @ 30x) there is a very long period where a single CPU is at 100% utilisation. This can probably be sped up through multiprocessing and/or other means.

[cudamapper] Number of overlaps generated has dependency on `index_size`

A regression appears to have been introduced whereby whatever index_size variable is set to affects the number of overlaps comptued. This results in very small differences between the number of overlaps detected prior and after read-level chunking. Example:

   899904 res_new.out
   899973 res_old.out```

This is likely to be an off-by-one error at some point in the read-level chunking

Graph representation in python and serialization methods

The python bindings are very helpful to perform alignments and consensus calls, but there currently isn't a good way to work with the resulting graph structures in python. The structures are available in C++, but there are some nuances to them. It would be nice if there were some examples (with documentation) of working with the resulting graphs in C++ and some bindings (or a new interface) to work with them in python as well.

It would also be helpful if there were a method that can be called after performing an alignment or consensus call that would serialize the graph (in DOT or some similar format) so it can be easily inspected / visualized after creation.

cuda aligner API to take in max available memory and max ref/query sizes

  1. each aligner batch to take in max memory and max ref/query sizes and determine how many how alignments can be performed in the batch.
  2. provide api to check max alignments possible
  3. actual max alignments may me larger based on inputs processed so far, and actual max can be determined by continually adding and checking return value of add alignment api call

Add build dependencies to Conda for GPUCI builds

Some recent GPUCI failures have revealed that (eg this one) have revealed that we are sensitive to what packages are installed on GPUCI VM/docker instances we are running.

Conda should be used as much as possible to allow our tests to run on a clean CI instance. This includes at a minimum:

  1. Cmake
  2. Flake

clang-format: fix member initializer formatting

clang-format formats the member initializer of a constructor as a single long line regardless of the length of this line.
For long lines clang-format should introduce line breaks in some sensible way.

Shared objects not being detected by python when importing `claragenomics.bindings`

When installing pyclaragenomics with venv, the following error is happening when running samples/tests:

Traceback (most recent call last):
  File "./sample_cudapoa", line 18, in <module>
    from claragenomics.bindings import cudapoa
ImportError: liblogging.so: cannot open shared object file: No such file or directory

it seems that if ClaraGenomicsAnalysis/pyclaragenomics/cga_build/install/lib/ is added to LD_LIBRARY_PATH this problem is resolved.

This error does not seem to occur when running in a Conda environment, but does in a venv (as reported by @mimaric ).

Improve README for different various parts of the SDK

Currently READMEs are not setup in an easy to use manner. The following needs to be done -
Proper README for each section (main, benchmarks, samples, APIs, tests)
Link all READMEs from main one to provide connected information from single location

About singlebatch

I went to build/install/benchmarks/cudapoa/ and when I run singlebatch, I get the following output

$ ./singlebatch
2019-07-29 13:14:25
Running ./singlebatch
Run on (16 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x8)
  L1 Instruction 64K (x8)
  L2 Unified 512K (x8)
  L3 Unified 8192K (x2)
Load Average: 0.33, 0.15, 0.16
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
BM_SingleBatchTest/1          1594 ms         1593 ms            1
BM_SingleBatchTest/4          2429 ms         2427 ms            1
BM_SingleBatchTest/16         2687 ms         2685 ms            1
BM_SingleBatchTest/64         3231 ms         3228 ms            1
BM_SingleBatchTest/256        8233 ms         8225 ms            1
terminate called after throwing an instance of 'std::runtime_error'
  what():  GPU Error:: out of memory /home/mahmood/cactus/cl/ClaraGenomicsAnalysis/cudapoa/src/allocate_block.cpp 46
Aborted (core dumped)

I would like to test a specific batch size and not variable sizes. It seems that singlebatch is a binary file. Any idea for that?

Setting the cuda toolkit location with PyClaraGenomics

It would be useful to override the CUDA_TOOLKIT_ROOT_DIR when building the Python bindings. The patch below passes the environment variable CUDA_TOOLKIT_ROOT_DIR to CMake if it is set, let me know if you want a P.R for this.

--- a/pyclaragenomics/setup.py
+++ b/pyclaragenomics/setup.py
@@ -35,6 +35,7 @@ class CMakeWrapper():
         self.cmake_root_dir = os.path.abspath(cmake_root_dir)
         self.cmake_install_dir = os.path.join(self.build_path, "install")
         self.cmake_extra_args = cmake_extra_args
+        self.cuda_toolkit_root_dir = os.environ.get("CUDA_TOOLKIT_ROOT_DIR")
 
     def run_cmake_cmd(self):
         cmake_args = ['-DCMAKE_INSTALL_PREFIX=' + self.cmake_install_dir,
@@ -42,6 +43,9 @@ class CMakeWrapper():
                       '-DCMAKE_INSTALL_RPATH=' + os.path.join(self.cmake_install_dir, "lib")]
         cmake_args += [self.cmake_extra_args]
 
+        if self.cuda_toolkit_root_dir:
+            cmake_args += ["-DCUDA_TOOLKIT_ROOT_DIR=%s" % self.cuda_toolkit_root_dir]
+
         if not os.path.exists(self.build_path):
             os.makedirs(self.build_path)

SketchElementImpl::ReadidPositionDirection became new SketchElement

SketchElement/Minimizer objects are not used anymore. IndexGPU internally relies on SketchElementImpl::ReadidPositionDirection. Its output consists of the content of SketchElementImpl::ReadidPositionDirection split into three separate arrays.

Look into ways to:

  1. Change the interface of Index so that SketchElementImpl::ReadidPositionDirection does not have to be split into three arrays
  2. Refactor the code to reflect the current state of not using SketchElement objects

Move enums back to enum classes in C++ CGA

Because of cython limitations, C++ enum classes had to be converted to enums for compatibility. However, there seem to be some workarounds in cython land to make up for that limitation. Worth investigating those WARs to avoid violating good C++ coding guidelines

Path Missing CMakeLists.txt

I tried to install Claragenomics in ubuntu 18.04 using command cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=install,but I'm getting this error
The source directory /home/vaibhavcurl/ClaraGenomicsAnalysis-master/3rdparty/bioparser
does not contain a CMakeLists.txt file.
The source directory

/home/vaibhavcurl/ClaraGenomicsAnalysis-master/3rdparty/spdlog

does not contain a CMakeLists.txt file.

Add CIGAR alignments to cudamapper using cudaaligner

  • Add optional cigar attribute to Overlap objects.
  • Add -a flag to cudamapper for the option of computing alignments
  • Once overlapping is complete, alignments can be completed in batches using cudaaligner.
  • If alignments are computed, they should be added to the PAF file (the relevant modification needs to be performed in the print_paf function).

Remove Ubuntu dependency

On Linux distributions which aren't Ubuntu or CentOS, building the source code fails with the 'unrecognized distro' fatal error. This error occurs in Packaging, which is not relevant to building the rest of the code to use on a given machine and should not block this.

[cudapoa] Lost lines in MSA output

In the example pyclaragenomics/samples/sample_cudapoa, the maximum sequences per poa is specified as 100, though the outputs are only 99 long. Changing the maximum sequences to 50, results in outputs of length 49. In appears that it is the final input sequence that is lost.

Do final steps of index generation in IndexGPU on GPU

As specified in PR #134 last part of building index in IndexGPU (done in details::index_gpu::build_index()) is still done on the CPU and takes about half of the total time execution time of IndexGPU generation.

Look for a way to move it to the GPU

Index should accept SketchElement implementation as tempalte parameter

Currently Index works with pointers to SketchElement, meaning we have to use std::vector<std::unique_ptr<SketchElement>> which is bad for performance and makes it hard to use that data on the GPU.
Change the implementation so that Index (or it's constructor) accepts one implementation of SketchElement and then work with std::vector<SketchElementImpl>

[cudapoa] CudaPoaBatch.get_msa() incorrectly reports success on failure with large inputs

The results of at least the python binding can be unexpected when the maximum MSA width is surpassed (default 1024 from cudapoa_kernels.cuh). I’ve observed the status be reported as 0 but the results be slightly mangled.

For example I’ve input 70 sequences of ~660bases, the status is 0, and the lengths of the strings returned for the MSA are not equal (often the first being longer than the rest). Taking a one/a few bases away from the inputs gives MSA lines uniformly of length 1023.

Solve performance regression caused by chunked Index Generator

#100 allows indexing of an arbitarily-large set of sequences but introduces a performance regression. This is caused because several sorted lists of SketchElements now need to be merged together. The merging is not being performed in an optimal way and can be improved by multithreading to run in ~log(N) time. There is also the possibility of performing this on GPU.

revert CGA_CU_CHECK_ERR to abort on error

Revert the CGA_CU_CHECK_ERR functionality to abort on error for Release builds, and assert(false) and then abort for Debug builds.

Potentially add cudaDeviceSynchronize in debug builds to catch errors when they occur

Make SDK functions "current device"-neutral

If we assume our users use CUDA also outside of our library, we should also ensure that our methods are "current device"-neutral, i.e. that we reset the device (cudaSetDevice) at the end of each method to the value it had when it entered the method.

`IndexGenerator` and `Matcher` unable to allocate memory on GPU when number of reads too large

When running overlaps with a FASTA/FASTQ that is too large (e.g >500MB) the following error is encountered:

terminate called after throwing an instance of 'claragenomics::device_memory_allocation_exception'
  what():  Could not allocate device memory!

This happens because on-device memory requirements of IndexGenerator and Matcher scale with size of the input reads.

The solution is to implement a "chunked" version of IndexGenerator and Matcher.

Still about single batch

For my GPU analyses, I tried running single-batch with nv profiler. It seems that there are two kernels only where the dominant one is generatePOAKernel. The other is generateConsensusKernel which is not important. So, this benchmark is not going to solve the problem and is only good for generating the graph. Am I right? I am not expert in this field and want to analyze some GPU things. I don't know if that graph generation is a big problem.

A single run of batch=256, takes
Time = 8094 ms
CPU = 8092 ms
Iterations = 1
So, where is GPU in the results?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.