Giter VIP home page Giter VIP logo

tensor-episdet's Introduction

tensor-episdet

This repository includes implementations of two exhaustive epistasis detection algorithms targeting the NVIDIA Turing GPU architecture. These implementations have been devised around the efficient use of novel tensor core capabilities introduced in the Turing architecture pertaining to support for tensorized fused XOR+POPC operations. These operations, which are typically targeted at processing certain types of quantized neural networks, have been repurposed for achieving breakthrough performance in 2-way and 3-way epistasis detection searches with challenging datasets. The tensor cores are accessed through the Warp Matrix Multiply-Accumulate (WMMA) API via the CUTLASS CUDA C++ template abstractions library.

What is Epistasis Detection?

Epistasis detection is a computationally complex bioinformatics application with significant societal impact. It is used in the search of new correlations between genetic markers, such as single-nucleotide polymorphisms (SNPs), and phenotype (e.g. a particular disease state). Finding new associations between genotype and phenotype can contribute to improved preventive care, personalized treatments and to the development of better drugs for more conditions.

Setup

Requirements

  • CUDA Toolkit (version 10 or more recent)
  • CUTLASS (version 1.3.X)
  • NVIDIA GPU with tensor cores (minimum Turing architecture, e.g. Geforce 2060)

Compilation

Compiling binary (tensor-episdet.triplets.k2.bin) for performing 3-way searches using K2 Bayesian scoring:

$ make triplets_k2

Compiling binary (tensor-episdet.pairs.k2.bin) for performing 2-way searches using K2 Bayesian scoring:

$ make pairs_k2

Mutual information scoring can be used instead of K2 scoring, through replacing k2 by mi in the Makefile argument.

Important parameters such as the number of SNPs per block (BLOCK_SIZE) and the number of CUDA streams used to process different rounds (NUM_STREAMS) can be changed by modifying the Makefile. Depending on the dataset characteristics, specializing these parameters (e.g. using a larger block size when processing datasets with more SNPs) can have a significant influence on the performance achieved.

The Makefile is expecting the CUTLASS library to be inside the project root directory in a folder named cutlass. If you installed the library in a different directory, you must modify the Makefile accordingly.

Notice that the application expects that the input dataset is in a particular binarized format. You can download an example dataset with 4096 SNPs and 262144 samples from here. Due to the way data is processed using matrix operations, the number of bits per {SNP, allele} in the dataset files (*.bin) representing cases or controls (stored in different files) must be a multiple of 1024 bits. In situations where the number of cases or controls is not a multiple of 1024, the input binary dataset is expected to be padded with zeros (i.e. unset bits).

Usage example

Running a 3-way search with a synthetic example dataset with 4096 SNPs (11,444,858,880 triplets of SNPs to evaluate) and 262144 samples (131072 controls and 131072 cases):

$ ./bin/tensor-episdet.triplets.k2.bin datasets/db_4096snps_262144samples.txt   

This example is expected to take slightly less than 2 minutes to execute and to achieve a performance of above 25 tera sets (triplets) of SNPs processed per second (scaled to sample size), when executed on a system with a GeForce 2070S GPU. Higher performance can be achieved when processing more challenging datasets with more SNPs.

The construction of contingency tables, a phase of epistasis detection searching that counts of occurrences of the possible genotypes in cases and controls resulting from combining pairs/triplets of SNPs, represents the most computationaly complex portion of the application. Thus, running the same example with Mutual Information instead of K2 Bayesian scoring is expected to achieve comparable performance.

In papers and reports, please refer to this tool as follows

R. Nobre, A. Ilic, S. Santander-Jiménez and L. Sousa, "Retargeting Tensor Accelerators for Epistasis Detection," in IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 9, pp. 2160-2174, 1 Sept. 2021, doi: 10.1109/TPDS.2021.3060322.

BibTeX:

@ARTICLE{9357942,
author={R. {Nobre} and A. {Ilic} and S. {Santander-Jiménez} and L. {Sousa}},
journal={IEEE Transactions on Parallel and Distributed Systems}, 
title={Retargeting Tensor Accelerators for Epistasis Detection}, 
year={2021},
volume={32},
number={9},
pages={2160-2174},
doi={10.1109/TPDS.2021.3060322}}

For additional readings in high-throughput epistasis detection, you can take a look at our IPDPS 2020 and JSSPP 2020 papers.

tensor-episdet's People

Contributors

rjfnobre avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.