Giter VIP home page Giter VIP logo

priblast's Introduction

pRIblast

gnu workflow issues doi version license

pRIblast is a high efficient, parallel application for extensive lncRNA-RNA interaction prediction. pRIblast is based on the work of T. Fukunaga and M. Hamada, RIblast, and it has been fully optimized to reduce I/O latencies and memory usage to the bare minimum.

Version

Version 0.0.3.

Requirements

To compile and execute pRIblast, the following software is required:

  • GNU Make.
  • C++ compiler (with support for OpenMP and the C++17 standard).
  • MPI implementation (MPI-3 compliant).

For instance, a valid combination of these tools may be: GNU Make v3.82, GCC v9.3.0 and OpenMPI v3.1.4.

Compilation

Download the source code from this repository, either use Git or download a copy from GitHub, and let GNU Make automatically compile pRIblast for you. As a result, there will be a newly created binary file named pRIblast in the target folder of your current working directory.

Execution

To execute pRIblast, fetch the MPI runtime interface as follows

$ mpirun -np <p> -x OMP_NUM_THREADS=<t> pRIblast <options>

where <p> is the number of processes that will exist in the MPI group and <t> is the number of threads spawned per MPI process.

As for the program options, RIblast's official repository provides a detailed list of the available execution modes (i.e. database construction and RNA interaction search) and per mode parameters. However, pRIblast implements new options to have fine grained control over the execution of the parallel algorithm. Those options are:

 (db) -a  <std>, sets the parallel algorithm used to distribute data among processes (block | heap | dynamic)
 (db) -p  <str>, sets a per process local path for fast writing of temporary output files
 (db) -c  <int>, sets the database page size (smaller page implies less memory usage)
(ris) -a  <str>, sets the parallel algorithm used to distribute data among processes (block | area | dynamic)
(ris) -p  <str>, sets a per process local path for fast writing of temporary output files

Execution example

Suppose you want to execute pRIblast (both the db and ris steps using the dynamic algorithm) on a 16-node multicore cluster using a FASTA file db.fa, which contains RNA sequences to construct a database, a FASTA file ris.fa, which contains the RNA sequences you want to predict interactions against the database, a page size of 500 sequences, and 1 process per node with 16 threads each. Furthermore, there exist a local, temporary disk attached to every node located in /tmp/scratch that allows fast writing of temporary output files.

First, create the target RNA database running the pRIblast database construction step as follows

$ mpirun -np 16 -x OMP_NUM_THREADS=16 \
         pRIblast db -i db.fa -o rna-db -a dynamic -p /tmp/scratch -c 500

And then, predict interactions against the database running the pRIblast RNA interaction search step as follows

$ mpirun -np 16 -x OMP_NUM_THREADS=16 \
         pRIblast ris -i ris.fa -o predictions.txt -d rna-db -a dynamic -p /tmp/scratch

Note that the -p option is not mandatory, but it is highly recommended to use it if there exists a local, temporary disk attached to every node, as this will drastically reduce I/O latencies. And also, note that the -c option is only available for the database construction step. It sets the page size of the database, i.e. the number of RNA sequences that will be loaded into memory at once. The smaller the page size, the less memory will be used in the ris step.

Configuration of threads, processes and algorithms

To achieve maximum performance, avoid running the pure-block algorithm. Its only purpose is to benchmark. Instead, use the heap (database construction step) and the area-sum (RNA interaction search step) algorithms if computing nodes have a high number of CPU cores available to take advantage of the multithreading performance optimization heuristics developed within the tool. Spawn one process per socket and run as many threads as cores it has. Otherwise, use the dynamic algorithm if the number of available nodes is low and/or the number of CPU cores per node is low. Spawn one process per core.

Cite us

If you use pRIblast in your research, please cite our work using the following references:

@article{amatria2023priblast,
  title={pRIblast: A highly efficient parallel application for comprehensive {lncRNA--RNA} interaction prediction},
  author={Amatria-Barral, I{\~n}aki and Gonz{\'a}lez-Dom{\'\i}nguez, Jorge and Touri{\~n}o, Juan},
  journal={Future Generation Computer Systems},
  volume={138},
  pages={270--279},
  year={2023}
}

@inproceedings{amatria2023parallel,
  author={Amatria-Barral, I{\~n}aki and Gonz{\'a}lez-Dom{\'\i}nguez, Jorge and Touri{\~n}o, Juan},
  title={Parallel construction of {RNA} databases for extensive {lncRNA--RNA} interaction prediction},
  booktitle={Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing},
  series={SAC '23},
  pages={555--558},
  year={2023},
  address={Tallinn, Estonia}
}

License

pRIblast is free software and as such it is distributed under the MIT License. However, pRIblast makes use of several modules which are not original pieces of work. Therefore, their usage is subject to their correspoding THIRDPARTYLICENSE and all rights are reserved to their authors.

priblast's People

Contributors

amatria avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

priblast's Issues

Possible loss of performance due to passing two vectors by value to a method which is run in a computation heavy loop

double SeedSearch::CalcAccessibility(vector<float> accessibility, vector<float> conditional_accessibility, int sp, int length){
double temp = accessibility[sp];
for(int i =_min_accessible_length; i < length; i++){
temp += conditional_accessibility[sp+i];
}
return(temp);
}

TODO:

  1. Check there indeed exists an important penalty because of passing these two vectors by value.
  2. Edit the code and pass the two vectors by reference.
  3. Benchmark and submit a PR, if necessary.

pRIblast ruuning question

mpirun -np 6 -x OMP_NUM_THREADS=6 pRIblast ris -i LncRNA.fa -o ZMpredictions.txt -d ZM -a area

[lzu-MZ72-HB0-00:2149410] *** Process received signal ***
[lzu-MZ72-HB0-00:2149410] Signal: Aborted (6)
[lzu-MZ72-HB0-00:2149410] Signal code: (-6)
[lzu-MZ72-HB0-00:2149410] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x14db7f442520]
[lzu-MZ72-HB0-00:2149410] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x14db7f4969fc]
[lzu-MZ72-HB0-00:2149410] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x14db7f442476]
[lzu-MZ72-HB0-00:2149410] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x14db7f4287f3]
[lzu-MZ72-HB0-00:2149410] [ 4] /mnt/sdb/fanglf/anaconda3/envs/CIRI/lib/libstdc++.so.6(+0xb135a)[0x14db7f8b135a]
[lzu-MZ72-HB0-00:2149410] [ 5] /mnt/sdb/fanglf/anaconda3/envs/CIRI/lib/libstdc++.so.6(+0xb13c5)[0x14db7f8b13c5]
[lzu-MZ72-HB0-00:2149410] [ 6] /mnt/sdb/fanglf/anaconda3/envs/CIRI/lib/libstdc++.so.6(+0xb1658)[0x14db7f8b1658]
[lzu-MZ72-HB0-00:2149410] [ 7] /mnt/sdb/fanglf/anaconda3/envs/CIRI/lib/libstdc++.so.6(_ZSt20__throw_length_errorPKc
[lzu-MZ72-HB0-00:2149410] [ 8] pRIblast(+0x35277)[0x557d90c24277]
[lzu-MZ72-HB0-00:2149410] [ 9] pRIblast(+0x1a10e)[0x557d90c0910e]
[lzu-MZ72-HB0-00:2149410] [10] pRIblast(+0x3029b)[0x557d90c1f29b]
[lzu-MZ72-HB0-00:2149410] [11] pRIblast(+0xee2a)[0x557d90bfde2a]
[lzu-MZ72-HB0-00:2149410] [12] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x14db7f429d90]
[lzu-MZ72-HB0-00:2149410] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x14db7f429e40]
[lzu-MZ72-HB0-00:2149410] [14] pRIblast(+0x11c05)[0x557d90c00c05]
[lzu-MZ72-HB0-00:2149410] *** End of error message ***

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.