phylo42 / rappas Goto Github PK

Rapid, Alignment-free, Phylogenetic Placement via Ancestral Sequences

License: GNU General Public License v3.0

Shell 0.79% Java 99.21%

placement phylogenetic-placement metagenomics metabarcoding phylogenetics genetic-markers kmers phylo-kmers taxonomic-classification

rappas's Introduction

NOT MAINTAINED ANYMORE, PLEASE SWITCH TO ITS SUCCESSOR, "EPIK" !

EPIK repository

RAPPAS can be considered deprecated. Please switch to its successor, EPIK

This repository is hardly maintained anymore.
Devolpments moved to C++.
The second iteration of the approach is renamed EPIK.
It is order(s) of magnitude faster, and requires order(s) of magnitude less RAM.

Repository: EPIK repository

Improvements compared to RAPPAS : https://doi.org/10.1093/bioinformatics/btad692

RAPPAS

Documents: Usage, Tutorial and Benchmarking
Please cite:

Rapid Alignment-free Phylogenetic Placement via Ancestral Sequences

Benjamin Linard, Krister Swenson, Fabio Pardi.

Bioinformatics, Volume 35, Issue 18, 15 September 2019, Pages 3303–3312

doi.org/10.1093/bioinformatics/btz068

RAPPAS is a program dedicated to "Phylogenetic Placement" (PP) of metagenomic or metabarcoding reads on a reference tree. As apposed to previous PP programs, RAPPAS is based on the phylo-kmers idea, detailed in tis manuscript and uses a 2 step approach divided into a) the database build, and b) the placement itself.

This README contains short instructions to build and launch RAPPAS.

You will find more detailed instructions, test datasets, pre-built databases and tutorials and discussions related to phylogenetic placement on the wiki page.

Important notices

01/2020

RAxML-ng support added in RAPPAS v1.20.
phyml (>=3.3.20190909), paml (>=4.9) or raxml-ng (>=0.9.0) ancestral reconstructions (AR)
are now compatible with RAPPAS. As they produce similar results, AR software selection
depends mostly on the time/RAM balance you can handle.
(see "Note on PhyML, PAML and RAxML binaries" below).
In your productions, please think to cite both RAPPAS and the selected AR software.

09/2019

Due to an output format change in PhyML, RAPPAS may fail with PhyML version <3.3.20190909.
We STRONGLY recommand to use PhyML >=3.3.20190909 and RAPPAS >=1.12 !
You can download this version of PhyML from its GIT repository.

**If you install RAPPAS with conda, the correct version will be automatically selected.**

06/2019

When placing on large trees, remember that the default limits of PhyML are quite low.
(<4000 taxa, sequence labels <1000 characters)
You can work with larger trees and longer sequence labels after the following changes:

1) Modify the following lines in the src/utilities.h source file of PhyML :
#define  T_MAX_FILE          2000
#define  T_MAX_NAME          5000
#define  N_MAX_OTU         262144
2) Recompile phyml after these changes.
3) Launch RAPPAS.

Description

RAPPAS (Rapid Alignment-free Phylogenetic PLacement via Ancestral Sequences) is a program dedicated to "Phylogenetic Placement" (PP) of metagenomic reads on a reference tree. As apposed to previous PP programs, RAPPAS is based on the phylo-kmers idea, detailed in tis manuscript and uses a 2 step approach divided into a) the database build, and b) the placement itself.

The main advantage of RAPPAS is that it is alignment free, which means that after step (a) (the DB build) is performed, metagenomic reads can be directly placed on a referene tree WITHOUT aligning them to the reference alignment on which the tree was built (as required by other approaches).

The second advantage of RAPPAS is its algorithm based on phylo-kmers matches, making its execution time linear with respect to the length of the placed sequences.

RAPPAS was funded from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 634650. (Virogenesis.eu)

Installation

Bioconda installation

If you use bioconda, you can install RAPPAS with the following commands:

conda config --add channels bioconda
conda config --add channels conda-forge
conda install rappas

Then you can run RAPPAS with the command:

rappas [options]

If you need to customize Java JVM parameters (for instance when requesting more memory with options -Xmx/-Xms), use the following bash command:

java -Xmx16G -jar $(which RAPPAS.jar) [options]

From sources: prerequisites

RAPPAS compilation requires a clean javac compiler installation. Java >=1.8 is compulsory as some operations are based on Lambda expressions.
Apache Ant is used to facilitate the compilation.

We provide instructions for Debian-based Linux distributions. For compiling Java sources with Apache Ant on other operating systems, please perform analogous operations on your system.

Using OpenJDK 1.8 (more recent versions also compatible):

#install packages
sudo apt-get update
sudo apt-get install openjdk-8-jdk
#update relevant symlinks to make v1.8 default
sudo update-java-alternatives --set java-1.8.0-openjdk-amd64

Using the proprietary Oracle JDK 1.8 (more recent versions also compatible):

#install packages
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
#update relevant symlinks to make v1.8 default
sudo apt-get install oracle-java8-set-default

Installation of Apache Ant:

sudo apt-get install ant

From sources: download and compilation

#download git repository
git clone -b master https://github.com/blinard-BIOINFO/RAPPAS.git
#compile
cd RAPPAS && ant -f build-cli.xml

The executable RAPPAS.jar can then be found in the ./dist directory.

You can create also create a standalone executable using a stub. If you use this approach, be aware you will need to setup the java virtual machine options manually by modify the stub.sh file and that they cannot be modified later.

cd ..
cat stub.sh dist/RAPPAS.jar > rappas && chmod +x rappas

When using the standalone binary and not the jar file, in all instructions and tutorials just remember to replace 'java [jvm_options] -jar RAPPAS.jar' by 'rappas' in the instructions and tutorials detailed below.

Usage

Reference Dataset

First, one has to prepare a reference dataset designed to answer a biological question. Typically, in the context of metagenomics and taxonomic identifications, a marker gene (16S rRNA, cox1, rbcl...) is used to build a reference species tree. This species tree is the basis for phylogenetic placement of marker gene(s). For RAPPAS, the reference dataset is composed of:

A reference alignment of all sequences of this marker gene
A phylogenetic tree inferred from this reference alignment

Such reference marker gene datasets can be found, for instance from:

"The All-Species Living Tree" Project (LTP, eukaryote rRNAs) : https://www.arb-silva.de/projects/living-tree/,
Greengenes (bacterial 16S) : http://greengenes.secondgenome.com/,
Any marker database from EukRef: http://eukref.org/databases/,
The curated database of Eukref : http://eukref.org/databases/,
Or built internally in the lab.

Sharing pre-built RAPPAS databases

While our goal is to allow stable databases that can be broadly shared, RAPPAS is still in active development. Some major changes may induce that a recent version (ex: v1.05) becomes incompatible with a DB built using an older version (ex: v1.01). Such incompatibilities are clearly stated in the "release" page and the changelog. Also be aware that, because uses internal Java serialization, you may run to troubles when sharing databases between users using different Java versions or Oracle JDK VS OpenJDK.

RAPPAS database build

Basic command

java -Xmx8G -jar RAPPAS.jar -p b -s [nucl|prot] -b ARbinary -w workdir -r reference_alignment.fasta -t reference_tree.newick

where

option	expected value	description
-p (--phase)	"b"	Invokes the "database build" process.
-s (--states)	"nucl" or "prot"	Set if we use a nucleotide or protein analysis.
-b (--arbinary)	binary of PhyML (>=3.3.20190909),PAML (>=4.9) or RAxML-ng (>=0.9.0)	Set the path to the binary used for ancestral sequence reconstruction (see note below).
-w (--workdir)	directory	Set the directory to save the database in.
-r (--refalign)	file	The reference alignment, in fasta format.
-t (--reftree)̀	file	The reference tree, in newick format.

Note on PhyML, PAML and RAxML binaries: Currently, the following programs are fully supported by RAPPAS for generating ancestral sequence posterior probabilities:

RAxML-ng : Fastest & recommended but may require lots of RAM (you have to use phyml-v3.3.20180214).
PhyML : ~10x slower, requires slightly less memory.
PAML : ~100x slower, but very low memory footprint.

Note on -Xm[x]G option: The process of database build can be memory intensive for values of k>=10. To make RAPPAS run smoothly, allocate more memory (more heap) to the java process using the option -Xm[x]G where [x] is replaced by an integer value. For instance, -Xm8G will extend the java heap to a maximum of 8Gb of memory, -Xmx16G will extend it to a maximum of 16Gb ...

You can use the latest versions provided on the authors' websites. PhyML requires at least version 3.3 (see PhyML GIT ), but we recommand the HACKED VERSIONS available in this git repository in the /depbin directory.

These are based on slightly modified sources of PhyML and PAML: no change in ML computations, but useless outputs are skipped, making the ancestral reconstruction process faster (in particular for PAML).

The reconstruction will result in the production of a directory structure and a database file in the given "workdir":

file or directory	description
[DBname].union	The RAPPAS database itself.
[workdir]/extended_tree	Temporary files used at DB construction, allowing the exploration of ghost nodes.
[workdir]/AR	Temporary files used at DB construction, the raw output of PhyML or PAML.
[workdir]/logs	As the name says.

Query placement

After building the RAPPAS DB, placement commands can be called numerous times on different query sequence datasets. v1.00 of RAPPAS places 1,000,000 metagenomic of 150bp in ~30-40 minutes, using only a single core of a normal desktop PC.

java -Xmx8G -jar RAPPAS.jar -p p -d database.union -q queries.fasta

where

option	expected value	description
-p (--phase)	"p"	Invokes the "placement" process.
-d (--database)	file	the *.union file created at previous DB build step.
-q (--queries)	file	The query reads, in fasta format.

Note on -Xm[x]G option: Reuse the value used in the database build phase, as loading the database will basically require the same amount of memory.

The *.jplace describing the placements of all queries will be written in the ./workdir/logs directory.

To know more about :

the jplace format.
the exploitation of phylogenetic placement results (OTU alpha diversity, Unifrac-like measures...).

Other options

Outputs options: Options are related to the Jplace file outputs which resumes the placement results. They are analogs to PPlacer options.

option	expected value {default}	description
--keep-at-most	integer>=1 {7}	Maximum number of placements reported per query in the jplace output. (p phase)
--keep-factor	float in ]0;1] {0.01}	Report placement with likelihood_ratio higher than (factor x best_likelihood_ratio). (p phase)
--write-reduction	file	Write reduced alignment to a file (see --ratio-reduction). (b phase)

Algorithm options: Options are related to the Jplace file outputs which resumes the placement results. They are analogs to PPlacer options.

option	expected value {default}	description
-a (--alpha)	float in ]0,Inf] {1.0}	Shape parameter used in AR. (b phase)
-c (--categories)	int in [1,Inf] {4}	# categories used in AR. (b phase)
-k	integer>=3 {8}	The k-mer length used at DB build.
-m (--model)	string {GTR\|LG}	Model used in AR, one of the following: (for nucl) JC69, HKY85, K80, F81, TN93, GTR ; (for amino) LG, WAG, JTT, Dayhoff, DCMut, CpREV, mMtREV, MtMam, MtArt (b phase)
--arparameters	string	Parameters passed to the software used for anc. seq. reconstuct. Overrides -a,-c,-m options. Value must be quoted by ' or ". Do not set options -i,-u,--ancestral (managed by RAPPAS). PhyML example: "-m HIVw -c 10 -f m -v 0.0 --r_seed 1" (b phase)
--convertUO	none	U,O amino acids are converted to C,L to allow correct ancestral reconstruction (b phase)
--force-root	none	Root input tree if non rooted. (b phase)
--ratio-reduction	float in ]0,1] {0.99}	Ratio for alignment reduction, i.e. sites holding >99% gaps are ignored. (b phase)
--no-reduction	none	Do not operate alignment reduction. This will keep all sites of input reference alignment but may produce erroneous ancestral k-mers. (b phase)
--gap-jump-thresh	float in ]0,1] {0.3}	Gap ratio above which gap jumps are activated, for instance if the reference alignment holds more than 30% of gaps. (b phase)
--omega	float in ]0,#states] {1.5}	Ratio levelling the probability threshold used in phylo-kmer filtering. (b phase)
--use_unrooted	none	Confirms you accept to use an unrooted reference tree (option -t). The trifurcation described in the newick file will be considered as root. Be aware that meaningless roots may impact accuracy. (b phase)

Debug options: Avoid using debug options if you are not involved in RAPPAS development.

option	expected value {default}	description
--ambwithmax	none	Treat ambiguities with max, not mean. (p phase)
--aronly	none	Launches ancestral reconstruction, but do not build phylo-kmers DB. (b phase)
--ardir	directory	Skips ancestral sequence reconstruction, and uses outputs of PhyML or PAML present in the specified directory. (b phase)
--dbinram	none	Operate "b" phase followed by "p" phase in one run, without saving DB to a file and placing directly queries given via -q .
--do-n-jumps	none	Shifts to n jumps. (b phase)
--no-gap-jumps	none	Deactivate k-mer gap jumps, even if reference alignment has a proportion of gaps higher than "--gap-jump-thresh". (b phase)
--noamb	none	Do not treat ambiguous states, corresponding k-mers are skipped. (p phase)
--threads	integer	#core to use with RAxML-ng to accelerate ancestral reconstruction. (b phase)

License

RAPPAS is available under the GPLv3 license.

rappas's People

Contributors

Stargazers

Watchers

Forkers

nromashchenko danielwsink pyspider

rappas's Issues

All the placement on half of the tree

Hi, we are trying to use RAPPAS on a tree with around 64K leaves. Attached are the backbone tree, backbone sequences, and query sequences. We used the same commands as the ones on the README and use raxml-ng to build the database. The resulting placements for all the queries are on the same half of the tree which is not correct. Could you give us some help on that? Thanks!
rappas_data_001.zip

Error while running Raxml-ng

Dear author,

I ran rappas on linux system with 256 GB RAM and 32 cores (64 threads).

The run failed with the error below

rappas -p b -s nucl -w /home/snyoo/pp_result/rappas/db/ -b /home/snyoo/miniconda3/bin/raxml-ng -r /data/snyoo/DB/WITCH-NG/target/release/UNITE_witch_ng.afa -t /data/snyoo/DB/WITCH-NG/target/release/UNITE_witch_ng.nwk --no-reduction -v 1 --force-root --threads 16
wordDir: /home/snyoo/pp_result/rappas/db
Using AR binary provided by user: /home/snyoo/miniconda3/bin/raxml-ng
Original alignment will be used (no gapped columns removed).
Default k=8 will be used.
Default omega=1.5 will be used.
Default ratio-reduction=0.99 will be used.
Default injectionPerBranch=1 will be used.
################################################

RAPPAS v1.21

---------------------------------------------

Rapid Alignment-free Phylogenetic Placement

via Ancestral Sequences

Linard B, Swenson KM, Pardi F

LIRMM, Univ. of Montpellier, CNRS, France

https://doi.org/10.1101/328740

benjamin/dot/linard/at/lirmm/dot/fr

################################################
workDir=/home/snyoo/pp_result/rappas/db
Starting db_build pipeline...
Set analysis for DNA
User did not set model parameters, using default: m=GTR;a=1.0;c=4;
main_v2.Main_DBBUILD_3--> k=8
main_v2.Main_DBBUILD_3--> factor=1.5
main_v2.Main_DBBUILD_3--> PPStarThreshold=3.9106607E-4
main_v2.Main_DBBUILD_3--> log10(PPStarThreshold)=-3.40775
main_v2.Main_DBBUILD_3--> Loading alignment: /data/snyoo/DB/WITCH-NG/target/release/UNITE_witch_ng.afa
Some ambiguous states were found in the alignment. (use '-v 1' to know more)
alignement.Alignment--> Ambiguous state: char='B' occurences=1
alignement.Alignment--> Ambiguous state: char='d' occurences=1
alignement.Alignment--> Ambiguous state: char='D' occurences=2
alignement.Alignment--> Ambiguous state: char='H' occurences=2
alignement.Alignment--> Ambiguous state: char='K' occurences=177
alignement.Alignment--> Ambiguous state: char='k' occurences=22
alignement.Alignment--> Ambiguous state: char='M' occurences=140
alignement.Alignment--> Ambiguous state: char='m' occurences=16
alignement.Alignment--> Ambiguous state: char='N' occurences=1047
alignement.Alignment--> Ambiguous state: char='n' occurences=178
alignement.Alignment--> Ambiguous state: char='R' occurences=575
alignement.Alignment--> Ambiguous state: char='r' occurences=31
alignement.Alignment--> Ambiguous state: char='S' occurences=141
alignement.Alignment--> Ambiguous state: char='s' occurences=14
alignement.Alignment--> Ambiguous state: char='v' occurences=1
alignement.Alignment--> Ambiguous state: char='V' occurences=2
alignement.Alignment--> Ambiguous state: char='W' occurences=198
alignement.Alignment--> Ambiguous state: char='w' occurences=21
alignement.Alignment--> Ambiguous state: char='Y' occurences=1023
alignement.Alignment--> Ambiguous state: char='y' occurences=49
main_v2.Main_DBBUILD_3--> Alignment read: Dimension: 38208x28381 (colxline)
main_v2.Main_DBBUILD_3--> Gap ratio: 69.73624883831596
main_v2.Main_DBBUILD_3--> >=0.3, gap jumps activated.
main_v2.Main_DBBUILD_3--> Loading tree: /data/snyoo/DB/WITCH-NG/target/release/UNITE_witch_ng.nwk
tree.NewickReader--> Rooting of input unrooted Tree !
tree.PhyloTree--> Building tree indexation
main_v2.Main_DBBUILD_3--> Original tree read.
tree.PhyloTree--> Building tree indexation
Injecting fake nodes...
tree.PhyloTree--> Building tree indexation
tree.ExtendedTree--> # nodes in tree before extension: 56761
tree.ExtendedTree--> # nodes in tree after extension: 283801
tree.PhyloTree--> Building tree indexation
main_v2.Main_DBBUILD_3--> RelaxedTree contains 141901 leaves
main_v2.Main_DBBUILD_3--> RelaxedTree contains 113520 FAKE_X new leaves
main_v2.Main_DBBUILD_3--> Write extended alignment (fasta): /home/snyoo/pp_result/rappas/db/extended_trees/extended_align.fasta
main_v2.Main_DBBUILD_3--> Write extended alignment (phylip): /home/snyoo/pp_result/rappas/db/extended_trees/extended_align.phylip
main_v2.Main_DBBUILD_3--> Write extended newick tree: /home/snyoo/pp_result/rappas/db/extended_trees/extended_tree_withBL.tree
main_v2.Main_DBBUILD_3--> Write extended newick tree with branch length: /home/snyoo/pp_result/rappas/db/extended_trees/extended_tree_withBL_withoutInterLabels.tree
main_v2.Main_DBBUILD_3--> Storing binary version of Extended Tree.
I guess, from binary name, we are using RAXML-NG.
Launching ancestral reconstruction...
inputs.ARProcessLauncher--> RAXML-NG AR was selected.
inputs.ARProcessLauncher--> Ancestral reconstruct command: [/home/snyoo/miniconda3/bin/raxml-ng, --ancestral, --msa, /home/snyoo/pp_result/rappas/db/extended_trees/extended_align.phylip, --tree, /home/snyoo/pp_result/rappas/db/extended_trees/extended_tree_withBL_withoutInterLabels.tree, --threads, 16, --redo, --precision, 9, --seed, 1, --force, msa, --data-type, DNA, --model, GTR+G4{1.0}+IU{0}+FC, --blopt, nr_safe, --opt-model, on, --opt-branches, on]
inputs.ARProcessLauncher--> Current directory:/home/snyoo/pp_result/rappas/db/AR
inputs.ARProcessLauncher--> External process operating reconstruction is logged in: /home/snyoo/pp_result/rappas/db/AR/AR_sdtout.txt
inputs.ARProcessLauncher--> Launching ancestral reconstruction (go and take a coffee, it might take hours if > 5000 leaves!) ...
Output from external software:

RAxML-NG v. 1.0.1 released on 19.09.2020 by The Exelixis Lab.
Developed by: Alexey M. Kozlov and Alexandros Stamatakis.
Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth.
Latest version: https://github.com/amkozlov/raxml-ng
Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml
RAxML-NG was called at 12-Apr-2023 17:02:56 as follows:

/home/snyoo/miniconda3/bin/raxml-ng --ancestral --msa /home/snyoo/pp_result/rappas/db/extended_trees/extended_align.phylip --tree /home/snyoo/pp_result/rappas/db/extended_trees/extended_tree_withBL_withoutInterLabels.tree --threads 16 --redo --precision 9 --seed 1 --force msa --data-type DNA --model GTR+G4{1.0}+IU{0}+FC --blopt nr_safe --opt-model on --opt-branches on

Analysis options:
run mode: Ancestral state reconstruction
start tree(s): user
random seed: 1
tip-inner: ON
pattern compression: OFF
per-rate scalers: OFF
site repeats: OFF
branch lengths: proportional (ML estimate, algorithm: NR-SAFE)
SIMD kernels: AVX2
parallelization: coarse-grained (auto), PTHREADS (16 threads), thread pinning: OFF

[00:00:00] Reading alignment from file: /home/snyoo/pp_result/rappas/db/extended_trees/extended_align.phylip

Alignment comprises 1 partitions and 38208 sites

NOTE: Binary MSA file created: /home/snyoo/pp_result/rappas/db/extended_trees/extended_align.phylip.raxml.rba

Parallel reduction/worker buffer size: 1 KB / 0 KB

[00:01:12] Loading user starting tree(s) from: /home/snyoo/pp_result/rappas/db/extended_trees/extended_tree_withBL_withoutInterLabels.tree
[00:01:14] Data distribution: max. partitions/sites/weight per thread: 1 / 2388 / 38208
[00:01:14] Data distribution: max. searches per worker: 1

Starting ML tree search with 1 distinct starting trees

Ancestral reconstruction finished. Return to RAPPAS process.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!! Raxml-ng outputs are missing, the process may have failed... !!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Some clues may be found in the AR/AR_sdterr.txt or AR/AR_sdtour.txt files.

AR_sdtout.txt
AR_sdterr.txt

I couldn't find out any error messages in the txt files.

Thank you in advance for your assistance.

Best regards,

Shinnam Yoo

RAPPAS is not compatible with phyml-v3.3.20180621 and superior (only phyml-v3.3.20180214)

There is a format change in the ancestral state output which crashes the parser of RAPPAS.
This need to be solved rapidly to allow compatibility with all PyML 3.3+ .

Inquiry About Increasing Limits in "src/utilities.h" and Recompiling PhyML

Dear PhyML Community,

I hope this message finds you well. I have been using RAPPAS for my research and have encountered a limitation in the default values of certain parameters in the "src/utilities.h" source file.

As per the documentation, I understand that I can increase the limits for the following parameters in "src/utilities.h":

#define T_MAX_FILE 2000
#define T_MAX_NAME 5000
#define N_MAX_OTU 262144

However, I am seeking some guidance on the next steps to proceed with this modification and successfully recompile PhyML. Changing these lines to new values is sufficient in most cases, but I want to ensure I do it correctly and avoid any potential issues.

Could you kindly provide me with the step-by-step instructions on how to make these changes in "src/utilities.h" and then recompile PhyML? Any additional considerations or recommendations during this process would also be highly appreciated.

Thank you for your time and support. I look forward to your guidance to overcome this limitation and continue using PhyML for my research.

Best regards,
Milli

Verify PhyML RAM requirements and compatibilty with system

PhyML can be really demanding for RAM with larghe trees, this information is in PyML stdout. It should be checked and tested by RAPPAS to verify if the process can run on the current architecture.

Alternatively, RAPPAS can advice to use PAML if RAM requirements of PhyML are too high.

Allow Nexus format for input tree

Allow (non)interleaved Phylip format for input alignment

If does not exists, /log dir not created during placement phase.

Needs to be created at both DB build and placement phases.

Redirection of AR software stdout and stderr is buggy

The issue is probably related to the thread redirecting the Stream. It doesn't affect the process, but some lines are not correctly redirected to the log files.
This thread should be removed for a more clean redirection in both the log files and the stdout.

Better logging related to fasta integrity

When using an empy file as fasta input, produces this error. File Integrity should be checked BEFORE anaylizing any non-existing sequence.

Analyzing query sequences...
java.lang.ArithmeticException: / by zero
at inputs.FASTAPointer.checkSize(FASTAPointer.java:242)
at inputs.FASTAPointer.(FASTAPointer.java:55)
at main_v2.Main_PLACEMENT_v07.doPlacements(Main_PLACEMENT_v07.java:182)
at main_v2.Main_v2.main(Main_v2.java:162)

ERROR: ERROR model initialization |JC69| (LIBPLL-5001): DNA model not found: JC69

I installed both rappas (v1.21) and raxml-ng (v. 1.0.2) using conda.
When I set the DNA model as JC69 and HKY 81, the program shows the following error message.

ERROR: ERROR model initialization |JC69| (LIBPLL-5001): DNA model not found: JC69

Ancestral reconstruction finished. Return to RAPPAS process.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!! Raxml-ng outputs are missing, the process may have failed... !!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Some clues may be found in the AR/AR_sdterr.txt  or AR/AR_sdtour.txt files.

But when I run with K80, K81 and GTR mode, all the programs run without any problem.
I think the problem might be one of the programs recognizes JC and HKY, but the other recognizes JC69 and HKY85.

Could you check this problem?

RAPPAS accumulates floating-point rounding errors during computation

RAPPAS may miscalculate the scores of k-mers in a window for k sufficiently high if many k-mers are alive in this window. I found examples (D652 dataset, k=10, o=1.5) where the scores computed by RAPPAS v1.21 differ from the real values (computed manually) by up to 1e-5 (in non-log values). While it does not seem to be a lot, the compound effect of those little differences while placing queries produces placements that are different compared to RAPPAS2. (for the examples I found, XPAS scores are much closer to the real values).

This happens in src/core/algos/WordExplorer_v3.java:

currentLogSum+=session.parsedProbas.getPP(nodeId, i, j);
...
currentLogSum-=session.parsedProbas.getPP(nodeId, i, j);

where the class variable currentLogSum accumulates the rounding error from adding and subtracting the same value. The error is the higher the more k-mers are alive in the window (i.e., the number of times we change the variable is O(|alive k-mers|))

The easy fix is to make currentLogSum a local variable (change it only O(k) times instead).

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.