nstenz / ticr Goto Github PK
View Code? Open in Web Editor NEWLicense: GNU General Public License v3.0
License: GNU General Public License v3.0
You have to run the convergence test for MrBayes analysis in the same computer where you run MrBayes.
For example, I ran MrBayes on a computer with
TICR/scripts/mb.pl alleleA.tar.gz -m mb-block.txt -o mb-output
But then, I copied the output folder to a different machine, and I tried to run the mb.pl
script with the option -c for convergence check:
TICR/scripts/mb.pl alleleA.tar.gz mb-output -c 0.05
and I get the following error:
Could not locate archive in 'mb-alleleB-red'.
You must specify a file containing a valid MrBayes block which will be appended to each gene.
Usage: mb.pl ([PARTITION TARBALL] [-m MRBAYES BLOCK]) || ([MRBAYES TARBALL] [-c THRESHOLD] || [-r THRESHOLD])
If I run the same command in the same computer where I ran the MrBayes analysis, I do not get any error (only the warnings the other issue). This is strange because I can see the mb.tar
file, but the script cannot locate it.
@crsl4: quite a few things are hard coded in the slurm-based pipeline, such as folder names, or the name of the MrBayes block. Could these assumptions be removed, or at least be documented in the associated readme? The folder name is re-used to create a new folder, containing the new alignments I believe.
Could the readme file also tell about some translate file being created, and what it contains?
Dear Prof. Stenz,
When I ran "getTreeBranchLengths.r" script, I keep getting the following error report. I tried two data sets. from the "ml.pl" script. Could you please guide me what might be the problem?
outgroup: last taxon
tree was read. 17 taxa.
listed the 976 quartets associated with edges in tree.
Error in dat$CF[i] <- cf[ind, resolution] : replacement has length zero
Execution halted
Thank you so much!
Best regards,
Lin Bai
South China Botanical Garden
Hi dear TICR team
I run the first script, mdl.pl, using the chr4-subset.nex in the example folder as an input, using this command:
It starts running without any warning or error messages and produce a folder that includes:
It's running for two days and noting more result produced, and then I stop it
I want to know how long does it take to finish? And why I don't get the output of mdl?
I also copy the information on the screen:
Script was called as follows:
perl mdl.pl example/chr4-subset.nex -b 100 -f 10000
Will now proceed to breakdown 'example/chr4-subset.nex' using a forced breakpoint after every 10000 characters, and a minimum block size of 100.
PAUP settings: gaps will not be treated as characters, missing and ambiguous sites will be included.
MDL settings: nletters = 4, nbestpart = 1, ngroupmax = 10000.
Input file 'example/chr4-subset.nex' appears to be a Nexus file.
paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>
800 total parsimony-informative sites found for 'example/chr4-subset.nex'.
Parsimony analyses will be performed on 36 different blocks.
Job server successfully created.
Determining commands for each block... done. (0 - 35)
I am looking forward to hearing from you
With the best
Niloo
We are running the bucky.pl script in the cluster and we get the following error:
$ bucky.pl sample0-mb/sample1.mb.tar -o sample1-bucky
Checking for BUCKy version >= 1.4.4...
BUCKy version: 1.4.4.
BUCKy version check passed.
Script was called as follows:
perl bucky.pl sample0-mb/sample1.mb.tar -o sample1-bucky
Found 10 taxa shared across all genes in this archive, 210 of 210 possible quartets will be run using output from 141 total genes.
Summarizing MrBayes output for 141 genes.
Job server successfully created.
Could not lock 'sample1.BUCKy.tar': Bad file descriptor.
Could not lock 'sample1.BUCKy.tar': Bad file descriptor.
All connections closed.
Total execution time: 6 minutes, 32 seconds.
We do not get the error when running bucky.pl on the laptop.
The operating system in the cluster is:
$ cat /proc/version
Linux version 3.10.0-1127.13.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Tue Jun 23 15:46:38 UTC 2020
I identified a problem with the TICR perl scripts.
For the analyses, I want to run the mb.pl
and the bucky.pl
scripts. Both scripts allow the option of --machine-file hosts.txt
to identify a set of computers to parallelize the computations. But you run into problem if you use the same computer to start both jobs (one after the other).
Using the data in the PhyloNetworks wiki page, you can run from darwin00
~/software/TICR/scripts/mb.pl baseline.gamma0.3_n30/1_seqgen.tar.gz -m mb-block.txt --machine-file hosts.txt -o mb-output
which runs smoothly, but if you then run in darwin00
~/software/TICR/scripts/bucky.pl mb-output/1_seqgen.mb.tar -o bucky-output --machine-file hosts.txt
the analyses finish, but the main job gets stuck and is unable to close the connection to the parallel threads.
However, if you change to darwin02
, and run again
~/software/TICR/scripts/bucky.pl mb-output/1_seqgen.mb.tar -o bucky-output --machine-file hosts.txt
Then everything works out fine.
So, you need to run mb.pl
in darwin00 and bucky.pl
in darwin02. This is in particularly tedious because you need to run the script from /tmp
(not afs). If you run from afs (even with "stashticket" and "screen") the script loses permission to write files. So, to run the script from a different computer you need to copy the files into /tmp
again.
I discussed this with Noah long ago, and he could not find the source of the problem (I think). For the snaq runs, I had to change computers, or he also told me to change the port
option in the perl script (which I don't recall how to use).
Bottomline, there seems to be a bug in the scripts that cannot properly close the connections to the parallel threads if you run the script for the second time (the first time, it closes connections fine).
I tried to use the mb.pl script and indicated to use -T 24 only. It seems this is not working since mb.pl uses all threads available in the server (64 in total - Linux). I changed to -T 5, -T 9 and it was the same; the script used all available threads.
Any comments please? Thanks.
I suspect that there is a bug in the counter of the number of analyses in the mb.pl
script.
I have a tar with 12 genes (so, 12 analyses), but when I try to run the script, I get:
Script was called as follows:
perl mb.pl alleleA.tar.gz -m mb-block.txt -o mb-alleleA
Appending MrBayes block to each gene... done.
Job server successfully created.
Analyses complete: 1/23.
So, it is trying to do 23 analyses.
I created a tar file with only two nexus file (two genes), and I still get:
Script was called as follows:
perl mb.pl alleleA.tar.gz -m mb-block.txt -o mb-alleleA
Appending MrBayes block to each gene... done.
Job server successfully created.
Analyses complete: 4/4.
All connections closed.
Total execution time: 1 hour, 30 minutes, 12 seconds.
This bug does not happen every time. I have run the scripts with many other files, and this has not happened. The bug might have to do with the specific dataset that I have, but unfortunately, I cannot share it.
I just wanted to document in case someone else encounters a similar issue.
When checking the convergence of MrBayes runs, I get a repeated warning:
perl mb.pl mb-alleleA -c 0.05
Argument "1C_A-red.nex.tar.gz" isn't numeric in numeric comparison (<=>) at /software/TICR/scripts/mb.pl line 634.
This line is repeated multiple times (only copied here once). It is not repeated once per gene file. I have 12 genes, and I got the warning 60 times.
I cannot share the files to reproduce this, but maybe someone else has encountered this issue.
This is not really an issue, more of a possible improvement.
For the tree, TICR.r assumes that the file is named: name.QMClengths.tre.
This can be confusing if you got the tree from a different program, not QMC.
Apparently if MrBayes is installed with conda install -c BioBuilds mrbayes
, the mb.pl
script will not run properly. If MrBayes is installed from the developers' website, then things work out.
@crsl4 : do we need the files example/bucky-slurm-submit.sh
, example/mb-slurm-submit.sh
and example/paste-mb-block.jl
? They seem to duplicate the files in scripts-cluster
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.