nstenz / ticr Goto Github PK

View Code? Open in Web Editor NEW

25.0 4.0 4.0 1.43 MB

License: GNU General Public License v3.0

Makefile 0.07% C 6.79% C++ 4.97% R 23.22% Shell 1.19% Perl 61.12% Julia 2.64%

ticr's People

Contributors

Stargazers

Watchers

Forkers

cecileane justincbagley crsl4 csy99

ticr's Issues

convergence check (option -c) does not work on different machine

You have to run the convergence test for MrBayes analysis in the same computer where you run MrBayes.
For example, I ran MrBayes on a computer with

TICR/scripts/mb.pl alleleA.tar.gz -m mb-block.txt -o mb-output

But then, I copied the output folder to a different machine, and I tried to run the mb.pl script with the option -c for convergence check:

TICR/scripts/mb.pl alleleA.tar.gz mb-output -c 0.05

and I get the following error:

Could not locate archive in 'mb-alleleB-red'.
You must specify a file containing a valid MrBayes block which will be appended to each gene.

Usage: mb.pl ([PARTITION TARBALL] [-m MRBAYES BLOCK]) || ([MRBAYES TARBALL] [-c THRESHOLD] || [-r THRESHOLD])

If I run the same command in the same computer where I ran the MrBayes analysis, I do not get any error (only the warnings the other issue). This is strange because I can see the mb.tar file, but the script cannot locate it.

slurm-based pipeline

@crsl4: quite a few things are hard coded in the slurm-based pipeline, such as folder names, or the name of the MrBayes block. Could these assumptions be removed, or at least be documented in the associated readme? The folder name is re-used to create a new folder, containing the new alignments I believe.

Could the readme file also tell about some translate file being created, and what it contains?

problem with running "getTreeBranchLengths.r"

Dear Prof. Stenz,
When I ran "getTreeBranchLengths.r" script, I keep getting the following error report. I tried two data sets. from the "ml.pl" script. Could you please guide me what might be the problem?
outgroup: last taxon
tree was read. 17 taxa.
listed the 976 quartets associated with edges in tree.
Error in dat$CF[i] <- cf[ind, resolution] : replacement has length zero
Execution halted

Thank you so much!
Best regards,
Lin Bai
South China Botanical Garden

running mdl.pl

Hi dear TICR team

I run the first script, mdl.pl, using the chr4-subset.nex in the example folder as an input, using this command:

mdl.pl input.fa -b 100 -f 10000

It starts running without any warning or error messages and produce a folder that includes:

Link of in the inputs (chr4-subset.nex)
chr4-subset-reduced-1.nex
these three empty folders:
mdl-genes
mdl-partitions
mdl-scores

It's running for two days and noting more result produced, and then I stop it

I want to know how long does it take to finish? And why I don't get the output of mdl?

I also copy the information on the screen:

Script was called as follows:
perl mdl.pl example/chr4-subset.nex -b 100 -f 10000
Will now proceed to breakdown 'example/chr4-subset.nex' using a forced breakpoint after every 10000 characters, and a minimum block size of 100.
PAUP settings: gaps will not be treated as characters, missing and ambiguous sites will be included.
MDL settings: nletters = 4, nbestpart = 1, ngroupmax = 10000.
Input file 'example/chr4-subset.nex' appears to be a Nexus file.
paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>paup>
800 total parsimony-informative sites found for 'example/chr4-subset.nex'.
Parsimony analyses will be performed on 36 different blocks.
Job server successfully created.
Determining commands for each block... done. (0 - 35)

I am looking forward to hearing from you

With the best
Niloo

Error: Bad file descriptor

We are running the bucky.pl script in the cluster and we get the following error:

$ bucky.pl sample0-mb/sample1.mb.tar -o sample1-bucky

Checking for BUCKy version >= 1.4.4...
  BUCKy version: 1.4.4.
  BUCKy version check passed.

Script was called as follows:
perl bucky.pl sample0-mb/sample1.mb.tar -o sample1-bucky

Found 10 taxa shared across all genes in this archive, 210 of 210 possible quartets will be run using output from 141 total genes.
Summarizing MrBayes output for 141 genes.
Job server successfully created.

Could not lock 'sample1.BUCKy.tar': Bad file descriptor.
Could not lock 'sample1.BUCKy.tar': Bad file descriptor.

  All connections closed.
Total execution time: 6 minutes, 32 seconds.

We do not get the error when running bucky.pl on the laptop.

The operating system in the cluster is:

$ cat /proc/version
Linux version 3.10.0-1127.13.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Tue Jun 23 15:46:38 UTC 2020

problem when closing parallel threads

I identified a problem with the TICR perl scripts.

For the analyses, I want to run the mb.pl and the bucky.pl scripts. Both scripts allow the option of --machine-file hosts.txt to identify a set of computers to parallelize the computations. But you run into problem if you use the same computer to start both jobs (one after the other).

Using the data in the PhyloNetworks wiki page, you can run from darwin00

~/software/TICR/scripts/mb.pl baseline.gamma0.3_n30/1_seqgen.tar.gz -m mb-block.txt --machine-file hosts.txt -o mb-output

which runs smoothly, but if you then run in darwin00

~/software/TICR/scripts/bucky.pl mb-output/1_seqgen.mb.tar -o bucky-output --machine-file hosts.txt

the analyses finish, but the main job gets stuck and is unable to close the connection to the parallel threads.

However, if you change to darwin02, and run again

~/software/TICR/scripts/bucky.pl mb-output/1_seqgen.mb.tar -o bucky-output --machine-file hosts.txt

Then everything works out fine.

So, you need to run mb.pl in darwin00 and bucky.pl in darwin02. This is in particularly tedious because you need to run the script from /tmp (not afs). If you run from afs (even with "stashticket" and "screen") the script loses permission to write files. So, to run the script from a different computer you need to copy the files into /tmp again.

I discussed this with Noah long ago, and he could not find the source of the problem (I think). For the snaq runs, I had to change computers, or he also told me to change the port option in the perl script (which I don't recall how to use).

Bottomline, there seems to be a bug in the scripts that cannot properly close the connections to the parallel threads if you run the script for the second time (the first time, it closes connections fine).

Problem with -T command

I tried to use the mb.pl script and indicated to use -T 24 only. It seems this is not working since mb.pl uses all threads available in the server (64 in total - Linux). I changed to -T 5, -T 9 and it was the same; the script used all available threads.
Any comments please? Thanks.

bug in counter of the number of analyses in mb.pl

I suspect that there is a bug in the counter of the number of analyses in the mb.pl script.
I have a tar with 12 genes (so, 12 analyses), but when I try to run the script, I get:

Script was called as follows:
perl mb.pl alleleA.tar.gz -m mb-block.txt -o mb-alleleA

Appending MrBayes block to each gene... done.

Job server successfully created.

  Analyses complete: 1/23.

So, it is trying to do 23 analyses.

I created a tar file with only two nexus file (two genes), and I still get:

Script was called as follows:
perl mb.pl alleleA.tar.gz -m mb-block.txt -o mb-alleleA

Appending MrBayes block to each gene... done.

Job server successfully created.

  Analyses complete: 4/4.
  All connections closed.
Total execution time: 1 hour, 30 minutes, 12 seconds.

This bug does not happen every time. I have run the scripts with many other files, and this has not happened. The bug might have to do with the specific dataset that I have, but unfortunately, I cannot share it.
I just wanted to document in case someone else encounters a similar issue.

repeated warning with option -c in mb.pl

When checking the convergence of MrBayes runs, I get a repeated warning:

perl mb.pl mb-alleleA -c 0.05

Argument "1C_A-red.nex.tar.gz" isn't numeric in numeric comparison (<=>) at /software/TICR/scripts/mb.pl line 634.

This line is repeated multiple times (only copied here once). It is not repeated once per gene file. I have 12 genes, and I got the warning 60 times.
I cannot share the files to reproduce this, but maybe someone else has encountered this issue.