Giter VIP home page Giter VIP logo

mashtree's Introduction

mashtree

DOI

Create a tree using Mash distances.

For simple usage, see mashtree --help. This is an example command:

mashtree *.fastq.gz > tree.dnd

For confidence values, run either with --help: mashtree_bootstrap.pl or mashtree_jackknife.pl.

Two modes: fast or accurate

Input files: fastq files are interpreted as raw read files. Fasta, GenBank, and EMBL files are interpreted as genome assemblies. Compressed files are also accepted of any of the above file types. You can compress with gz, bz2, or zip.

Output files: Newick (.dnd). If --outmatrix is supplied, then a distance matrix too.

See the documentation on the algorithms for more information.

Faster

mashtree --numcpus 12 *.fastq.gz [*.fasta] > mashtree.dnd

More accurate

You can get a more accurate tree with the minimum abundance finder. Simply give --mindepth 0. This step helps ignore very unique kmers that are more likely read errors.

mashtree --mindepth 0 --numcpus 12 *.fastq.gz [*.fasta] > mashtree.dnd

Adding confidence values

Mashtree can add confidence values using jack knifing. For each jack knife tree, 50% of hashes are used. Confidence values are calculated from the jack knife trees using BioPerl. When using this method, you can pass flags to mashtree using the double-dash like in the example below.

Added in version 0.40.

mashtree_jackknife.pl --reps 100 --numcpus 12 *.fastq.gz -- --min-depth 0 > mashtree.jackknife.dnd
mashtree_jackknife.pl --help # additional usage help

Bootsrapping was added in version 0.55. This runs mashtree itself multiple times, each with a random seed.

mashtree_bootstrap.pl --reps 100 --numcpus 12 *.fastq.gz -- --min-depth 0 > mashtree.bootstrap.dnd

Usage

Usage: mashtree [options] *.fastq *.fasta *.gbk *.msh > tree.dnd
NOTE: fastq files are read as raw reads;
      fasta, gbk, and embl files are read as assemblies;
      Input files can be gzipped.
--tempdir            ''   If specified, this directory will not be
                          removed at the end of the script and can
                          be used to cache results for future
                          analyses.
                          If not specified, a dir will be made for you
                          and then deleted at the end of this script.
--numcpus            1    This script uses Perl threads.
--outmatrix          ''   If specified, will write a distance matrix
                          in tab-delimited format
--file-of-files           If specified, mashtree will try to read
                          filenames from each input file. The file of
                          files format is one filename per line. This
                          file of files cannot be compressed.
--outtree                 If specified, the tree will be written to
                          this file and not to stdout. Log messages
                          will still go to stderr.
--version                 Display the version and exit

TREE OPTIONS
--truncLength        250  How many characters to keep in a filename
--sort-order         ABC  For neighbor-joining, the sort order can
                          make a difference. Options include:
                          ABC (alphabetical), random, input-order

MASH SKETCH OPTIONS
--genomesize         5000000
--mindepth           5    If mindepth is zero, then it will be
                          chosen in a smart but slower method,
                          to discard lower-abundance kmers.
--kmerlength         21
--sketch-size        10000

Installation

Please see INSTALL.md

Further documentation

For perl library help, run perldoc on a .pm file, e.g., perldoc lib/Mashtree/Db.pm.

For executable help run --help, e.g., mashtree_bootstrap.pl --help.

For more information and help please see the docs folder

For more information on plugins, see the plugins folder. (in development)

For more information on contributions, please see CONTRIBUTING.md.

References

Citation

JOSS

Katz, L. S., Griswold, T., Morrison, S., Caravas, J., Zhang, S., den Bakker, H.C., Deng, X., and Carleton, H. A., (2019). Mashtree: a rapid comparison of whole genome sequence files. Journal of Open Source Software, 4(44), 1762, https://doi.org/10.21105/joss.01762

Poster

Katz, L. S., Griswold, T., & Carleton, H. A. (2017, October 8-11). Generating WGS Trees with Mashtree. Poster presented at the American Society for Microbiology Conference on Rapid Applied Microbial Next-Generation Sequencing and Bioinformatic Pipelines, Washington, DC. Poster number 27.

mashtree's People

Contributors

csoneson avatar fbristow avatar lskatz avatar manwar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mashtree's Issues

Possible SQL injection attack

At the moment you are manually construction SQL INSERT commands:

    $insertSQL.=qq( ("$query", "$subject", $distance), );

The better solution is to use SQL "prepared" statements which will do the quoting etc for you, and also simplify your code.

my $sth = $dbh->prepare("INSERT INTO table (genome1,genome1,distance) VALUES (?,?,?)");
foreach (entry) {
  $sth->execute($query, $subject, $distance) or die "failed to insert";
}
$sth->finish;

This may also help with issue #18 ?

INSTALL.md Quicktree unzipping cmd error

Quicktree instructions:

...
$ wget https://github.com/khowe/quicktree/archive/v2.5.zip
$ tar xvf v2.5.tar.gz 
...

There are different files mentioned (zip is correct).

Also, it would be easier to copy-paste commands without the leading dollar sign.

Continue canceled job without calculating the distances again

Is your feature request related to a problem? Please describe.
I've stopped the tree building process and I'd like mashtree to continue where it left off. Creating msh files again is skipped but what about distances already added to the sql database? If existing entries to the database are already checked, then I've got some other issue and this one can be closed.

Describe the solution you'd like
If distances.sql exists, exclude all of the comparisons already done. This is probably a bit tricky because the other genomes are given in the mshList.txt

Describe alternatives you've considered

  1. Print out the whole database, check which msh is already compared to all other mshs and remove the line from mshList.txt.
  2. Start again, delete the database but keep the msh files and hope no further interruptions (currently due to optimization) are needed.

Additional context
I'm running mashtree on a HPC with 15k bacterial genomes and trying to optimize the resource allocation. The latest issue is mashtree grinding to a halt (probably) because of mash subprocesses are using up all 1024 file descriptors (handles).

The phylip conversion step too slow

When Mashtree is applied to a large number of genomes (e.g., 10k), the phylip conversion step seems to be the bottleneck. I'm no expert on Perl, but it seems to me that it might be due to the way how the phylip string ($str) is created โ€“ Mashtree uses the concatenation assignment operator .= for extending the string (see e.g.

$str.=sprintf("%0.10f ",$$distanceHash{$name[$j]});
). I'm unsure about how this is handled by Perl, but in Python this would be very slow and creating a list of strings that are joined in the end is the recommended practice. Google suggests that the same may apply to Perl too: https://stackoverflow.com/a/3104548/4641846.

Select outgroup

Hi team,

Currently generating a guide tree for a group of marsupials, yet mashtree is grouping my placental mammals within the marsupial lineages.

Would you please be able to write some software so that we can define our outgroup species relative to current evolutionary understanding (ie. marsupials, monotremes and placental mammals are independent lineages).

Cheers,
Anna

Running mashtree locally with local dependecies

Due to server restrictions I can't install anything system-wide so to use mashtree I ended up going through the mashtree code and made references to a local dir where mash, quicktree and perl libs are located (also setting Bio::Tree::DistanceFactory->new(-method=>"UPGMA")).

We're currently planning to use mashtree as a part of our software but adding a fixed modified mashtree doesn't look nice. Having the same problem, we don't expect the users to install anything globally.

Should I be looking for other solutions or is it an option that can be added?

Rooted tree with BioPerl using UPGMA

This is more of a discussion/suggestion/pro-BioPerl argument than an issue.

Using an older version of mashtree (0.30) with BioPerl I needed a rooted tree which NJ does not produce.
That said, changing the reoccurring line
my $dfactory = Bio::Tree::DistanceFactory->new(-method=>"NJ");
in all files to
my $dfactory = Bio::Tree::DistanceFactory->new(-method=>"UPGMA");
did the trick.

Since there's no way to root with mashtree(?) maybe it's worth adding a "tree constructing algorithm" as an option?

v0.4.9 => "Failed test 'Saving sketches"

t/02_lambda.t ............. 2/6
    #   Failed test 'MD5 of sample1.fastq.gz.msh'
    #   at t/02_lambda.t line 96.
    #          got: 'f52e4232f5c7c46e77a15a9ac7ee1bfd'
    #     expected: 'b737ed5e87f4851181c0f3027848ab4b'

    #   Failed test 'MD5 of sample4.fastq.gz.msh'
    #   at t/02_lambda.t line 96.
    #          got: 'd4e868824b2bae710aaad61da87d13ec'
    #     expected: '53d545265d37632dbfcb0d2556e8aba6'

    #   Failed test 'MD5 of sample2.fastq.gz.msh'
    #   at t/02_lambda.t line 96.
    #          got: '68df1823233fbea03e82778b3c3c4a2e'
    #     expected: '90e5a975176b4add3eb13891a1ee8368'

    #   Failed test 'MD5 of sample3.fastq.gz.msh'
    #   at t/02_lambda.t line 96.
    #          got: '5d9db49d34e6193cfca45db1005613ef'
    #     expected: '52fbe9f503f240065def9872cfd8e308'
    # Looks like you failed 4 tests of 4.

#   Failed test 'Saving sketches'
#   at t/02_lambda.t line 99.
# Looks like you failed 1 test of 6.
t/02_lambda.t ............. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/6 subtests

mash sketch ERROR: Unrecognized option: -S

I'm running mashtree 0.57 (anaconda version) on SRA sequences assembled into contigs/scaffolds and I'm getting the following error:

mashtree: main: Temporary directory will be /local/136489.1.eddie/MASHTREE.7_2A9p
mashtree: main: mashtree on 2339 files
mashtree: mashSketch(TID1): This thread will work on 2339 sketches
mashtree: mashSketch(TID1): Working on file 1 out of 2339
mashtree: mashSketch(TID1): ERROR running mash sketch -S 42 -k 19 -s 10000   -o /local/136489.1.eddie/MASHTREE.7_2A9p/ERR2632350_ctg2.fasta ootw/ERR2632350_ctg2.fasta 2>&1!
  ERROR: Unrecognized option: -S

Thread 1 terminated abnormally: mashtree: main::mashSketch: Died
Stopped at /exports/cmvm/eddie/eb/groups/fitzgerald_grp/software/conda_envs/jamieG/envs/tree/bin/mashtree line 298.
mashtree: mashDistance: Waiting to join thread (1/1, TID2)
mashtree: mashDistance: Databasing distances (1/1, TID2)
mashtree: mashDistance: Converting to phylip format into /local/136489.1.eddie/MASHTREE.7_2A9p/distances.phylip
mashtree: Mashtree::createTreeFromPhylip: Can't call method "as_text" on an undefined value
Stopped at /exports/cmvm/eddie/eb/groups/fitzgerald_grp/software/conda_envs/jamieG/envs/tree/lib/site_perl/5.26.2/Mashtree.pm line 217.

I'm not sure what the error means or where I might be going wrong?
I'm using 2 8G cores of a high compute cluster (sge) and I ran mashtree with the command: mashtree --numcpus 2 --kmerlength 19 --outtree ./mashtree.dnd ootw/* where all the files in ootw are .fasta genome assemblies.

Any suggestions would be greatly appreciated!

t/10_sqlite.t ..... Dubious, test returned 1 (wstat 256, 0x100)

ashtree.pl: Mashtree::createTreeFromPhylip: Creating tree with BioPerl
t/03_subsample.t .. ok
t/10_sqlite.t ..... 5/6
#   Failed test 'Added distances to the database'
#   at t/10_sqlite.t line 44.
# Looks like you failed 1 test of 6.
t/10_sqlite.t ..... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/6 subtests

Test Summary Report
-------------------
t/01_filetypes.t (Wstat: 512 Tests: 1 Failed: 0)
  Non-zero exit status: 2
  Parse errors: Bad plan.  You planned 2 tests but ran 1.
t/10_sqlite.t   (Wstat: 256 Tests: 6 Failed: 1)
  Failed test:  5
  Non-zero exit status: 1
Files=5, Tests=13, 75 wallclock secs ( 0.03 usr  0.01 sys + 73.08 cusr  1.33 csys = 74.45 CPU)
Result: FAIL
Failed 2/5 test programs. 1/13 subtests failed.
make: *** [Makefile:907: test_dynamic] Error 1
FAIL

CPAN 0.22 failing?

Does it usually take a while?

$ cpanm Mashtree
--> Working on Mashtree
Fetching http://www.cpan.org/authors/id/L/LS/LSKATZ/Mashtree-0.22.tar.gz ... FAIL
! Download http://www.cpan.org/authors/id/L/LS/LSKATZ/Mashtree-0.22.tar.gz failed. Retrying ...
! Download http://www.cpan.org/authors/id/L/LS/LSKATZ/Mashtree-0.22.tar.gz failed. Retrying ...
! Download http://www.cpan.org/authors/id/L/LS/LSKATZ/Mashtree-0.22.tar.gz failed. Retrying ...
! Failed to download http://www.cpan.org/authors/id/L/LS/LSKATZ/Mashtree-0.22.tar.gz
! Failed to fetch distribution Mashtree-0.22

SQLite error: too many terms in compound SELECT

Not an active issue - I've got it working. Just thought I'd post it here for future reference!

I was running Mashtree with a lot of bacterial genomes (about 400) and DBD::SQLite was crashing with a 'too many terms in compound SELECT' error. I found that SQLite has a limit on the number of terms in a compound select statement which is set at compile time using the SQLITE_MAX_COMPOUND_SELECT parameter.

I got around the problem by compiling the DBD::SQLite module with an increased value for that parameter:

wget http://search.cpan.org/CPAN/authors/id/I/IS/ISHIGAKI/DBD-SQLite-1.54.tar.gz
tar -xf DBD-SQLite-1.54.tar.gz
cd DBD-SQLite-1.54
sed -i 's|SQLITE_MAX_COMPOUND_SELECT 500|SQLITE_MAX_COMPOUND_SELECT 1000000|g' sqlite3.c
perl Makefile.PL INSTALL_BASE=$HOME/.local/perl
make
make install

I'm not sure how high I needed to make it, but 1000000 worked fine.

Then I needed to make sure Perl was using this version of the module:

export PERL5LIB=$HOME/.local/perl/lib/perl5:$HOME/.local/perl/lib/perl5/x86_64-linux-thread-multi:$PERL5LIB

And then Mashtree happily worked for all 400 genomes!

Faster kmer counting

Don't make a dependency out of it, but if jellyfish, khmer, or kanalyze is present, use that. Make a flag to be able to specify one of these kmer counters or "pure-perl." When the flag is not specified, Mashtree will attempt to find a faster kmer counter than pure perl.

Installation issue mash tree

I installed mash tree first with bioconda:
conda install mashtree

I did the same for mash and it worked fine. However for mashtree I get the error
"BEGIN failed--compilation aborted at /Users/X/miniconda3/bin/mashtree line 22"

I then downloaded the package directly from GitHub and tried again, but there is no such file as Makefile.PL

Any advice?

0.29 failed tests on cpanm install

md5_hex(cat $sqliteFile) and $dbMd5sum don't seem to be equal on our setup.

cpanm Mashtree
...
sh: mashtree: command not found
01_filetypes.t: Mashtree::treeDist: Can't call method "get_nodes" on an undefined value
Stopped at lib/Mashtree.pm line 249.
# Looks like you planned 2 tests but ran 1.
# Looks like your test exited with 2 just after 1.
t/01_filetypes.t ..
Dubious, test returned 2 (wstat 512, 0x200)
Failed 1/2 subtests

# Failed test 'Added distances to the database'
# at t/10_sqlite.t line 44.
# Looks like you failed 1 test of 6.
t/10_sqlite.t .....
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/6 subtests
...

Help understanding boostrap results

Hi

I've run mashtree_bootstrap.pl with 100 replicates and 1000 replicates (see attached images). Would I be correct in assuming the 100 is a percentage? When I first ran it with 100 I assumed it was the number of replicates which agreed with that branch and not a percentage of replicates. There's nowhere in the documentation I've seen which says this.

Given that some branches are low percentage confidence (e.g. 21%), could I interpret this to mean there are missing genomes for these branches to account for the low percentage and if they were later included (a much larger comparison) this would boost the branch confidence in this branch region?)

Thanks for your help.

Boostrap 100 rep
pangenome_tree_bootstrap

Bootstrap 1000 rep
pangenome_tree_bootstrap_1000rep

No need to uncompress data for mash

Mash can read .gz files so we don't need to uncompress the files to --tempdir ?

We killed a server today when /tmp filled up running mashtree. Whoops :)

Is .gz support new to Mash 2.0 ?

Faster! Better multithreading

If more cpus are given than genomes, see if we can distribute the load so that we multithread mash and maybe the kmer counters.

Confidence values

Mashtree can add confidence values using jack knifing.

As far as I understand the jack knifing code, it samples the hashes, which itself is already a sample of all k-mers. I think a more sound way would be to bootstrap mash itself. In the case of mashtree, just run mash multiple times with different seed values (not applicable to precomputed sketches). This then gives a new, random sample of hashes each time, a new distance matrix, and a new tree. For the latter you can then compute the consensus tree and support values.

Shouldn't be too hard to implement, except I don't know perl. ๐Ÿ˜

Mashtree::createTreeFromPhylip: Can't call method "as_text" on an undefined value

Describe the bug
I'm trying to compute a tree from precomputed sketches (sketches.tar.gz). I always get the following error (see also the full log.txt):

mashtree: mashDistance: Databasing distances (5/8, TID13)
mashtree: mashDistance: Waiting to join thread (6/8, TID14)
mashtree: mashDistance: Databasing distances (6/8, TID14)
mashtree: mashDistance: Waiting to join thread (7/8, TID15)
mashtree: mashDistance: Databasing distances (7/8, TID15)
mashtree: mashDistance: Waiting to join thread (8/8, TID16)
mashtree: mashDistance: Databasing distances (8/8, TID16)
mashtree: mashDistance: Converting to phylip format into /var/folders/5c/b4402v855qd5lf_dr0cj93gh0000gn/T/MASHTREE.YtCWcT/distances.phylip
mashtree: Mashtree::createTreeFromPhylip: Can't call method "as_text" on an undefined value 
Stopped at /Users/karel/miniconda/envs/mashtree/lib/site_perl/5.26.2/Mashtree.pm line 217.

Desktop (please complete the following information):

  • OS: iOS
  • Version Mashtree 1.0.4 (tested also with 1.2.0)
  • which method did you install with? conda

Allow use of premade sketches in conjunction with --file-of-files

Mashtree v1.1.2 appears to assume that a supplied FOFN contains fastqs, and unlike the command line args, is not accepting of .msh sketches. It would be useful to be able to supply sketches inside a FOFN when dealing with large datasets approaching ARG_MAX when input with shell globbing.

pod documentation

TODO for myself: make POD documentation to make this project more visible on CPAN.

Installing mashtree on CentOS (Docker image)

This is more a comment than an issue (can be added to the INSTALL.md?).

Though this is related to the discussion on #43, someone else might benefit from it.
Since there are other dependencies besides mashtree that my program needs, I decided to try putting it all to a Docker container.

Using CentOS image which should be as clean as it gets (no other programs) I managed to get mashtree working with the following commands:

yum -y install perl-devel perl-Env expat-devel

export PATH=$HOME/bin:$PATH
export PERL5LIB=$PERL5LIB:$HOME/lib/perl5
yum -y install cpanminus wget
yum -y install unzip gcc bzip2 which

cpanm --notest BioPerl Bio::Sketch::Mash DBD::SQLite DBI Graph::Dijkstra

mkdir -pv $HOME/bin/build
cd $HOME/bin/build
wget https://github.com/khowe/quicktree/archive/v2.5.zip
unzip v2.5.zip 
cd quicktree-2.5
make
mv quicktree $HOME/bin/

mkdir -pv $HOME/bin/build
cd $HOME/bin/build
wget https://github.com/marbl/Mash/releases/download/v2.2/mash-Linux64-v2.2.tar
tar xvf mash-Linux64-v2.2.tar
mv -v mash-Linux64-v2.2/mash $HOME/bin/

cpanm -l ~ Mashtree
mashtree --help

Note that CentOS already had sqlite3. If needed, it can be installed with
yum -y install sqlite.x86_64
(yum -y install sqlite3 does not work).

The tricky part was figuring out which of the system dependencies were needed. For example while installing mashtree, quicktree was reported missing because "which" wasn't installed.

Tree build fail for many isolates

Hi @lskatz ,

Thank you very much for the useful program. I am trying to run mashtree on more than 1000 isolates, but I am running into a problem. The error message is below.

mashtree.pl: mashDistance: Converting to phylip format into mash_tmp/distances.phylip
DBD::SQLite::db prepare failed: Expression tree is too large (maximum depth 1000) at /home/linuxbrew/.linuxbrew/Cellar/perl/5.26.1/lib/perl5/site_perl/5.26.1/Mashtree/Db.pm line 301.
mashtree.pl: Mashtree::Db::toString_phylip: DBD::SQLite::db prepare failed: Expression tree is too large (maximum depth 1000)
Stopped at /home/linuxbrew/.linuxbrew/Cellar/perl/5.26.1/lib/perl5/site_perl/5.26.1/Mashtree/Db.pm line 301.

Cheers,

Bio::Tree::DistanceFactory

Hi there,

I am using mash-Linux64-v1.1.1 and I receive the message _mashtree.pl: Bio::Tree::DistanceFactory::nj: Can't locate object method "add_Descendents" via package "Bio::Tree::Node" while calling the program with 2 fasta files. I have currently perl 5, version 18, subversion 2 (v5.18.2) installed.

Am I doing something wrong ?

Thanks

0.27 failed on quicktree test

mashtree.pl: Mashtree::createTreeFromPhylip: Creating tree with QuickTree

#   Failed test 'Mashtree produced the expected tree'
#   at t/01_filetypes.t line 20.
#          got: '(CFSAN001112.ref:0.00001,CFSAN001115.ref:0.00000,(((CFSAN001140_1:0.00019,CFSAN000968.ref:0.00001):0.00000,CFSAN000189.ref:0.00000):0.00001,((CFSAN000961.gbk:0.00005,CFSAN000211.gbk:0.00045):0.00004,(CFSAN000191.ref:0.00000,CFSAN000189.gbk:0.00000):0.00001):0.00001):0.00000);'
#     expected: '(CFSAN000189.gbk:0.00000,(CFSAN001112.ref:0.00001,CFSAN001140_1:0.00019):0.00001,((CFSAN000968.ref:0.00000,CFSAN001115.ref:0.00000):0.00001,(CFSAN000189.ref:0.00000,(CFSAN000191.ref:0.00002,(CFSAN000211.gbk:0.00045,CFSAN000961.gbk:0.00005):0.00003):0.00000):0.00001):0.00000);'
# Looks like you failed 1 test of 2.
quicktree -v
quicktree 2.2

0.29 cpanm install mashtree executable not found

We've installed Mashtree via cpanm (--force as our md5 wasn't matching).

cpanm Mashtree

We get the depreciation warning as expected:

mashtree.pl -h
mashtree.pl: Mashtree::logmsg: WARNING: the executable mashtree.pl is deprecated. Please switch to the mashtree executable (without the .pl).
mashtree.pl: main::main: mashtree.pl: use distances from Mash (min-hash algorithm) to make a NJ tree
  Usage: mashtree.pl [options] *.fastq *.fasta *.gbk *.msh > tree.dnd

...

Stopped at /software/pathogen/external/lib/bin/mashtree.pl line 57.

But, the mashtree executable isn't copied into our bin directory, we get mashtree.pl and mashtree_wrapper.pl. Had a hunt around the module directory, but couldn't see it there to symlink across.

mashtree.pl -v
mashtree.pl: Mashtree::logmsg: WARNING: the executable mashtree.pl is deprecated. Please switch to the mashtree executable (without the .pl).
Mashtree 0.29

Usage of the mashtree_wrapper.pl

Can the mashtree_wrapper.pl be used in the latest version? It seems like it is still calling the mashtree.pl file.
After replacing in the script mashtree.pl for mashtree it runs however I encounter the following error.

Use of uninitialized value $dist in concatenation (.) or string at /scratch/installers/mashtree/mashtree-master/bin/../lib/Mashtree/Db.pm line 239.
Use of uninitialized value $dist in concatenation (.) or string at /scratch/installers/mashtree/mashtree-master/bin/../lib/Mashtree/Db.pm line 239.
Use of uninitialized value $dist in concatenation (.) or string at /scratch/installers/mashtree/mashtree-master/bin/../lib/Mashtree/Db.pm line 239.
Use of uninitialized value $dist in concatenation (.) or string at /scratch/installers/mashtree/mashtree-master/bin/../lib/Mashtree/Db.pm line 239.
Use of uninitialized value $dist in concatenation (.) or string at /scratch/installers/mashtree/mashtree-master/bin/../lib/Mashtree/Db.pm line 239.
Use of uninitialized value $dist in concatenation (.) or string at /scratch/installers/mashtree/mashtree-master/bin/../lib/Mashtree/Db.pm line 239.
mashtree: Mashtree::createTreeFromPhylip: Creating tree with QuickTree
mv: cannot stat '/tmp/MASHTREE.CbWrGN/observed/tree.dnd.tmp': No such file or directory
mashtree_wrapper.pl: main::main: Died 

Thanks.

mashtree_jackknife.pl exited

Hi!
I am planning to use mashtree_bootstrap.pl --reps 100 --numcpus 12 *.msh -- --min-depth 0 > mashtree.bootstrap.dnd but on anaconda there is no mashtree_bootstrap.pl and by git cloning I got this:

t/06_jackknifingHashes.t .. 1/5 Bailout called.  Further testing stopped:  mashtree_jackknife.pl exited with an error code 0
# Can't locate List/MoreUtils.pm in @INC (you may need to install the List::MoreUtils module) (@INC contains: /home/edgar/Mash/                         mashtree/blib/lib /home/edgar/Mash/mashtree/blib/arch /home/edgar/perl5/lib/perl5/x86_64-linux-thread-multi /home/edgar/perl5/l                         ib/perl5/x86_64-linux-thread-multi /home/edgar/perl5/lib/perl5 /home/edgar/perl5/lib/perl5/x86_64-linux-thread-multi /home/edga                         r/perl5/lib/perl5 /home/edgar/perl5/lib/perl5/x86_64-linux-thread-multi /home/edgar/perl5/lib/perl5 /home/edgar/anaconda2/lib/p                         erl5/site_perl/5.22.0/x86_64-linux-thread-multi /home/edgar/anaconda2/lib/perl5/site_perl/5.22.0 /home/edgar/anaconda2/lib/perl                         5/5.22.0/x86_64-linux-thread-multi /home/edgar/anaconda2/lib/perl5/5.22.0 .) at ./bin/mashtree_jackknife.pl line 14.
# BEGIN failed--compilation aborted at ./bin/mashtree_jackknife.pl line 14.
FAILED--Further testing stopped: mashtree_jackknife.pl exited with an error code 0
Makefile:925: recipe for target 'test_dynamic' failed
make: *** [test_dynamic] Error 255

This is when I installed mashtree by using: conda install -c bioconda mashtree, but I got this warning:

(base) edgar@pathos:~/anaconda2/bin$ mashtree.pl --help
mashtree.pl: Mashtree::logmsg: WARNING: the executable mashtree.pl is deprecated. Please switch to the mashtree executable (without the .pl).

Thank you for your help.

use of unititialized value message

Use of uninitialized value in concatenation (.) or string at /home/linuxbrew/.linuxbrew/Cellar/perl/5.26.2/lib/perl5/site_perl/5.26.2/Mashtree/Db.pm line 237.

Even though I see this message, the run seems to produce a valid result.

GBK support

Hello.

It would be great to have support for GENBANK files.

Thank you.

Anders.

Use of uninitialized value $dist in concatenation (.) or string

I ran mashtree, and it was ok.

Then i re-ran mashtree (due to forgetting to capture stdout), and this time it gave this error:

mashtree: mashDistance: Writing a distance matrix to out.mat
Use of uninitialized value $dist in concatenation (.) or string at /home/linuxbrew/.linuxbrew/Cellar/perl/5.28.0/lib/perl5/site_perl/5.28.0/Mashtree/Db.pm line 239.
Use of uninitialized value $dist in concatenation (.) or string at /home/linuxbrew/.linuxbrew/Cellar/perl/5.28.0/lib/perl5/site_perl/5.28.0/Mashtree/Db.pm line 239.
Use of uninitialized value $dist in concatenation (.) or string at /home/linuxbrew/.linuxbrew/Cellar/perl/5.28.0/lib/perl5/site_perl/5.28.0/Mashtree/Db.pm line 239.
Use of uninitialized value $dist in concatenation (.) or string at /home/linuxbrew/.linuxbrew/Cellar/perl/5.28.0/lib/perl5/site_perl/5.28.0/Mashtree/Db.pm line 239.
mashtree: Mashtree::createTreeFromPhylip: Creating tree with QuickTree

Remove `Thread::Queue` in favor of arrays

There is a potential speed up if I just send an equal number of files to each thread instead of a Queue. Thread::Queue locks up each thread momentarily each time there is a dequeue statement. However, this lock would not occur if I just send an array of files to each thread. Randomizing the array would help distribute the load to each thread but there could be a smarter way to do it.

There is a queue for both mash sketch and mash dist.

Weird -h behavioiur

0.57

mashtree -h

Stopped at /home/linuxbrew/.linuxbrew/bin/mashtree line 57.

echo $?
255

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.