Giter VIP home page Giter VIP logo

yihan_modified_astral's Introduction

DESCRIPTION:

ASTRAL is a Java program for estimating a species tree given a set of unrooted gene trees. ASTRAL is statistically consistent under multi-species coalescent model (and thus is useful for handling ILS). It finds the tree that maximizes the number of induced quartet trees in the set of gene trees that are shared by the species tree. The algorithm has an exact version that can run for small datasets (less than 18 taxa) and a more useful version (its default) that can handle large datasets (tested for up to 1000 taxa and 1000 genes).

The algorithm used is described in:

  • Mirarab, Siavash, Rezwana Reaz, Md. Shamsuzzoha Bayzid, Theo Zimmermann, M Shel Swenson, and Tandy Warnow. “ASTRAL: Genome-Scale Coalescent-Based Species Tree.” Bioinformatics (ECCB special issue) 30, no. 17 (2014): i541–i548. doi:10.1093/bioinformatics/btu462.
  • Mirarab, Siavash, Tandy Warnow. “ASTRAL-II: Coalescent-Based Species Tree Estimation with Many Hundreds of Taxa and Thousands of Genes.”. Bioinformatics (ISMB special issue) 31, no. 12 (2015): i44–i52. doi:10.1093/bioinformatics/btv234

The code given here corresponds to ASTRAL-II.

See our tutorial in addition to the rest of this README file. Also, the chapter of my dissertation that describes ASTRAL in detail is provided here.

Email: [email protected] for questions.

INSTALLATION:

There is no installation required to run ASTRAL. You simply need to download the zip file and extract the contents to a folder of your choice. Alternatively, you can clone the github repository. You can run make.sh to build the project or simply use the jar file that is included with the repository.

ASTRAL is a java-based application, and should run in any environment (Windows, Linux, Mac, etc.) as long as java is installed. Java 1.5 or later is required. We have tested ASTRAL only on Linux and MAC.

To test your installation, go to the place where you uncompressed ASTRAL, and run:

java -jar astral.4.7.12.jar -i test_data/song_primates.424.gene.tre

This should quickly finish. There are also other sample input files under test_data/ that can be used.

ASTRAL can be run from any directories. You just need to run java -jar /path/to/astral/astral.4.7.12.jar. Also, you can move astral.4.7.12.jar to any location you like and run it from there, but note that you need to move the lib directory as well.

EXECUTION:

ASTRAL currently has no GUI. You need to run it through command-line. In a terminal, go the location where you have downloaded the software, and issue the following command:

  java -jar astral.4.7.12.jar

This will give you a list of options available in ASTRAL.

To find the species tree given a set of gene trees in a file called in.tree, use:

java -jar astral.4.7.12.jar -i in.tree

The results will be outputted to the standard output. To save the results in a file use the -o option (Strongly recommended, unless you are using a pipeline):

java -jar astral.4.7.12.jar -i in.tree -o out.tre

The input gene trees can have missing taxa, polytommies (unresolved branches), and also multiple individuals per species. When multiple individuals from the same species are available, a mapping file needs to be provided using a -a option. This mapping file should have one line per species, and each line needs to be in one of two formats:

species_name [number of individuals] individual_1 individual_2 ...

species_name:individual_1,individual_2,...

The code for handling multiple individuals is in its infancy and might not work well yet. Keep posted for improvements to this feature. As of July, 2015, we strongly recommend that you test multiind branch for multi individuals.

Bootstrapping:

To perform 100 replicates of multi-locus bootstrapping (Seo 2008), use:

java -jar astral.4.7.12.jar -i best_ml -b bs_paths -r 100

In this command, bs_paths is a file that gives the location of gene tree bootstrap files, one line per gene. best_ml has all the "main" trees (e.g. best ML trees) in one file.

Bootstrap Output:

The output file generated when using the bootstrapping feature with 100 replicates (-r 100) contains the following trees, in this order:

  • 100 bootstrapped replicate trees; each tree is the result of running ASTRAL on a set of bootstrap gene trees (one per gene).
  • A greedy consensus of the 100 bootstrapped replicate trees; this tree has support values drawn on branches based on the bootstrap replicate trees. Support values show the percentage of bootstrap replicates that contain a branch.
  • The “main” ASTRAL tree; this is the results of running ASTRAL on the best_ml input gene trees. This main tree also includes support values, which are again drawn based on the 100 bootstrap replicate trees.

If -r option is set to anything other than 100, the number of replicates would be accordingly adjusted.
Note that by default (i.e., when no -r is given), ASTRAL only performs 100 replicates regardless of the number of replicates in your bootstrapped gene trees. If you want to bootstrap with a different number of replicates, you must use -r.

Also related to bootstrapping are -g (to enable gene/site resampling) and -s (to set the seed number) options.

Memory:

For big datasets (say more than 100 taxon) increasing the memory available to Java can result in speed ups. Note that you should give Java only as much free memory as you have available on your machine. So, for example, if you have 3GB of free memory, you can invoke ASTRAL using the following command to make all the 3GB available to Java:

java -Xmx3000M -jar astral.4.7.12.jar -i in.tree

Acknowledgment

ASTRAL code uses bytecode and some reverse engineered code from PhyloNet package (with permission from the authors).

Bug Reports:

contact [email protected]

yihan_modified_astral's People

Contributors

smirarab avatar hyphaltip avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.