Giter VIP home page Giter VIP logo

roadies's Introduction

Reference-free Orthology-free Alignment-free DIscordance aware Estimation of Species tree (ROADIES)

License Build Status

Table of Contents


Introduction

Welcome to the official repository of ROADIES, a novel pipeline designed for phylogenetic tree inference of the species directly from their raw genomic assemblies. ROADIES offers a fully automated, easy-to-use, scalable solution, eliminating any error-prone manual steps and providing unique flexibility in adjusting the tradeoff between accuracy and runtime.

For more detailed information on all the features and settings of ROADIES, please refer to our Wiki.


ROADIES Pipeline Stages Figure: ROADIES Pipeline Stages

Quick Install

Using DockerHub

To run ROADIES using DockerHub, follow these steps:

  1. Pull the ROADIES Docker image from DockerHub:
docker pull ang037/roadies:latest
  1. Run the Docker container:
docker run -it ang037/roadies:latest

Using Docker locally

First, clone the repository (requires git to be installed in the system):

git clone https://github.com/TurakhiaLab/ROADIES.git
cd ROADIES

Then build and run the Docker container:

docker build -t roadies_image .
docker run -it roadies_image

Using installation script (requires sudo access)

First clone the repository:

git clone https://github.com/TurakhiaLab/ROADIES.git
cd ROADIES

Then, execute the installation script:

chmod +x roadies_env.sh
source roadies_env.sh

This will install and build all tools and dependencies. Once the setup is complete, it will print Setup complete in the terminal and activate the roadies_env environment with all Conda packages installed.

Required dependencies

To run this script, ensure the following dependencies are installed:

For Ubuntu, you can install these dependencies with:

sudo apt-get install -y wget unzip make g++ python3 python3-pip python3-setuptools git default-jre libgomp1 libboost-all-dev cmake

Note: If you encounter issues with the Boost library, add its path to $CPLUS_LIBRARY_PATH and save it in ~/.bashrc.


Quick Start

Once setup is done, you can run the ROADIES pipeline using the provided test dataset. Follow these steps for a 16-core machine:

  1. Go to ROADIES repository directory if not there:
cd ROADIES
  1. Create a directory for the test data and download the test datasets (using the following one line command):
mkdir -p test/test_data && cat test/input_genome_links.txt | xargs -I {} sh -c 'wget -O test/test_data/$(basename {}) {}'
  1. Run the pipeline with the following command (from ROADIES directory):
python run_roadies.py --cores 16

The second command will download the 11 Drosophila genomic datasets (links provided in test/input_genome_links.txt) and save them in the test/test_data directory. The third command will run ROADIES pipeline for those 11 Drosophila genomes and save the final newick tree as roadies.nwk in a separate output_files folder upon completion.


Run ROADIES with your own datasets

To run ROADIES with your own datasets, follow these steps:

  1. Specify Input Genomic Dataset: Update the config.yaml file (found in the ROADIES directory - config folder) to include the path to your input datasets under the GENOMES parameter. Ensure all input genomic assemblies are in .fa or .fa.gz format and named according to the species' name (e.g., Aardvark.fa).

Note: Each file should contain the genome assembly of one unique species. If a file contains multiple species, split it into individual genome files (fasplit can be used: faSplit byname <input_dir> <output_dir>).

  1. Configure Other Parameters: Adjust other parameters in config.yaml as needed. Detailed information on each parameter is available in the Usage section.

  2. Run the Pipeline: Execute the pipeline with the following command (example for 16 cores):

python run_roadies.py --cores 16

The output species tree in Newick format will be saved as roadies.nwk in the output_files folder.

  1. Modes of operation: ROADIES supports multiple modes of operation (fast, balanced, accurate) by controlling the accuracy-runtime tradeoff. Use any one of the following commands to select a mode (accurate mode is the default):
python run_roadies.py --cores 16 --mode accurate

python run_roadies.py --cores 16 --mode balanced

python run_roadies.py --cores 16 --mode fast

Citing ROADIES

If you use ROADIES in your research or publications, please cite the following paper:

Gupta A, Mirarab S, Turakhia Y, (2024). Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES. bioRxiv. https://www.biorxiv.org/content/10.1101/2024.05.27.596098v1.

Accessing ROADIES output files

The output files with the gene trees and species trees generated by ROADIES in the manuscript are deposited to Dryad. To access it, please refer to the following:

Gupta, Anshu; Mirarab, Siavash; Turakhia, Yatish (2024). Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES [Dataset]. Dryad. https://doi.org/10.5061/dryad.tht76hf73.

roadies's People

Contributors

ang037 avatar cryszzz avatar stachyris avatar ttl074 avatar yatisht avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.