Giter VIP home page Giter VIP logo

phylotar's Introduction

Automated Retrieval of Orthologous DNA Sequences from GenBank

Build Status Coverage Status CRAN downloads

R implementation of the PhyLoTa sequence cluster pipeline. For more information see the accompanying website. Tested and demonstrated on Unix and Windows. Find out more by visiting the phylotaR website.

Install

From CRAN:

install.packages('phylotaR')

Or, download the development package from GitHub:

devtools::install_github(repo='ropensci/phylotaR', build_vignettes=TRUE)

Full functionality depends on a local copy of BLAST+ (>= 2.0.0). For details on downloading and compiling BLAST+ on your machine please visit the NCBI website.

Pipeline

phylotaR runs the PhyLoTa pipeline in four automated stages: identify and retrieve taxonomic information on all descendent nodes of the taxonomic group of interest (taxise), download sequence data for every identified node (download), identify orthologous clusters using BLAST (cluster), and identify sister clusters for sets of clusters identified in the previous stage (cluster^2) After these stages are complete, phylotaR provides tools for exploring, identifying and exporting suitable clusters for subsequent analysis.

phylotaR pipeline

For more information on the pipeline and how it works see the publication, phylotaR: An Automated Pipeline for Retrieving Orthologous DNA Sequences from GenBank in R.

Running

At a minimum all a user need do is provide the taxonomic ID of their chosen taxonomic group of interest. For example, if you were interested in primates, you can visit the NCBI taxonomy home page and search primates to look up their ID. After identifying the ID, the phylotaR pipeline can be run with the following script.

library(phylotaR)
wd <- '[FILEPATH TO WORKING DIRECTORY]'
ncbi_dr <- '[FILEPATH TO COMPILED BLAST+ TOOLS]'
txid <- 9443  # primates ID
setup(wd = wd, txid = txid, ncbi_dr = ncbi_dr)
run(wd = wd)

The pipeline can be stopped and restarted at any point without loss of data. For more details on this script, how to change parameters, check the log and details of the pipeline, please check out the package vignette.

library(phylotaR)
vignette("phylotaR")

Timings

How long does it take for a phylotaR pipeline to complete? Below is a table listing the runtimes in minutes for different demonstration, taxonomic groups.

Taxon N. taxa N. sequences N. clusters Taxise (mins.) Download (mins.) Cluster (mins.) Cluster2 (mins.) Total (mins.)
Anisoptera 1175 11432 796 1.6 23 48 0.017 72
Acipenseridae 51 2407 333 0.1 6.9 6.4 0.017 13
Tinamiformes 25 251 98 0.067 2.4 0.18 0.017 2.7
Aotus 13 1499 193 0.067 3.2 0.6 0 3.9
Bromeliaceae 1171 9833 724 1.2 28 37 0.033 66
Cycadidae 353 8331 540 0.32 19 18 0.033 37
Eutardigrada 261 960 211 0.3 11 1.8 0.05 14
Kazachstania 40 623 101 0.1 20 3 0.05 23
Platyrrhini 212 12731 3112 0.35 51 6.9 1.2 60

To run these same demonstrations see demos/demo_run.R.

License

MIT

Version

Version 1.

Authors

Dom Bennett (maintainer, R package dev), Hannes Hettling (workhouse code dev), Rutger Vos, Alexander Zizka and Alexandre Antonelli

Reference

Bennett, D., Hettling, H., Silvestro, D., Zizka, A., Bacon, C., Faurby, S., โ€ฆ Antonelli, A. (2018). phylotaR: An Automated Pipeline for Retrieving Orthologous DNA Sequences from GenBank in R. Life, 8(2), 20. DOI:10.3390/life8020020

Sanderson, M. J., Boss, D., Chen, D., Cranston, K. A., & Wehe, A. (2008). The PhyLoTA Browser: Processing GenBank for molecular phylogenetics research. Systematic Biology, 57(3), 335โ€“346. DOI:10.1080/10635150802158688


ropensci_footer

phylotar's People

Contributors

dombennett avatar hettling avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.