willpearse / phylogenerator Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 8.0 1.34 MB

Automated Phylogeny Generation for Ecologists

License: Other

R 1.91% Python 98.09%

phylogenerator's People

Contributors

Stargazers

Watchers

Forkers

mosheein rossmounce bioinformaticsarchive denagros dmirandae theoreticalecology bomeara hadi-nayebi

phylogenerator's Issues

The paper's out...

...so change the readme!

use referenceDownload in thorough searches, etc.

OSError: [Errno 21] Is a directory: 'standard-RAxML'

Yes, my alignment was terrible... matk interspersed with rbcl willy-nilly but even with garbage in / garbage out it should still run through the pipeline, right? Any idea what might be happening here? Have I not installed RAxML correctly perhaps?

DNA ALIGNMENT

Choose one alignment method ('muscle', 'mafft', 'clustalo', 'prank'), or...
'everything' - all four and compare their outputs
'quick' - do only the first three
Return will use MAFFT; prank is very slow!

DNA Alignment (default - mafft):
Starting alignment...
...aligning gene no. 1
......with MAFFT

Alignment complete!

ALIGNMENT CHECKING

Gene: rbcL
ID Alignment Length Med. Gaps SD Gaps Min-Max Gaps Med. Gap Frac. M-M Gap Frac. Warn?
0 mafft 1764 800.0 266.41 249.0 - 1232.0 0.45 0.141-0.698 !!!!

'output' - write out alignments. I recommend you look at your alignment before continuing
'DNA' - return to DNA editing stage
'align' - return to alignment stage, discarding current alignments.
'trimal' - automatically trim your sequences using trimAl
'raxml=X' - run X RAxML runs for each alignment, and calculate the R-F distances between the trees and alignments (slow)
'metal' - calculate SSP distances between alignments using metal
'clustal-x2' - open the Clustal-X2 website to download this alignment viewer

TIPS:
*_If the column 'Warn?' has '!!!' in it, BEWARE! Your alignment likely has problems._
Bad sequences cause bad alignments. Be careful in the DNA check stage, and return there now if necessary
'output' your alignments, and open them in something like Clustal-X2. You will immediately see sequences that should be RELOADed or TRIMMed
When downloading Clustal, make sure you get the graphical Clustal-X2, not the command line version

Hit enter to continue and choose one final alignment per gene

Alignment Checking:output
...output written!
Alignment Checking:

CONSTRAINT TREE
I recommend you use a constraint tree with this program
'newick' - supply your own constraint tree
'phylomatic' - use Phylomatic to generate a tree
'taxonomy' - download the NCBI taxonomy for your species (does not generate a constraint tree)
Warning: Phylomatic can trim the end off species names, causing conflicts with phyloGenerator that are hard to detect. Rooted phylogenies are not valid constraints.
Otherwise, press enter to continue without a constraint tree.

TIPS:
If you choose 'taxonomy', it will be written out to your working directory now. Use that to make a constraint tree!
If you have access to a reference phylogeny, try using Phylomatic
A constraint tree makes your phylogeny much more likely to be right. Use one!

Constraint Method: taxonomy

Creating a 'taxonomy' for your species from GenBank
...lineages found!
Constraint Method:
...Continuing without constraint tree

PHYLOGENY BUILDING
You can either build a maximum likelihood tree ('raxml') or a Bayesian tree ('beast')
If unsure, hit enter to use RAxML - using BEAST safely will require some knowledge of phylogenetics

Phylogeny Building (default raxml):
...using RAxML...
RAXML:
'integratedBootstrap=X' - conduct X number of bootstraps and a thorough ML search in one run (!)
'restart=X' - conduct X number of full ML searches (!)
'partitions' - concatenate all genes into a single partition (not the default)
Specify multiple options with hyphens (e.g., 'restart=5-partitions'), but do not mix options marked with '(!)'
Hit enter to conduct one search

TIPS:
The integrated boostrap method is fast, and gives confidence intervals on your tree, and a value of 1000 is probably more than adequate for most trees
Phylogeny Building (RAxML - default 1 search): integratedBootstrap=1000
Traceback (most recent call last):
File "./phyloGenerator.py", line 3790, in
main()
File "./phyloGenerator.py", line 3756, in main
currentState.phylogen()
File "./phyloGenerator.py", line 3122, in phylogen
raxmlSetup('')
File "./phyloGenerator.py", line 2984, in raxmlSetup
self.phylogeny = RAxML(align, method=self.phylogenyMethods+'localVersion', constraint=self.constraint, timeout=999999, partitions=partitions)
File "./phyloGenerator.py", line 904, in RAxML
os.remove(each)
OSError: [Errno 21] Is a directory: 'standard-RAxML'

COI trimming

COI is not always stored as COI in GenBank - need a way to have aliases of genes when searching/trimming...

Controlling for phylogeny in a comparative life-history study

I'd like to use your program to obtain family and order classification for around 2500 bird species across most of the 143 families identified by Sibley & Ahlquist (1990). I already have classification to genus level but anything else time saving would be fantastic!

Any help would be great!

Alistair Baxter

Crash mode

...would be nice if pG wrote out on crashes as much as it could (e.g., after someone escapes a prank run...)

Do constraint trees with taxonIDs work?

...Could potentially be a Newick format problem in BioPython, but a user (unchecked) has had issues with constraint trees using this

Trim using < differentiating by gene

e.g., < 1000 rbcL in trim mode

HT to Ben Warren (again!)

Can't deal with 'blank' genes if downloading multiple genes

Allow users to enter DNA alignments, not just DNA sequences

A user has requested this; I will try to get round to it!

Aligning empty genes

pG will attempt to align empty genes (i.e., genes with no DNA data). It shouldn't do this, as (quite legitimately) alignment programs don't like it!

Browser interface

I got the following from a reviewer:

The command-line interface, while fairly well-designed, is still a potential problem for some users. An equivalent web browser-based interface would help and should be feasible: use Python's built-in CGI server (SimpleHTTPServer) to serve pages locally, and use the webbrowser module to load 'http://localhost' when launching the program. (Note that I'm not suggesting the authors host a public web server themselves.) Assuming you intend to maintain and improve phyloGenerator, I encourage you to look into doing this for a future release.

If you have strong views about the terminal-based interface, please let me know. I'm going to try and implement this, although it may well be at the expense of the terminal-based interface (i.e., I might not keep both going).

Cheers,

Will

APG III references

It would be nice if pG came with APGIII built into it, or at least a reference on the website...

Integrate Windows and Mac Properly

...the downloads are different python files (minorly), and integrate the build python scripts

Clarify requirements in README

The read-me file currently says:

Install Python >=2.6; Numpy and SciPy for Python; Biopython >=2.5.

The currently version of Biopython is 1.60 (one, sixty), so something is wrong with that.

Also, having looked at the code it will not work under both Python 2 and 3 as it is, for instance you are using print statements. Therefore saying Python >=2.6 is potentially confusing as some users might try this under Python 3. I would suggest saying install Python 2.6 or 2.7 (since Python 2.7 is the final Python 2.x release).

PATHd8 issues on Linux

PATHd8 seems to be causing problems for some people - it's not running.

File "phyloGenerator.py", line 3790, in <module> main() File "phyloGenerator.py", line 3764, in main currentState.rateSmooth() File "phyloGenerator.py", line 3260, in rateSmooth success = PATHd8() File "phyloGenerator.py", line 3161, in PATHd8 self.smoothPhylogeny = rateSmooth(self.phylogeny, sequenceLength=length) File "phyloGenerator.py", line 1686, in rateSmooth with open(tempPATHd8Output, 'r') as tempFile: IOError: [Errno 2] No such file or directory: 'tempPATHd8Output'

Allow taxonID and species-name searches at the same time

Helps with sub-species searches for some users

Attempt to save output on exit

It would be nice if the program attempted to save something on exit if there's an error. Admittedly, there shouldn't be an error (...!...) but it would still be nice if it tried.

Reduce filesize of applications

...likely by stripping out scipy as you only use it for ~2 things, none of which are that key/hard to write yourself!

Better constraint options

Gene choice from multiple possibilities

Trim en mass according to critera

Search with NCBI taxon IDs instead of species names

Hi Will,

Would it be possible to use taxon IDs, as they are more specific, instead of species names?

Thanks,
Dom

Use replacements from same genus if no species in GenBank

Server re-tries

...can hang. Sometimes this is GenBank, but sometimes it's pG and there's definitely a more intelligent way of handling this...

Unit tests?

Some of your more technically inclined potential users would be concerned at the lack of unit tests in the repository. The provision of test data is a good first step - and could be the basis of a test suite.

Once you have a basic test script, which can return zero on success or non-zero on an error, this could be used for automated testing. If you are not already familiar with TravisCI and its excellent GitHub integration, I would suggest looking into that http://travis-ci.org/ - this would require the Linux binaries to be available as Debian/Ununtu packages, or simple to download 32bit Linux binaries.

Can't unzip zip file for mac

I keep getting error messages when attempting to unzip the zip file. Wonder if you can try making another one, or making a tarball instead?

Print version number on run-through

...can't hurt! Thanks Rampal!

Interactive tests

Set a list of 'select this, then this' -type options, which would also make a nice tutorial

Log files output

Server error when checking NCBI for data

I keep getting stuck at this step no matter what I try. My internet was definitely working all the time throughout this:

ross@ross-envy:~/workspace/pearsepg$ ./phyloGenerator-master/phyloGenerator.py 


Welcome to phyloGenerator! Let's make a phylogeny!
---Please go to http://willpearse.github.com/phyloGenerator for help
---Written by Will Pearse ([email protected])

This program is easier to use with a wider console window
Mac/Linux: Drag the edge of your terminal window with the mouse
PC: Right click the command prompt icon, select properties,
	click the 'layout' tab, and increase 'screen buffer'
	and 'window' widths to at least '160'

When downloading sequence data, you will see warnings relating to
'missing DTD files. Do not be alarmed; this is normal, and will
have no effect on your output.

Please input a 'stem' name to act as a prefix to all output (e.g., 'stemName_phylogeny.tre')

Stem name: dog

Please input an *existing* directory for all your output
	(hit enter to use /home/ross/workspace/pearsepg
Working directory (/home/ross/workspace/pearsepg): 
Please enter the gene(s) you want to use (e.g., 'COI' for cytochrome oxidase one')
Specify 'aliases' (alternate names) by listing them after the main name using '-'
Gene names may contain spaces, or (as with the command line) they can be replaced with '_'
e.g., COI-this_is_an_alis-this is also an alias
If you wish to use the defaults for your taxa, please enter 'plant', 'invertebrate', or 'vertebrate' instead
Each gene on a separate line, and an empty line to continue

plant


DNA INPUT

If you already have DNA sequences in a FASTA file, please enter its location
If you have more than one set of sequences, please separate the file locations with commas
Otherwise, hit enter to continue

File locations: 

No DNA loaded

DNA DOWNLOAD

Please enter the location of the list of species for which you want to build a phylogeny
Each species must be on a new line

/home/ross/workspace/pearsepg/phyloGenerator-master/chinaspp.txt

6098 species loaded.
Please enter a valid email address to download sequence data from GenBank

Email: [email protected]

To use the referenceDownload method, enter locations of sequence files (on separate lines), finishing with an empty line.
Just hit enter to perform a standard search (this is probably the option you're looking for).
refDownload: 
Searching for: Acanthus ebracteatus
!!!Server error checking (((Acanthus ebracteatus[Organism]) AND rbcL[Gene]) NOT partial [Title]) NOT genome [Title]  - retrying...
!!!!!!Unreachable. Returning nothing.
Traceback (most recent call last):
  File "./phyloGenerator-master/phyloGenerator.py", line 4122, in <module>
    main()
  File "./phyloGenerator-master/phyloGenerator.py", line 4033, in main
    currentState.loadGenBank()
  File "./phyloGenerator-master/phyloGenerator.py", line 2333, in loadGenBank
    self.sequences, self.genes = findGenes(self.speciesNames, self.genes, seqChoice=self.seqChoice, verbose=True, download=True, thorough=True, targetNoGenes=self.nGenes, spacer=self.spacer, delay=self.delay, taxonIDs=self.taxonIDs)
  File "./phyloGenerator-master/phyloGenerator.py", line 476, in findGenes
    sequence, _ = sequenceDownload(speciesList[i], geneNames[k], noSeqs=noSeqs, includePartial=includePartial, includeGenome=includeGenome, seqChoice=seqChoice, download=download, thorough=thorough, retMax=retMax, taxonID=taxonIDs)
  File "./phyloGenerator-master/phyloGenerator.py", line 339, in sequenceDownload
    seq = dwnSeq(includeGenome=False, includePartial=False, gene=gene)
  File "./phyloGenerator-master/phyloGenerator.py", line 332, in dwnSeq
    if int(firstSearch['Count']):
TypeError: tuple indices must be integers, not str

Potential integratedBootstrap PATHd8 interaction

Something weird is happening when a Mac user went integratedBootstrap --> PATHD8 (and potentially with BEAST as well). It sounds like a [] issue with BioPython, but I need to check.

Order of species placed in BEAST constraints seem to be reflected in phylogeny output

A user has reported that the order of species placed inside a large polytomy in their phylogeny is reflected in the branching order of those species in their output phylogeny.

I'm currently investigating; I'm not sure what could be causing their error, but would be grateful if anyone experiencing similar issues could send me their input files.

Thanks very much,

Will

Can't make constraint tree from GenBank output

renameSequences error

Using my own fasta file I get this error.
The names look like this:

Anacanthocoris_striicornis
Excep one is like this.
Diaphorina_citri diaci_nymph_66660000040632

Not sure which it is failing on.

ERROR

........
Other modes: 'reload', 'trim', 'replace', 'merge'. Hit enter to continue.

DNA Editing (delete):
Traceback (most recent call last):
File "phyloGenerator.py", line 3730, in
main()
File "phyloGenerator.py", line 3680, in main
currentState.renameSequences()
File "phyloGenerator.py", line 3267, in renameSequences
if self.sequences[i][k]:
IndexError: list index out of range

ross@ross-envy:~/workspace/pearsepg/phyloGenerator-master$ ./setupLinux.py /home/ross/workspace/pearsepg/BEASTv1.8.4/bin /home/ross/workspace/pearsepg/prank/bin/ /home/ross/workspace/pearsepg/pathd8 /home/ross/workspace/pearsepg/phylocom-4.2/src /home/ross/workspace/pearsepg/metal-linux64-1.1 /home/ross/workspace/pearsepg/trimAl/source /home/ross/workspace/pearsepg/clustalofolder/ /home/ross/workspace/pearsepg/standard-RAxML-master
Linux configuration script for phyloGenerator

Pass, as additional command line arguments, the path where you've downloaded all your files
	and *which folder contains BEAST*
	e.g., './setupLinux.py /home/will/phyloGenerator /home/will/phyloGenerator/BEAST\ v1.7.4'
Make sure all programs are executable - if unsure, make them so
	e.g., 'chmod +x NAMEOFPROGRAM'
The resulting 'requires' folder must contain only output from this script
Do not leave source code from the programs phyloGenerator uses in the same folder
	as phyloGenerator.py, or (for safety) in your output 'working directory'
	This will cause obscure-looking errors from phyloGenerator!
Checking and configuring external programs

ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/mafft': File exists
ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/muscle': File exists
ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/prank': Permission denied
ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/clustalo': Permission denied
ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/metal': Permission denied
ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/trimal': Permission denied
ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/phylomatic': Permission denied
ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/PATHd8': File exists
ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/raxml': Permission denied
ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/beast': File exists
Checking Python libraries

ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/treeannotator': File exists


CONGRATULATIONS!
phyloGenerator is setup. You should now be able to run it by typing './phyloGenerator.py'
ross@ross-envy:~/workspace/pearsepg/phyloGenerator-master$ ./phyloGenerator.py
bash: ./phyloGenerator.py: Permission denied
ross@ross-envy:~/workspace/pearsepg/phyloGenerator-master$ chmod +x phyloGenerator.py 
ross@ross-envy:~/workspace/pearsepg/phyloGenerator-master$ ./phyloGenerator.pyTraceback (most recent call last):
  File "./phyloGenerator.py", line 33, in <module>
    import dendropy#To drop tips...
ImportError: No module named dendropy
ross@ross-envy:~/workspace/pearsepg/phyloGenerator-master$ ./phyloGenerator.py
Traceback (most recent call last):
  File "./phyloGenerator.py", line 33, in <module>
    import dendropy#To drop tips...
ImportError: No module named dendropy