Giter VIP home page Giter VIP logo

phylogenerator's People

Contributors

rossmounce avatar willpearse avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

phylogenerator's Issues

Is DendroPy required as a pre-installation for the linux install?

Just making sure I can run it ahead of the workshop...
I can't believe it's been over 3 years since I last looked at this! How time flies!

ross@ross-envy:~/workspace/pearsepg/phyloGenerator-master$ ./setupLinux.py /home/ross/workspace/pearsepg/BEASTv1.8.4/bin /home/ross/workspace/pearsepg/prank/bin/ /home/ross/workspace/pearsepg/pathd8 /home/ross/workspace/pearsepg/phylocom-4.2/src /home/ross/workspace/pearsepg/metal-linux64-1.1 /home/ross/workspace/pearsepg/trimAl/source /home/ross/workspace/pearsepg/clustalofolder/ /home/ross/workspace/pearsepg/standard-RAxML-master
Linux configuration script for phyloGenerator

Pass, as additional command line arguments, the path where you've downloaded all your files
	and *which folder contains BEAST*
	e.g., './setupLinux.py /home/will/phyloGenerator /home/will/phyloGenerator/BEAST\ v1.7.4'
Make sure all programs are executable - if unsure, make them so
	e.g., 'chmod +x NAMEOFPROGRAM'
The resulting 'requires' folder must contain only output from this script
Do not leave source code from the programs phyloGenerator uses in the same folder
	as phyloGenerator.py, or (for safety) in your output 'working directory'
	This will cause obscure-looking errors from phyloGenerator!
Checking and configuring external programs

ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/mafft': File exists
ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/muscle': File exists
ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/prank': Permission denied
ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/clustalo': Permission denied
ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/metal': Permission denied
ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/trimal': Permission denied
ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/phylomatic': Permission denied
ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/PATHd8': File exists
ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/raxml': Permission denied
ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/beast': File exists
Checking Python libraries

ln: failed to create symbolic link '/home/ross/workspace/pearsepg/phyloGenerator-master/requires/treeannotator': File exists


CONGRATULATIONS!
phyloGenerator is setup. You should now be able to run it by typing './phyloGenerator.py'
ross@ross-envy:~/workspace/pearsepg/phyloGenerator-master$ ./phyloGenerator.py
bash: ./phyloGenerator.py: Permission denied
ross@ross-envy:~/workspace/pearsepg/phyloGenerator-master$ chmod +x phyloGenerator.py 
ross@ross-envy:~/workspace/pearsepg/phyloGenerator-master$ ./phyloGenerator.pyTraceback (most recent call last):
  File "./phyloGenerator.py", line 33, in <module>
    import dendropy#To drop tips...
ImportError: No module named dendropy
ross@ross-envy:~/workspace/pearsepg/phyloGenerator-master$ ./phyloGenerator.py
Traceback (most recent call last):
  File "./phyloGenerator.py", line 33, in <module>
    import dendropy#To drop tips...
ImportError: No module named dendropy

Server error when checking NCBI for data

I keep getting stuck at this step no matter what I try. My internet was definitely working all the time throughout this:

ross@ross-envy:~/workspace/pearsepg$ ./phyloGenerator-master/phyloGenerator.py 


Welcome to phyloGenerator! Let's make a phylogeny!
---Please go to http://willpearse.github.com/phyloGenerator for help
---Written by Will Pearse ([email protected])

This program is easier to use with a wider console window
Mac/Linux: Drag the edge of your terminal window with the mouse
PC: Right click the command prompt icon, select properties,
	click the 'layout' tab, and increase 'screen buffer'
	and 'window' widths to at least '160'

When downloading sequence data, you will see warnings relating to
'missing DTD files. Do not be alarmed; this is normal, and will
have no effect on your output.

Please input a 'stem' name to act as a prefix to all output (e.g., 'stemName_phylogeny.tre')

Stem name: dog

Please input an *existing* directory for all your output
	(hit enter to use /home/ross/workspace/pearsepg
Working directory (/home/ross/workspace/pearsepg): 
Please enter the gene(s) you want to use (e.g., 'COI' for cytochrome oxidase one')
Specify 'aliases' (alternate names) by listing them after the main name using '-'
Gene names may contain spaces, or (as with the command line) they can be replaced with '_'
e.g., COI-this_is_an_alis-this is also an alias
If you wish to use the defaults for your taxa, please enter 'plant', 'invertebrate', or 'vertebrate' instead
Each gene on a separate line, and an empty line to continue

plant


DNA INPUT

If you already have DNA sequences in a FASTA file, please enter its location
If you have more than one set of sequences, please separate the file locations with commas
Otherwise, hit enter to continue

File locations: 

No DNA loaded

DNA DOWNLOAD

Please enter the location of the list of species for which you want to build a phylogeny
Each species must be on a new line

/home/ross/workspace/pearsepg/phyloGenerator-master/chinaspp.txt

6098 species loaded.
Please enter a valid email address to download sequence data from GenBank

Email: [email protected]

To use the referenceDownload method, enter locations of sequence files (on separate lines), finishing with an empty line.
Just hit enter to perform a standard search (this is probably the option you're looking for).
refDownload: 
Searching for: Acanthus ebracteatus
!!!Server error checking (((Acanthus ebracteatus[Organism]) AND rbcL[Gene]) NOT partial [Title]) NOT genome [Title]  - retrying...
!!!!!!Unreachable. Returning nothing.
Traceback (most recent call last):
  File "./phyloGenerator-master/phyloGenerator.py", line 4122, in <module>
    main()
  File "./phyloGenerator-master/phyloGenerator.py", line 4033, in main
    currentState.loadGenBank()
  File "./phyloGenerator-master/phyloGenerator.py", line 2333, in loadGenBank
    self.sequences, self.genes = findGenes(self.speciesNames, self.genes, seqChoice=self.seqChoice, verbose=True, download=True, thorough=True, targetNoGenes=self.nGenes, spacer=self.spacer, delay=self.delay, taxonIDs=self.taxonIDs)
  File "./phyloGenerator-master/phyloGenerator.py", line 476, in findGenes
    sequence, _ = sequenceDownload(speciesList[i], geneNames[k], noSeqs=noSeqs, includePartial=includePartial, includeGenome=includeGenome, seqChoice=seqChoice, download=download, thorough=thorough, retMax=retMax, taxonID=taxonIDs)
  File "./phyloGenerator-master/phyloGenerator.py", line 339, in sequenceDownload
    seq = dwnSeq(includeGenome=False, includePartial=False, gene=gene)
  File "./phyloGenerator-master/phyloGenerator.py", line 332, in dwnSeq
    if int(firstSearch['Count']):
TypeError: tuple indices must be integers, not str

Browser interface

I got the following from a reviewer:

The command-line interface, while fairly well-designed, is still a potential problem for some users. An equivalent web browser-based interface would help and should be feasible: use Python's built-in CGI server (SimpleHTTPServer) to serve pages locally, and use the webbrowser module to load 'http://localhost' when launching the program. (Note that I'm not suggesting the authors host a public web server themselves.) Assuming you intend to maintain and improve phyloGenerator, I encourage you to look into doing this for a future release.

If you have strong views about the terminal-based interface, please let me know. I'm going to try and implement this, although it may well be at the expense of the terminal-based interface (i.e., I might not keep both going).

Cheers,

Will

Clarify requirements in README

The read-me file currently says:

Install Python >=2.6; Numpy and SciPy for Python; Biopython >=2.5.

The currently version of Biopython is 1.60 (one, sixty), so something is wrong with that.

Also, having looked at the code it will not work under both Python 2 and 3 as it is, for instance you are using print statements. Therefore saying Python >=2.6 is potentially confusing as some users might try this under Python 3. I would suggest saying install Python 2.6 or 2.7 (since Python 2.7 is the final Python 2.x release).

Attempt to save output on exit

It would be nice if the program attempted to save something on exit if there's an error. Admittedly, there shouldn't be an error (...!...) but it would still be nice if it tried.

APG III references

It would be nice if pG came with APGIII built into it, or at least a reference on the website...

Crash mode

...would be nice if pG wrote out on crashes as much as it could (e.g., after someone escapes a prank run...)

Can't unzip zip file for mac

I keep getting error messages when attempting to unzip the zip file. Wonder if you can try making another one, or making a tarball instead?

Server re-tries

...can hang. Sometimes this is GenBank, but sometimes it's pG and there's definitely a more intelligent way of handling this...

Aligning empty genes

pG will attempt to align empty genes (i.e., genes with no DNA data). It shouldn't do this, as (quite legitimately) alignment programs don't like it!

Order of species placed in BEAST constraints seem to be reflected in phylogeny output

A user has reported that the order of species placed inside a large polytomy in their phylogeny is reflected in the branching order of those species in their output phylogeny.

I'm currently investigating; I'm not sure what could be causing their error, but would be grateful if anyone experiencing similar issues could send me their input files.

Thanks very much,

Will

COI trimming

COI is not always stored as COI in GenBank - need a way to have aliases of genes when searching/trimming...

renameSequences error

Using my own fasta file I get this error.
The names look like this:

Anacanthocoris_striicornis
Excep one is like this.
Diaphorina_citri diaci_nymph_66660000040632

Not sure which it is failing on.

ERROR

........
Other modes: 'reload', 'trim', 'replace', 'merge'. Hit enter to continue.

DNA Editing (delete):
Traceback (most recent call last):
File "phyloGenerator.py", line 3730, in
main()
File "phyloGenerator.py", line 3680, in main
currentState.renameSequences()
File "phyloGenerator.py", line 3267, in renameSequences
if self.sequences[i][k]:
IndexError: list index out of range

Unit tests?

Some of your more technically inclined potential users would be concerned at the lack of unit tests in the repository. The provision of test data is a good first step - and could be the basis of a test suite.

Once you have a basic test script, which can return zero on success or non-zero on an error, this could be used for automated testing. If you are not already familiar with TravisCI and its excellent GitHub integration, I would suggest looking into that http://travis-ci.org/ - this would require the Linux binaries to be available as Debian/Ununtu packages, or simple to download 32bit Linux binaries.

Controlling for phylogeny in a comparative life-history study

I'd like to use your program to obtain family and order classification for around 2500 bird species across most of the 143 families identified by Sibley & Ahlquist (1990). I already have classification to genus level but anything else time saving would be fantastic!

Any help would be great!

Alistair Baxter

undo mode?

when doing a big thing (like thorough downloading), it might be an idea to have an 'undo' button.

that would probably be hard to write, but you could have a 'backup' option where the internal state is deepcopied to new pG.sequence slots, and then you could 'revert' back to them as needed.

NCBI taxonomy checker

If a genus doesn't exist in NCBI, there's a chance the species name might be found somewhere else. This could lead to weird THOROUGH replacements (Eric's diatoms!)

Interactive tests

Set a list of 'select this, then this' -type options, which would also make a nice tutorial

PATHd8 issues on Linux

PATHd8 seems to be causing problems for some people - it's not running.

File "phyloGenerator.py", line 3790, in <module> main() File "phyloGenerator.py", line 3764, in main currentState.rateSmooth() File "phyloGenerator.py", line 3260, in rateSmooth success = PATHd8() File "phyloGenerator.py", line 3161, in PATHd8 self.smoothPhylogeny = rateSmooth(self.phylogeny, sequenceLength=length) File "phyloGenerator.py", line 1686, in rateSmooth with open(tempPATHd8Output, 'r') as tempFile: IOError: [Errno 2] No such file or directory: 'tempPATHd8Output'

OSError: [Errno 21] Is a directory: 'standard-RAxML'

Yes, my alignment was terrible... matk interspersed with rbcl willy-nilly but even with garbage in / garbage out it should still run through the pipeline, right? Any idea what might be happening here? Have I not installed RAxML correctly perhaps?

DNA ALIGNMENT

Choose one alignment method ('muscle', 'mafft', 'clustalo', 'prank'), or...
'everything' - all four and compare their outputs
'quick' - do only the first three
Return will use MAFFT; prank is very slow!

DNA Alignment (default - mafft):
Starting alignment...
...aligning gene no. 1
......with MAFFT

Alignment complete!

ALIGNMENT CHECKING

Gene: rbcL
ID Alignment Length Med. Gaps SD Gaps Min-Max Gaps Med. Gap Frac. M-M Gap Frac. Warn?
0 mafft 1764 800.0 266.41 249.0 - 1232.0 0.45 0.141-0.698 !!!!

'output' - write out alignments. I recommend you look at your alignment before continuing
'DNA' - return to DNA editing stage
'align' - return to alignment stage, discarding current alignments.
'trimal' - automatically trim your sequences using trimAl
'raxml=X' - run X RAxML runs for each alignment, and calculate the R-F distances between the trees and alignments (slow)
'metal' - calculate SSP distances between alignments using metal
'clustal-x2' - open the Clustal-X2 website to download this alignment viewer

TIPS:
*_If the column 'Warn?' has '!!!' in it, BEWARE! Your alignment likely has problems._
Bad sequences cause bad alignments. Be careful in the DNA check stage, and return there now if necessary
'output' your alignments, and open them in something like Clustal-X2. You will immediately see sequences that should be RELOADed or TRIMMed
When downloading Clustal, make sure you get the graphical Clustal-X2, not the command line version

Hit enter to continue and choose one final alignment per gene

Alignment Checking:output
...output written!
Alignment Checking:

CONSTRAINT TREE
I recommend you use a constraint tree with this program
'newick' - supply your own constraint tree
'phylomatic' - use Phylomatic to generate a tree
'taxonomy' - download the NCBI taxonomy for your species (does not generate a constraint tree)
Warning: Phylomatic can trim the end off species names, causing conflicts with phyloGenerator that are hard to detect. Rooted phylogenies are not valid constraints.
Otherwise, press enter to continue without a constraint tree.

TIPS:
If you choose 'taxonomy', it will be written out to your working directory now. Use that to make a constraint tree!
If you have access to a reference phylogeny, try using Phylomatic
A constraint tree makes your phylogeny much more likely to be right. Use one!

Constraint Method: taxonomy

Creating a 'taxonomy' for your species from GenBank
...lineages found!
Constraint Method:
...Continuing without constraint tree

PHYLOGENY BUILDING
You can either build a maximum likelihood tree ('raxml') or a Bayesian tree ('beast')
If unsure, hit enter to use RAxML - using BEAST safely will require some knowledge of phylogenetics

Phylogeny Building (default raxml):
...using RAxML...
RAXML:
'integratedBootstrap=X' - conduct X number of bootstraps and a thorough ML search in one run (!)
'restart=X' - conduct X number of full ML searches (!)
'partitions' - concatenate all genes into a single partition (not the default)
Specify multiple options with hyphens (e.g., 'restart=5-partitions'), but do not mix options marked with '(!)'
Hit enter to conduct one search

TIPS:
The integrated boostrap method is fast, and gives confidence intervals on your tree, and a value of 1000 is probably more than adequate for most trees
Phylogeny Building (RAxML - default 1 search): integratedBootstrap=1000
Traceback (most recent call last):
File "./phyloGenerator.py", line 3790, in
main()
File "./phyloGenerator.py", line 3756, in main
currentState.phylogen()
File "./phyloGenerator.py", line 3122, in phylogen
raxmlSetup('')
File "./phyloGenerator.py", line 2984, in raxmlSetup
self.phylogeny = RAxML(align, method=self.phylogenyMethods+'localVersion', constraint=self.constraint, timeout=999999, partitions=partitions)
File "./phyloGenerator.py", line 904, in RAxML
os.remove(each)
OSError: [Errno 21] Is a directory: 'standard-RAxML'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.