deprekate / phanotate Goto Github PK

PHANOTATE: a tool to annotate phage genomes.

License: GNU General Public License v3.0

Python 100.00%

genomics phage annotation annotate-phage-genomes

phanotate's Introduction

Introduction

PHANOTATE is a tool to annotate phage genomes. It uses the assumption that non-coding bases in a phage genome is disadvantageous, and then populates a weighted graph to find the optimal path through the six frames of the DNA where open reading frames are beneficial paths, while gaps and overlaps are penalized paths.

To install PHANOTATE,

pip3 install phanotate

 git clone https://github.com/deprekate/PHANOTATE.git
 pip3 install PHANOTATE/.

PHANOTATE Example

Run on included sample data:

phanotate.py tests/NC_001416.1.fasta

Output is the predicted ORFs, and should look like

125     187     +
191     736     +
741     2636    +
2633    2839    +
2836    4437    +
4319    5737    +
...

PHANOTATE has the ability to output different formats: genbank, gff, gff3, fasta

Output a genbank file that contains the genes and genome:

$ phanotate.py tests/phiX174.fasta -f genbank | head 
LOCUS       phiX174                 5386 bp 
FEATURES             Location/Qualifiers
     CDS             100..627
                     /note=score:-4.827981E+02
     CDS             687..1622
                     /note=score:-4.857517E+06
     CDS             1686..3227
                     /note=score:-3.785434E+10
     CDS             3224..3484
                     /note=score:-3.779878E+02

Output the nucleotide bases of the gene calls in fasta format:

$ phanotate.py tests/phiX174.fasta -f fna | head -n2
>phiX174_CDS_[100..627] [note=score:-4.827981E+02]
atgtttcagacttttatttctcgccataattcaaactttttttctgataagctggttctcacttctgttactccagcttcttcggcacctgttttacagacacctaaagctacatcgtcaacgttatattttgatagtttgacggttaatgctggtaatggtggttttcttcattgcattcagatggatacatctgtcaacgccgctaatcaggttgtttctgttggtgctgatattgcttttgatgccgaccctaaattttttgcctgtttggttcgctttgagtcttcttcggttccgactaccctcccgactgcctatgatgtttatcctttgaatggtcgccatgatggtggttattataccgtcaaggactgtgtgactattgacgtccttccccgtacgccgggcaataacgtttatgttggtttcatggtttggtctaactttaccgctactaaatgccgcggattggtttcgctgaatcaggttattaaagagattatttgtctccagccacttaagtga

Output the amino-acids of the gene calls in fasta format:

$ phanotate.py tests/phiX174.fasta -f faa | head -n2
>phiX174_CDS_[100..627] [note=score:-4.827981E+02]
MFQTFISRHNSNFFSDKLVLTSVTPASSAPVLQTPKATSSTLYFDSLTVNAGNGGFLHCIQMDTSVNAANQVVSVGADIAFDADPKFFACLVRFESSSVPTTLPTAYDVYPLNGRHDGGYYTVKDCVTIDVLPRTPGNNVYVGFMVWSNFTATKCRGLVSLNQVIKEIICLQPLK*

phanotate's People

Contributors

Stargazers

Watchers

Forkers

linsalrob pythseq wangdi2014 jc3293 vloux yemilawal sophieloveys morloclib

phanotate's Issues

Include Aragorn for tRNA masking in addition to tRNAscan-SE

Hi,
I recently annotated phages with phanotate in an environment where tRNAscan-SE was installed. Therefore, tRNA masking did use tRNAscan-SE. However, in my final NCBI submission I also added tRNA predictions from Aragorn and apparently had a couple of tRNAs overlapping a CDS.
Could you please facilitate Aragorn for tRNA masking as well on systems where it is installed?
For now, going to drop the CDS manually.
Thank you,
Ilya.

AttributeError: 'int' object has no attribute 'position'

Occasionally, phanotate gives the error:

Traceback (most recent call last):
  File "PHANOTATE/phanotate.py", line 61, in <module>
    file_handling.write_output(id, args, my_path, my_graph, my_orfs)
  File "PHANOTATE/lib/file_handling.py", line 80, in write_output
    if(left.position == 0 and right.position == last_node.position):
AttributeError: 'int' object has no attribute 'position'

Here is the 1.9Kb sequence that generates the error:
https://www.dropbox.com/s/bv56zgc0ohf0w1z/test2.fna?dl=0

Prodigal runs fine and identifies 5 genes on this sequence.

Any ideas what could be wrong?

How to use the output file .gb to assign functions to ORFs.

Hello,
Thanks for developing this tool. I completed the annotation of some phages, but I don't know how to assign functionality to those ORFs. When I try to use the genbank file in RAST or PROKKA, those programs recall ORFs, so PHANOTATE info is not preserved.

Thanks a lot

Inquiry on upload

Do you have any plan to upload the newest version on bioconda repository ?

Thanks

Cutoff for phanotate score

Hi!

Thank you for Phanotate, the tool is really useful and easy to use. I've been using it for my own work, and I was wondering what is a safe cutoff for selecting the ORFs predicted by Phanotate?

Also, I've written a short script to parse the Phanotate output and spit out fasta-format. Please use it as you want/need.

Thank you so much !

Cheers

#!/usr/bin/python
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
import argparse
import os
import sys

def get_args():
    """Get command-line arguments"""

    parser = argparse.ArgumentParser(
        description='Parse results phanotate',
        formatter_class=argparse.ArgumentDefaultsHelpFormatter)

    parser.add_argument('-i',
                        '--pinput',
                        help='Phanotate output file',
                        metavar='PINPUT',
                        type=str,
                        default="")

    parser.add_argument('-f',
                        '--finput',
                        help='Sequence fasta file',
                        metavar='FINPUT',
                        type=str,
                        default="")

    parser.add_argument('-p',
                        '--protein',
                        help='Protein output file name',
                        metavar='PROT',
                        type=str,
                        default="")

    parser.add_argument('-g',
                        '--gene',
                        help='Gene output file name',
                        metavar='GENE',
                        type=str,
                        default="")

    return parser.parse_args()

def main():
    args=get_args()
    fasta_file=args.finput
    phanotate_file=args.pinput
    protein_file=args.protein
    gene_file=args.gene

    record_dict = SeqIO.to_dict(SeqIO.parse(fasta_file, "fasta"))

    phanotate_CDS=[]
    phanotate_proteins=[]
    i=0

    # read the phanotate output file as a tsv
    with open(phanotate_file) as f:
        lines = f.read().split('\n')[:-1]
        for c, line in enumerate(lines):
            if not line.startswith('#'): #skip the comment lines from the file
                i=i+1
                data = line.split();
                curr_record=record_dict[data[3]]

                # get start and end of the CDS
                if int(data[0]) < int(data[1]):
                    start=int(data[0])-1
                    end=int(data[1])-1
                else:
                    start=int(data[1])
                    end=int(data[0])

                curr_CDS = curr_record.seq[start:end]
            
                # get reverse complement if strand == -
                if data[2] == "-":
                    curr_CDS=curr_CDS.reverse_complement()
                    print("reverse complement")
                else:
                    print(curr_CDS)

                #get a name for the CDS and create a record
                CDS_id=data[3]+"_CDS_"+str(i)
                new_record = SeqRecord(curr_CDS,CDS_id, '', '')
                new_record_prot = SeqRecord(curr_CDS.translate(),CDS_id, '', '')            
                # add record in the output list
                phanotate_CDS.append(new_record)
                phanotate_proteins.append(new_record_prot)

                # print verifications
                print(new_record.id)
                print(str(start)+"--"+str(end))

    # write output file
    SeqIO.write(phanotate_CDS, gene_file, "fasta")

    # write protein output file
    SeqIO.write(phanotate_proteins, protein_file, "fasta")

if __name__ == '__main__':
    main()

Develop: incorrect output

The develop branch has two output errors:

Genbank files (printed to STDOUT) occasionally contain an additional line of the format:

92033_gap 92355_gap

fasta files (whether to STDOUT or to a file) crash with the error:

  File "/home/edwa0468/GitHubs/PHANOTATE/venv/lib/python3.9/site-packages/phanotate-2.0-py3.9-linux-x86_64.egg/phanotate/file_handling.py", line 157, in write_fasta   
    if(feature.type == 'CDS'):
AttributeError: 'NoneType' object has no attribute 'type'

There is an additional gap object added to the record which is not a CDS, and so the code

feature = contig_orfs.get_feature(left, right)

returns None

For (1), these are printed (line 126, file_handling.py and for (2) the None is not handled correctly at lines 155-157, file_handling.py.

No ORFs for 38kb phage with 33 Prodigal genes

I've now run phanotate on a large number of phage sequences. Its seems to be producing output in most cases. However for a small subset of sequences, no ORFs are found. These sequences are often >20Kb and contain numerous genes called by Prodigal.

Here is one example you can download and test yourself:
https://www.dropbox.com/s/xxpagf1qvcj6f57/test.fna?dl=0

In this example, the sequence in a 38kb phage with 33 Prodigal genes

Any ideas what might be going on here?

Continuously increasing RAM demand

Hi Katelyn,

thanks for PHANOTATE, great tool!

I'm using it for larger sets of viral contigs, and noticed that it seems to not free up RAM between contigs. On my current set with 100+ sequence, towards the end, it needs >5GB RAM. Obviously, I can split the input file to get around that, but I'd also suspect, it wouldn't be too difficult to clean the RAM after a contig has been processed. Might make sense as an improvement for future versions.

Cheers
Thomas

"parallel edges are forbidden" error in v1.6.4 but not in v1.5.1

Hi,

On certain sequences (perhaps because they have large gaps between ORFs), phanotate v1.6.4 throws an error. This is not observed with phanotate v1.5.1, which returns ORF predictions.

(phanotate_164) mjt % phanotate.py --version
1.6.4
(phanotate_164) mjt % phanotate.py -f tabular uParvo3481.fasta -o uParvo3481.phan_v1.6.4.tsv
Traceback (most recent call last):
  File "/Users/michaeltisza/miniconda3/envs/phanotate_164/bin/phanotate.py", line 49, in <module>
    graph = functions.get_graph(orfs)
  File "/Users/michaeltisza/miniconda3/envs/phanotate_164/lib/python3.10/site-packages/phanotate_modules/functions.py", line 429, in get_graph
    G.add_edge(Edge(left_node, right_node, score ))	
  File "/Users/michaeltisza/miniconda3/envs/phanotate_164/lib/python3.10/site-packages/phanotate_modules/graphs.py", line 74, in add_edge
    raise ValueError("parallel edges are forbidden")
ValueError: parallel edges are forbidden

The sequence below was used as the input. Notably, this contig is not predicted to belong to a phage genome, and therefore this may be the intended behavior of phanotate.

Best,

Mike

uParvo_sus_pigfeces_SRR11413591_3481
CTGATGAAGATAATAATTTATCAGATCTTGTAAAAATATCTCCAAAATATTTCAAGATTTCTACTGGTGGTGAAAC
TACTATTAATCCTGCATTTAGTACTGACAGTACAAATGCTGATCCTGATTATCGTTTATGGAGTATGACAAAACAA
GGACTTGATCAAAATAACTCTCCTTGTTTAAATTATAACATTTGGAAGAATGCTAGTAGATTTTATATATTCAGTT
TTGCTGAGAATTTCAGCTTACTTGATAATAATAACTATATTAATTATGAATTAAGATTTTCTGACGACGATACTGT
TGATATTCCTGAAAAGGTCAATATTCATAGAATTTATTTGAAGGATTATCTAACTATTTTTGAAAGTCAAACTGAA
TAATTGAGAAGAATTTATTCGTTGAAACAACTTTTTTACTTTGTTTCTAACAGTGACGTAAGTGACGATAGTGACG
ATACTTTATAAACTTTTAAAATAGAACAGATTTTCAGTTTTTCTTAAAATCTTAAAAATAAAAATCTTTTATTAAA
TTATCGTCACTATTGTCACTGTAAATTATAAAATCAAGCAAATTAAACAAAATATAGTAAAATCATCTGTATGACA
TTCCCTTAAACGTTAAAAAGTAGGGAGTGACGTTAAATATAAATTATCGTCACTAGTGTCGATAATTAACGTCACT
GTTAAAGATTTCTAGAAAAATTTAGACTAACTGTTAAAATTATTATCGTCACTGTTTAAATTATTATCGTCACTGT
TAAATATTAAGTGACGTTAAACAAAGTGAAACGCTGAATTGCTTATTCAAAGTAAAAGAATTAATTTAAGATTTTT
AATCTTCTTCAAATTCACAAGCATCAATACGATCTTCTGGTAAAATATTATATTTTTCCTCAATAAGATCTCTTAC
ATAGTGTAAATCAATTAAACGAATACCATTAGAGCGTCTATATACATTATTTCCAAATAATTTGGTCATTTCTTTA
CTAATATTACGTATACTAAGATCTTTAATATCCTCATCAGCACAAATTCTAATTTGATTTACAAGTTCAGATGGAG
TATATGTATCAAATAACTTATTATAATATATAACTTCCACAATCCATGGCATATTACATCTTTGCATAGTATCTTT
TGAAGGAGTATATATATTTCTATGGAAATCATATGAAGAATCATACCATTCACAAGCTTTATGATAACAACTTGAT
AAGAAATCAGGATTATTTTCCCATTTTGTTCTAAATTCTTTTGCAAATGATTTTTTAACAGGAACCCCTTGTAAAT
TATAAAATAATCCACGACGTTCACCAGTTTCCCATTCAAGAGGTATACCTTGAGTAGTATTACTAAATAAGAATAA
TGTTGATTTATTATCTTTTGTTTCAATATGACCATATTTATGATTTACTGTAACTTCAGATGCTGTAATATAATCT
TTAATCTTATTAACAGTATCAGTGGATCTATCATGACATTCATTAATCATAACAACACATTTACCTGCATTTACAT
TAAATTGTCCAACAACTTTATCTAGTGTTGATTTTGCAGTTTCAATAGGATCATACCACTTACTTACTATATCAAA
AAGAGTATCTTTACCATATCCTTGATTTGATGCTATACAAACAGCAATTTCAGTACGATATCCAGGTTTCATAGCA
AGTGATCCTAATAGTTTCTTAAGAGGTTCAATATATTGTAATTGTAATTCGACACAATTTGATTCACCATATGCAA
AAGTGTTAACAAATTTTTCAAAATCATCACCAATTTCTTTATTGTAATGTCCATTAATAATTTCAGGTTTTATGCC
CATAAACATACTGTAATATAAATCCCCATTTTGTGTTTTTATTAACCATTTACGAGGATTATCAATATCATCTTTT
CTATATGCAGTTTCAATATAAGAGTATAAGTGTTCACATCCGTTTTGAGCTAAAATTGCTTTAAAATCACTAGGTT
TATACATTGTCATATTATTAATAGTATATCTGCACATTACAGTATTTGTCATAATATCAATAACAACATACTGTCT
AATATAATTAATTTGTCTGTCAATACGATCTCCTTGTTTTAATGACTTTAATTTTCCAAAACTGAATGGTTCATTT
GGATATTCAATACATTCTGGTTTTGCTGATAAATATTCAGCATATGTGTTATATTTACATGGCTGATATTCCATAG
GATCAAAATCATTTTGTTCGAAATCTTTTGATATAAATTTGATATTATCAAGAGGAATATAGAAATCAATATTATG
TATATATTCATTAATAGCATCTGTTAATTGATTAGGTGTTAATTTAATTTTTTCTAGTTGATCCTTTCTTATCATG
CATCCATCAAAACAATATATGCATGTACTAATATCAATACCAAGATCACTTAATTTCTTATAAACTTGACGAATTA
TATTAACTTCTAAGAATTGCATATATCTACTAATACATGAATTATCAAGTTCATATTGTGATAATTTTCCTGTTGA
TTTACTGTCTGCAATAAGAATATTCTTTAATTTACAACAATGTTGTTTACCATCAGGATTTTGATACTTATTATTT
TTAATAGTACAGTTTGTTTTAATAGTAATTTTATTAGCATGATCAATAATAATTTTTCTTGAATTTTGCATTTCAT
TATAATAATTAATAACAAATTCTGTTGGTTGTACATTTTTAGCATTATTATTATACCAAGTTCTATAATTACCACC
AAATCCAATCATTAAAAATAATCTTTTTGCTAAATCACGTGAAACTTTACATGATTCCATTATTTCTTTAAGATAT
TTATCTCTATTATTTATGTATTCACCCAAATACTTACCATTTGTTATTGAATACATAAAATTAGGATATGCATTAA
CAATATCAATATCGATATAATTATCTTTATATAATGCAGCTCTTATTTCTCTCATAAAATGACAGGCACCACAAGC
ATTACAATTATTTACAGTTGCATATTTTCTATATAATCCGAATTTGTCACCACCACCTGATGAATATAGAATATCT
ATTGATTTATCTGCTGATAAAACATTTTTATTAATAGAATTTTCTAAATAACATAAATAAACTGATGGATTTACAC
CTTGACTGATTTCAACATTATTTCTTAATATTTTTTCTTTTTCTATTAAAAAAGAAATACATCTTCTTAAAGTGTC
TTCATTGTATACTTCATAGTAAGGTTGGCATAAAATATCAGATTCCATTTTACTTTGTATTTATTTAATAAATAAA
AAAAATTTTTAAAATAGAAAATAAAAATTAGTTAGAATTAATTTAGATAAAATTTATGATTACTAAGAAAAAACTT
GCAGTGACGATAATAGTTTAAACAGTTAGTCTAAATTTTTCTAGAAATCTTTAACAGTGCCGATATTTATCGTCAC
TAGTGACGTTAAACACCAAATTATCGTCACTGTCAATTACTATCAATTTGTTACTATTTAA

Interpretation of results

Hi deprekate,

I have used Phanotate but I am not sure what the output means. I ran it through a single metagenomics-assembled genome (MAG), and got multiple start and stoping codon frames in one single contigs.

The scores are all negative and are those E-values? I am not sure which contig to choose.
After that should is the best way to identify phages Blasting the sequences?

I am new in extracting phage sequences from metagenomics so any advice is greatly appreciated.

Cheers and many thanks

Alan

fastpathz cannot traverse the network: TypeError: no path to target

Running phanotate on some phage gives me this error. The below is from Hubei odonate virus 11:

(phan.env) [jmeppley@tyrosine phanotate]$ phanotate.py -o test/NC_032956.ncbi.faa -f fasta test/NC_032956.ncbi.fasta 
/mnt/data0/jmeppley/projects/nanopore_biller/viemes_by_depth/phage_clusters/phanotate/phan.env/bin/phanotate.py:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  __import__('pkg_resources').require('phanotate==1.6.3')
Traceback (most recent call last):
  File "/mnt/data0/jmeppley/projects/nanopore_biller/viemes_by_depth/phage_clusters/phanotate/phan.env/bin/phanotate.py", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/mnt/data0/jmeppley/projects/nanopore_biller/viemes_by_depth/phage_clusters/phanotate/PHANOTATE/phanotate.py", line 63, in <module>
    shortest_path = fz.get_path(source=source, target=target)
TypeError: No path to target

I have seen the error in phanotate version 1.5.1 (python 3.7 and 3.10) and the latest from github (1.6.3) on python 3.10.

Can you create a release with the current version?

Doing so I can refer to the release rather than the repository, for reproducibility.

Releases page: https://github.com/deprekate/PHANOTATE/releases

Thanks!
Andrea

Different outputs between 1.5.1 and 1.6.1. Also, odd hash symbols.

Hello,

We are noticing that the ORFs called in 1.5.1 are different than 1.6.1.
Also, we are seeing +, *, and # randomly being put in sequences in 1.6.1.
I assume that the * are stop codons?
What are the + and # symbols?

For example:

J02459.1_CDS_[complement(25396..26973)] [note=score:-7.092482E+07]
YNLKSQ#LSPPI#GRHLFLFRKDARLPTLLIRKQCYKIEQKLLVPLFTL+PLQF#PKHPVFLLHSISNYLFFKSVEGLYKKRNPLS#LMIQLESFNSLPKFVLEEFR#VLNHHTGLANRL#LLLSQELWSNESPHESL#VPVKRAHNLPNGLNSLKVKSKLSFSRGPKSLNYGKLHCQLRYGSCL#VRK+PALPTPFCC#NLPY#PR+WLGLLLLIKKRLLLSE+LSS+GFPIKASEQSQSKSRKLGKGGFLVGLGRFPCVMKI#PEFLRRSLSNFSDPLEANLK#SRK#PTYSHLSFSVETELR+FFDSSLLLKSLYLE+SE#RFILLNSALLGV#VHLLLFKFNKILL+VFIDEAYSRPVR#QNK*+CSNHLQQPLFSNQNKLLGLQVDVGGNERRKNACNSLNELRALPHRY#RVRGHHDVLQGFRTYTFQDASSLRYL#+AGL#LCKPLLNPQNALHKNELHCLRPMVVNNSVRQLSLERILW#DFLILPVYPNLPAWQNFRYYYLSLLPFHVT

Hmmer and diamond aren't likely the # symbols. Do we filter base on score? Can we remove them? Are they stop codons?
What are they?

many thanks,
Rick

pip install failed

Hello and thanks for this tool！ When I try to install this software by: "pip install phanotate" , bu I get the following error:

Python 3.6.7
GCC 7.3.0

(pgcgap) [lyj@admin Tools]$ pip install phanotate
Processing /mnt/data/home/lyj/.cache/pip/wheels/35/9c/ec/dce8f8b32571277dd99d2c2453f62170c528add4071acaabce/phanotate-1.4.0-cp36-cp36m-linux_x86_64.whl
Collecting fastpath>=1.2
Using cached fastpath-1.2.tar.gz (32 kB)
Building wheels for collected packages: fastpath
Building wheel for fastpath (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /mnt/data/home/lyj/miniconda2/envs/pgcgap/bin/python3.6 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-lfrz6s87/fastpath/setup.py'"'"'; file='"'"'/tmp/pip-install-lfrz6s87/fastpath/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-kjx0k26r
cwd: /tmp/pip-install-lfrz6s87/fastpath/
Complete output (10 lines):
running bdist_wheel
running build
running build_ext
building 'fastpath' extension
creating build
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/src
gcc -DNDEBUG -fwrapv -O2 -Wall -Wstrict-prototypes -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /mnt/data/home/lyj/miniconda2/envs/pgcgap/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /mnt/data/home/lyj/miniconda2/envs/pgcgap/include -fPIC -I. -I... -I/tmp/pip-install-lfrz6s87/fastpath/include -I/mnt/data/home/lyj/miniconda2/envs/pgcgap/include/python3.6m -c src/fastpath-py.c -o build/temp.linux-x86_64-3.6/src/fastpath-py.o
gcc: error: unrecognized command line option ‘-fno-plt’
error: command 'gcc' failed with exit status 1

ERROR: Failed building wheel for fastpath
Running setup.py clean for fastpath
Failed to build fastpath
Installing collected packages: fastpath, phanotate
Running setup.py install for fastpath ... error
ERROR: Command errored out with exit status 1:
command: /mnt/data/home/lyj/miniconda2/envs/pgcgap/bin/python3.6 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-lfrz6s87/fastpath/setup.py'"'"'; file='"'"'/tmp/pip-install-lfrz6s87/fastpath/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-8qq524vd/install-record.txt --single-version-externally-managed --compile --install-headers /mnt/data/home/lyj/miniconda2/envs/pgcgap/include/python3.6m/fastpath
cwd: /tmp/pip-install-lfrz6s87/fastpath/
Complete output (10 lines):
running install
running build
running build_ext
building 'fastpath' extension
creating build
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/src
gcc -DNDEBUG -fwrapv -O2 -Wall -Wstrict-prototypes -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /mnt/data/home/lyj/miniconda2/envs/pgcgap/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /mnt/data/home/lyj/miniconda2/envs/pgcgap/include -fPIC -I. -I... -I/tmp/pip-install-lfrz6s87/fastpath/include -I/mnt/data/home/lyj/miniconda2/envs/pgcgap/include/python3.6m -c src/fastpath-py.c -o build/temp.linux-x86_64-3.6/src/fastpath-py.o
gcc: error: unrecognized command line option ‘-fno-plt’
error: command 'gcc' failed with exit status 1
----------------------------------------
ERROR: Command errored out with exit status 1: /mnt/data/home/lyj/miniconda2/envs/pgcgap/bin/python3.6 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-lfrz6s87/fastpath/setup.py'"'"'; file='"'"'/tmp/pip-install-lfrz6s87/fastpath/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-8qq524vd/install-record.txt --single-version-externally-managed --compile --install-headers /mnt/data/home/lyj/miniconda2/envs/pgcgap/include/python3.6m/fastpath Check the logs for full command output.
`

Any help is greatly appreciated!

YangjieLi

Add a zenodo id

Can you please link to zenodo and then make a new release and you will get a DOI. Then you can add the badge, too

Warning: tRNAscan not found, proceding without tRNA masking.

Dear developer,
Thank you for making avaible this pipeline publicly.

Why this warning? Does this mean the tool cannot detect the tRNA sequence region when it is present?

Below the output on my run:

mifer@fmichodigni:~/PHANOTATE$ ./phanotate.py tests/NC_001416.1.fasta
Warning: tRNAscan not found, proceding without tRNA masking.
#id: NC_001416
#START STOP FRAME CONTIG SCORE
191 736 + NC_001416 -3012.018383359018091774783529
...............

Protein sequence from gene call

What's the best way to determine if an ORF is protein coding and the protein sequence that it encodes?

Protein Fasta format output (.faa)?

How do you output a protein fasta (.faa)?
Also, I keep getting an error for start/stop options?
Do I use prodigal on the orfs this calls?

Makefile error

Showing 2 errors during install below:

git clone --recursive [email protected]:deprekate/PHANOTATE.git
Cloning into 'PHANOTATE'...
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

PHANOTATE-master kshingle$ make
cd fastpath && /Applications/Xcode.app/Contents/Developer/usr/bin/make
make[1]: *** No targets specified and no makefile found. Stop.
make: *** [all] Error 2

multithreading

Any chance you have multithreading implemented? If I have a single file with thousands of genomes it'd be super helpful to not have to split it into thousands of little files under the hood.

Thanks so much,
Braden

suitable for metagenomic assemblies?

Dear deprekate

Thank you so much for developing this, I'm sure it'll come extremely useful for anyone interested in phages :)

Just wanted to ask you if PHANOTATE is suitable for gene calling in phage contigs retrieved from metagenomic assemblies?

Thanks!

missing phanotate_lastal_alignments.tgz

The alignments file is not available anymore https://edwards.sdsu.edu/data/phanotate_lastal_alignments.tgz
Is there another way of recreating the proteins?

Truncated proteins at genome end

Hi,
I am encountering cases where there are truncated proteins (missing stop codon) called by phanotate at the genome end. The common case was when there was another protein on the other side of the contig which was missing a start codon. Thus, when rotating the genome, a complete CDS would be found.
However, after rotating the genome, I still see rare cases of truncated proteins at genome end for which I can't find a logical continuation on the other side of the contig. Is this the intended behavior? Should I discard these proteins in post processing?
Thanks,
Ilya.

TypeError: a bytes-like object is required, not 'str'

Hello! I've been happily calling genes using phanotate but I ran across 4 genomes on NCBI that mysteriously resist my attempts: MG945313, MG945322, MK765556 and MK765660. They all give me the error pasted below. It does not have anything to do with the file name, fasta header or the file itself as far as I can tell, since changing those makes no difference. Simply adding the offending genome into a previously working file causes an error so it appears to be the nucleotide sequence?

Traceback (most recent call last):
File "/stor/work/Ochman/paul/PHANOTATE/phanotate.py", line 40, in
my_graph = functions.get_graph(my_orfs)
File "/stor/work/Ochman/paul/PHANOTATE/lib/functions.py", line 354, in get_graph
add_trnas(my_orfs, G)
File "/stor/work/Ochman/paul/PHANOTATE/lib/functions.py", line 469, in add_trnas
column = line.split('\t')
TypeError: a bytes-like object is required, not 'str'