mircare / porter5 Goto Github PK
View Code? Open in Web Editor NEWFast, state-of-the-art ab initio prediction of protein secondary structure in 3 and 8 classes
Home Page: http://distilldeep.ucd.ie/porter/
License: Other
Fast, state-of-the-art ab initio prediction of protein secondary structure in 3 and 8 classes
Home Page: http://distilldeep.ucd.ie/porter/
License: Other
I get the following error:
> python3 Porter5/Porter5.py -i Porter5/example/2FLGA.fasta --cpu 4 --fast
> ~~~~~~~~~ Prediction of Porter5/example/2FLGA.fasta started ~~~~~~~~~
> wc: Porter5/example/2FLGA.fasta.psi: No such file or directory
> awk: fatal: cannot open file `Porter5/example/2FLGA.fasta.psi' for reading (No such file or directory)
> HHblits executed in 1.23s
> Traceback (most recent call last):
> File "/mnt/data/software/Porter5/Porter5/scripts/process-alignment.py", line 37, in <module>
> sequences = lines[0] = len(lines) - 1
> IndexError: list assignment index out of range
> Traceback (most recent call last):
> File "Porter5/Porter5.py", line 82, in <module>
> flatpsi_ann = open(filename+".flatpsi.ann", "r").readlines()
> FileNotFoundError: [Errno 2] No such file or directory: 'Porter5/example/2FLGA.fasta.flatpsi.ann'
I know it has been mentioned in 2 other issues but the respective solutions did not help me.
[DEFAULT]
psiblast = /mnt/data/software/Porter5/blast/ncbi-blast-2.10.1+/bin/psiblast
uniref90 = /mnt/data/software/Porter5/uniref90/uniref90.fasta
hhblits = /usr/bin/hhblits
uniprot20 = /mnt/data/software/Porter5/uniprot20/uniprot20_2016_02/uniprot20_2016_02
There is only the intermediary file 2FLGA.fasta.flatpsi
contents of 2FLGA.fasta.log (I checked uniprot20_2016_02_cs219.ffindex, it is not empty and does not seem corrupted):
- 20:47:21.837 INFO: Search results will be written to Porter5/example/2FLGA.hhr
ffindex.c:452 ffindex_index_parse: mlock: Cannot allocate memory
- 20:47:22.228 WARNING: In /build/hhsuite-rhvmkl/hhsuite-3.0~beta2+dfsg/src/hhdatabase.cpp:50: FFindexDatabase:
- 20:47:22.228 WARNING: Could not read index file/mnt/data/software/Porter5/uniprot20/uniprot20_2016_02/uniprot20_2016_02_cs219.ffindex. Is the file empty or corrupted?
ffindex.c:452 ffindex_index_parse: mlock: Cannot allocate memory
- 20:47:22.657 WARNING: In /build/hhsuite-rhvmkl/hhsuite-3.0~beta2+dfsg/src/hhdatabase.cpp:50: FFindexDatabase:
- 20:47:22.657 WARNING: Could not read index file/mnt/data/software/Porter5/uniprot20/uniprot20_2016_02/uniprot20_2016_02_a3m.ffindex. Is the file empty or corrupted?
I read your paper and found that you need to run psiblast to get the pssm matrix of the amino acid sequence. I also use psiblast to run under the NR database. It takes 1 hour for an amino acid sequence. Why is your psiblast so fast? Is my problem? my command line is:
psiblast -query test.fasta -db ./blast/db/nr/nr -num_threads 10 -out_ascii_pssm test.pssm -num_iterations 3 -evalue 0.001
I have a question
I pulled the mircare/porter5 image from the docker hub
and set the config.ini like below
[DEFAULT]
psiblast = psiblast
uniref90 = /uniref90/uniref90.fasta
hhblits = hhblits
uniprot20 = /uniprot20/uniprot20_2016_20/uniprot20_2016_20
and then insert the command below
python3 Porter5.py -i query/2FLGA.fasta --cpu 4 --setup
PSI-BLAST executed in 1302.09s
wc: query/2FLGA.fasta.psi: No such file or directory
awk: cannot open query/2FLGA.fasta.psi (No such file or directory)
HHblits executed in 0.07s
Traceback (most recent call last):
File "/Porter5/scripts/process-alignment.py", line 37, in <module>
sequences = lines[0] = len(lines) - 1
IndexError: list assignment index out of range
Traceback (most recent call last):
File "Porter5.py", line 82, in <module>
flatpsi_ann = open(filename+".flatpsi.ann", "r").readlines()
FileNotFoundError: [Errno 2] No such file or directory: 'query/2FLGA.fasta.flatpsi.ann'
I don't know why this error happened?
Anybody can help me?
Hi All,
First off, my system runs Pop!_OS linux with 16 gb memory and a Ryzen 5 5600x 6 core processor. I run Porter5 in parallel on multiple sequences with CPU set to 3 and parallel set to 2, though I still have the below issue with other settings.
Usually on the first test run I have no issues, but after a couple of test runs on a particular class of proteins I get the following error in the log file:
13:39:28.388 INFO: Search results will be written to test011_TCF3_ZNF384.hhr
13:39:29.524 INFO: Searching 8290206 column state sequences.
13:39:29.552 INFO: test011_TCF3_ZNF384.fasta is in A2M, A3M or FASTA format
13:39:29.553 INFO: Iteration 1
13:39:29.853 INFO: Prefiltering database
13:41:21.032 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment) : 556889
13:41:28.048 WARNING: database contains sequences that exceeds maximum allowed size (maxres = 20001). Maxres can be increased with parameter -maxres.
13:41:28.153 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment) : 99165
13:41:28.153 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 99165
13:41:28.153 INFO: Scoring 99165 HMMs using HMM-HMM Viterbi alignment
13:41:28.180 INFO: Alternative alignment: 0
13:41:28.512 INFO: 2000 alignments done
13:41:28.958 INFO: 4000 alignments done
13:41:29.513 INFO: 6000 alignments done
13:41:30.172 INFO: 8000 alignments done
13:41:30.940 INFO: 10000 alignments done
13:41:31.826 INFO: 12000 alignments done
13:41:32.833 INFO: 14000 alignments done
13:41:33.956 INFO: 16000 alignments done
13:41:35.187 INFO: 18000 alignments done
13:41:36.559 INFO: 20000 alignments done
13:41:38.078 INFO: 22000 alignments done
13:41:39.746 INFO: 24000 alignments done
13:41:41.591 INFO: 26000 alignments done
13:41:43.606 INFO: 28000 alignments done
13:41:45.820 INFO: 30000 alignments done
13:41:48.245 INFO: 32000 alignments done
13:41:50.946 INFO: 34000 alignments done
13:41:53.940 INFO: 36000 alignments done
13:41:57.304 INFO: 38000 alignments done
13:42:01.173 INFO: 40000 alignments done
13:42:05.471 INFO: 42000 alignments done
13:42:09.795 INFO: 44000 alignments done
13:42:12.701 INFO: 46000 alignments done
13:42:14.711 INFO: 48000 alignments done
13:42:16.676 INFO: 50000 alignments done
13:42:18.732 INFO: 52000 alignments done
13:42:20.696 INFO: 54000 alignments done
13:42:20.699 WARNING: Number of match columns too large. Only first 19999 match columns will be kept!
13:42:20.704 WARNING: Number of match columns too large. Only first 19999 match columns will be kept!
13:42:22.779 INFO: 56000 alignments done
13:42:22.783 WARNING: Number of match columns too large. Only first 19999 match columns will be kept!
13:42:24.820 INFO: 58000 alignments done
13:42:26.784 INFO: 60000 alignments done
13:42:28.723 INFO: 62000 alignments done
13:42:30.772 INFO: 64000 alignments done
13:42:32.774 INFO: 66000 alignments done
13:42:34.843 INFO: 68000 alignments done
13:42:36.799 INFO: 70000 alignments done
13:42:38.719 INFO: 72000 alignments done
13:42:40.700 INFO: 74000 alignments done
13:42:42.624 INFO: 76000 alignments done
13:42:44.604 INFO: 78000 alignments done
13:42:46.505 INFO: 80000 alignments done
13:42:48.453 INFO: 82000 alignments done
13:42:50.438 INFO: 84000 alignments done
13:42:52.390 INFO: 86000 alignments done
13:42:54.348 INFO: 88000 alignments done
13:42:56.306 INFO: 90000 alignments done
13:42:58.295 INFO: 92000 alignments done
13:42:58.297 INFO: Stop after DB-HHM: 92000 because early stop 18.7068 < filter cutoff 20
13:42:58.300 INFO: Alternative alignment: 1
13:42:58.303 WARNING: Number of match columns too large. Only first 19999 match columns will be kept!
13:42:58.308 WARNING: Number of match columns too large. Only first 19999 match columns will be kept!
13:42:58.313 WARNING: Number of match columns too large. Only first 19999 match columns will be kept!
13:44:28.849 INFO: 87853 alignments done
13:44:28.985 INFO: Alternative alignment: 2
13:44:28.988 WARNING: Number of match columns too large. Only first 19999 match columns will be kept!
13:44:28.993 WARNING: Number of match columns too large. Only first 19999 match columns will be kept!
13:44:28.000 WARNING: Number of match columns too large. Only first 19999 match columns will be kept!
13:45:41.853 INFO: 61705 alignments done
13:45:41.914 INFO: Alternative alignment: 3
13:45:41.917 WARNING: Number of match columns too large. Only first 19999 match columns will be kept!
13:45:41.921 WARNING: Number of match columns too large. Only first 19999 match columns will be kept!
13:45:41.927 WARNING: Number of match columns too large. Only first 19999 match columns will be kept!
13:46:41.174 INFO: 47257 alignments done
Killed
Any help would be appreciated, and I can provide an example FASTA if need be to reproduce the error!
Thanks!
Scott
here is the log I got after running the program in anaconda prompt:
(base) C:\Users\Username\OneDrive\Desktop\Porter5>pwd
/c/Users/Username/OneDrive/Desktop/Porter5
(base) C:\Users\Username\OneDrive\Desktop\Porter5>Porter5.py -i C:\Users\Username\Desktop\Porter5\example\2FLGA.fasta --setup
Please insert the absolute path to psiblast (e.g., /home/username/psiblast): /c/Users/ Username/OneDrive/Desktop/psiblast
Please insert the absolute path to uniref90 (e.g., /home/username/UniProt/uniref90.fasta): ###
Please insert the call to HHblits (e.g., hhblits): hhblits
Please insert the absolute path to uniprot20 - DATABASE NAME INCLUDED (e.g., /home/username/uniprot20_2016_02/uniprot20_2016_02): /c/Users/ Username/OneDrive/Desktop/uniprot20_2016_02/uniprot20_2016_02
The filename, directory name, or volume label syntax is incorrect.
Setup completed successfully. If you encounter any problems in future, please run "python3 Porter5.py --setup". <<<<<
The system cannot find the path specified.
[main 2022-09-25T18:30:23.102Z] update#setState idle
[main 2022-09-25T18:30:47.275Z] Starting extension host with pid 1776 (fork() took 134 ms).
(node:12820) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
(Use `Code --trace-deprecation ...` to show where the warning was created)
[main 2022-09-25T18:31:09.604Z] update#setState checking for updates
[main 2022-09-25T18:31:09.691Z] update#setState idle
[8132:0925/143215.472:ERROR:gpu_init.cc(481)] Passthrough is not supported, GL is disabled, ANGLE is
After this It then opens visual studio code on a file called process-blast.pl and then seemingly freezes.
and then when I tried to run it again with the added --fast command this happened even though numpy is installed:
(base) C:\Users\Username\OneDrive\Desktop\Porter5>python3 Porter5.py -i C:\Users\Username\OneDrive\Desktop\Porter5\example\2FLGA.fasta --cpu 4 --fast
~~~~~~~~~ Prediction of C:\Users\Username\OneDrive\Desktop\Porter5\example\2FLGA.fasta started ~~~~~~~~~
The process cannot access the file because it is being used by another process.
HHblits executed in 0.61s
Traceback (most recent call last):
File "C:\Users\Username\OneDrive\Desktop\Porter5\scripts\process-alignment.py", line 12, in <module>
import numpy
ModuleNotFoundError: No module named 'numpy'
Traceback (most recent call last):
File "C:\Users\Username\OneDrive\Desktop\Porter5\Porter5.py", line 82, in <module>
flatpsi_ann = open(filename+".flatpsi.ann", "r").readlines()
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Username\\OneDrive\\Desktop\\Porter5\\example\\2FLGA.fasta.flatpsi.ann'
Hi!
I've been doing some test with both the standalone version of Porter and the web version, and I'm looking into the differences of each output.
Web output: http://distilldeep.ucd.ie/~gianluca/distill_results/query160365365118818.html
Standalone output: https://github.com/mircare/Porter5/blob/master/example/2FLGA.ss8
Is there any way to convert the standalone output into something like the web output? What are the main differences between the two?
Thanks in advance!
Thank you for the quick response!
I tried python3 Porter5.py -i example/2FLGA.fasta --cpu 4 --setup, the setup completed successfully while the FileNotFoundError still exists.
I think I encountered almost the same problem stated here: https://stackoverflow.com/questions/55283434/unable-to-run-porter5-generating-flatpsi-file-instead-of-psi
Here's my default setting:
hhblits = '/home/username/hh-suite/build/bin/hhblits'
uniprot20 = '/home/username/Downloads/uniprot20_2016_02'
I commented out psiblast and uniref90 since I was running the fast version. btw, I'm working on ubuntu 18.04.
Here are the error messages:
wc: Porter5/example/2FLGA.fasta.psi: No such file or directory
awk: cannot open Porter5/example/2FLGA.fasta.psi (No such file or directory)
HHblits executed in 0.02s
Traceback (most recent call last):
File "/home/usr/Porter5/scripts/process-alignment.py", line 37, in
sequences = lines[0] = len(lines) - 1
IndexError: list assignment index out of range
Traceback (most recent call last):
File "Porter5/Porter5.py", line 86, in
flatpsi_ann = open(filename+".flatpsi.ann", "r").readlines()
FileNotFoundError: [Errno 2] No such file or directory: 'Porter5/example/2FLGA.fasta.flatpsi.ann'
Hello! Please forgive me if this issue has already been addressed.
I just began using the Porter5 script but reach an error after the psiblast application finishes running. Porter5 output states that the following:
wc: PorterTest.fa.psi: open: No such file or directory
awk: can't open file PorterTest.fa.psi
source line number 1
sed: -i: No such file or directory
sed: -i: No such file or directory
Looking at psiblast's outputs, I have the following:
PorterTest.fa.chk
PorterTest.fa.blastpgp
PorterTest.fa.flatblast
PorterTest.fa.flatpsi
PorterTest.fa.log
Any recommendations/advice would be appreciated. Thank you.
Hi,
I would love to use Porter5, but I have some problems with the initial steps, which seems to be addressed in one or the other previous issues, however testing the path to the databases did not help, so I guess my paths are somehow not fully correct. Here is my config.ini
[DEFAULT]
psiblast = <path_to_packages>/packages/databases/psi-blast/ncbi-blast-2.8.1+
uniref90 = <path_to_packages>/packages/databases/uniref90
hhblits = <path_to_packages>/packages/hh-suite
uniprot20 = <path_to_packages>/packages/databases/uniprot20_2016_02
How should it look like?
Thanks in advance
Hello! Would there happen to be a Predict file that would be compatible with MacOS systems? Any advice/recommendations would be appreciated.
Hi, @mircare
I ran the command below and it did not work.
$ python3 /home/indigomad/Porter5/Porter5.py -i 2jsvX.fasta --cpu 8
~~~~~~~~~ Prediction of 2jsvX.fasta started ~~~~~~~~~
PSI-BLAST executed in 432.47s
HHblits executed in 19.60s
Alignments encoded in 0.61s
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
sh: Line 1: 13502 (Core Dump)/home/indigomad/Porter5/scripts/Predict_BRNN/Predict /home/indigomad/Porter5/scripts/Predict_BRNN/models/modelv8_ss3 2jsvX.fasta.flatpsi.ann > /dev/null
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
sh: Line 1: 13506 (Core Dump)/home/indigomad/Porter5/scripts/Predict_BRNN/Predict /home/indigomad/Porter5/scripts/Predict_BRNN/models/modelv7_ss3 2jsvX.fasta.flatblast.ann > /dev/null
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
sh: Line 1: 13510 (Core Dump)/home/indigomad/Porter5/scripts/Predict_BRNN/Predict /home/indigomad/Porter5/scripts/Predict_BRNN/models/modelv78_ss3 2jsvX.fasta.flatblastpsi.ann > /dev/null
Prediction in 3 classes made in 0.15s
Traceback (most recent call last):
File "/home/indigomad/Porter5/Porter5.py", line 122, in <module>
prob_hh = list(map(float, open(filename+".flatpsi.ann.probsF", "r").readlines()[3].split()))
FileNotFoundError: [Errno 2] No such file or directory: '2jsvX.fasta.flatpsi.ann.probsF'
Hi, I was trying to see whether training of the network could be done (ie: not only predictions) using the existing codebase.
I am seeing functions relating to backpropagation in various files, however, I am not quite sure where to start.
Any directions or insights are highly appreciated. Thank you!
Hello! I'm not entirely sure why, but Porter5 was working for the past few runs I was doing with the --flag. However, when I remove the flag to include psiblast, I began encountering troubles that resurfaced when I returned back to the --fast flag. I was wondering if you could help me out.
After originally running,
~/Porter5/Porter5.py -i PorterTest.fa --fast
I get the following message after the hhblits alignment:
sed: -i: No such file or directory
HHblits executed in 387.83s
sh: python3: command not found
Traceback (most recent call last):
File "/Users/labadmin/Porter5/Porter5.py", line 76, in
flatpsi_ann = open(filename+".flatpsi.ann", "r").readlines()
IOError: [Errno 2] No such file or directory: 'PorterTest.fa.flatpsi.ann'
I recognize that the .ann file is created by Process-Alignment, so from there I tried running that script alone.
STK-CHEM-Q1F8J8:PTestOutput labadmin$ python /Users/labadmin/Porter5/scripts/process-alignment.py PorterTest.fa.flatpsi flatpsi 1
Traceback (most recent call last):
File "/Users/labadmin/Porter5/scripts/process-alignment.py", line 94, in
weights[i] = weights[i] - math.log(frequencies[j][aa[lines[i+1][j]]])
ValueError: math domain error
Would this happen to still be a problem with the sed -i flag discussed in the earlier issue? Please let me know if any files/outputs should be submitted.
Hi
I have a question
after performing 'python Porter5/Porter5.py -i Porter5/example/2FLGA.fasta --cpu 4'
I got the '2FLGA.hhr' this file
No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM
1 tr|Q9U712|Q9U712_9APIC Merozoi 99.8 4.5E-25 2.3E-30 140.3 0.0 46 3-48 4-50 (95)
No 1
>tr|Q9U712|Q9U712_9APIC Merozoite surface protein-1 (Fragment) OS=Plasmodium yoelii PE=4 SV=1
Probab=99.84 E-value=4.5e-25 Score=140.34 Aligned_cols=46 Identities=85% Similarity=1.506 Sum_probs=45.0 Template_Neff=1.900
Q 2FLGA 3 SQHQCVKK-QCPQNSGCFRHLDEREECKCLLNYKQEGDKCVENPNPT 48 (48)
Q Consensus 3 ~~H~Ci~t-~~p~NAgCyr~~~g~Ee~RCll~fk~~~~kCv~~~~~t 48 (48)
|+|+||++ ++|+|||||||+||+||||||||||+++++||++++||
T Consensus 4 s~hvCi~tr~~P~nagCfRyddg~EEwrCLLgfKk~~~~Cv~d~~pt 50 (95)
T tr|Q9U712|Q9U7 4 SQHQCVKTRQCPENAGCFRYLDGREEWRCLLNYKQEGDKCVENPNPT 50 (95)
Confidence 99999999 99999999999999999999999999999999999886
Where is the secondary prediction result?
below is the complete log, any suggestion to fix it?
$ python3 /Users/username/Porter5/Porter5.py -i /Users/username/Porter5/example/2FLGA.fasta --cpu 4 --fast
Please insert the absolute path to psiblast (e.g., /home/username/psiblast):
Please insert the absolute path to uniref90 (e.g., /home/username/UniProt/uniref90.fasta):
Please insert the call to HHblits (e.g., hhblits): /Users/username/hh-suite/build/bin/hhblits
Please insert the absolute path to uniprot20 - DATABASE NAME INCLUDED (e.g., /home/username/uniprot20_2016_02/uniprot20_2016_02): /Users/username/Databases/uniprot20_2016_02/uniprot20_2016_02
g++ -c -O3 Layer.cxx
Layer.cxx:11:10: fatal error: 'omp.h' file not found
#include <omp.h>
^~~~~~~
1 error generated.
make: *** [Layer.o] Error 1
Setup completed successfully. If you encounter any problems in future, please run "python3 Porter5.py --setup". <<<<<
HHblits executed in 39.15s
Alignments encoded in 0.13s
sh: /Users/username/Porter5/scripts/Predict_BRNN/Predict: Permission denied
Prediction in 3 classes made in 0.00s
Traceback (most recent call last):
File "/Users/username/Porter5/Porter5.py", line 120, in <module>
prob_hh = list(map(float, open(filename+".flatpsi.ann.probsF", "r").readlines()[3].split()))
FileNotFoundError: [Errno 2] No such file or directory: '/Users/username/Porter5/example/2FLGA.fasta.flatpsi.ann.probsF'
Hi,
I can make ss prediction normally on my local PC. However when I deployed the tool on a cloud computing instance, the predictions of ss3 and ss8 are all null like the following.
# AA SS Helix Sheet Coil
1 M H 0.0 0.0 0.0
2 P H 0.0 0.0 0.0
3 T H 0.0 0.0 0.0
4 I H 0.0 0.0 0.0
5 L H 0.0 0.0 0.0
6 G H 0.0 0.0 0.0
7 Q H 0.0 0.0 0.0
8 N H 0.0 0.0 0.0
9 Q H 0.0 0.0 0.0
10 Y H 0.0 0.0 0.0
There exists corresponding .psi
and .hhr
files with non-trivial content, which may indicate that the HHblits functions correctly.
Is there any idea about why this happens? I'm running with ordinary arguments like: python Porter5.py -I ${fasta_file} --cpu 12 --fast
Thanks you for developing this amazing tool
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.