Comments (23)
Hello,
thank you for reporting this. This behavior is kind of intended.
The internal alignment functions in hhblits are written with simd instructions.
This way four templates can be processed simultaneously.
However, if one of those templates in such a bulk does not have
secondary structure annotation, then no seconary score is calculated in this bulk.
The responsible programmer (@martin-steinegger) can probably describe this better.
For your problem we might add secondary structure predictions to our pfam database.
Cheers,
Markus
from hh-suite.
Understood. Do you know why we do not encounter this problem when running with both DBs on the web-based tool?
from hh-suite.
It would be great for us if you could add SS to the pFAM: how long might this take?
from hh-suite.
Also, is there an option to turn off SIMD?
from hh-suite.
the web-based tool still uses an old version of hhblits; they run two hhblits searches first against the first database, the second search against the second database; afterwards they merge the results
in the old hhblits version we did not have the problem with simd instructions and secondary structure scoring
the annotation of ss to the pfam would take a couple of days (< 5 i assume)... i will update the database pipeline, so future releases will also have the secondary structure prediction
there is no option to turn off simd, you can limit the simd instructions to ssse3 (the option is described in the manual).
from hh-suite.
forgot to mention: the limitation of the simd instructions has to be done during building with cmake
from hh-suite.
OK, I don't think it's worth us getting a 4-fold slowdown to fix this, so I'll not try the no-SIMD approach. We'll wait until the pfam data is upgraded with SS. Will you let us know when this is ready?
J
from hh-suite.
The database is updated
from hh-suite.
Thanks very much
from hh-suite.
I get some negative SS values with the new database, is this expected behavior?
from hh-suite.
Can you show the alignment with the negative secondary structure scores?
from hh-suite.
The file header:
Query YPR199C Seqment 0
Match_columns 294
No_of_seqs 34 out of 1385
Neff 4.26402
Searched_HMMs 55100
Date Wed May 3 13:18:19 2017
Command hhsearch -remove_ss_cap -E 1000000000 -d /home/cceaiac/levine/databases/pdb70/pdb70 -ssm 4 -cpu 1 -o /home/cceaiac/Scratch/Levine/results/test_YPR199C/YPR199C.0.ssw11.hhr -i /home/cceaiac/Scratch/Levine/results/test_YPR199C/YPR199C.0.ss.a3m -v 2 -p 0 -cov 50 -ssw 0.11 -Z 5000 -d /home/cceaiac/levine/databases/pfamA_31/pfam
One example:
No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM
9 PF08601.9 ; PAP1 ; Transcripti 97.9 5E-10 9.1E-15 109.2 -12.6 58 235-292 297-356 (356)
The alignment:
No 9
>PF08601.9 ; PAP1 ; Transcription factor PAP1
Probab=97.86 E-value=5e-10 Score=109.22 Aligned_cols=58 Identities=22% Similarity=0.399 Sum_probs=54.4 Template_Neff=4.100
Q ss_pred CCCCeeecHHHHHHHHHhCCccc--CCCHHHHHHHHHHhCccCCCCeeccHHHHHHHHhh
Q YPR199C 235 FGGDVLLSAMDIWSFMKVHPKVN--TFDLEILGTELKKSATCSNFDILISLKHFIKVFSS 292 (294)
Q Consensus 235 ~~g~~lLt~~atWeyi~~~~~~~--~fDv~~v~~kLKg~~~C~g~Gp~~~~~~i~~~~~s 292 (294)
..++.+||+.++|+||..|+.++ +|||+.|+++|+++++|+|+|+||.+.+|+.+|.+
T Consensus 297 ~~~~~lLTcvqaWd~IqshPkF~~gd~DLD~LCseLr~KAKCsGfGaVVee~dVd~iL~k 356 (356)
T Q0CHW7_ASPTN/2 297 EDKTQMLSCTKIWDRLQSMEKFRNGEIDVDNLCSELRTKARCSEGGVVVNQKDVDDIMGR 356 (356)
T ss_pred cCCCceecHHHHHHHHHhChhhhCCCCCHHHHHHHHhhcCccCCCCCCCCHHHHHHHhcC
Confidence 35789999999999999999998 89999999999999999999999999999998863
from hh-suite.
You are right. That is a bug. Can you give us your query?
from hh-suite.
I'm attaching a zip file with outputs of every step of the search. The headers should have the information you need.
YPR199C_output.zip
Let me know if you need more input!
from hh-suite.
Could you please add your input query?
At the moment I assume, that you use: http://www.uniprot.org/uniprot/Q676V5.fasta
from hh-suite.
A! I think it's
YPR199C Seqment 0
MAKPRGRKGGRKPSLTPPKNKRAAQLRASQNAFRKRKLERLEELEKKEAQLTVTNDQIHILKKENELLHFMLRSLLTERNMPSDERNISKACCEEKPPTCNTLDGSVVLSSTYNSLEIQQCYVFFKQLLSVCVGKNCTVPSPLNSFDRSFYPIGCTNLSNDIPGYSFLNDAMSEIHTFGDFNGELDSTFLEFSGTEIKEPNNFITENTNAIETAAASMVIRQGFHPRQYYTVDAFGGDVLLSAMDIWSFMKVHPKVNTFDLEILGTELKKSATCSNFDILISLKHFIKVFSSKL*
from hh-suite.
You should get a warning with your hhsearch call:
- 17:25:49.101 WARNING: Ignoring unknown option -remove_ss_cap
Is that true? Where did you get your version of hhblits/hhsearch? What is this parameter supposed to do?
I could reproduce this bug. The responsible programmer will look into this.
Thank you for your patience.
from hh-suite.
@ilectra I should have fixed the problem with negative score. Please let me know whether the problem persists.
from hh-suite.
@martin-steinegger , it did solve the negative SS scores, but there are still some zeros there (these are just the first 20 matches):
Query YPR199C Seqment 0
Match_columns 294
No_of_seqs 34 out of 1376
Neff 4.26402
Searched_HMMs 52837
Date Mon May 15 14:12:08 2017
Command hhsearch -remove_ss_cap -E 1000000000 -d /home/cceaiac/levine/databases/pdb70/pdb70 -ssm 2 -cpu 1 -o /home/cceaiac/Scratch/Levine/results/test_YPR199C/YPR199C.0.ssw11.hhr -i /home/cceaiac/Scratch/Levine/results/test_YPR199C/YPR199C.0.ss.a3m -v 2 -p 0 -cov 50 -ssw 0.11 -Z 5000 -d /home/cceaiac/levine/databases/pfamA_31/pfam
No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM
1 1sse_B AP-1 like transcription 98.8 2.2E-11 4.3E-16 97.0 5.7 60 235-294 26-85 (86)
2 1gd2_E Transcription factor PA 98.7 7.5E-10 1.4E-14 81.8 10.3 68 11-81 1-68 (70)
3 PF08601.9 ; PAP1 ; Transcripti 98.6 4.6E-10 8.8E-15 109.3 6.4 58 235-292 297-356 (356)
4 1gu4_A CAAT/enhancer binding p 98.2 2.4E-07 4.6E-12 69.2 10.8 61 14-77 11-71 (78)
5 1ci6_A Transcription factor AT 98.1 2.8E-07 5.3E-12 65.6 9.1 57 18-77 2-58 (63)
6 PF00170.20 ; bZIP_1 ; bZIP tra 98.1 4.7E-07 8.9E-12 63.8 9.7 58 18-78 5-62 (64)
7 2dgc_A Protein (GCN4); basic d 98.0 4.2E-07 8E-12 64.4 8.8 57 15-74 6-62 (63)
8 2wt7_A Proto-oncogene protein 98.0 7E-07 1.3E-11 62.9 9.5 57 18-77 2-58 (63)
9 1jnm_A Proto-oncogene C-JUN; B 97.9 1.2E-06 2.2E-11 61.0 9.1 57 19-78 2-58 (62)
10 PF03131.16 ; bZIP_Maf ; bZIP M 97.9 1.6E-06 3E-11 66.8 9.7 59 18-79 30-88 (90)
11 1t2k_D Cyclic-AMP-dependent tr 97.8 2.9E-06 5.5E-11 58.9 9.1 55 19-76 2-56 (61)
12 PF07716.14 ; bZIP_2 ; Basic re 97.6 6.7E-06 1.3E-10 56.1 7.5 51 17-70 4-54 (55)
13 1hjb_A Ccaat/enhancer binding 97.2 2.3E-06 4.4E-11 65.8 0.0 61 15-78 12-72 (87)
14 1dh3_A Transcription factor CR 96.8 1.1E-05 2.1E-10 55.7 0.0 52 19-73 2-53 (55)
15 5apu_A General control protein 96.8 0.00018 3.5E-09 59.1 7.0 48 19-73 46-93 (95)
16 3a5t_A Transcription factor MA 96.8 1.5E-05 2.8E-10 64.7 0.0 62 18-82 37-98 (107)
17 4c46_A General control protein 95.7 0.0076 1.4E-07 47.9 6.6 51 18-72 26-76 (76)
18 2wt7_B Transcription factor MA 95.4 0.0011 2E-08 51.8 0.0 60 18-80 27-86 (90)
19 1deb_A APC protein, adenomatou 95.3 0.018 3.5E-07 42.7 6.1 43 38-83 2-44 (54)
20 1kd8_B GABH BLL, GCN4 acid bas 95.1 0.015 2.9E-07 40.2 4.5 35 39-76 1-35 (36)
from hh-suite.
I should mention that those were not zero before the fix.
from hh-suite.
And some of the SS scores are still negative, both when the search is run online, and in my local version - try
>C9orf72
MSTLCPPPSPAVAKTEIALSGKSPLLAATFAYWDNILGPRVRHIWAPKTE
QVLLSDGEITFLANHTLNGEILRNAESGAIDVKFFVLSEKGVIIVSLIFD
GNWNGDRSTYGLSIILPQTELSFYLPLHRVCVDRLTHIIRKGRIWMHKER
QENVQKIILEGTERMEDQGQSIIPMLTGEVIPVMELLSSMKSHSVPEEID
IADTVLNDDDIGDSCHEGFLLNAISSHLQTCGCSVVVGSSAEKVNKIVRT
LCLFLTPAERKCSRLCEAESSFKYESGLFVQGLLKDSTGSFVLPFRQVMY
APYPTTHIDVDVNTVKQMPPCHEHIYNQRRYMRSELTAFWRATSEEDMAQ
DTIIYTDESFTPDLNIFQDVLHRDTLVKAFLDQVFQLKPGLSLRSTFLAQ
FLLVLHRKALTLIKYIEDDTQKGKKPFKSLRNLKIDLDLTAEGDLNIIMA
LAEKIKPGLHSFIFGRPFYTSVQERDVLMTF
Just to make sure we're comparing the same thing, what's the exact software (git tag) and databses versions in the online tool?
from hh-suite.
@croth1 , @martin-steinegger , any news on that?
from hh-suite.
SS scores can be negative. You could check the SS structure alignment of this negative scoring hits.
The 0 at the SS scoring can still occur when mixing SS types at the target db. (e.g. If some hmms don't have a SS structure or if some have just DSSP and other just Predictions)
I'm currently busy with writing my thesis. I might change the 0 score problem afterwards.
from hh-suite.
Related Issues (20)
- Does the template searching result contain query sequence's structure or not? HOT 3
- HHsearch - inconsistent results HOT 1
- HHblits failed
- hhmake issue
- hhmake issue
- c++: fatal error: Killed signal terminated program cc1plus HOT 1
- error while loading shared libraries: libHH_OBJECTS.so HOT 2
- Request to update the PDB70 database HOT 3
- Both HHBlits and HHSearch give misaligned indels for homologous sequences
- PackagesNotFoundError HOT 3
- HHblits Prefiltering database run time on Supercomputer Cluster HOT 3
- Segmentation fault with ffindex_apply -- hhmake
- Python 3.11 Support HOT 1
- 您好 制作属于自己的蛋白质数据集时出了问题
- Using hh-suites to find E.coli gene in other Proteobacteria <advice on the way I am using the method>
- Pfam36 Update HOT 1
- Segmentation Fault Running HHsearch
- HHblits failed : Unrecognized HMM file format in '468479486'
- PDB70 errors
- addss.pl calls blastpgp and makemat which are legacy blast commands HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hh-suite.