Giter VIP home page Giter VIP logo

Comments (23)

meiermark avatar meiermark commented on July 18, 2024

Hello,

thank you for reporting this. This behavior is kind of intended.
The internal alignment functions in hhblits are written with simd instructions.
This way four templates can be processed simultaneously.
However, if one of those templates in such a bulk does not have
secondary structure annotation, then no seconary score is calculated in this bulk.
The responsible programmer (@martin-steinegger) can probably describe this better.

For your problem we might add secondary structure predictions to our pfam database.

Cheers,
Markus

from hh-suite.

jamespjh avatar jamespjh commented on July 18, 2024

Understood. Do you know why we do not encounter this problem when running with both DBs on the web-based tool?

from hh-suite.

jamespjh avatar jamespjh commented on July 18, 2024

It would be great for us if you could add SS to the pFAM: how long might this take?

from hh-suite.

jamespjh avatar jamespjh commented on July 18, 2024

Also, is there an option to turn off SIMD?

from hh-suite.

meiermark avatar meiermark commented on July 18, 2024

the web-based tool still uses an old version of hhblits; they run two hhblits searches first against the first database, the second search against the second database; afterwards they merge the results
in the old hhblits version we did not have the problem with simd instructions and secondary structure scoring

the annotation of ss to the pfam would take a couple of days (< 5 i assume)... i will update the database pipeline, so future releases will also have the secondary structure prediction

there is no option to turn off simd, you can limit the simd instructions to ssse3 (the option is described in the manual).

from hh-suite.

meiermark avatar meiermark commented on July 18, 2024

forgot to mention: the limitation of the simd instructions has to be done during building with cmake

from hh-suite.

jamespjh avatar jamespjh commented on July 18, 2024

OK, I don't think it's worth us getting a 4-fold slowdown to fix this, so I'll not try the no-SIMD approach. We'll wait until the pfam data is upgraded with SS. Will you let us know when this is ready?

J

from hh-suite.

meiermark avatar meiermark commented on July 18, 2024

The database is updated

from hh-suite.

jamespjh avatar jamespjh commented on July 18, 2024

Thanks very much

from hh-suite.

ilectra avatar ilectra commented on July 18, 2024

I get some negative SS values with the new database, is this expected behavior?

from hh-suite.

meiermark avatar meiermark commented on July 18, 2024

Can you show the alignment with the negative secondary structure scores?

from hh-suite.

ilectra avatar ilectra commented on July 18, 2024

The file header:

Query         YPR199C Seqment 0
Match_columns 294
No_of_seqs    34 out of 1385
Neff          4.26402
Searched_HMMs 55100
Date          Wed May  3 13:18:19 2017
Command       hhsearch -remove_ss_cap -E 1000000000 -d /home/cceaiac/levine/databases/pdb70/pdb70 -ssm 4 -cpu 1 -o /home/cceaiac/Scratch/Levine/results/test_YPR199C/YPR199C.0.ssw11.hhr -i /home/cceaiac/Scratch/Levine/results/test_YPR199C/YPR199C.0.ss.a3m -v 2 -p 0 -cov 50 -ssw 0.11 -Z 5000 -d /home/cceaiac/levine/databases/pfamA_31/pfam 

One example:

 No Hit                              Prob E-value P-value  Score  SS   Cols Query HMM  Template HMM
   9 PF08601.9 ; PAP1 ; Transcripti  97.9   5E-10 9.1E-15  109.2 -12.6   58  235-292  297-356 (356)

The alignment:

No 9
>PF08601.9 ; PAP1 ; Transcription factor PAP1
Probab=97.86  E-value=5e-10  Score=109.22  Aligned_cols=58  Identities=22%  Similarity=0.399  Sum_probs=54.4  Template_Neff=4.100

Q ss_pred             CCCCeeecHHHHHHHHHhCCccc--CCCHHHHHHHHHHhCccCCCCeeccHHHHHHHHhh
Q YPR199C         235 FGGDVLLSAMDIWSFMKVHPKVN--TFDLEILGTELKKSATCSNFDILISLKHFIKVFSS  292 (294)
Q Consensus       235 ~~g~~lLt~~atWeyi~~~~~~~--~fDv~~v~~kLKg~~~C~g~Gp~~~~~~i~~~~~s  292 (294)
                      ..++.+||+.++|+||..|+.++  +|||+.|+++|+++++|+|+|+||.+.+|+.+|.+
T Consensus       297 ~~~~~lLTcvqaWd~IqshPkF~~gd~DLD~LCseLr~KAKCsGfGaVVee~dVd~iL~k  356 (356)
T Q0CHW7_ASPTN/2  297 EDKTQMLSCTKIWDRLQSMEKFRNGEIDVDNLCSELRTKARCSEGGVVVNQKDVDDIMGR  356 (356)
T ss_pred             cCCCceecHHHHHHHHHhChhhhCCCCCHHHHHHHHhhcCccCCCCCCCCHHHHHHHhcC
Confidence            35789999999999999999998  89999999999999999999999999999998863

from hh-suite.

meiermark avatar meiermark commented on July 18, 2024

You are right. That is a bug. Can you give us your query?

from hh-suite.

ilectra avatar ilectra commented on July 18, 2024

I'm attaching a zip file with outputs of every step of the search. The headers should have the information you need.
YPR199C_output.zip

Let me know if you need more input!

from hh-suite.

meiermark avatar meiermark commented on July 18, 2024

Could you please add your input query?
At the moment I assume, that you use: http://www.uniprot.org/uniprot/Q676V5.fasta

from hh-suite.

ilectra avatar ilectra commented on July 18, 2024

A! I think it's

YPR199C Seqment 0
MAKPRGRKGGRKPSLTPPKNKRAAQLRASQNAFRKRKLERLEELEKKEAQLTVTNDQIHILKKENELLHFMLRSLLTERNMPSDERNISKACCEEKPPTCNTLDGSVVLSSTYNSLEIQQCYVFFKQLLSVCVGKNCTVPSPLNSFDRSFYPIGCTNLSNDIPGYSFLNDAMSEIHTFGDFNGELDSTFLEFSGTEIKEPNNFITENTNAIETAAASMVIRQGFHPRQYYTVDAFGGDVLLSAMDIWSFMKVHPKVNTFDLEILGTELKKSATCSNFDILISLKHFIKVFSSKL*

from hh-suite.

meiermark avatar meiermark commented on July 18, 2024

You should get a warning with your hhsearch call:

  • 17:25:49.101 WARNING: Ignoring unknown option -remove_ss_cap

Is that true? Where did you get your version of hhblits/hhsearch? What is this parameter supposed to do?

I could reproduce this bug. The responsible programmer will look into this.
Thank you for your patience.

from hh-suite.

martin-steinegger avatar martin-steinegger commented on July 18, 2024

@ilectra I should have fixed the problem with negative score. Please let me know whether the problem persists.

from hh-suite.

ilectra avatar ilectra commented on July 18, 2024

@martin-steinegger , it did solve the negative SS scores, but there are still some zeros there (these are just the first 20 matches):

Query         YPR199C Seqment 0
Match_columns 294
No_of_seqs    34 out of 1376
Neff          4.26402
Searched_HMMs 52837
Date          Mon May 15 14:12:08 2017
Command       hhsearch -remove_ss_cap -E 1000000000 -d /home/cceaiac/levine/databases/pdb70/pdb70 -ssm 2 -cpu 1 -o /home/cceaiac/Scratch/Levine/results/test_YPR199C/YPR199C.0.ssw11.hhr -i /home/cceaiac/Scratch/Levine/results/test_YPR199C/YPR199C.0.ss.a3m -v 2 -p 0 -cov 50 -ssw 0.11 -Z 5000 -d /home/cceaiac/levine/databases/pfamA_31/pfam

 No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
  1 1sse_B AP-1 like transcription  98.8 2.2E-11 4.3E-16   97.0   5.7   60  235-294    26-85  (86)
  2 1gd2_E Transcription factor PA  98.7 7.5E-10 1.4E-14   81.8  10.3   68   11-81      1-68  (70)
  3 PF08601.9 ; PAP1 ; Transcripti  98.6 4.6E-10 8.8E-15  109.3   6.4   58  235-292   297-356 (356)
  4 1gu4_A CAAT/enhancer binding p  98.2 2.4E-07 4.6E-12   69.2  10.8   61   14-77     11-71  (78)
  5 1ci6_A Transcription factor AT  98.1 2.8E-07 5.3E-12   65.6   9.1   57   18-77      2-58  (63)
  6 PF00170.20 ; bZIP_1 ; bZIP tra  98.1 4.7E-07 8.9E-12   63.8   9.7   58   18-78      5-62  (64)
  7 2dgc_A Protein (GCN4); basic d  98.0 4.2E-07   8E-12   64.4   8.8   57   15-74      6-62  (63)
  8 2wt7_A Proto-oncogene protein   98.0   7E-07 1.3E-11   62.9   9.5   57   18-77      2-58  (63)
  9 1jnm_A Proto-oncogene C-JUN; B  97.9 1.2E-06 2.2E-11   61.0   9.1   57   19-78      2-58  (62)
 10 PF03131.16 ; bZIP_Maf ; bZIP M  97.9 1.6E-06   3E-11   66.8   9.7   59   18-79     30-88  (90)
 11 1t2k_D Cyclic-AMP-dependent tr  97.8 2.9E-06 5.5E-11   58.9   9.1   55   19-76      2-56  (61)
 12 PF07716.14 ; bZIP_2 ; Basic re  97.6 6.7E-06 1.3E-10   56.1   7.5   51   17-70      4-54  (55)
 13 1hjb_A Ccaat/enhancer binding   97.2 2.3E-06 4.4E-11   65.8   0.0   61   15-78     12-72  (87)
 14 1dh3_A Transcription factor CR  96.8 1.1E-05 2.1E-10   55.7   0.0   52   19-73      2-53  (55)
 15 5apu_A General control protein  96.8 0.00018 3.5E-09   59.1   7.0   48   19-73     46-93  (95)
 16 3a5t_A Transcription factor MA  96.8 1.5E-05 2.8E-10   64.7   0.0   62   18-82     37-98  (107)
 17 4c46_A General control protein  95.7  0.0076 1.4E-07   47.9   6.6   51   18-72     26-76  (76)
 18 2wt7_B Transcription factor MA  95.4  0.0011   2E-08   51.8   0.0   60   18-80     27-86  (90)
 19 1deb_A APC protein, adenomatou  95.3   0.018 3.5E-07   42.7   6.1   43   38-83      2-44  (54)
 20 1kd8_B GABH BLL, GCN4 acid bas  95.1   0.015 2.9E-07   40.2   4.5   35   39-76      1-35  (36)

from hh-suite.

ilectra avatar ilectra commented on July 18, 2024

I should mention that those were not zero before the fix.

from hh-suite.

ilectra avatar ilectra commented on July 18, 2024

And some of the SS scores are still negative, both when the search is run online, and in my local version - try

>C9orf72

MSTLCPPPSPAVAKTEIALSGKSPLLAATFAYWDNILGPRVRHIWAPKTE
QVLLSDGEITFLANHTLNGEILRNAESGAIDVKFFVLSEKGVIIVSLIFD
GNWNGDRSTYGLSIILPQTELSFYLPLHRVCVDRLTHIIRKGRIWMHKER
QENVQKIILEGTERMEDQGQSIIPMLTGEVIPVMELLSSMKSHSVPEEID
IADTVLNDDDIGDSCHEGFLLNAISSHLQTCGCSVVVGSSAEKVNKIVRT
LCLFLTPAERKCSRLCEAESSFKYESGLFVQGLLKDSTGSFVLPFRQVMY
APYPTTHIDVDVNTVKQMPPCHEHIYNQRRYMRSELTAFWRATSEEDMAQ
DTIIYTDESFTPDLNIFQDVLHRDTLVKAFLDQVFQLKPGLSLRSTFLAQ
FLLVLHRKALTLIKYIEDDTQKGKKPFKSLRNLKIDLDLTAEGDLNIIMA
LAEKIKPGLHSFIFGRPFYTSVQERDVLMTF

Just to make sure we're comparing the same thing, what's the exact software (git tag) and databses versions in the online tool?

from hh-suite.

ilectra avatar ilectra commented on July 18, 2024

@croth1 , @martin-steinegger , any news on that?

from hh-suite.

martin-steinegger avatar martin-steinegger commented on July 18, 2024

SS scores can be negative. You could check the SS structure alignment of this negative scoring hits.

The 0 at the SS scoring can still occur when mixing SS types at the target db. (e.g. If some hmms don't have a SS structure or if some have just DSSP and other just Predictions)

I'm currently busy with writing my thesis. I might change the 0 score problem afterwards.

from hh-suite.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.