Comments (5)
Hello @fslee62,
thanks a lot for evaluating HH-suite3.0 and sorry for this late answer.
To your observations:
(1) you mean that the hits from later (>2) iterations have a lower score? This could be explained because of profile divergences through picking up a lot of diversity.
(2) This is possible. The scoring scheme changed.
(3) There is a length limitation of 20.000 within hh-suite (without adjusting the -maxres parameter). So you problably found some very long target sequence. This is a problem. sequences longer 20.000 can lead to corrupted memory. In the successor databases to the Uniprot20, the Uniclust there are no longer sequences than 14000 residues (http://gwdu111.gwdg.de/~compbiol/uniclust/2017_04/ ).
To your questions:
(1) I dont think its a usage error of the method. HH-suite3.0 and 2.0.16 just perform differently because of the scoring scheme change.
(2) What do you mean by lower n? In our benchmarks 3 iterations of HHblits3.0 (and 2.0.16) results in the best performance. HH-suite3.0 should create better models.
(3) I think hh-suite2.0.16 just did not print this error message, but the limitation still exists.
Cheers
Martin
from hh-suite.
hello martin,
sorry for my late reply also. thanks very much for the informative answers.
related to my observations:
(1) yes i meant when n (# of hhblits search iterations) went up (>1), the hhblits scores went down. this was true for both v2 and v3. i can understand it was related to profile divergence.
(2) got it. the scoring schemes are different between v2 and v3.
(3) so if i use unclust30 (instead of uniprot20), then i can avoid the warnings and memory corruptions??
related to my questions:
(2) yes i wasn't clear there. i used the antibody modeling assessments 2 (AMA-II) as my benchmark. i used an in-house software to predict models for the 11 AMA-II cases. then i used RMSDs (predictions vs x-ray structures) as a main gauge for performance.
our in-house software uses hh-suite v2 OR v3 for this benchmark. all else (code-wise) were identical. the HHM's for the 11 targets were generated using n=3 (hhblits) for both the v2 and v3 cases. then we used hhsearch to find the templates (i.e., alignments). after that, we built the 11 models and calculated 11 RMSDs (for both the v2 and v3 cases). the average backbone RMSD for the v2 case was around 1.25A, whereas the one for the v3 case was around 1.65A. that is, 0.4A worse using v3.
since i homed in on #iterations (n of hhblits), i have since performed the following experiments:
(1) all else the same except n=1, n=2, or n=3 during the HHM construction stage.
(2) calculate final average backbone RMSD (cf <b-RMSD>) as described above.
here are the results: as n went up (1 -> 3),
(A) <b-RMSD> went UP for v2: performance got worse with lower n consistent with your expectation.
(B) <b-RMSD> went DOWN for v3: performance got worse with higher n different from your expectation.
i did not use n>3.
in summary, v2 produced the best final models using n=3 during HHM constructions.
v3 produced the best final models using n=1 during HHM constructions.
overall, v3 still performed less well relative to v2 (all else the same). in particular, best v2 <b-RMSD> was around 1.25A (n=3). best v3 <b-RMSD> was around 1.37A (n=1).
IS IT POSSIBLE that hh-suite is not meant for HIGH homology cases like the variable domains of Fab?? in general. since v3 is even more "sensitive" than v2, the negative impact is even greater for such cases??
fred
from hh-suite.
Hi Fred,
the model quality can be influenced a lot by various details of how you build your models, which might make the your results meaningless.
(1) Your models could always use global alignments or local alignments from HHblits for creating the distance restraints for Modeller (or some other homology modeling software).
(2) Your model can either contain all residues of the query sequence or it might only contain those residues that are part of the local query-template alignment.
To get useful results, you need to use global alignments (e.g. using option -mact 0.05) and include all residues in your model, because adding more residues to your model can only hurt your RMSD. The greediness (length) of the alignments (e.,g. of hhblits v2 versus v3) could otherwise have a large influence on the RMSD of your model. Long, greedy alignments will give higher RMSDs than short, conservative alignments in which unreliably aligned parts are left out of the query-template alignment.
Second, you have a trade-off between specificity and sensitivity. Adding more iterations only helps if you need to get more sensitive, in the case that your template is very distantly related. As a simple rule of thumb, if your template is more closely related to your query than the most distantly related sequences in your query MSA with the query itself, than you have added too much diversity to your query MSA and you should rather reduce the number of iterations.
For a more detailed discussion and a method to find the optimal balance of sensitivity and specificity in building your query MSA, see our paper http://onlinelibrary.wiley.com/doi/10.1002/prot.22499/full
RMSD has a very bad reputation as a measure of the quality of protein structure models in the (CASP) community. Better use TM-score, GDT-HA, or any other score used in recent CASP competitions.
I hope this helps.
Best wishes,
Johannes
from hh-suite.
hi johannes,
sorry for the slow reply. i appreciated very much your expert guidance. i will study the effect of local vs global alignment (i.e., the -mach values) on the overall quality of the final HM models in our benchmark.
however, i guess where i got stuck was trying to understand why using hhsuite v2 vs hhsuite v3 gave me noticeable different overall benchmark performance when ALL else were the same. i was not trying to get a better overall result using either v2 or v3. indeed, i am/was most interested in finding the corresponding conditions or parameters that will gave me comparable performance between v2 and v3. i can see that v2 and v3 may have been "tuned" differently, thus needing slightly different parameter values.
finally, the backbone average RMSDs (referred to above as <b-RMSD>) were calculated according the specs of AMA-II (i.e., using C=O instead of Ca) so that i may compare my results to those published. yes i also calculated other gauges concurrently such GDT-TS, TM-score, MP-score, and our own internal "health" score. although i don't have the numbers in front of me, i take it that i would see similar performance parity between v2 and v3 if i say use <TM-score> instead of <b-RMSD>.
most respectfully,
fred
POSTSCRIPT: i made a careless mistake describing my results a week ago: here is the correction:
>>>>>
here are the results: as n went up (1 -> 3),
(A) <b-RMSD> went DOWN for v2: performance got worse with lower n.
(B) <b-RMSD> went UP for v3: performance got worse with higher n.
<<<<<
from hh-suite.
all else being the same, using -mact=0.05 and n=1 (vs -mact=0.35 and n=1) did not have huge impact on the overall model accuracy using AMA-II as a benchmark. the Fv mean backbone RMSD (average of 11 cases) went from 1.36A (mact=0.35) to 1.34A (mact=0.05). template alignments were done using hhsuite-v3.
using the same benchmark, pretty much all default hhsuite-v2 parameters (mact=0.35, n=3), and the same model building and model selection engines, the corresponding Fv mean backbone RMSD was 1.26A.
i have yet determined a set of hhsuite-v3 parameters that will give me equal or better overall results relative to those obtained from using hhsuite-v2. the one hhsuite-v3 parameter that has generated the largest performance variations was n (# of iterations of hhblits used in making the query HHM's).
with such, i will conclude my question and close this ticket. thank you for all your help and suggestions.
from hh-suite.
Related Issues (20)
- Does the template searching result contain query sequence's structure or not? HOT 3
- HHsearch - inconsistent results HOT 1
- HHblits failed
- hhmake issue
- hhmake issue
- c++: fatal error: Killed signal terminated program cc1plus HOT 1
- error while loading shared libraries: libHH_OBJECTS.so HOT 2
- Request to update the PDB70 database HOT 3
- Both HHBlits and HHSearch give misaligned indels for homologous sequences
- PackagesNotFoundError HOT 3
- HHblits Prefiltering database run time on Supercomputer Cluster HOT 3
- Segmentation fault with ffindex_apply -- hhmake
- Python 3.11 Support HOT 1
- 您好 制作属于自己的蛋白质数据集时出了问题
- Using hh-suites to find E.coli gene in other Proteobacteria <advice on the way I am using the method>
- Pfam36 Update HOT 1
- Segmentation Fault Running HHsearch
- HHblits failed : Unrecognized HMM file format in '468479486'
- PDB70 errors
- addss.pl calls blastpgp and makemat which are legacy blast commands HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hh-suite.