Comments (8)
Hello Pauline,
thank you for using HHsuite.
There is a chapter in the hhsuite-userguide concerning 'Building customized databases' (Chapter 3.5).
The necessary steps to build an HHsuite database are explained there.
In your case, you have to build with ffindex_build an ffindex-database out of your single
a3m files. This database can be used with cstranslate to build an ffindex-database with the
cs219 sequences.
Did you use multiple sequence alignments in the fasta format as input for reformat.pl?
If not, you should probably redo your database with the instructions in the userguide.
HHblits uses Hidden Markov profiles of the query and the template. In the case of single
fasta sequences, the Hidden Markov profile is trivial and contains not as much evolutionary
information as the Hidden Markov profile of the corresponding multiple sequence alignment.
HHblits can exploit this additional evolutionary information...
short: you will find more homologs, your results will be better.
If you have any further questions do not hesitate to ask.
Cheers,
Markus
from hh-suite.
Thank you for your quick reply.
I was following the Chapter 3.5 to create my database.
If I understand. I have to fuse all my a3m files in one ? or when I create
my a3m.ffindex I do it for all my a3m files ?
My fasta files (.fas) for reformat.pl were alll multiple alignments.
For the call of cstranslate, I do not understand the
OMP_NUM_THREADS=<number_threads>. I understand it is to do cstranslate
multiple times in one call, but I do not understand why I have to do that.
Regards,
Pauline
2016-07-20 16:17 GMT+02:00 Markus Meier [email protected]:
Hello Pauline,
thank you for using HHsuite.
There is a chapter in the hhsuite-userguide concerning 'Building
customized databases' (Chapter 3.5).
The necessary steps to build an HHsuite database are explained there.In your case, you have to build with ffindex_build an ffindex-database out
of your single
a3m files. This database can be used with cstranslate to build an
ffindex-database with the
cs219 sequences.Did you use multiple sequence alignments in the fasta format as input for
reformat.pl?
If not, you should probably redo your database with the instructions in
the userguide.
HHblits uses Hidden Markov profiles of the query and the template. In the
case of single
fasta sequences, the Hidden Markov profile is trivial and contains not as
much evolutionary
information as the Hidden Markov profile of the corresponding multiple
sequence alignment.
HHblits can exploit this additional evolutionary information...
short: you will find more homologs, your results will be better.If you have any further questions do not hesitate to ask.
Cheers,
Markus—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#11 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ATm0X9xOEJgC5yGHnLl0sFIFDWMGoAIUks5qXi3vgaJpZM4JQyas
.
from hh-suite.
You have to fuse all your a3m files to one ffindex database.
you will probably need to use the following command:
ffindex_build -as test_a3m.ffdata test_a3m.ffindex *.a3m
This will create one ffindex-database (test_a3m.ffdata + test_a3m.ffindex) containing all your a3m files (*.a3m). It is best to call ffindex_build in the directory of the single a3m files. Otherwise, you will have residues of the path in your ffindex file, that could lead to truncated filenames... that could cause problems with ambiguous truncated file names in test_a3m.ffindex.
cstranslate will automatically create the ffindex-database (test_cs219.ffdata + test_cs219.ffindex) with all your cs219 column state sequences for all your a3m files in the corresponding ffindex-database (test_a3m.ffdata + test_a3m.ffindex). Depending on your computer, you can use more threads to accelerate the calculation. In linux you may see the available cores/threads with the command: nproc
For example:
$ nproc
Sample output:
8
Therefore, you could use 8 threads for cstranslate:
OMP_NUM_THREADS=8
If speed is not relevant in your case, you can limit cstranslate to one thread:
OMP_NUM_THREADS=1
from hh-suite.
ffindex_build -as test_a3m.ffdata test_a3m.ffindex *.a3m worked to create the fused data
Then when I call cstranslate : OMP_NUM_THREADS=1 cstranslate -A /home/pauline/HHLIB/data/cs219.lib -D /home/pauline/HHLIB/data/context_data.lib -x 0.3 -c 4 -f -i test_a3m -o test_cs219 -I a3m -b
I still have the error : Unable to read input file 'test_a3m'!
So I tried sending the test_a3m.ffdata, I have this error : Sequence 2208 has 104 match columns but should have 64!
and the test_cs219 is still not created.
Here are the files in my folder :
file1.a3m
file1.hhm
file2.a3m
file2.hhm
test_hhm.ffdata
test_hhm.ffindex
test_a3m.ffdata
test_a3m.ffindex
from hh-suite.
How did you install HHsuite?
If you cloned the project, which commit did you use?
It seems, your version does not yet support the -f option in cstranslate.
Can you install it with a clone of the repository as described in the readme?
from hh-suite.
I re-installed it following the readme files and I still have the same error. I think it is the not the good version of cstranslate I have. Because in the options, ther is not the -f :
cstranslate version 2.1.2
Translate a sequence/alignment into an abstract state alphabet.
Copyright (c) 2010 Andreas Biegert, Johannes Soding, and LMU Munich
Usage: cstranslate -i -A [options]
Options:
-i, --infile Input file with alignment or sequence
-o, --outfile Output file for generated abstract state sequence (def: .as)
-a, --append Append generated abstract state sequence to this file
-I, --informat prf|seq|fas|... Input format: prf, seq, fas, a2m, or a3m (def=auto)
-O, --outformat seq|prf Outformat: abstract state sequence or profile (def=seq)
-M, --match-assign [0:100] Make all FASTA columns with less than X% gaps match columns
(def: make columns with residue in first sequence match columns)
-A, --alphabet Abstract state alphabet consisting of exactly 219 states (def=off)
-D, --context-data Add context-specific pseudocounts using given context-data (def=off)
-x, --pc-admix [0,1] Pseudocount admix for context-specific pseudocounts (def=0.30)
-c, --pc-ali [0,inf[ Constant in pseudocount calculation for alignments (def=4.0)
-w, --weight [0,inf[ Weight of abstract state column in emission calculation (def=1000.00)
from hh-suite.
Sorry it actually worked, I used the wrong version of cstranslate, when I give the Path to the cstranslate of hhsuite installed with git clone, it works.
Thank you for your help
from hh-suite.
You are welcome.
If you have any further questions do not hesitate to open a new issue.
Cheers,
Markus
from hh-suite.
Related Issues (20)
- Does the template searching result contain query sequence's structure or not? HOT 3
- HHsearch - inconsistent results HOT 1
- HHblits failed
- hhmake issue
- hhmake issue
- c++: fatal error: Killed signal terminated program cc1plus HOT 1
- error while loading shared libraries: libHH_OBJECTS.so HOT 2
- Request to update the PDB70 database HOT 3
- Both HHBlits and HHSearch give misaligned indels for homologous sequences
- PackagesNotFoundError HOT 3
- HHblits Prefiltering database run time on Supercomputer Cluster HOT 3
- Segmentation fault with ffindex_apply -- hhmake
- Python 3.11 Support HOT 1
- 您好 制作属于自己的蛋白质数据集时出了问题
- Using hh-suites to find E.coli gene in other Proteobacteria <advice on the way I am using the method>
- Pfam36 Update HOT 1
- Segmentation Fault Running HHsearch
- HHblits failed : Unrecognized HMM file format in '468479486'
- PDB70 errors
- addss.pl calls blastpgp and makemat which are legacy blast commands HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hh-suite.