Hello, I am trying to build a database for HHblits. I already have the alignments

Problem with cstranslate for building HHblits database about hh-suite HOT 8 CLOSED

soedinglab commented on July 18, 2024

Problem with cstranslate for building HHblits database

from hh-suite.

Comments (8)

meiermark commented on July 18, 2024

Hello Pauline,

thank you for using HHsuite.

There is a chapter in the hhsuite-userguide concerning 'Building customized databases' (Chapter 3.5).
The necessary steps to build an HHsuite database are explained there.

In your case, you have to build with ffindex_build an ffindex-database out of your single
a3m files. This database can be used with cstranslate to build an ffindex-database with the
cs219 sequences.

Did you use multiple sequence alignments in the fasta format as input for reformat.pl?
If not, you should probably redo your database with the instructions in the userguide.
HHblits uses Hidden Markov profiles of the query and the template. In the case of single
fasta sequences, the Hidden Markov profile is trivial and contains not as much evolutionary
information as the Hidden Markov profile of the corresponding multiple sequence alignment.
HHblits can exploit this additional evolutionary information...
short: you will find more homologs, your results will be better.

If you have any further questions do not hesitate to ask.

Cheers,
Markus

from hh-suite.

paulinefx commented on July 18, 2024

Thank you for your quick reply.

I was following the Chapter 3.5 to create my database.
If I understand. I have to fuse all my a3m files in one ? or when I create
my a3m.ffindex I do it for all my a3m files ?

My fasta files (.fas) for reformat.pl were alll multiple alignments.

For the call of cstranslate, I do not understand the
OMP_NUM_THREADS=<number_threads>. I understand it is to do cstranslate
multiple times in one call, but I do not understand why I have to do that.

Regards,

Pauline

2016-07-20 16:17 GMT+02:00 Markus Meier [email protected]:

Hello Pauline,

thank you for using HHsuite.

There is a chapter in the hhsuite-userguide concerning 'Building
customized databases' (Chapter 3.5).
The necessary steps to build an HHsuite database are explained there.

In your case, you have to build with ffindex_build an ffindex-database out
of your single
a3m files. This database can be used with cstranslate to build an
ffindex-database with the
cs219 sequences.

Did you use multiple sequence alignments in the fasta format as input for
reformat.pl?
If not, you should probably redo your database with the instructions in
the userguide.
HHblits uses Hidden Markov profiles of the query and the template. In the
case of single
fasta sequences, the Hidden Markov profile is trivial and contains not as
much evolutionary
information as the Hidden Markov profile of the corresponding multiple
sequence alignment.
HHblits can exploit this additional evolutionary information...
short: you will find more homologs, your results will be better.

If you have any further questions do not hesitate to ask.

Cheers,
Markus

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#11 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ATm0X9xOEJgC5yGHnLl0sFIFDWMGoAIUks5qXi3vgaJpZM4JQyas
.

from hh-suite.

meiermark commented on July 18, 2024

You have to fuse all your a3m files to one ffindex database.
you will probably need to use the following command:
ffindex_build -as test_a3m.ffdata test_a3m.ffindex *.a3m

This will create one ffindex-database (test_a3m.ffdata + test_a3m.ffindex) containing all your a3m files (*.a3m). It is best to call ffindex_build in the directory of the single a3m files. Otherwise, you will have residues of the path in your ffindex file, that could lead to truncated filenames... that could cause problems with ambiguous truncated file names in test_a3m.ffindex.

cstranslate will automatically create the ffindex-database (test_cs219.ffdata + test_cs219.ffindex) with all your cs219 column state sequences for all your a3m files in the corresponding ffindex-database (test_a3m.ffdata + test_a3m.ffindex). Depending on your computer, you can use more threads to accelerate the calculation. In linux you may see the available cores/threads with the command: nproc

For example:
$ nproc

Sample output:
8

Therefore, you could use 8 threads for cstranslate:
OMP_NUM_THREADS=8

If speed is not relevant in your case, you can limit cstranslate to one thread:
OMP_NUM_THREADS=1

from hh-suite.

paulinefx commented on July 18, 2024

ffindex_build -as test_a3m.ffdata test_a3m.ffindex *.a3m worked to create the fused data
Then when I call cstranslate : OMP_NUM_THREADS=1 cstranslate -A /home/pauline/HHLIB/data/cs219.lib -D /home/pauline/HHLIB/data/context_data.lib -x 0.3 -c 4 -f -i test_a3m -o test_cs219 -I a3m -b
I still have the error : Unable to read input file 'test_a3m'!
So I tried sending the test_a3m.ffdata, I have this error : Sequence 2208 has 104 match columns but should have 64!
and the test_cs219 is still not created.
Here are the files in my folder :
file1.a3m
file1.hhm
file2.a3m
file2.hhm
test_hhm.ffdata
test_hhm.ffindex
test_a3m.ffdata
test_a3m.ffindex

from hh-suite.

meiermark commented on July 18, 2024

How did you install HHsuite?
If you cloned the project, which commit did you use?

It seems, your version does not yet support the -f option in cstranslate.
Can you install it with a clone of the repository as described in the readme?

from hh-suite.

paulinefx commented on July 18, 2024

I re-installed it following the readme files and I still have the same error. I think it is the not the good version of cstranslate I have. Because in the options, ther is not the -f :
cstranslate version 2.1.2
Translate a sequence/alignment into an abstract state alphabet.
Copyright (c) 2010 Andreas Biegert, Johannes Soding, and LMU Munich

Usage: cstranslate -i -A [options]

Options:
-i, --infile Input file with alignment or sequence
-o, --outfile Output file for generated abstract state sequence (def: .as)
-a, --append Append generated abstract state sequence to this file
-I, --informat prf|seq|fas|... Input format: prf, seq, fas, a2m, or a3m (def=auto)
-O, --outformat seq|prf Outformat: abstract state sequence or profile (def=seq)
-M, --match-assign [0:100] Make all FASTA columns with less than X% gaps match columns
(def: make columns with residue in first sequence match columns)
-A, --alphabet Abstract state alphabet consisting of exactly 219 states (def=off)
-D, --context-data Add context-specific pseudocounts using given context-data (def=off)
-x, --pc-admix [0,1] Pseudocount admix for context-specific pseudocounts (def=0.30)
-c, --pc-ali [0,inf[ Constant in pseudocount calculation for alignments (def=4.0)
-w, --weight [0,inf[ Weight of abstract state column in emission calculation (def=1000.00)

from hh-suite.

paulinefx commented on July 18, 2024

Sorry it actually worked, I used the wrong version of cstranslate, when I give the Path to the cstranslate of hhsuite installed with git clone, it works.
Thank you for your help

from hh-suite.

meiermark commented on July 18, 2024

You are welcome.
If you have any further questions do not hesitate to open a new issue.

Cheers,
Markus

from hh-suite.

Problem with cstranslate for building HHblits database about hh-suite HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent