Giter VIP home page Giter VIP logo

soedinglab / hh-suite Goto Github PK

View Code? Open in Web Editor NEW
510.0 24.0 128.0 23.76 MB

Remote protein homology detection suite.

Home Page: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3019-7

License: GNU General Public License v3.0

CMake 0.46% Perl 7.38% Python 2.04% C++ 19.12% C 70.86% Makefile 0.01% Shell 0.11% Dockerfile 0.02% SWIG 0.01%
bioinformatics hh-suite hhblits alignment sequence-search profile-search profile-profile-search opensource cpp hhsearch

hh-suite's People

Contributors

al42and avatar clovisg avatar danbuchan avatar dmiller423 avatar dvs avatar garymacindoe avatar huhlim avatar jamespjh avatar jhcepas avatar jhuber6 avatar martin-steinegger avatar meiermark avatar milot-mirdita avatar mr-c avatar smsaladi avatar sseemayer avatar wojdyr avatar xrobin avatar zacharyrs avatar zhujianwei31415 avatar zy4 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hh-suite's Issues

renumberpdb.pl has a bug to parse pdbfile variable

In the script, the operator "." in line 382 should be changed to ".=".
File structure:
./
./1a3a.pdb
./1a3a_A.a3m
Command:
renumberpdb.pl 1a3a_A.a3m -d ./
Error:
"ERROR in Align.pm: sequence x is empty"

wrong:
elsif (-e $pdbfile."$pdbcode.pdb") {$pdbfile."$pdbcode.pdb";}
right:
elsif (-e $pdbfile."$pdbcode.pdb") {$pdbfile.="$pdbcode.pdb";}

Query: 2 new files seen in source code.

Hi, I have been working on hhsuite source code for some time so my version of source code is slightly older (I think more than a month or so). I cloned hhsuite yesterday as I wanted to have a local reference copy. While comparing the files I found that there are 2 new files in the current codebase:
src/hhblits_mpi.cpp
lib/ffindex/src/ffindex_from_fasta_with_split.c

I did not see anything related to this in the CHANGES file either. The version strings (in top level CMakeLists.txt) are same in both code bases. Are these 2 new files essential for hhsuite or they are just kind of utilities which are not necessary for core functioning of hh-suite?

Thanks,
Atul.

Problem with cstranslate for building HHblits database

Hello,
I am trying to build a database for HHblits. I already have the alignments in fasta files. I transformed the fasta files in a3m files and added the secondary structure prediction with reformat.pl and addss.pl. Then I created the hhm files with hhmake and the ff{index,data} files with ffindex_build.

I am now trying to create the cs219 files for the database with cstranslate. When I use the command

cstranslate -A /home/pauline/HHLIB/data/cs219.lib -D /home/pauline/HHLIB/data/context_data.lib -x 0.3 -c 4 -f -i test_a3m -o test_cs219 -I a3m -b

I have the next error :
Unable to read input file 'test_a3m'!

It is because I do not have a test_a3m file. Do I have to create one ?

Thank you by advance for your help

Pauline

When I use your source code to compile. It gives me an error in ffindex_apply_mpi

[81%] Linking C executable ffindex_apply_mpi
/opt/intel/mpi/5.0.0.028/intel64/lib/libmpicxx.so: to 'operator delete [](void *)' undefined reference
/opt/intel/impi/5.0.0.028/intel64/lib/libmpicxx.so: to 'operator new (unsigned long)' undefined reference
/opt/intel/impi/5.0.0.028/intel64/lib/libmpicxx.so: to 'operator delete (void )' undefined reference
/opt/intel/impi/5.0.0.028/intel64/lib/libmpicxx.so: to 'operator new [](unsigned long)' undefined reference
/opt/intel/impi/5.0.0.028/intel64/lib/libmpicxx.so: for '__cxa_pure_virtual' undefined reference
/opt/intel/impi/5.0.0.028/intel64/lib/libmpicxx.so: for '__cxa_allocate_exception' undefined reference
/opt/intel/impi/5.0.0.028/intel64/lib/libmpicxx.so: for '__gxx_personality_v0' undefined reference
/opt/intel/impi/5.0.0.028/intel64/lib/libmpicxx.so: for '__cxa_throw' undefined reference
/opt/intel/impi/5.0.0.028/intel64/lib/libmpicxx.so: for 'vtable for __cxxabiv1 :: __ class_type_info' undefined reference
/opt/intel/impi/5.0.0.028/intel64/lib/libmpicxx.so: for 'vtable for __cxxabiv1 :: __ si_class_type_info' undefined reference
collect2: error: ld returned 1
make [2]: *
* [lib / ffindex / src / ffindex_apply_mpi] Error 1
make [1]: *** [lib / ffindex / src / CMakeFiles / ffindex_apply_mpi.dir / all] Error 2
make: *** [all] Error 2

SIGSEGV (when using older version: munmap SIGABRT)

Hi,

I've built binaries from commit f1f3d9a using cmake version 3.6.0, and I'm getting consistent SIGSEGV on one of the input files (attached). I also have an older hhblits in use, and that one gives SIGABRT on the same file.

Running the newer version in GDB:

$ HHLIB=/path/to/hh-suite gdb ./hhblits
(gdb) run -cpu 1 -e 1e-06 -n 1 -Z 10 -B 10 -i uniclust30_2016_09_245.faa -o uniclust30_2016_09_245.hhr -d uniclust/uniclust30_2016_09/uniclust30_2016_09

- 14:27:53.196 INFO: Searching 10381269 column state sequences.
- 14:27:53.852 INFO: uniclust30_2016_09_245.faa is in A2M, A3M or FASTA format
- 14:27:53.852 INFO: Iteration 1
- 14:27:54.183 INFO: Prefiltering database
- 14:28:43.845 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment)  : 175908
- 14:28:46.290 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment)   : 136
- 14:28:46.290 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 136
- 14:28:46.290 INFO: Scoring 136 HMMs using HMM-HMM Viterbi alignment
- 14:28:46.305 INFO: Alternative alignment: 0

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6e6242a in strlen () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff6e6242a in strlen () from /lib64/libc.so.6
#1  0x0000000000435921 in Hit::initHitFromHMM (this=this@entry=0x7ffffff52eb0, t=t@entry=0x165436d0, par_nseqdis=par_nseqdis@entry=1)
    at hh-suite/src/hhhit.cpp:283
#2  0x00000000004b03b3 in ViterbiConsumerThread::align (this=0x18ed5a30, maxres=4, nseqdis=1, smin=20) at hh-suite/src/hhviterbirunner.cpp:32
#3  0x00000000004b1585 in ViterbiRunner::alignment(Parameters&, HMMSimd*, std::vector<HHEntry*, std::allocator<HHEntry*> >, float, float*, float const (*) [20], float const (*) [20], float const (*) [20], int, float const (*) [4][11], float const (*) [11][4][11], float const (*) [11][8]) [clone ._omp_fn.0] ()
    at hh-suite/src/hhviterbirunner.cpp:162
#4  0x00000000004b2c47 in ViterbiRunner::alignment (this=this@entry=0x7ffffff549a0, par=..., q_simd=q_simd@entry=0x7ffffff540b0, 
    dbfiles=std::vector of length 136, capacity 136 = {...}, qsc=qsc@entry=-20, pb=pb@entry=0x7ffffff67280, S=S@entry=0x7ffffff66c40, Sim=Sim@entry=0x7ffffff66600, 
    R=R@entry=0x7ffffff65fc0, ssm_mode=ssm_mode@entry=2, S73=S73@entry=0x7ffffff672d4, S33=S33@entry=0x7ffffff67dd4, S37=S37@entry=0x7ffffff67854)
    at hh-suite/src/hhviterbirunner.cpp:117
#5  0x0000000000429be1 in HHblits::run (this=this@entry=0x7ffffff65960, query_fh=query_fh@entry=0x143d1c30, 
    query_path=query_path@entry=0x7ffffff56750 "uniclust30_2016_09_245.faa") at hh-suite/src/hhblits.cpp:1329
#6  0x00000000004165ba in main (argc=<optimized out>, argv=0x7fffffffb7f8) at hh-suite/src/hhblits_app.cpp:92

When using high verbosity, backtrace differs:

$ HHLIB=/path/to/hh-suite gdb ./hhblits
(gdb) run -v 4 -cpu 1 -e 1e-06 -n 1 -Z 10 -B 10 -i uniclust30_2016_09_245.faa -o uniclust30_2016_09_245.hhr -d uniclust/uniclust30_2016_09/uniclust30_2016_09
...
- 14:43:40.432 DEBUG1:  4.66049
- 14:43:40.432 DEBUG1:  7.10559
- 14:43:40.432 DEBUG1:  1.2472
- 14:43:40.432 DEBUG1:  3.94279
- 14:43:40.432 DEBUG1:  7.26569
score= 15.875  score_ss=  0.000
step  Q T    i    j  state   score    T Q cf ss-score

Program received signal SIGSEGV, Segmentation fault.
0x000000000048fddd in Viterbi::PrintDebug (this=this@entry=0x18ee4d20, q=q@entry=0x15d5a250, t=t@entry=0x16543730, backtraceScore=backtraceScore@entry=0x7ffffff52e60,
    backtraceResult=backtraceResult@entry=0x7ffffff52e40, ssm=2) at /gvusers/tokovebt/workspace/hh-suite/src/hhviterbi.cpp:300
300                     printf("%4i  %1c %1c ",step,q->seq[q->nfirst][i_steps[step]],t->seq[nfirst][j_steps[step]]);
(gdb) bt
#0  0x000000000048fddd in Viterbi::PrintDebug (this=this@entry=0x18ee4d20, q=q@entry=0x15d5a250, t=t@entry=0x16543730, backtraceScore=backtraceScore@entry=0x7ffffff52e60, 
    backtraceResult=backtraceResult@entry=0x7ffffff52e40, ssm=2) at hh-suite/src/hhviterbi.cpp:300
#1  0x00000000004903ae in Viterbi::ScoreForBacktrace (this=0x18ee4d20, q_four=<optimized out>, t_four=<optimized out>, elem=elem@entry=0, 
    backtraceResult=backtraceResult@entry=0x7ffffff52e40, alignmentScore=alignmentScore@entry=0x190ba010, ss_hmm_mode=ss_hmm_mode@entry=0)
    at hh-suite/src/hhviterbi.cpp:278
#2  0x00000000004b0375 in ViterbiConsumerThread::align (this=0x18ed5a90, maxres=4, nseqdis=1, smin=20) at hh-suite/src/hhviterbirunner.cpp:26
#3  0x00000000004b1585 in ViterbiRunner::alignment(Parameters&, HMMSimd*, std::vector<HHEntry*, std::allocator<HHEntry*> >, float, float*, float const (*) [20], float const (*) [20], float const (*) [20], int, float const (*) [4][11], float const (*) [11][4][11], float const (*) [11][8]) [clone ._omp_fn.0] ()
    at hh-suite/src/hhviterbirunner.cpp:162
#4  0x00000000004b2c47 in ViterbiRunner::alignment (this=this@entry=0x7ffffff54980, par=..., q_simd=q_simd@entry=0x7ffffff54090, 
    dbfiles=std::vector of length 136, capacity 136 = {...}, qsc=qsc@entry=-20, pb=pb@entry=0x7ffffff67260, S=S@entry=0x7ffffff66c20, Sim=Sim@entry=0x7ffffff665e0, 
    R=R@entry=0x7ffffff65fa0, ssm_mode=ssm_mode@entry=2, S73=S73@entry=0x7ffffff672b4, S33=S33@entry=0x7ffffff67db4, S37=S37@entry=0x7ffffff67834)
    at hh-suite/src/hhviterbirunner.cpp:117
#5  0x0000000000429be1 in HHblits::run (this=this@entry=0x7ffffff65940, query_fh=query_fh@entry=0x143d1fe0, 
    query_path=query_path@entry=0x7ffffff56730 "uniclust30_2016_09_245.faa") at hh-suite/src/hhblits.cpp:1329
#6  0x00000000004165ba in main (argc=<optimized out>, argv=0x7fffffffb7e8) at hh-suite/src/hhblits_app.cpp:92

I've also tried an older version (not sure which exact commit/build it is, describes itself as "HHblits 3.0.0 (15-03-2015)") on the same input file, and it fails a bit differently:

$ gdb hhblits
(gdb) run -cpu 1 -e 1e-06 -n 1 -Z 10 -B 10 -i uniclust30_2016_09_245.faa -o uniclust30_2016_09_245.hhr -d uniclust/uniclust30_2016_09/uniclust30_2016_09
Starting program: /opt/hhblits/hhsuite-3.0.1-Linux/bin/hhblits -cpu 1 -e 1e-06 -n 1 -Z 10 -B 10 -i uniclust30_2016_09_245.faa -o uniclust30_2016_09_245.hhr -d uniclust/uniclust30_2016_09/uniclust30_2016_09
- 14:22:12.059 INFO: Searching 10381269 column state sequences.
- 14:22:12.116 INFO: uniclust30_2016_09_245.faa is in A2M, A3M or FASTA format
- 14:22:12.117 INFO: Iteration 1
- 14:22:12.444 INFO: Prefiltering database
- 14:22:58.369 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment)  : 175908
- 14:23:00.920 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment)   : 136
- 14:23:00.920 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 136
- 14:23:00.920 INFO: Scoring 136 HMMs using HMM-HMM Viterbi alignment
2
- 14:23:00.934 INFO: Alternative alignment: 0
- 14:23:01.140 INFO: 136 alignments done
- 14:23:01.140 INFO: Alternative alignment: 1
- 14:23:01.214 INFO: 77 alignments done
- 14:23:01.214 INFO: Alternative alignment: 2
- 14:23:01.217 INFO: 2 alignments done
- 14:23:01.217 INFO: Alternative alignment: 3

*** Error in `/opt/hhblits/hhsuite-3.0.1-Linux/bin/hhblits': munmap_chunk(): invalid pointer: 0x00000000160f4780 ***

Program received signal SIGABRT, Aborted.
0x00000000005a5f69 in raise ()
(gdb) bt
#0  0x00000000005a5f69 in raise ()
#1  0x000000000053c7b8 in abort ()
#2  0x0000000000549520 in __libc_message ()
#3  0x0000000000550f73 in malloc_printerr ()
#4  0x000000000046b166 in HMM::~HMM() ()
#5  0x00000000004aaa26 in ViterbiRunner::alignment(Parameters&, HMMSimd*, std::vector<HHEntry*, std::allocator<HHEntry*> >, float, float*, float const (*) [20], float const (*) [20], float const (*) [20], int, float const (*) [4][11], float const (*) [11][4][11], float const (*) [11][8]) ()
#6  0x000000000041932c in HHblits::run(_IO_FILE*, char*) ()
#7  0x0000000000401e8d in main ()

sysinfo.txt
uniclust30_2016_09_245.txt

CMake Error at CMakeLists.txt:47

Hi everyone,

I'm having problem running cmake.

cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -G "Unix Makefiles" $HHLIB
-- The CXX compiler identification is GNU 5.3.0
-- Check for working CXX compiler: /cm/shared/apps/gcc/5.3.0/bin/g++
-- Check for working CXX compiler: /cm/shared/apps/gcc/5.3.0/bin/g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Compiler is GNU 
-- Using CPU native flags for SSE optimization:  -march=native
-- Performing Test HAVE_MM_MALLOC
-- Performing Test HAVE_MM_MALLOC - Success
-- Performing Test HAVE_POSIX_MEMALIGN
-- Performing Test HAVE_POSIX_MEMALIGN - Success
-- Performing Test HAVE_AVX2_EXTENSIONS
-- Performing Test HAVE_AVX2_EXTENSIONS - Failed
-- Performing Test HAVE_AVX_EXTENSIONS
-- Performing Test HAVE_AVX_EXTENSIONS - Success
-- Performing Test HAVE_SSE4_2_EXTENSIONS
-- Performing Test HAVE_SSE4_2_EXTENSIONS - Success
-- Performing Test HAVE_SSE4_1_EXTENSIONS
-- Performing Test HAVE_SSE4_1_EXTENSIONS - Success
-- Performing Test HAVE_SSSE3_EXTENSIONS
-- Performing Test HAVE_SSSE3_EXTENSIONS - Success
-- Performing Test HAVE_SSE3_EXTENSIONS
-- Performing Test HAVE_SSE3_EXTENSIONS - Success
-- Performing Test HAVE_SSE2_EXTENSIONS
-- Performing Test HAVE_SSE2_EXTENSIONS - Success
-- Performing Test HAVE_SSE_EXTENSIONS
-- Performing Test HAVE_SSE_EXTENSIONS - Success
-- Found AVXextensions, using flags:  -march=native -mavx -mfpmath=sse -Wa,-q
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp  
-- Found OpenMP
CMake Error at CMakeLists.txt:47 (add_subdirectory):
  The source directory

    /home/lv70640/c7701100/software/hh-suite/lib/ffindex

  does not contain a CMakeLists.txt file.


-- Configuring incomplete, errors occurred!
See also "/home/lv70640/c7701100/software/hh-suite/build/CMakeFiles/CMakeOutput.log".
See also "/home/lv70640/c7701100/software/hh-suite/build/CMakeFiles/CMakeError.log".

any help?
Thank you a lot
F

HHblits hitlist (hhhitlist.cpp) formatting problem

In HHsuite version 3.0 (and previous too, I believe), HHhitlist is formatted as:

    sprintf(line, "%-6.6s %5.1f %4i %4i-%-4i %4i-%-4i(%i)\n", str, hit.score_ss,
        hit.matched_cols, hit.i1, hit.i2, hit.j1, hit.j2, hit.L);

(lines 104-105)

for large inputs it becomes more difficult to parse HHblits/HHsearch outputs conveniently, e.g.

362 3cmu_A Protein RECA, recombina  92.2   0.022 5.9E-07   62.6   0.0   28  222-249  1080-1107(2050)
  • no space between the last 2 columns.

is there an argument against re-writing the source code as:

    sprintf(line, "%-6.6s %5.1f %4i %4i-%-4i %4i-%-4i (%i)\n", str, hit.score_ss,
        hit.matched_cols, hit.i1, hit.i2, hit.j1, hit.j2, hit.L);

i.e. guarantee a space between the last 2 columns?

Initial ppc64le code available for review/comments.

Hi,

With the generous help of hh-suite community and Power8 community, I am done with the first phase of porting hh-suite to Power8 platform.

hh-suite code ported to ppc64le is available for review at https://github.com/asowani/hh-suite/tree/hpc for those who are curious/interested/willing to help. This code is at a very primitive level and I can't even call it "beta" - it is even below that. Also this code is not fully functional yet.

If you happen to check this code and find any issues, I am very eager and would be very happy to know your comments/corrections/suggestions/feedback.

Thanks,
Atul.

Missing files after creating database

Hello,
I created a database with the chapter 3.5 of the userguide, but by the end I was missing files.
I only got :
_a3m.ffdata
_a3m.ffinddex
_hhm.ffdata
_hhm.ffindex
_cs219.ffdata
_cs219.ffindex

I saw that the _a3m_db, _hhm_db were copies of the .ffdata and the _a3m_db.index, _hhm_db.index were copies of the .ffindex with the extensions of the file names. So they are easy to create with a script. But for the files .cs219 and .cs219.sizes I do not know how to create them.

For the creation of my database I already had alignments in fasta files, so I reformated them in a3m files, added the prediction of secondary structure, created the hhm files. Then I made the ffdat and ffindex for the a3m and hhm file. Then I used cstranslate to produce the cs219.ffdata and index, then I followed every step of the chapter 3.5.

Thank you in advance for your help.

Pauline

scop70_1.75_cs219.ffdata missing?

Hi,

I am testing the hh-suite on ppc64le. I downloaded following databases before I begun testing.
From http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/ :
pdb70_14Sep16.tgz
pfamA_29.0.tgz
uniprot20_2016_02.tgz
From http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsearch_dbs/ :
scop70_1.75.a3m.tar.gz
scop70_1.75.hhm.tar.gz

I extracted these databases to their appropriate locations, so that I could just copy-paste test commands from User Guide and execute them. Also I have set all the recommended environment variables and paths as per User Guide. However, the very first command "hhsearch -cpu 4 -i data/query.a3m -d dbs/scop70_1.75 -o data/query.hhr" gives me following error:

  • 18:22:32.998 ERROR: In /root/hh-suite/src/hhdatabase.cpp:35: FFindexDatabase:
  • 18:22:32.998 ERROR: could not open file 'dbs/scop70_1.75_cs219.ffdata'

I checked for the presence of scop70_1.75_cs219.ffdata file everywhere, but it is missing. User Guide section 2.4 on page 8 mentions that _cs219.ffdata is one of the eight files a database consists of. However this file seems to be missing.

Am I working with correct databases? Do I have to install something additional? Or do I have to "condition" existing database files to generate these kinds of files?

Thanks,
Atul.

usage of HMMer format profiles

hh-suite v2.0.16 from the Linux tar release

I would like to use the hh-suite similarity searches using HMMer format files. hh-suites documentation states that it is able to handle HMMer format files, but I have not been able to make it to work.

What I tried is running hhsearch using HMMer format files like described in the documentation, though it gave me an ERROR and no output. This command line does work for the hh-suite profiles.

command used:
multithread.pl '*.hmm' 'hhsearch -i $file -d hmm_db' -cpu 15

ERROR code:
Use of HMMER format as input will result in severe loss of sensitivity!

So my questions are:
Can I use hh-suite with HMMer profiles? And how should I do this?
Do you have extra information or feedback about the reduced performance of hh-suite when using HMMer format profiles apart?

With kind regards,

Margo Schuller

errors in building customized databases

I installed hh-suite and when I want to build customized databases.

mpirun -np 16 ffindex_apply_mpi bdd_a3m_wo_ss.ff{data,index} -i bdd_hhm.ffindex -d bdd_hhm.ffindex -- hhmake -i stdin -o stdout -v 0

It returns:


- 11:11:15.836 ERROR: Error in /home/fishteam/lich/hh-suite/src/hhfunc.cpp:83: ReadQueryFile:

- 11:11:15.836 ERROR:   unrecognized input file format in 'stdin'

- 11:11:15.836 ERROR:   line = Query         A1TKA9

I did not use PSIPRED-predicted secondary structure because of the following errors.
mpirun -np 20 ffindex_apply_mpi bdd_a3m_wo_ss.ff{data,index} -i bdd_a3m.ffindex -d bdd_a3m.ffindex -- addss.pl stdin stdout

The WARING seems include all the positions in *.a3m.

- 11:21:32.381 WARNING: Ignoring invalid symbol '3' at pos. 55 in line 729 of /tmp/O_BQo61Lwq/82t7zubzSx.in.a3m

- 11:21:32.381 WARNING: Ignoring invalid symbol '=' at pos. 68 in line 729 of /tmp/O_BQo61Lwq/82t7zubzSx.in.a3m

- 11:21:32.381 WARNING: Ignoring invalid symbol '4' at pos. 69 in line 729 of /tmp/O_BQo61Lwq/82t7zubzSx.in.a3m

- 11:21:32.381 WARNING: Ignoring invalid symbol '3' at pos. 70 in line 729 of /tmp/O_BQo61Lwq/82t7zubzSx.in.a3m

- 11:21:32.381 WARNING: Ignoring invalid symbol '%' at pos. 71 in line 729 of /tmp/O_BQo61Lwq/82t7zubzSx.in.a3m

- 11:21:32.381 WARNING: Ignoring invalid symbol '=' at pos. 84 in line 729 of /tmp/O_BQo61Lwq/82t7zubzSx.in.a3m

- 11:21:32.381 WARNING: Ignoring invalid symbol '0' at pos. 85 in line 729 of /tmp/O_BQo61Lwq/82t7zubzSx.in.a3m

- 11:21:32.381 WARNING: Ignoring invalid symbol '9' at pos. 87 in line 729 of /tmp/O_BQo61Lwq/82t7zubzSx.in.a3m

- 11:21:32.381 WARNING: Ignoring invalid symbol '4' at pos. 88 in line 729 of /tmp/O_BQo61Lwq/82t7zubzSx.in.a3m

- 11:21:32.381 WARNING: Ignoring invalid symbol '3' at pos. 89 in line 729 of /tmp/O_BQo61Lwq/82t7zubzSx.in.a3m

Effective debugging of Prefilter::ungapped_sse_score().

Hi,

This is w.r.t. the hh-suite Power8/LE porting work I am doing. While checking execution on ppc64le I found that first database prefilter is not getting applied as expected. Hence I started debugging this and reached Prefilter::ungapped_sse_score().

To ensure if the ported intrinsics are behaving identically on both platforms, I wrote a small test program which uses all the simd* calls this particular function uses (like simdi_setzero, simdi8_set, simdi_load, simdi_store etc.). Then I built this test program on x86_64 (which uses SSE intrinsics) and ppc64le (uses gcc AltiVec intrisics). The output on both the platforms was identical, so I am assuming everything as far as SSE/AltiVec intrinsics are concerned, is working fine.

Now with this understanding, I came back to ungapped_sse_score() and I still see a difference, The score value at the end of this function is way different on ppc64le (205) than x86_64 (79). Basically, no filter is getting applied.

My suspicion is on the behaviour of the code which loops over db sequence positions and query band positions (the "S" value). Since this loop mainly deals with addresses, they are of no use when comparing across platforms. Is there any optimal way I can ensure that the data fetched from DB is correct?

Thanks,
Atul.

Problem building the hh-suite databases

Hi everyone,

I encounter problems building the hh-suite databases. Running ffindex_apply_mpi together with hhblits results in the output as contained in the file have.txt. I know that the output should look like in
should.txt

It seems that ffindex_apply_mpi either does not terminate hhblits (the child process) correctly or that hhblits does not produce output on standard out. (The output "iteration 1" is too far ahead in the code of hhblits, so the child process does not terminate).

I looked into the code and found that the read call
in ffindex_apply_mpi blocks, so I think this code here is problematic:

fcntl(pipefd_stdout[0], F_SETFL, flags); // Remove O_NONBLOCK
ssize_t r;
while ((r = read(pipefd_stdout[0], b, PIPE_BUF)) > 0) {
b += r;
}

If I use the same binaries and databases, the problem may appear on one machine, but not necessarily on the other. The problem appears on all on our cluster nodes, that is why I am unable to create the hh-suite databases.

Do you have any suggestions about this? I have not yet tried to use hhblits_mpi. That might also be an option. However, for me there seems to be a bug in ffindex_apply_mpi.

No output with customized database

Hi, I am using HHsuite 3.0. I used the pre-compiled version. The OS I am running is Cent OS 7.
I followed the user guide to build a customized database. As a test, I only used 15 sequences. I did not get error information when building the attached database (although got a lot of error during the whole process). But when using one of the 15 sequence as a query sequence to test the database, the process stuck at "INFO: Alternative alignment: 0".

Here is the only thing I got from the shell:

- 21:54:22.729 INFO: Searching 15 column state sequences.

- 21:54:22.830 INFO: ./test.fasta is in A2M, A3M or FASTA format

- 21:54:22.830 INFO: Iteration 1

- 21:54:23.029 INFO: Prefiltering database

- 21:54:23.222 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment)  : 15

- 21:54:23.223 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment)   : 6

- 21:54:23.223 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 6

- 21:54:23.223 INFO: Scoring 6 HMMs using HMM-HMM Viterbi alignment

- 21:54:23.335 INFO: Alternative alignment: 0

I test the query.fasta on Pfam database downloaded from your server, it works correctly.
Hope you could give me some suggestion to make the customized database work.
After this 15 sequence test database, I would build a much larger one, about 40,000 sequences.

The attachment is the built database and related code.

Another thing that confused me is that after adding the secondary structure information, the user guide suggested that we should

rm ${DB}_wo_ss_a3m.ff{data,index}

However, we did not generate those two files in the previous steps. Should that be

rm ${DB}_a3m_wo_ss.ff{data,index}

?

Last thing to mention is that I did not use DSSP secondary structure annotation. As a result, I left $dsspdir and $dssp in HHPath.pm as unchanged.

Thank you very much in advance.

no_output.tar.gz

IBM-HPC: does not build on Power8 with GCC due to vector unit checking and instrinsics

We are trying to build hh-suite on Power8. Our configuration looks like:

git clone https://github.com/soedinglab/hh-suite.git
cd hh-suite
git submodule init
git submodule update
mkdir build && cd build

#requires recent cmake 3.5.1 was used
cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=/opt/um-ppc64le/hh-suite ..

Appears from the error message that cmake checks for SSSE3 and then the code uses abstracted versions of different vector calls in src/simd.h.

We would like to add support for Power 8 VSX when compiling on ppc64le. To do this we are assuming

cmake/CheckSSEFeatures.cmake
src/simd.h

Both need logic added for the VSX intrinsics. In the end we want a cmake and build that works on Power8.

$ cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=/opt/um-ppc64le/hh-suite ..
-- The CXX compiler identification is GNU 4.8.3
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Compiler is GNU 
-- Using CPU native flags for SSE optimization:  -march=native
-- Performing Test HAVE_MM_MALLOC
-- Performing Test HAVE_MM_MALLOC - Failed
-- Performing Test HAVE_POSIX_MEMALIGN
-- Performing Test HAVE_POSIX_MEMALIGN - Success
-- Performing Test HAVE_AVX2_EXTENSIONS
-- Performing Test HAVE_AVX2_EXTENSIONS - Failed
-- Performing Test HAVE_AVX_EXTENSIONS
-- Performing Test HAVE_AVX_EXTENSIONS - Failed
-- Performing Test HAVE_SSE4_2_EXTENSIONS
-- Performing Test HAVE_SSE4_2_EXTENSIONS - Failed
-- Performing Test HAVE_SSE4_1_EXTENSIONS
-- Performing Test HAVE_SSE4_1_EXTENSIONS - Failed
-- Performing Test HAVE_SSSE3_EXTENSIONS
-- Performing Test HAVE_SSSE3_EXTENSIONS - Failed
-- Performing Test HAVE_SSE3_EXTENSIONS
-- Performing Test HAVE_SSE3_EXTENSIONS - Failed
-- Performing Test HAVE_SSE2_EXTENSIONS
-- Performing Test HAVE_SSE2_EXTENSIONS - Failed
-- Performing Test HAVE_SSE_EXTENSIONS
-- Performing Test HAVE_SSE_EXTENSIONS - Failed
-- No SSE extensions found
CMake Error at src/CMakeLists.txt:26 (message):
  SSSE3 is needed to run compile! CMake will exit.

Compile error - cannot "make" due to problem with ffindex.c

Hi, I'm trying to compile from source code the latest hhsuite 3.0 but am running into trouble. I can do cmake successfully, however I get error messages regarding Open MP (cannot find). But it still generates the "build" files so I think it is okay.. or at least not relevant to my main issue (?)

After running cmake, then make I get this error:

[  2%] Built target ext
Scanning dependencies of target ffindex
[  3%] Building C object lib/ffindex/src/CMakeFiles/ffindex.dir/ffindex.c.o
/Users/HAB/hh-suite/lib/ffindex/src/ffindex.c:525:3: warning: implicit declaration
      of function 'twalkmisc' is invalid in C99 [-Wimplicit-function-declaration]
  twalkmisc(index->tree_root, action, (void *) index_file);
  ^
/Users/HAB/hh-suite/lib/ffindex/src/ffindex.c:527:3: error: function definition is
      not allowed here
  {
  ^
/Users/HAB/hh-suite/lib/ffindex/src/ffindex.c:543:27: warning: incompatible pointer
      types passing 'void (const void *, const VISIT, const int, void *)' to
      parameter of type 'void (*)(const void *, VISIT, int)'
      [-Wincompatible-pointer-types]
  twalk(index->tree_root, action);
                          ^~~~~~
/usr/include/search.h:59:34: note: passing argument to parameter here
void     twalk(const void *, void (*)(const void *, VISIT, int));
                                    ^
/Users/HAB/hh-suite/lib/ffindex/src/ffindex.c:658:1: error: unknown type name
      'Contact'
Contact GitHub API Training Shop Blog About
^
/Users/HAB/hh-suite/lib/ffindex/src/ffindex.c:658:15: error: expected ';' after top
      level declarator
Contact GitHub API Training Shop Blog About
              ^
              ;
/Users/HAB/hh-suite/lib/ffindex/src/ffindex.c:659:1: error: non-ASCII characters
      are not allowed outside of literals and identifiers
© 2017 GitHub, Inc. Terms Privacy Security Status Help
^~
2 warnings and 4 errors generated.
make[2]: *** [lib/ffindex/src/CMakeFiles/ffindex.dir/ffindex.c.o] Error 1
make[1]: *** [lib/ffindex/src/CMakeFiles/ffindex.dir/all] Error 2
make: *** [all] Error 2

Before performing cmake I ran "git pull && git submodule deinit && git submodule init && git submodule update" in the main hhsuite directory. It appeared successful... I got this message

Cloning into '/Users/HAB/hh-suite/lib/ffindex'...
Submodule path 'lib/ffindex': checked out '360e4176ece531be34a94298c808349916d016ac'

I am fairly new at C++ compilation/etc so I apologize if I'm missing something super obvious... but any help would be appreciated!!!

Changelog is out of date

It would be great if the CHANGES file could be kept up to date in order to know which fixes are more critical.
Also: are there any version bumps planned? There have been a lot of changes since the current version number was determined.

hhmakemodel.pl hangs with missing pdb files

HHblits 3.0.0 on Centos linux, 64-bit

If the hhsearch output file contains hits to pdbs which are not present in the pdb directory then the hhmakemodel.pl script does not exit cleanly and will hang. This can happen if the copy of the hhsearch databases are out of sync with the local copy of the pdb.

To replicate hhmakemodel.pl was run with the following command over the attached output. PDBfiles for 3wt9, 5b2n and 5b0w were not available in the pdb dir provided.

hhmakemodel.pl -m 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 -d /scratch0/NOT_BACKED_UP/dbuchan/pdb -i B0R5N0.txt -ts /scratch0/NOT_BACKED_UP/dbuchan/hhblitsdb/pdb70

File: B0R5N0.txt

ERROR: In ~/hh-suite/src/hhdatabase.cpp:35: FFindexDatabase: could not open file 'uniprotdb/_cs219.ffdata'

I have installed hh-suite 3.0 version in Linux 64bit system in computer cluster. when I run ffindex_apply_mpi command, I get the error saying that _CS916 file could not be opened though all the files are there in the directory (which mentioned in -d option). I suspect it to be export path problem. However I have followed procedure exactly what has given in the user_guide for hhlib path set. I wonder where the source of errors could be?

Example: mpirun -np 10 ffindex_apply_mpi test_fas.ff{data,index} -i test_a3m_wo_ss.ffindex -d test_a3m_wo_ss.ffdata -- hhblits -d ~/hh-suite/build -i stdin -oa3m stdout -n 2

  • 17:54:37.260 INFO: Search results will be written to stdin.hhr
  • 17:54:37.260 ERROR: In ~/hh-suite/src/hhdatabase.cpp:35: FFindexDatabase:
  • 17:54:37.260 ERROR: could not open file '~/hh-suite/build_cs219.ffdata'

Zero SS scores when using both pdb and pfam

hh-suite commit ID  b1aa73afae4811c7
Operating system: linux, compiler: gnu 4.9.2

Hi,

When using both pdb and pfam:

hhsearch -d /home/ucgajhe/levine/databases/pdb70/pdb70 -ssm 4 -cpu 12 -o /home/ucgajhe/Scratch/Levine/results/test_YPR199C/YPR199C.0.ssw11.hhr -i /home/ucgajhe/Scratch/Levine/results/test_YPR199C/YPR199C.0.ss.a3m -v 2 -p 0 -cov 50 -ssw 0.11 -Z 5000 -d /home/ucgajhe/levine/databases/pfamA_30/pfam

we observe zero secondary structure scores for both PDB matches and PFAM matches:

No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
  1 1gd2_E Transcription factor PA  98.7 3.4E-10 6.3E-15   83.8   8.4   68   11-81      1-68  (70)
  2 1sse_B AP-1 like transcription  98.7 2.3E-11 4.2E-16   97.0   0.0   60  235-294    26-85  (86)
  3 1gu4_A CAAT/enhancer binding p  98.1 2.5E-07 4.5E-12   69.3   9.7   61   14-77     11-71  (78)
  4 PF08601.7 ; PAP1 ; Transcripti  97.9   3E-08 5.5E-13   97.1   0.0   56  236-291   298-355 (356)
  5 1hjb_A Ccaat/enhancer binding   97.7   1E-07 1.9E-12   73.4   0.0   61   15-78     12-72  (87)
  6 1ci6_A Transcription factor AT  97.5 2.9E-07 5.3E-12   65.6   0.0   57   18-77      2-58  (63)
  7 PF00170.18 ; bZIP_1 ; bZIP tra  97.5 3.5E-07 6.5E-12   64.3   0.0   58   18-78      5-62  (64)
  8 2dgc_A Protein (GCN4); basic d  97.5 4.4E-07   8E-12   64.4   0.0   57   15-74      6-62  (63)
  9 2wt7_A Proto-oncogene protein   97.4 6.9E-07 1.3E-11   63.0   0.0   57   18-77      2-58  (63)
 10 1jnm_A Proto-oncogene C-JUN; B  97.3 1.2E-06 2.2E-11   61.0   0.0   57   19-78      2-58  (62)
 11 1dh3_A Transcription factor CR  97.2 1.7E-06 3.1E-11   59.9   0.0   52   19-73      2-53  (55)
 12 PF07716.12 ; bZIP_2 ; Basic re  97.1 2.7E-06 4.9E-11   58.3   0.0   51   17-70      4-54  (55)
 13 1t2k_D Cyclic-AMP-dependent tr  97.1 2.8E-06 5.2E-11   59.0   0.0   55   19-76      2-56  (61)
 14 3a5t_A Transcription factor MA  97.1   4E-06 7.2E-11   68.2   0.0   62   18-82     37-98  (107)
 15 2wt7_B Transcription factor MA  97.0 6.5E-06 1.2E-10   64.3   0.0   60   18-80     27-86  (90)
 16 PF03131.14 ; bZIP_Maf ; bZIP M  96.9 9.6E-06 1.8E-10   62.1   0.0   58   18-78     30-87  (90)
 17 5apu_A General control protein  96.5 0.00019 3.5E-09   59.1   3.2   48   19-73     46-93  (95)
 18 2oxj_A Hybrid alpha/beta pepti  94.6   0.029 5.3E-07   38.0   4.4   32   39-73      1-32  (34)
 19 2r2v_A GCN4 leucine zipper; co  94.4   0.036 6.6E-07   38.0   4.2   32   39-73      1-32  (34)
 20 PF16689.2 ; APC_N_CC ; Coiled-  94.2  0.0059 1.1E-07   44.5   0.0   43   39-84      1-43  (52)
 21 4c46_A General control protein  94.0  0.0079 1.4E-07   47.9   0.0   51   18-72     26-76  (76)
 22 1kd8_B GABH BLL, GCN4 acid bas  93.7   0.011   2E-07   41.0   0.0   35   39-76      1-35  (36)
 23 3w92_A Thioester coiled coil p  93.3   0.015 2.8E-07   39.4   0.0   31   40-73      1-31  (32)
 24 1deb_A APC protein, adenomatou  93.3   0.015 2.8E-07   43.2   0.0   43   38-83      2-44  (54)
 25 2wq1_A General control protein  92.6   0.025 4.5E-07   38.6   0.0   31   40-73      1-31  (33)
 26 3c3g_A Alpha/beta peptide with  92.3    0.03 5.4E-07   38.2   0.0   31   40-73      1-31  (33)

but when running with PDB only, we get nonzero scores for all matches.

I note that the PDB database download includes SS data, but PFAM does not:


No 10
>2dgc_A Protein (GCN4); basic domain, leucine zipper, DNA binding, eukaryotic regulatory protein, transcription/DNA complex; HET: DNA; 2.20A {Saccharomyces cerevisiae} SCOP: h.1.3.1 PDB: 1dgc_A* 1ld4_E 1ysa_C* 3p8m_D
Probab=97.47  E-value=4.4e-07  Score=64.37  Aligned_cols=57  Identities=26%  Similarity=0.323  Sum_probs=45.1  Template_Neff=9.500

Q ss_pred             CCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Q YPR199C          15 LTPPKNKRAAQLRASQNAFRKRKLERLEELEKKEAQLTVTNDQIHILKKENELLHFMLRS   74 (294)
Q Consensus        15 ~~~~k~KRKaQNRaAQkAFRERKE~rlkeLE~kl~ele~~~~~~~~L~~EnE~Lr~~n~e   74 (294)
                      ....+.+|+.+||.||+.+|+||..++.+||.++..|+..+   ..|..+++.|+..+..
T Consensus         6 ~~~~~~~kr~rnr~~~~~~R~rk~~~~~~le~~v~~l~~~~---~~l~~~~~~l~~~~~~   62 (63)
T 2dgc_A            6 SSDPAALKRARNTEAARRSRARKLQRMKQLEDKVEELLSKN---YHLENEVARLKKLVGE   62 (63)
T ss_dssp             -----CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHC---
T ss_pred             cccHHHHHHHHhHHHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHh
Confidence            34566778899999999999999999999999999999888   7888888888876654


No 11
>PF00170.19 ; bZIP_1 ; bZIP transcription factor
Probab=97.45  E-value=4.9e-07  Score=63.79  Aligned_cols=58  Identities=28%  Similarity=0.290  Sum_probs=51.1  Template_Neff=9.800

Q ss_pred             HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Q YPR199C          18 PKNKRAAQLRASQNAFRKRKLERLEELEKKEAQLTVTNDQIHILKKENELLHFMLRSLLTE   78 (294)
Q Consensus        18 ~k~KRKaQNRaAQkAFRERKE~rlkeLE~kl~ele~~~~~~~~L~~EnE~Lr~~n~el~~e   78 (294)
                      ++.+|+.+||.||+.||+||..++.+||.++..|+...   ..|..+++.|+..+..|..+
T Consensus         5 k~~rr~~~nr~~~~~~R~rk~~~~~~Le~~~~~L~~~~---~~l~~~~~~l~~e~~~L~~~   62 (64)
T GBF1_ARATH/220    5 KRQKRKQSNRESARRSRLRKQAECEQLQQRVESLSNEN---QSLRDELQRLSSECDKLKSE   62 (64)
Confidence            56788999999999999999999999999999999888   77888888888887776554

We note that secondary structure nonzero matches are found in the web-search tool, but that the downloadable version of hh-pfam does not have any SS info in it.

Most confusing of all, though, is why PDB matches become zero SS score when PFAM is present.

I think this might have something to do with the code in https://github.com/soedinglab/hh-suite/blob/master/src/hhviterbirunner.cpp

   int ss_hmm_mode = HMM::computeScoreSSMode(q_simd->GetHMM(0), t_hmm_simd->GetHMM(0));
    for(size_t i = 1; i < maxres; i++){
        ss_hmm_mode = std::min(ss_hmm_mode,
                               HMM::computeScoreSSMode(q_simd->GetHMM(0), t_hmm_simd->GetHMM(i)));
    }

and this:

https://github.com/soedinglab/hh-suite/blob/master/src/hhhmm.cpp

int HMM::computeScoreSSMode( HMM *q,  HMM *t){
	int returnMode = HMM::NO_SS_INFORMATION;
	if      (q->nss_pred>=0 && t->nss_dssp>=0) returnMode=HMM::PRED_DSSP;
	else if (q->nss_dssp>=0 && t->nss_pred>=0) returnMode=HMM::DSSP_PRED;
	else if (q->nss_pred>=0 && t->nss_pred>=0) returnMode=HMM::PRED_PRED;
	return returnMode;
}

which takes a minimum across the available data, so would result in zero SS for PDB when PFAM is present.

Any thoughts?

hh-suite porting to Power8/LE.

Hi,

I am working on porting hh-suite to ppc64le. I would like to know if there is any CLA that needs to be signed to port hh-suite to ppc64le. This is because I need to change SSE specific code to AltiVec. To achieve this, I will need to comment out/write new code in simd.h file.

Also, I was checking the user guide which mentions that certain databases (uniprot20, nr20 etc.) are required for hh-suite. However I could not find those databases at wwwuser.gwdg.de/~compbiol/data/databases/hhsuite_dbs/ This link redirects to https://www.gwdg.de/ where I could not find any databases. So where do I find them?

Thanks,
Atul.

hhblits "scores" decrease as n increases

hello,
i started using hhsuite v3 last october for testing. i noticed then the qualities of my HM models (say rmsd differences) got worse when i used alignment results from v3 instead of v2. i didn't understand why and i left the project until recently.

after poking around a bit, i learned that the number of iterations n used in hhblits to create the HHM's could affect the final outcome quite a bit. to help me frame my issue, i prepared 2 very simple cases: (1) 1pga and (2) ab01_H.

all i was doing was running hhblits (v2 or v3) using the corresponding fasta files:

hhblits -n X -i 1pga.fasta -d {}/uniprot20
or
hhblits -n X -i ab01_H.fasta -d {}/uniprot20

here is what i have observed in general:
(1) when n increases (say X=1->3), overall hhblits scores decreases.
(2) hhblits v2 scores are, on average, higher than hhblits v3 scores.
(3) for the longer fasta (ab01_H) and for n>1, i started seeing warning messages from hhblits v3 like
WARNING: Number of match columns too large. Only first 19999 match columns will be kept!

with the above observations, i can understand why v3 did not do as well as v2 in HM test cases. using the same n, v2 hhblits produced higher scores and "better" alignments (??); thus, more accurate HM models.

so here are my questions:
(1) did i misuse v2 vs v3?
(2) what are the things i should worry about if i want to see similar alignment results using v2 vs v3?
i found that if i used a lower n for hhblits v3, i got better HM models at the end in selected test cases.
(3) why am i getting those warning messages when i used hhblits v3.
warm regards,
fred
ps: my uniprot20 was obtained from your server. the 2 fasta files mentioned above are attached.
1PGA.fasta.gz
ab01_H.fasta.gz

Validation/regression tests for hh-suite.

Hi,

Since I am done with the initial port of hh-suite to ppc64le and that I am able to run a few commands mentioned in the User Guide without crashing hh-suite, I am looking for testing the ported code for validity/accuracy. As I am a lay programmer without any knowledge about DNA/Proteins or anything about it. it is tough for me to know if the output I got is correct or not. The crash-less execution of hh-suite makes me happy, but I am totally unaware of the state of the result it produces.

At present I am planning to set up hh-suite on x86_64 machine, run test commands (again picked up from the User Guide) and generate a baseline to test hh-suite. Then by generating similar output on ppc64le, I will compare it with x86 results. However I am sure that this can not be claimed as "tested and validated" port.

I did not find any test cases packaged along with the source code. What could be the best way to validate the port?

Thanks,
Atul.

Issue after updating to pdb70_from_mmcif_15Feb17

Hello,

We tried updating the pdb70 database to the newest version (pdb70_from_mmcif_15Feb17). HHblits (hhsuite 3.0.0) now fails. This is the command we use:

/scratch/mahmoud/apps/hh-suite/build/bin/hhblits -d /scratch/databases/hhblits/pdb70_latest -p 20 -z 0 -b 0 -i Data/2017-02-11_00000008_1_32/templates/2017-02-11_00000008_1_32.a3m -o Data/2017-02-11_00000008_1_32/templates/2017-02-11_00000008_1_32.hhr

This is the error we get:


- 11:21:13.860 WARNING: No sequence name preceding following line in 3C1A_A:
'------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------'

- 11:21:13.882 ERROR: Uncharacterized protein OS=Enterobius vermicularis PE=4 SV=1
- 11:21:13.882 ERROR: Error in /scratch/mahmoud/apps/hh-suite/src/hhalignment.cpp:1236: Compress:

- 11:21:13.882 ERROR: 	sequences in 3C1A_A do not all have the same number of columns, 

- 11:21:13.882 ERROR: 	
e.g. first sequence and sequence tr|A0A0N4VN87|A0A0N4VN87_ENTVE.

- 11:21:13.882 ERROR: Check input format for '-M a2m' option and consider using '-M first' or '-M 50'

Reverting to the old database version solves the issue.

Here are our system informations:
Version use hhsuite 3.0.0
Centos 6.3

Thank you for your help!

Yours,
Mahmoud

ffindex_apply_mpi doesn't exist.

I'm using the version 3.0.0 of hh-suite.

The hhsuitedb.py script points to ffindex_apply_mpi ; but this binary doesn't exist in the bin folder. Instead exists ffindex_apply. I then modified the code to use ffindex_apply.

line 106

check_call(" ".join(["mpirun", "-np", threads, "ffindex_apply", a3m_base_path+".ffdata", large_a3m_index, "-d", hhm_base_path+".ffdata", "-i", hhm_base_path+".ffindex", "--", "hhmake", "-v", str(0), "-i", "stdin", "-o" ,"stdout"]), shell=True)

I think you should correct this.

hhfilter 65535 sequence limit

It appears that hhfilter has an undocumented limit on the number of sequences it outputs (65535). This should be documented and a clear warning given to the user. I just lost 100,000s of compute hours because of this (filtered very large MSAs and then erased them, only to realize that they were being severely truncated).

cstranslate parsing issue

hhmake output hhm files that begin with a first line which reads
"HHSearch 1.5"

If you build ffdata, ffindex files for a library of hhms and then run cstranslate to build the cs219 files cstranslate throws and error that the hhm file is malfornated. This can be remedied by removing this first line from hhms prior to running ffindex_build but really this should be handled gracefully in cstranslate.

For now, prior to running ffindex_build I have been using this to remove the first line from a dir of hhms

for i in ls; do tail -n +2 $i > tmpfile; mv tmpfile $i;done

Merge hhmakemodel.pl and hhmakemodel.py

Hello everybody,

I'd be interested in the necessary steps to merge the files
hhmakemodel.pl
hhmakemodel.py

As far as I know, hhmakemodel.py support mmCIF, while the Perl version does not.
Furthermore, hhmakemodel.py only supports PDBCODE_CHAIN identifiers, while hhmakemodel.pl
can also work with DALI and SCOP identifiers.

I think the Python version should be extended such that the Perl version becomes obsolete.
What do you think? What steps would this encompass? I'd e interested in working on that.

Best wishes
Lukas

libmqp needs 2 successive calls of cmake to be built

trying to build hhsuite-3.0-beta.2 from following archive: https://github.com/soedinglab/hh-suite/releases/download/v3.0-beta.2/hhsuite-3.0-beta.2-Source.tar.gz
on centos-6.9

build was done this way:

cmake -DCMAKE_INSTALL_PREFIX=/exe/hhsuite/3.0-beta.2 -G "Unix Makefiles" /src/hhsuite/hhsuite-3.0.1-Source 2>&1 
make

it breaks with

[ 73%] Building CXX object src/CMakeFiles/cstranslate_mpi.dir/cs/cstranslate_mpi_app.cc.o
Linking CXX executable ../bin/cstranslate_mpi
/usr/bin/ld: cannot find -lmpq
collect2: error: ld returned 1 exit status

we have to issue cmake twice before calling make in order to have lib/libmpq.a the build succesfully end.

looks like first iteration does not generate rules for libmpq.a build
while second one does.

see:

-=-=- One cmake iteration -=-=-

bigmess:/tmp > tar xf hhsuite-3.0-beta.2-Source.tar.gz 
bigmess:/tmp > cd hhsuite-3.0.1-Source
bigmess:/tmp/hhsuite-3.0.1-Source > mkdir build-once
bigmess:/tmp/hhsuite-3.0.1-Source > cd build-once 
bigmess:hhsuite-3.0.1-Source/build-once > cmake -G "Unix Makefiles"  ../
-- The CXX compiler identification is GNU 4.9.0
-- Check for working CXX compiler: /local/gensoft2/exe/gcc/4.9.0/bin/c++
-- Check for working CXX compiler: /local/gensoft2/exe/gcc/4.9.0/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Compiler is GNU 
-- Using CPU native flags for SSE optimization:  -march=native
-- Performing Test HAVE_MM_MALLOC
-- Performing Test HAVE_MM_MALLOC - Success
-- Performing Test HAVE_POSIX_MEMALIGN
-- Performing Test HAVE_POSIX_MEMALIGN - Success
-- Performing Test HAVE_AVX2_EXTENSIONS
-- Performing Test HAVE_AVX2_EXTENSIONS - Failed
-- Performing Test HAVE_AVX_EXTENSIONS
-- Performing Test HAVE_AVX_EXTENSIONS - Success
-- Performing Test HAVE_SSE4_2_EXTENSIONS
-- Performing Test HAVE_SSE4_2_EXTENSIONS - Success
-- Performing Test HAVE_SSE4_1_EXTENSIONS
-- Performing Test HAVE_SSE4_1_EXTENSIONS - Success
-- Performing Test HAVE_SSSE3_EXTENSIONS
-- Performing Test HAVE_SSSE3_EXTENSIONS - Success
-- Performing Test HAVE_SSE3_EXTENSIONS
-- Performing Test HAVE_SSE3_EXTENSIONS - Success
-- Performing Test HAVE_SSE2_EXTENSIONS
-- Performing Test HAVE_SSE2_EXTENSIONS - Success
-- Performing Test HAVE_SSE_EXTENSIONS
-- Performing Test HAVE_SSE_EXTENSIONS - Success
-- Found AVXextensions, using flags:  -march=native -mavx -mfpmath=sse -Wa,-q
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp  
-- Found OpenMP
-- The C compiler identification is GNU 4.9.0
-- Check for working C compiler: /local/gensoft2/exe/gcc/4.9.0/bin/cc
-- Check for working C compiler: /local/gensoft2/exe/gcc/4.9.0/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Looking for fmemopen
-- Looking for fmemopen - found
-- Could NOT find MPI_C (missing:  MPI_C_LIBRARIES) 
-- Found MPI_CXX: /local/gensoft2/exe/openmpi/2.0.1/lib/libmpi.so  
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/hhsuite-3.0.1-Source/build-once
bigmess:hhsuite-3.0.1-Source/build-once > make help
The following are some of the valid targets for this Makefile:
... all (the default if no target is provided)
... clean
... depend
... install/strip
... edit_cache
... rebuild_cache
... install
... list_install_components
... package
... test
... package_source
... install/local
... cstranslate
... hhviterbialgorithm_and_ss
... hhblits
... CS_OBJECTS
... hhviterbialgorithm_with_celloff
... hhmake
... a3m_extract
... hhblits_omp
... hhalign
... hhconsensus
... HH_OBJECTS
... a3m_database_filter
... A3M_COMPRESS_OBJECT
... a3m_database_extract
... hhfilter
... a3m_database_reduce
... hhviterbialgorithm_with_celloff_and_ss
... hhsearch
... cstranslate_mpi
... a3m_reduce
... ffindex_shared
... ffindex_apply_mpi
... ffindex_get
... ffindex_from_fasta
... ffindex_order
... ffindex_unpack
... ffindex_apply
... ffindex_modify
... ffindex
... ffindex_build

with this make ends with following error

[ 70%] Built target hhviterbialgorithm_and_ss
Scanning dependencies of target cstranslate
[ 72%] Building CXX object src/CMakeFiles/cstranslate.dir/cs/cstranslate_app.cc.o
Linking CXX executable ../bin/cstranslate
[ 72%] Built target cstranslate
Scanning dependencies of target cstranslate_mpi
[ 73%] Building CXX object src/CMakeFiles/cstranslate_mpi.dir/cs/cstranslate_mpi_app.cc.o
Linking CXX executable ../bin/cstranslate_mpi
/usr/bin/ld: cannot find -lmpq
collect2: error: ld returned 1 exit status

-=-=- two cmake iterations -=-=-
while 2 successives calls to cmake does the job

bigmess:hhsuite-3.0.1-Source/build-once > cd ../
bigmess:/tmp/hhsuite-3.0.1-Source > mkdir build-twice
bigmess:/tmp/hhsuite-3.0.1-Source > cd build-twice
bigmess:hhsuite-3.0.1-Source/build-twice > cmake -G "Unix Makefiles"  ../
-- The CXX compiler identification is GNU 4.9.0
-- Check for working CXX compiler: /local/gensoft2/exe/gcc/4.9.0/bin/c++
-- Check for working CXX compiler: /local/gensoft2/exe/gcc/4.9.0/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Compiler is GNU 
-- Using CPU native flags for SSE optimization:  -march=native
-- Performing Test HAVE_MM_MALLOC
-- Performing Test HAVE_MM_MALLOC - Success
-- Performing Test HAVE_POSIX_MEMALIGN
-- Performing Test HAVE_POSIX_MEMALIGN - Success
-- Performing Test HAVE_AVX2_EXTENSIONS
-- Performing Test HAVE_AVX2_EXTENSIONS - Failed
-- Performing Test HAVE_AVX_EXTENSIONS
-- Performing Test HAVE_AVX_EXTENSIONS - Success
-- Performing Test HAVE_SSE4_2_EXTENSIONS
-- Performing Test HAVE_SSE4_2_EXTENSIONS - Success
-- Performing Test HAVE_SSE4_1_EXTENSIONS
-- Performing Test HAVE_SSE4_1_EXTENSIONS - Success
-- Performing Test HAVE_SSSE3_EXTENSIONS
-- Performing Test HAVE_SSSE3_EXTENSIONS - Success
-- Performing Test HAVE_SSE3_EXTENSIONS
-- Performing Test HAVE_SSE3_EXTENSIONS - Success
-- Performing Test HAVE_SSE2_EXTENSIONS
-- Performing Test HAVE_SSE2_EXTENSIONS - Success
-- Performing Test HAVE_SSE_EXTENSIONS
-- Performing Test HAVE_SSE_EXTENSIONS - Success
-- Found AVXextensions, using flags:  -march=native -mavx -mfpmath=sse -Wa,-q
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp  
-- Found OpenMP
-- The C compiler identification is GNU 4.9.0
-- Check for working C compiler: /local/gensoft2/exe/gcc/4.9.0/bin/cc
-- Check for working C compiler: /local/gensoft2/exe/gcc/4.9.0/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Looking for fmemopen
-- Looking for fmemopen - found
-- Could NOT find MPI_C (missing:  MPI_C_LIBRARIES) 
-- Found MPI_CXX: /local/gensoft2/exe/openmpi/2.0.1/lib/libmpi.so  
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/hhsuite-3.0.1-Source/build-twice
bigmess:hhsuite-3.0.1-Source/build-twice > make help 
The following are some of the valid targets for this Makefile:
... all (the default if no target is provided)
... clean
... depend
... install/strip
... edit_cache
... rebuild_cache
... install
... list_install_components
... package
... test
... package_source
... install/local
... cstranslate
... hhviterbialgorithm_and_ss
... hhblits
... CS_OBJECTS
... hhviterbialgorithm_with_celloff
... hhmake
... a3m_extract
... hhblits_omp
... hhalign
... hhconsensus
... HH_OBJECTS
... a3m_database_filter
... A3M_COMPRESS_OBJECT
... a3m_database_extract
... hhfilter
... a3m_database_reduce
... hhviterbialgorithm_with_celloff_and_ss
... hhsearch
... cstranslate_mpi
... a3m_reduce
... ffindex_shared
... ffindex_apply_mpi
... ffindex_get
... ffindex_from_fasta
... ffindex_order
... ffindex_unpack
... ffindex_apply
... ffindex_modify
... ffindex
... ffindex_build
bigmess:hhsuite-3.0.1-Source/build-twice > cmake -G "Unix Makefiles"  ../
-- Compiler is GNU 
-- Using CPU native flags for SSE optimization:  -march=native
-- Found AVXextensions, using flags:  -march=native -mavx -mfpmath=sse -Wa,-q
-- Found OpenMP
-- Found MPI_C: /local/gensoft2/exe/openmpi/2.0.1/lib/libmpi.so  
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/hhsuite-3.0.1-Source/build-twice
bigmess:hhsuite-3.0.1-Source/build-twice > make help 
The following are some of the valid targets for this Makefile:
... all (the default if no target is provided)
... clean
... depend
... install/strip
... edit_cache
... rebuild_cache
... install
... list_install_components
... package
... test
... package_source
... install/local
... cstranslate
... hhviterbialgorithm_and_ss
... hhblits
... CS_OBJECTS
... hhviterbialgorithm_with_celloff
... hhmake
... a3m_extract
... hhblits_omp
... hhalign
... hhconsensus
... HH_OBJECTS
... a3m_database_filter
... A3M_COMPRESS_OBJECT
... a3m_database_extract
... hhfilter
... a3m_database_reduce
... hhviterbialgorithm_with_celloff_and_ss
... hhsearch
... cstranslate_mpi
... a3m_reduce
... ffindex_shared
... ffindex_apply_mpi
... ffindex_get
... ffindex_from_fasta
... ffindex_order
... ffindex_unpack
... ffindex_apply
... ffindex_modify
... ffindex
... ffindex_build
... mpq

and then make successfully ends.

Scanning dependencies of target ffindex_unpack
[100%] Building C object lib/ffindex/src/CMakeFiles/ffindex_unpack.dir/ffindex_unpack.c.o
Linking C executable ../../../bin/ffindex_unpack
[100%] Built target ffindex_unpack

Error in hhsuitedb.py

I have a problem with the script hhsuitedb.py.
I want to say that I've recompiled hh-suite post installation of mpi and added the right paths HHLIB and PATH to my bashrc.

My problem is the following:

I have different alignments files in FASTA format; I converted them to am3 using the perl script reformat.pl in this way:
perl reformat.pl fas a3m ../alignments/31.aln /a3m/31.a3m
The script doesn’t complain and produces all the a3m files.

Then I move to hhsuitedb.py and I’m using it like this:

The error that I get is the following:
user@machine:~/temp/hh-suite/scripts$ python hhsuitedb.py --ia3m="/home/user/temp/dbs_exp/a3m/A10.a3m" -o ~/temp/db.db --cpu=1 Warning: A3M A10.a3m is corrupted! MPQ_Init: Needs at least one worker process. Traceback (most recent call last): File "hhsuitedb.py", line 482, in <module> main() File "hhsuitedb.py", line 478, in main check_database(options.output_basename, options.nr_cores, options.force_mode) File "hhsuitedb.py", line 376, in check_database calculate_hhm(threads, output_basename+"_a3m", output_basename+"_hhm") File "hhsuitedb.py", line 106, in calculate_hhm check_call(" ".join(["mpirun", "-np", threads, "ffindex_apply_mpi", a3m_base_path+".ffdata", large_a3m_index, "-d", hhm_base_path+".ffdata", "-i", hhm_base_path+".ffindex", "--", "hhmake", "-v", str(0), "-i", "stdin", "-o" ,"stdout"]), shell=True) File "/usr/lib/python2.7/subprocess.py", line 541, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'mpirun -np 1 ffindex_apply_mpi /home/user/temp/db.db_a3m.ffdata /tmp/tmpf3GVy1/large.ffindex -d /home/user/temp/db.db_hhm.ffdata -i /home/user/temp/db.db_hhm.ffindex -- hhmake -v 0 -i stdin -o stdout' returned non-zero exit status 1

ffindex_order issue and observation

hello,
i am using a hh-suite v3 version from oct 2016. not sure if it matters for this following issue.
to optimize the a3m and hhm ffindices, i followed the instructions on page 17 of the manual using ffindex_order. the file "sorting.dat" only worked for the a3m case, but not the hhm case.

so i made a guess replacing all the a3m extensions within the "sorting.dat" file with hhm extensions. using this new file (refer to here as sorting.hhm), i was able to get the hhm case to WORK. that is, it needed 2 separate "sorting.dat" files to optimize the a3m and hhm cases. is this an issue of the manul or something else?

in addition, i noticed that the number of entries before and after the optimization are the same for the a3m case but NOT the same for the hhm case using my trial procedure described above. after optimization, i had fewer hhm entries. is this ok or something is wrong?

-rw-r----- 1 fslee lee 3084984 Jul 5 16:39 test_cs219.ffdata
-rw-r----- 1 fslee lee 391634 Jul 5 16:39 test_cs219.ffindex
-rw-r----- 1 fslee lee 6552214670 Jul 5 16:40 test_a3m.ffdata
-rw-r----- 1 fslee lee 493334 Jul 5 16:40 test_a3m.ffindex
-rw-r----- 1 fslee lee 541748872 Jul 5 16:48 test_hhm.ffdata
-rw-r----- 1 fslee lee 467672 Jul 5 16:48 test_hhm.ffindex
-rw-r----- 1 fslee lee 200868 Jul 5 16:52 sorting.a3m
-rw-r----- 1 fslee lee 6552214670 Jul 5 17:03 test_a3m_new.ffdata
-rw-r----- 1 fslee lee 493334 Jul 5 17:03 test_a3m_new.ffindex
-rw-r----- 1 fslee lee 200868 Jul 5 17:05 sorting.hhm
-rw-r----- 1 fslee lee 430943602 Jul 5 17:36 test_hhm_new.ffdata
-rw-r----- 1 fslee lee 367354 Jul 5 17:36 test_hhm_new.ffindex
muon:/tmp/scope70_2.06.191 wc -l sorting.*
16739 sorting.a3m
16739 sorting.hhm
33478 total
muon:
/tmp/scope70_2.06.192 wc -l *.ffindex
16739 test_a3m.ffindex
16739 test_a3m_new.ffindex
16739 test_cs219.ffindex
16861 test_hhm.ffindex
13400 test_hhm_new.ffindex
80478 total

fred

hhalign: -t switch is not optional

If hhalign is invoked like:
hhalign -i foo.fasta

The error message
ERROR: Template File does not exist:
appears.

Currently, you must invoke:
hhalign -i foo.fasta -t foo,fasta

hhalign -ssm switch is not documented

hh-suite 3.0.0

the switch -ssm is not documented for hhalign, but it still works, as long as a DSSP sec. structure prediction is present. If -ssm 1, the SS score is set to 0

Edit: I was wrong... it is documented

hhmake fails on example file

For the latest verison of HHsuite 3.0.beta.2, hhmake fails despite passing tests from make test.

I am running on Arch Linux following the build here https://aur.archlinux.org/packages/hhsuite/

My processor has ssse3 as seen in the flags:

$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 21
model : 19
model name : AMD A10-5750M APU with Radeon(tm) HD Graphics
stepping : 1
microcode : 0x6001119
cpu MHz : 1400.000
cache size : 2048 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 16
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs : fxsave_leak sysret_ss_attrs null_seg
bogomips : 4992.61
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

The build process detects the following:

$ makepkg
==> Making package: hhsuite 3.0.beta.2-1 (Tue May 9 09:17:34 CDT 2017)
==> Checking runtime dependencies...
==> Checking buildtime dependencies...
==> Retrieving sources...
-> Downloading hhsuite-3.0-beta.2-Source.tar.gz...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 608 0 608 0 0 1583 0 --:--:-- --:--:-- --:--:-- 1583
100 6571k 100 6571k 0 0 1341k 0 0:00:04 0:00:04 --:--:-- 1740k
-> Found hhsuite.sh
==> Validating source files with sha1sums...
hhsuite-3.0-beta.2-Source.tar.gz ... Passed
hhsuite.sh ... Passed
==> Extracting sources...
-> Extracting hhsuite-3.0-beta.2-Source.tar.gz with bsdtar
==> Starting build()...
-- The CXX compiler identification is GNU 6.3.1
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Compiler is GNU
-- Using CPU native flags for SSE optimization: -march=native
-- Performing Test HAVE_MM_MALLOC
-- Performing Test HAVE_MM_MALLOC - Success
-- Performing Test HAVE_POSIX_MEMALIGN
-- Performing Test HAVE_POSIX_MEMALIGN - Success
-- Performing Test HAVE_AVX2_EXTENSIONS
-- Performing Test HAVE_AVX2_EXTENSIONS - Success
-- Performing Test HAVE_AVX_EXTENSIONS
-- Performing Test HAVE_AVX_EXTENSIONS - Success
-- Performing Test HAVE_SSE4_2_EXTENSIONS
-- Performing Test HAVE_SSE4_2_EXTENSIONS - Success
-- Performing Test HAVE_SSE4_1_EXTENSIONS
-- Performing Test HAVE_SSE4_1_EXTENSIONS - Success
-- Performing Test HAVE_SSSE3_EXTENSIONS
-- Performing Test HAVE_SSSE3_EXTENSIONS - Success
-- Performing Test HAVE_SSE3_EXTENSIONS
-- Performing Test HAVE_SSE3_EXTENSIONS - Success
-- Performing Test HAVE_SSE2_EXTENSIONS
-- Performing Test HAVE_SSE2_EXTENSIONS - Success
-- Performing Test HAVE_SSE_EXTENSIONS
-- Performing Test HAVE_SSE_EXTENSIONS - Success
-- Found AVX2extensions, using flags: -march=native -mavx2 -mfpmath=sse -Wa,-q
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp
-- Found OpenMP
-- The C compiler identification is GNU 6.3.1
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Looking for fmemopen
-- Looking for fmemopen - found
-- Found MPI_C: /usr/lib/openmpi/libmpi_cxx.so;/usr/lib/openmpi/libmpi.so
-- Found MPI_CXX: /usr/lib/openmpi/libmpi_cxx.so;/usr/lib/openmpi/libmpi.so
-- Configuring done
-- Generating done

I have been following the example in the hhsuite-userguide.pdf.

$ valgrind hhmake -i query.a3m
==14707== Memcheck, a memory error detector
==14707== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==14707== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==14707== Command: hhmake -i query.a3m
==14707==

  • 09:36:53.091 INFO: query.a3m is in A2M, A3M or FASTA format

vex amd64->IR: unhandled instruction bytes: 0x8F 0xEA 0xF8 0x10 0xC9 0x3 0x1D 0x0 0x0 0x45
vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0
==14707== valgrind: Unrecognised instruction at address 0x47a30b.
==14707== at 0x47A30B: Alignment::Amino_acid_frequencies_and_transitions_from_M_state(HMM*, char, char*, int, float const*) (in /usr/bin/hhmake)
==14707== by 0x47BB04: Alignment::FrequenciesAndTransitions(HMM*, char, char, char, char, int, float const*, float const () [20], char, bool) (in /usr/bin/hhmake)
==14707== by 0x4C5294: ReadQueryFile(Parameters&, _IO_FILE*, char&, char, HMM*, Alignment*, char*, float*, float const () [20], float const () [20]) (in /usr/bin/hhmake)
==14707== by 0x4C5FED: ReadQueryFile(Parameters&, char*, char&, char, HMM*, Alignment*, float*, float const () [20], float const () [20]) (in /usr/bin/hhmake)
==14707== by 0x414D8D: main (in /usr/bin/hhmake)
==14707== Your program just tried to execute an instruction that Valgrind
==14707== did not recognise. There are two possible reasons for this.
==14707== 1. Your program has a bug and erroneously jumped to a non-code
==14707== location. If you are running Memcheck and you just saw a
==14707== warning about a bad jump, it's probably your program's fault.
==14707== 2. The instruction is legitimate but Valgrind doesn't handle it,
==14707== i.e. it's Valgrind's fault. If you think this is the case or
==14707== you are not sure, please let us know and we'll try to fix it.
==14707== Either way, Valgrind will now raise a SIGILL signal which will
==14707== probably kill your program.
==14707==
==14707== Process terminating with default action of signal 4 (SIGILL): dumping core
==14707== Illegal opcode at address 0x47A30B
==14707== at 0x47A30B: Alignment::Amino_acid_frequencies_and_transitions_from_M_state(HMM*, char, char*, int, float const*) (in /usr/bin/hhmake)
==14707== by 0x47BB04: Alignment::FrequenciesAndTransitions(HMM*, char, char, char, char, int, float const*, float const () [20], char, bool) (in /usr/bin/hhmake)
==14707== by 0x4C5294: ReadQueryFile(Parameters&, _IO_FILE*, char&, char, HMM*, Alignment*, char*, float*, float const () [20], float const () [20]) (in /usr/bin/hhmake)
==14707== by 0x4C5FED: ReadQueryFile(Parameters&, char*, char&, char, HMM*, Alignment*, float*, float const () [20], float const () [20]) (in /usr/bin/hhmake)
==14707== by 0x414D8D: main (in /usr/bin/hhmake)
==14707==
==14707== HEAP SUMMARY:
==14707== in use at exit: 23,106,610 bytes in 97,163 blocks
==14707== total heap usage: 113,363 allocs, 16,200 frees, 33,876,011 bytes allocated
==14707==
==14707== LEAK SUMMARY:
==14707== definitely lost: 88 bytes in 3 blocks
==14707== indirectly lost: 10,784,040 bytes in 16,002 blocks
==14707== possibly lost: 0 bytes in 0 blocks
==14707== still reachable: 12,322,482 bytes in 81,158 blocks
==14707== suppressed: 0 bytes in 0 blocks
==14707== Rerun with --leak-check=full to see details of leaked memory
==14707==
==14707== For counts of detected and suppressed errors, rerun with: -v
==14707== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Illegal instruction (core dumped)

The output above shows hhmake give an illegal instruction and fail. Is there any advice on how to fix this?

However, the precompiled binaries seem to work.

Problem running HHSearch on Debian Testing

I am unable to run HHSearch anymore. I am not sure when it stopped working. I am running Debian Testing, and installed hhsuite from the repository. The following is a trace:

hhsearch -i HHSearch.hhm -d HHSearch2.hhm

  • 17:20:44.102 INFO: Search results will be written to HHSearch.hhr!

  • 17:20:44.102 ERROR: In /build/hhsuite-3.0~beta2+dfsg/src/hhdatabase.cpp:35: FFindexDatabase:

  • 17:20:44.102 ERROR: could not open file 'HHSearch2.hhm_cs219.ffdata'

<db>_<a3m|hhm>.ffindex VERSUS <db>_<a3m|hhm>_db.index

hello,
i noticed that ffindex files and index files are almost the same but not identical. when i used ffindex_build (distributed with version 3), i actually made files identical to index files (and not the ffindex files). does it matter? are the 2 interchangeable? i thought v2 uses the index files and v3 uses the ffindex files.

also when the pfamA_29.0.tgz file downloaded from your server was untar-ed, i noticed something UNusual in this particular file: pfam_hhm.ffindex. it has "a3m" inside all over. was that intended?
best regards,
fred

Wrong interpretation of '-ssm' option

Dear HH-Suite developers,

It seems that '-ssm' option is not processed correctly by the HH-Suite tools.

When we compared HHalign v.2.0.16 vs. v.3.0.0 we observed that in spite of explicit identical settings
(-local -alt 10 -p 0 -norealign -pc_hhm_contxt_mode 2 -pc_hhm_contxt_a 1.0 -pc_hhm_contxt_b 1.5 -pc_hhm_contxt_c 1.0 -contxt /usr/.../data/context_data.lib) the new version does not take into account SS score when the alignment is constructed.

It turned out that due to a bug '-ssm 1' should be used instead '-ssm 2' option for setting the mode when SS score is used during the Viterbi algorithm.

Indeed, Hit::SCORE_ALIGNMENT constant is defined as 1 in hhhit.h: as a result when '-ssm 2' option is used (which is the default value) the comparison 'ss_mode == Hit::SCORE_ALIGNMENT' in Viterbi::Align() fails and SS score is not used during the Viterbi algorithm.

Wrong format in printf statement

Hi,

hhblits.cpp uses the %i printf(3) conversion to format a 'double' value ( see Debian bug tracking system )

To help us debugging your issue, please make sure you

Please note besides this
There are a lot of patches available in the Debian packaging which should have been propagated by the maintainer before. I guess this has not happened yet and thus I'm pointing you to this set of patches to enable you cherry-picking from it as far as it applies.
Kind regards, Andreas.

Where to find source code for older versions of hh-suite?

Hi! I tried to build hh-suite on Power8/LE but was getting a lot of SSE related issues. I wanted to have a look at older code to see what was there before implementing SSE related optimizations, however on GitHub only 3.0-beta code is available. I could not find any trace of older code in the existing code base as well. So where can I get older code of hh-suite? I did obtain version 2.0.16 code from one of Ubuntu source repositories but anything older than that is also desirable.

Thanks!
Atul.

Degenerated viterbi alignments

Hello,

When running hhsearch with the following parameters it ends with a segmentation fault:
hhsearch -i 8221280.reduced.a3m -d pdb70 -o out.hhr -glob -norealign

Those parameters are used in the the toolkit in Tuebingen.

We see the following output:

13 4CAY_A HISTONE H2A.Z, HISTONE   99.3   4E-15 7.4E-20  107.9   0.0    1  126-126   112-112 (111)
14 4CAY_A HISTONE H2A.Z, HISTONE   99.3 4.6E-15 8.4E-20  107.6   0.1    1  126-126   112-112 (111)

Those degenerated alignments caused in previous debugging sessions a segmentation fault:

  • Either the alignment is simply so bad (unlikely due to the evalue) and should be filtered out before other methods cause the segmentation fault.
  • There might be still a bug in the simd viterbi version. (unlikely)
  • Or there is a bug in the backtracing of the viterbi.
    Usually the alignments of the viterbi are redone with the Maximum Accuracy alignment algorithm, therefore this bug does not appear in hhsearch/hhblits with default parameters.

Since the bug seems to be caused by the Viterbi.
Could you please handle it, @martin-steinegger ?

cstranslate questions [v3.0.0]

hello,
i was using hhsuite2 with database files built by people before me. all was well. lately, i have been trying to update to version 3. i am able to make packed files from a3m's and hhm's using ffindex_build. however, i have trouble making the corresponding _cs219 packed file and index using cstranslate.

(1) i used existing a3m's and hhm's (made from version 2) to make the _cs219 ffdata and ffindex files and i got errors like:

Processing entry: 4jdv_L.a3m
Could not read entry: 4jdv_L.a3m, Message: Sequence 2 has 0 match columns but should have 103!

what may have gone wrong here? i also tried using freshly made a3m's and hhm's (from v3) to make the cs219 files. i saw similarly errors. perhaps these a3m's and hhm's were not prepared properly?

(2) do hhsuite v3 hhsearch and hhblits always require the corresponding cs219 db files to run? seemed so from my very limited experience with v3. i don't recall version 2 needing the cs219 files. simplistically speaking, what is being added here?

warm regards,
fred

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.