Giter VIP home page Giter VIP logo

hssp's People

Contributors

cbaakman avatar jonblack avatar tonyelewis avatar touwwouter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hssp's Issues

Compilation error on cmbi23

/usr/local/lib/libzeep.so: undefined reference to `boost_round'
/usr/local/lib/libzeep.so: undefined reference to `bool boost::math::tr1::isnan<double>(double)'

BOOST_FOREACH caches the end() iterator

BOOST_FOREACH caches the end() iterator and it is undefined behaviour if you modify the container inside the loop.

The affected code is src/structure.cpp:2018.

Either use a standard for loop or the std::remove_if built-in function.

Open source the project

The project should be open source. Before making it public, ensure the license information is correct in all file (source) and COPYING.

mkhssp progress bar should be optional

When mkhssp is run a progress bar for the blast is shown. This interferes with the HSSP output when it's sent to stdout.

Make the progress bar output optional and turned off by default.

mkhssp/mkdssp missing residue

mkhssp and mkdssp both generate files with a missing residue when run on pdb/mmCIF files 1yv8 and 1yva. It's the first threonine that is missing. These files lack Thr-1's C-alpha atom.

stdout output option doesn't output to stdout

The help documentation states:

-o [ --output ] arg Output file, use 'stdout' to output to screen

Using stdout as the output file results in a file called stdout and not the output being sent to stdout.

Fix this by removing the help text. mkdssp (and probably also mkhssp) should by default output to stdout. Only when -o is specified should it output to a file.

"Not a valid id" parsing fasta file

The following command outputs the correct hssp file, but ends with the text "Not a valid id" and exits with an error code:

$ ./mkhssp -i ~/1crn.fasta -d /data/fasta/sprot.fasta -d /data/fasta/trembl.fasta
...
...
Not a valid id
$ echo $?
1

The equivalent command using the pdb file works fine:

$ ./mkhssp -i ~/1crn.pdb -d /data/fasta/sprot.fasta -d /data/fasta/trembl.fasta
...
...
$ echo $?
0

I've tried fasta files with and without a trailing newline, but both fail.

I suspect the error is caused by a bug in the parser. Fasta files can contain multiple sequences, where the first line of each sequence contains an id. I suspect the parser assumes another entry even when there isn't one.

Improve code coverage

There are not many unit tests for xssp so it's hard to be sure if it's correct and scary to change because a regression may be introduced.

Improve logging

HOPE receives a proxy error but there is no coredump, so the program may not be crashing. In order to find out what is happening, improve the logging so that more statements are logged and more often. It would also be beneficial to use log4cpp.

syntax error in autoconf's configure script

When running the autoconf built configure script on some machines, one gets the following error:

./configure: line 4290: syntax error near unexpected token `1.54'
./configure: line 4290: `AX_BOOST_BASE(1.54)'

Remove limits setting code

There is code that sets limits when on linux so coredumps are generated. This isn't required. Developers can set limits on the system directly.

RuntimeError: No home defined

In order to load settings from a configuration file, mkhssp tries to determine the user's home directory. xssp-rest, however, runs the command via subprocess which means no home directory exists, which causes mkhssp to exit.

If the home directory doesn't exist, mkhssp should continue. A debug log message should be given.

Accept input from stdin in mkdssp

mkdssp accepts input from a file. Output can be either a file, or stdout. Add support for taking the input from stdin. This removes the need for scripts to save content to a temporary file before running mkdssp.

mkdssp does not accept gzipped mmCIF files

mkdssp does not accept EDIT: gzipped mmCIF files when compiled from the master branch (currently release 2.2.3).
on cmbi4, mkdssp version 2.2.0 does accept gzipped mmCIF files.

-Why the discrepancy?
-How to fix mmCIF acceptance on master branch?

Compilation fails on cmbi4

Compilation fails on cmbi4, which is running ubuntu 12.04 LTS. There are multiple reasons:

  1. The version of autoconf is too old to parse the configure.ac script. It also fails to check the required version properly;
  2. The version of boost is 1.48 and the repository doesn't contain 1.54; however, it does compile with 1.48 after fixing the other problems;
  3. The version of gcc doesn't support std=c++11; however, it does support std=c++0x;
  4. Compilation errors due to warnings (strangely these only appear on cmbi4).

Investigate hsspsoap timeout cause

This is related to hope issue https://github.com/cmbi/hope/issues/4.

When hope makes a request to hsspsoap, it reaches the timeout of one hour. It's not clear what is causing it. The possibilities are:

  • hsspsoap has crashed;
  • the hsspsoap job takes longer than the timeout;
  • hsspsoap doesn't reply when finished.

The last one in the list is a suspect due to the following comment in HOPE, in HSSPServiceImpl.java:

HSSPSoap sometimes acts as a tarpit, meaning requests for the WSDL never return. Cope by using a timeout here

Fix cppcheck errors

cppcheck reports many errors when --enable=all is used. There are likely false positive in there, but they should be checked regardless.

Use autotools

The current makefile is hand-written and requires a lot of effort to maintain. Gnu autotools takes a lot of this work away. Moreover, the make clean target doesn't remove executables. See the wiwsd project for an example.

hsspsoap exits immediately

When hsspsoap is executed on cmbi23 it exits immediately. This change occurred after updating the machine from Ubuntu 12.04 LTS to Ubuntu 14.04 LTS.

mkdssp/mkhssp produce different chain IDs for cif files

Both mkdssp and mkhssp can take pdb files as input, as well as mmcif files. But for the structure 1B2W for example, the mmcif file produces a different output compared to the pdb file.

The difference is in the chain IDs. PDB produces chains L and H, while mmCIF produces A and B.

Inspection of the mmCIF file at http://rcsb.org/pdb/files/1B2W.cif shows that each chan has two IDs in the file. One column has IDs A and B, while the other column has L and H as in the PDB file. For some reason Maarten chosen to take the leftmost column from mmCIF files.

Remove VERBOSE

VERBOSE is a global and globals are evil. It is getting in the way of writing tests because it means including another main, which doesn't play nice with Boost.Test.

When the logging is implemented as per #14, VERBOSE logging statements will be normal logging using a level such as trace.

Remove hsspsoap

A wrapper is being written in python that will replace hsspsoap. Once the wrapper has been written and is being used by all services instead of hsspsoap, hsspsoap can be removed. This makes a lot of issues in xssp redundant.

Cannot find mtrx/matrices.h

When compiling the develop branch from scratch, the following error occurs:

configure: error: cannot find sources (mtrx/matrices.h) in . or ..

Fasta formatted input file must have the fasta extension

This isn't clear at all. Either the documentation should be updated, or the code should determine the file type by checking for the fasta header.

The error message displayed is also silly:

empty protein, or no valid complete residues

I assume this is because by default files are assumes to be in PDB format.

Use a queue to manage hsspsoap jobs

When clients make a SOAP request via hsspsoap, the HTTP connection must stay open for a long time to wait for the response. This is not good practise, and also causes problems for the proxy server (see #13).

Assign job id's to each request and reply immediately with the job id. Allow clients to query the status of the job id.

mkhssp doesn't correctly count NDEL and NINS

Under the sequence profile section, mkhssp places zeros for NDEL and NINS for this PDB input example: 1f2i

Compiling under a different optimization level fixes the problem, but that's not a clean solution of coarse.

The NDEL and NINS values are determined in the MProfile::Align function, at https://github.com/cmbi/xssp/blob/master/src/hssp-nt.cpp#L733 and https://github.com/cmbi/xssp/blob/master/src/hssp-nt.cpp#L739 . The code however, is not very clear.

Invalid formatted floating point number

mkdssp returns the following error when called from xssp-rest:

DSSP could not be created due to an error:
invalid formatted floating point number '  8   '

I will do some more investigation to figure out what input was used.

hsspsoap segmentation fault

The wsdl at www.cmbi.ru.nl/hsspsoap/wsdl becomes unavailable because hsspsoap is segfaulting. See the logs below:

messages.1:Aug  6 14:53:21 cmbi23 kernel: [2786239.939964] hsspsoap[20485]: segfault at 6ecbe0 ip 0000000000558f47 sp 00007fff7875b470 error 6 in hsspsoap[400000+395000]
messages.1:Aug  6 14:53:31 cmbi23 kernel: [2786249.446772] hsspsoap[17288]: segfault at 6ecbe0 ip 0000000000558f47 sp 00007fff142724a0 error 6 in hsspsoap[400000+395000]
messages.1:Aug  6 14:56:12 cmbi23 kernel: [2786410.667699] hsspsoap[1821]: segfault at 6ecbe0 ip 0000000000558f47 sp 00007fffe66826a0 error 6 in hsspsoap[400000+395000]

The service hsspsoap service has to be restarted every time this happens.

HSSP deletion count

Suppose we have the alignment

0         1
1 23456789012
R.SDALTRHFRTE Query
R.SDALSRHFRTE
PDPSSLARHRHVH
NASDRAKHQNRTH
R.SDALSRHLRTE
R.PDNLQRHVRVH
R.LENLKTHLRSH
R.SDALSRHFRTE
R.SDALSRHFRTE
N....LKQHVLRH
E....LRKHLRVP 

The 2 insertions will be assigned to position 1, but the 2 deletions will also be assigned to position 1.
Does it make more sense to assign the deletions to position 2 in the query sequence?

Large hsspsoap jobs take over the process

When large jobs are submitted to hsspsoap, the entire application becomes unresponsive to other requests. The web service wsdl takes a long time to load, which causes a timeout in the browser/in HOPE.

Ambiguous command line option -a

The -a option is available for both --address and --threads, which when used, results in the following error:

jon@cmbipc85:~/Projects/xssp$ ./hsspsoap -a 127.0.0.1 -p 6789
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::program_options::ambiguous_option> >'
  what():  option '-a' is ambiguous and matches '--address', and '--threads'
Aborted

A workaround is to use the long name variant.

Remove reading parameters from a config file in mkhssp

mkhssp allows command params to be read from a config file. This is not necessary as the command line arguments are small, and it causes problems on virtual machines when HOME isn't defined. Considering it can always be run using explicit arguments (and is as far as CMBI is concerned), remove this feature.

hsspsoap doesn't start automatically after reboot

When cmbi23 is restarted, hsspsoap is not started automatically. Moreover, running sudo service hsspsoap start and /etc/init.d/hsspsoap start does nothing. Running the command from the init script manually works, so I suspect the init script is broken.

xssp error output isn't captured by celery task when an exception is raised

When the command mkhssp -i /tmp/1crn.pdb -d /data/fasta/sprot.fasta -d /data/fasta/trembl.fasta is run manually, you get the following message, which is correct:

Databank /data/fasta/sprot.fasta does not exist

However, when the celery task for xssp-rest runs, the captured output is empty:

[2014-08-07 02:47:51,447: ERROR/MainProcess] Task xssp_rest.tasks.mkhssp_from_pdb[b5b3a5f4-dbd1-49eb-95ac-9ddf4862ce61] raised unexpected: RuntimeError('',)

The error is caused by an exception, which is never handled. Perhaps this should just be a message to std::err?

Unused files

The following files are not used during compilation. Are they needed. If so, what for?

  • align-2d.cpp
  • align-3d.cpp
  • maxhom-hssp.cpp
  • mutualinformation.cpp
  • ioseq.cpp
  • guide.cpp
  • mkhssp.h

Segmentation fault in CalculateConservation for mkhssp

mkhssp crashes with a Segmentation fault when run with the pdb file /data/xssp_error_pdbs/989985db-0d97-4db3-bf92-d2466eefbae5.pdb located on cmbi23:

jon@cmbi23:/var/log/supervisor$ mkhssp -i /tmp/989985db-0d97-4db3-bf92-d2466eefbae5.pdb -d /data/fasta/sprot.fasta -d /data/fasta/trembl.fasta
blast done in 646s cpu / 43s wall
distance done in 94s cpu / 4s wall
aligning done in 93s cpu / 93s wall
Segmentation fault (core dumped)

The core dump shows that the problem is in hssp-nt.cpp on line 1276:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000047e620 in HSSP::MProfile::CalculateConservation (
    this=this@entry=0x7ffff13fc7b0, inThreads=inThreads@entry=32) at src/hssp-nt.cpp:1276
    1276            ri.m_dist[j] = m_seq[i] == j ? 1 : 0;
(gdb) bt
#0  0x000000000047e620 in HSSP::MProfile::CalculateConservation (
    this=this@entry=0x7ffff13fc7b0, inThreads=inThreads@entry=32) at src/hssp-nt.cpp:1276
#1  0x000000000047f3c7 in HSSP::MProfile::Process (this=this@entry=0x7ffff13fc7b0, 
    inHits=..., inGapOpen=inGapOpen@entry=30, inGapExtend=inGapExtend@entry=2, 
    inMaxHits=inMaxHits@entry=5000, inThreads=inThreads@entry=32) at src/hssp-nt.cpp:1112
#2  0x0000000000480359 in HSSP::CreateHSSP (inProtein=..., inDatabanks=..., 
    inMaxHits=inMaxHits@entry=5000, inMinSeqLength=inMinSeqLength@entry=25, 
    inGapOpen=inGapOpen@entry=30, inGapExtend=inGapExtend@entry=2, 
    inThreshold=inThreshold@entry=0.0500000007, 
    inFragmentCutOff=inFragmentCutOff@entry=0.75, inThreads=inThreads@entry=32, 
    inFetchDBRefs=inFetchDBRefs@entry=false, inOs=...) at src/hssp-nt.cpp:1406
#3  0x00000000004307a9 in main (argc=<optimized out>, argv=0x7ffff13fdaf8)
    at src/mkhssp.cpp:221

The coredump is on cmbi23 in /data/coredumps/core.mkhssp.9960.

mkhssp skips chain 0

Observed for pdb entries 1AL2, 1AR6,1AR8,1AR9 and 1AR7. The resulting hssp files do not have chain 0, while the dssp files do.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.