cmbi / hssp Goto Github PK
View Code? Open in Web Editor NEWCreate DSSP and HSSP files
License: GNU General Public License v3.0
Create DSSP and HSSP files
License: GNU General Public License v3.0
/usr/local/lib/libzeep.so: undefined reference to `boost_round'
/usr/local/lib/libzeep.so: undefined reference to `bool boost::math::tr1::isnan<double>(double)'
'operator=' should check for assignment to self to avoid problems with dynamic memory. The affected code is src/matrix.h:169.
Writing a copy constructor is not trivial. Use the copy-swap idiom. See http://stackoverflow.com/questions/3279543/what-is-the-copy-and-swap-idiom for more details.
Much of the code is poorly aligned which makes it awful to read. Two spaces should be used instead of tabs.
BOOST_FOREACH caches the end() iterator and it is undefined behaviour if you modify the container inside the loop.
The affected code is src/structure.cpp:2018.
Either use a standard for loop or the std::remove_if
built-in function.
The project should be open source. Before making it public, ensure the license information is correct in all file (source) and COPYING.
When mkhssp
is run a progress bar for the blast is shown. This interferes with the HSSP output when it's sent to stdout.
Make the progress bar output optional and turned off by default.
Add a version option to the command line so users can discover the version they are running.
mkhssp and mkdssp both generate files with a missing residue when run on pdb/mmCIF files 1yv8 and 1yva. It's the first threonine that is missing. These files lack Thr-1's C-alpha atom.
The help documentation states:
-o [ --output ] arg Output file, use 'stdout' to output to screen
Using stdout
as the output file results in a file called stdout and not the output being sent to stdout.
Fix this by removing the help text. mkdssp (and probably also mkhssp) should by default output to stdout. Only when -o
is specified should it output to a file.
The following command outputs the correct hssp file, but ends with the text "Not a valid id" and exits with an error code:
$ ./mkhssp -i ~/1crn.fasta -d /data/fasta/sprot.fasta -d /data/fasta/trembl.fasta
...
...
Not a valid id
$ echo $?
1
The equivalent command using the pdb file works fine:
$ ./mkhssp -i ~/1crn.pdb -d /data/fasta/sprot.fasta -d /data/fasta/trembl.fasta
...
...
$ echo $?
0
I've tried fasta files with and without a trailing newline, but both fail.
I suspect the error is caused by a bug in the parser. Fasta files can contain multiple sequences, where the first line of each sequence contains an id. I suspect the parser assumes another entry even when there isn't one.
There are not many unit tests for xssp so it's hard to be sure if it's correct and scary to change because a regression may be introduced.
HOPE receives a proxy error but there is no coredump, so the program may not be crashing. In order to find out what is happening, improve the logging so that more statements are logged and more often. It would also be beneficial to use log4cpp.
When running the autoconf built configure script on some machines, one gets the following error:
./configure: line 4290: syntax error near unexpected token `1.54'
./configure: line 4290: `AX_BOOST_BASE(1.54)'
There is code that sets limits when on linux so coredumps are generated. This isn't required. Developers can set limits on the system directly.
In structures.cpp
on line 1929 a container is modified inside its own BOOST_FOREACH loop.
This is undefined behaviour because BOOST_FOREACH contains an optimisation where the end iterator is cached (this is called hoisting). Making changes to the container invalidates that end iterator.
For more information see: http://www.boost.org/doc/libs/1_54_0/doc/html/foreach/pitfalls.html
In order to load settings from a configuration file, mkhssp tries to determine the user's home directory. xssp-rest, however, runs the command via subprocess which means no home directory exists, which causes mkhssp to exit.
If the home directory doesn't exist, mkhssp should continue. A debug log message should be given.
The compiler flag USE_COMPRESSION
should be ON be default. This helps to prevent problems like #38.
mkdssp accepts input from a file. Output can be either a file, or stdout. Add support for taking the input from stdin. This removes the need for scripts to save content to a temporary file before running mkdssp.
mkdssp does not accept EDIT: gzipped
mmCIF files when compiled from the master branch (currently release 2.2.3).
on cmbi4, mkdssp version 2.2.0 does accept gzipped
mmCIF files.
-Why the discrepancy?
-How to fix mmCIF acceptance on master branch?
Compilation fails on cmbi4, which is running ubuntu 12.04 LTS. There are multiple reasons:
std=c++11
; however, it does support std=c++0x
;This is related to hope issue https://github.com/cmbi/hope/issues/4.
When hope makes a request to hsspsoap, it reaches the timeout of one hour. It's not clear what is causing it. The possibilities are:
The last one in the list is a suspect due to the following comment in HOPE, in HSSPServiceImpl.java
:
HSSPSoap sometimes acts as a tarpit, meaning requests for the WSDL never return. Cope by using a timeout here
cppcheck
reports many errors when --enable=all
is used. There are likely false positive in there, but they should be checked regardless.
MSurfaceDots
is a singleton where the object is created once in the Instance()
function.
This is only thread-safe in c++11. The makefile states c++0x, which may or may not be up to the same standard.
See: http://stackoverflow.com/questions/1661529/is-meyers-implementation-of-singleton-pattern-thread-safe
The current makefile is hand-written and requires a lot of effort to maintain. Gnu autotools takes a lot of this work away. Moreover, the make clean
target doesn't remove executables. See the wiwsd
project for an example.
When hsspsoap is executed on cmbi23 it exits immediately. This change occurred after updating the machine from Ubuntu 12.04 LTS to Ubuntu 14.04 LTS.
Both mkdssp and mkhssp can take pdb files as input, as well as mmcif files. But for the structure 1B2W for example, the mmcif file produces a different output compared to the pdb file.
The difference is in the chain IDs. PDB produces chains L and H, while mmCIF produces A and B.
Inspection of the mmCIF file at http://rcsb.org/pdb/files/1B2W.cif shows that each chan has two IDs in the file. One column has IDs A and B, while the other column has L and H as in the PDB file. For some reason Maarten chosen to take the leftmost column from mmCIF files.
VERBOSE is a global and globals are evil. It is getting in the way of writing tests because it means including another main, which doesn't play nice with Boost.Test.
When the logging is implemented as per #14, VERBOSE logging statements will be normal logging using a level such as trace.
A wrapper is being written in python that will replace hsspsoap. Once the wrapper has been written and is being used by all services instead of hsspsoap, hsspsoap can be removed. This makes a lot of issues in xssp redundant.
When compiling the develop
branch from scratch, the following error occurs:
configure: error: cannot find sources (mtrx/matrices.h) in . or ..
hsspconv takes a stockholm file as input, but the help says it takes a PDB file.
This isn't clear at all. Either the documentation should be updated, or the code should determine the file type by checking for the fasta header.
The error message displayed is also silly:
empty protein, or no valid complete residues
I assume this is because by default files are assumes to be in PDB format.
There are a total of 2 tests, which isn't very good. Write more unit tests.
It would be good to do this before #47.
During a call to mkdssp
using a pdb_file in xssp-rest, the following error was reported:
DSSP could not be created due to an error: basic_string::substr
When running autotools for the first time it stops becase INSTALL and COPYING are missing.
When clients make a SOAP request via hsspsoap, the HTTP connection must stay open for a long time to wait for the response. This is not good practise, and also causes problems for the proxy server (see #13).
Assign job id's to each request and reply immediately with the job id. Allow clients to query the status of the job id.
Under the sequence profile section, mkhssp places zeros for NDEL and NINS for this PDB input example: 1f2i
Compiling under a different optimization level fixes the problem, but that's not a clean solution of coarse.
The NDEL and NINS values are determined in the MProfile::Align function, at https://github.com/cmbi/xssp/blob/master/src/hssp-nt.cpp#L733 and https://github.com/cmbi/xssp/blob/master/src/hssp-nt.cpp#L739 . The code however, is not very clear.
Refactor the code so that:
using namespace std;
is removed (see: http://stackoverflow.com/questions/1452721/why-is-using-namespace-std-considered-bad-practice)#define
with anonymous namespace static const;boost::foreach
with c++11 ranged for loop;mkdssp returns the following error when called from xssp-rest:
DSSP could not be created due to an error:
invalid formatted floating point number ' 8 '
I will do some more investigation to figure out what input was used.
The wsdl at www.cmbi.ru.nl/hsspsoap/wsdl becomes unavailable because hsspsoap is segfaulting. See the logs below:
messages.1:Aug 6 14:53:21 cmbi23 kernel: [2786239.939964] hsspsoap[20485]: segfault at 6ecbe0 ip 0000000000558f47 sp 00007fff7875b470 error 6 in hsspsoap[400000+395000]
messages.1:Aug 6 14:53:31 cmbi23 kernel: [2786249.446772] hsspsoap[17288]: segfault at 6ecbe0 ip 0000000000558f47 sp 00007fff142724a0 error 6 in hsspsoap[400000+395000]
messages.1:Aug 6 14:56:12 cmbi23 kernel: [2786410.667699] hsspsoap[1821]: segfault at 6ecbe0 ip 0000000000558f47 sp 00007fffe66826a0 error 6 in hsspsoap[400000+395000]
The service hsspsoap service has to be restarted every time this happens.
Suppose we have the alignment
0 1
1 23456789012
R.SDALTRHFRTE Query
R.SDALSRHFRTE
PDPSSLARHRHVH
NASDRAKHQNRTH
R.SDALSRHLRTE
R.PDNLQRHVRVH
R.LENLKTHLRSH
R.SDALSRHFRTE
R.SDALSRHFRTE
N....LKQHVLRH
E....LRKHLRVP
The 2 insertions will be assigned to position 1, but the 2 deletions will also be assigned to position 1.
Does it make more sense to assign the deletions to position 2 in the query sequence?
When large jobs are submitted to hsspsoap, the entire application becomes unresponsive to other requests. The web service wsdl takes a long time to load, which causes a timeout in the browser/in HOPE.
The -a option is available for both --address and --threads, which when used, results in the following error:
jon@cmbipc85:~/Projects/xssp$ ./hsspsoap -a 127.0.0.1 -p 6789
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::program_options::ambiguous_option> >'
what(): option '-a' is ambiguous and matches '--address', and '--threads'
Aborted
A workaround is to use the long name variant.
mkhssp allows command params to be read from a config file. This is not necessary as the command line arguments are small, and it causes problems on virtual machines when HOME isn't defined. Considering it can always be run using explicit arguments (and is as far as CMBI is concerned), remove this feature.
When cmbi23 is restarted, hsspsoap is not started automatically. Moreover, running sudo service hsspsoap start
and /etc/init.d/hsspsoap start
does nothing. Running the command from the init script manually works, so I suspect the init script is broken.
When the command mkhssp -i /tmp/1crn.pdb -d /data/fasta/sprot.fasta -d /data/fasta/trembl.fasta
is run manually, you get the following message, which is correct:
Databank /data/fasta/sprot.fasta does not exist
However, when the celery task for xssp-rest runs, the captured output is empty:
[2014-08-07 02:47:51,447: ERROR/MainProcess] Task xssp_rest.tasks.mkhssp_from_pdb[b5b3a5f4-dbd1-49eb-95ac-9ddf4862ce61] raised unexpected: RuntimeError('',)
The error is caused by an exception, which is never handled. Perhaps this should just be a message to std::err?
The following files are not used during compilation. Are they needed. If so, what for?
The help text for mkhssp
is not complete. For example, -i
can be a fasta file, but the help text says 'Input PDB file (or PDB ID)'.
mkhssp
crashes with a Segmentation fault when run with the pdb file /data/xssp_error_pdbs/989985db-0d97-4db3-bf92-d2466eefbae5.pdb
located on cmbi23
:
jon@cmbi23:/var/log/supervisor$ mkhssp -i /tmp/989985db-0d97-4db3-bf92-d2466eefbae5.pdb -d /data/fasta/sprot.fasta -d /data/fasta/trembl.fasta
blast done in 646s cpu / 43s wall
distance done in 94s cpu / 4s wall
aligning done in 93s cpu / 93s wall
Segmentation fault (core dumped)
The core dump shows that the problem is in hssp-nt.cpp
on line 1276:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000000047e620 in HSSP::MProfile::CalculateConservation (
this=this@entry=0x7ffff13fc7b0, inThreads=inThreads@entry=32) at src/hssp-nt.cpp:1276
1276 ri.m_dist[j] = m_seq[i] == j ? 1 : 0;
(gdb) bt
#0 0x000000000047e620 in HSSP::MProfile::CalculateConservation (
this=this@entry=0x7ffff13fc7b0, inThreads=inThreads@entry=32) at src/hssp-nt.cpp:1276
#1 0x000000000047f3c7 in HSSP::MProfile::Process (this=this@entry=0x7ffff13fc7b0,
inHits=..., inGapOpen=inGapOpen@entry=30, inGapExtend=inGapExtend@entry=2,
inMaxHits=inMaxHits@entry=5000, inThreads=inThreads@entry=32) at src/hssp-nt.cpp:1112
#2 0x0000000000480359 in HSSP::CreateHSSP (inProtein=..., inDatabanks=...,
inMaxHits=inMaxHits@entry=5000, inMinSeqLength=inMinSeqLength@entry=25,
inGapOpen=inGapOpen@entry=30, inGapExtend=inGapExtend@entry=2,
inThreshold=inThreshold@entry=0.0500000007,
inFragmentCutOff=inFragmentCutOff@entry=0.75, inThreads=inThreads@entry=32,
inFetchDBRefs=inFetchDBRefs@entry=false, inOs=...) at src/hssp-nt.cpp:1406
#3 0x00000000004307a9 in main (argc=<optimized out>, argv=0x7ffff13fdaf8)
at src/mkhssp.cpp:221
The coredump is on cmbi23
in /data/coredumps/core.mkhssp.9960
.
Observed for pdb entries 1AL2, 1AR6,1AR8,1AR9 and 1AR7. The resulting hssp files do not have chain 0, while the dssp files do.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.