Giter VIP home page Giter VIP logo

mrs's People

Contributors

cbaakman avatar jonblack avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mrs's Issues

Entry 5kxi does not exist in databank pdbfinder2

This error occurs because HOPE users are creating reports for relatively new PDB accession codes and pdbfinder2 hasn't been updated since the PDB accession code was added.

mrs cannot do much about this given the current state of the databanks (application). Ideally when mrs requests an entry, if it doesn't exist, the databanks should generate it on-the-fly. This would require big changes in the databanks application to act as a web service rather than a background script that generates files on a schedule.

This issue is here for future reference and will also be referenced in the databanks repository.

high memory usage

In rare cases MRS databank updates can cause the server to become unresponsive due to high memory usage. MRS updates run as separate processes in the list. top inspection shows that virtual memory usage can exceed 300 Gb.

Perhaps there's a memory leak.

Limit CPU use

MRS can take over a machine because there's no limit on memory or CPU use. This is particularly bad for memory use because it ends up using the swap drive, even on chelonium which has 128GB of memory.

Limit the number of threads to 32. This is a command-line option to mrs.

kicking mrs deamon in docker

Whenever A databank is updated outide MRS (like in cron), the deamon needs to get a sighup to make it reload the databanks. The problem is that the deamon now runs in a docker container, so that it's pid file is inaccessible for cron.

What might work is something like:

/usr/bin/docker restart `/usr/bin/docker ps | grep 'mrs server start' | /usr/bin/cut -d" " -f1`

improve logging

MRS occasionally exits with a segfault, withou providing further details. To find out what's going on, wee need to improve MRS' logging in some way.

Blasting doesn't support non-standard amino acids

MRS returns the following error to HOPE when blasting swissprot with the sequence shown below:

Query contains invalid characters

The web interface provides more information when you try to blast the same sequence:

Query contains invalid characters: 'O'

MEFVALGGPDAGSPTPFPDEAGAFLGLGGGPRTEAGGLLASYPPSGRVSLVPWADTOTLGTPQWVPPATQMEPPHYLELLQPPRGSPPHPSSGPLLPLSSGPPPCEARECVNCGATATPLWRRDGTGHYLCNACGLYHRLNGQNRPLIRPKKRLLVSKRAGTVCSNCQTSTTTLWRRSPMGDPVCNACGLYYKLHQVNRPLTMRKDGIQTRNRKVSSKGKKRRPPGGONPSATAGGGAPMGGGGDPSMPPPPPPPAAAPPQSDALYALGPVVLSGHFLPFGNSGGFFGGGAGGYTAPPGLSPQI

O is the letter for Pyrrolysine, a non-standard amino acid. Should MRS support non-standard amino acids or is it correct to reject the query? Perhaps it should be an option? If not, the API error message should match the error message shown in the web interface.

Segmentation fault

The following error appears in the logs:

run.sh: line 2:     8 Segmentation fault      (core dumped) mrs server start -p /var/run/mrs.pid --no-daemon

There is no additional information, so it's very difficult to figure out the cause. See how far we can get, if it proves difficult, we can open up an issue to improve the logging.

requests getting lost in MRS

During a course, MRS appeared to ignore certain blast submission requests. I've seen the same happening while sending repeated blast job status requests over the to webservice via suds. tshark shows that the missing requests get through to the machine. However, they don't arrive at "M6Server::handle_request" function.

It's only observed on cmbi12, not on cmbi23.

I'm suspecting the underlying libraries: libzeep, boost::asio

setting rsync port number

The world wide pdb allows us to download pdb files over rsync. So MRS could fetch the files over rsync. However, the rsync server doesn't use the default port for sharing the files. The port number 33444 must be explicitly specified when downloading with rsync.

So MRS needs the option to specify a port number for rsync fetches.

Show synposis in command line usage help

The command line help doesn't show the synopsis so you need to guess which parameters are required and which are optional. An example is the mrs blast command.

ask user permission on cookies

MRS appears to use cookies to store blast job ids. I believe the law says that it should ask the user's permission for that. However, it doesn't. I think we need to add a confirmation message box here.

trembl jobs cause delay in blast queue

Occasionally, people submit trembl jobs in mrs blast with query lengths of 1000 and more. Such large searches have shown to cause a delay in the queue, since only one job can run at the time.

Update code to use c++11 features instead of boost and tr1

Some of the features being used in mrs from boost and tr1 are now in c++11. Use these instead. Below is a non-exhaustive list:

  • range-based for loop instead of boost::foreach
  • std::tie instead of tr1::tie
  • std::tuple instead of tr1::tuple
  • std::thread instead of boost::thread (be careful!)

Set BASE_URL of container host

The base URL used by the soap service in MRS is set during compilation using the makefile variable MRS_BASE_URL; it's value is copied into the generated configuration file.

The default was to set it to the hostname via $(shell hostname) but since MRS now runs in a container, this evaluates to the container's hostname, not that of the host running the container.

For now it's hardcoded. It can also be changed in the admin UI. It would be nice to have this generated correctly.

It's also hardcoded in the configure script, which was also using the same method to determine the hostname.

Moreover, the URL that is given in the Makefile is used for the mrs server address and public SOAP endpoint address, and these may not always be the same (in fact with docker, they never will be). For now this is worked around by editing the config file after compilation to change just the SOAP endpoint address.

mrs crashing after blast job removal

Today MRS crashed after serveral blast jobs had been removed from the queue, from the admin page. The first job was succesfully removed, but when the second cross was clicked, the job didn't go away. After several re-clicks, MRS crashed.

M6IntersectionIterator::GetCount returns the wrong quantity.

M6IntersectionIterator's 'mCount' is set in the M6IntersectionIterator::AddIterator function. However, the actual value can not be determined at that time.

To know the correct value, one should loop through all documents in all child iterators. Only documents present in all children should be counted.

One of the symptoms is that the server displays the wrong number of hits when query 'ft:carbohyd AND de:alzheimer' is enetered.

Error loading databank

The logs often contain messages about being unable to load databanks:

[09/Nov/2016:19:15:57 UTC] Restarting services...Error loading databank embl
 >> Invalid page number
Error loading databank oxford
 >> Invalid page number
Error loading databank pfamb
 >> Invalid page number
Error loading databank pmc
 >> Invalid page number

errors in tar parser

While indexing the pubmed central tar archives (pmc databank in mrs), frequently checksum errors are thrown from M6DataSource.cpp. Though the actual archives appear to be readable by the tar archiver.

Another strange thing is the segfaults that occur in the boost decompressor classes used. This might have a relation to the checksum problems.

The boost version, that mrs uses is old 1.48. However, mrs won't compile with boost version 1.5.

Make argument parser more robust

The error displayed from the following invocation isn't correct. The mistake is that the option -c is part of the username.

./mrs server -u jon-c config/mrs-config.xml --command start

mrs exited with an exception:
option '--command' cannot be specified more than once

Improve docker use

Update the dockerfiles and docker-compose files to match hope, also utilising the scripts for running a development and test environment. This means:

  • Use depends_on intead of links in docker-compose file
  • Add a dev docker-compose file and run_dev.sh script
  • Add a run_tests.sh script (ensure the correct container and packages are used)
  • Put the command in the docker-compose file

This should mimic how docker is used in hope/hommod/etc. See those projects for more information.

Seg fault processing pmc

root@41cb7db21f91:/app# mrs update pmc
update pmc
listing files done in 0s cpu / 30s wall
Fetching done in 103s cpu / 1453s wall

Error processing "/srv/mrs-data/raw/pmc/articles.O-Z.tar.gz"
Invalid checksum

Error processing "/srv/mrs-data/raw/pmc/articles.txt.0-9A-B.tar.gz"
Invalid checksum

Error processing "/srv/mrs-data/raw/pmc/articles.txt.C-H.tar.gz"
Invalid checksum

Error processing "/srv/mrs-data/raw/pmc/articles.txt.I-N.tar.gz"
Invalid checksum

Error processing "/srv/mrs-data/raw/pmc/articles.txt.O-Z.tar.gz"
Invalid checksum
ACS_Macro_Lett_20... [                                                    ]   0%
Error processing "/srv/mrs-data/raw/pmc/articles.C-H.tar.gz"
Invalid checksum
Segmentation fault (core dumped)

xml parser creates empty entries

For the pubmed central databank in mrs (pmc) the created entries are empty. pmc has no parser, thus the libzeep xml parser is used. However, the resulting entries created contain no data in mrs. The raw xml text is available though.

updates not truly disabled

Due too low disk space, all MRS' updates are temporarily disabled. However, inspection of the disk and process list shows that MRS is still executing updates.

This has been shown to happen when the 'enabled' checkbox on the scheduler page is unchecked and the individual databanks are still enabled and updates for them are not set to 'never'.

Improve logging

Logging is currently implemented using streams. It would be better to use the log4cpp library which gives more control over logging and is more efficient.

More logging statements should be added to aid in debugging.

rsync options

The way it works now, MRS only assumes a mirror to be rsync when the url startswith rsync. However the rsync mirror from ncbi starts in ftp:

rsync -av --include="/" --include=".ptt" --exclude="*" ftp.ncbi.nih.gov::genomes/Bacteria/ /data/raw/ptt/Bacteria/

We would like to add the option to force MRS to use rsync on certain urls.

embl hyperlinks

Since embl is no longer a database in mrs, the thinks to it in other mrs databases don't work anymore. We need to make mrs somehow make hyperlinks to the embl server. This might be possible by modifying the javascripts.

corrupted blast job files

Lately, MRS has been creating corrupted blast job files. Some files are empty, some files are incomplete or throw a bzip2 error while decompressing and for some files, the checkcache application reports the following exception: "putback buffer full".

Use autotools

Autotools is a standard method for building software and allows us to remove the custom scripts created by Maarten.

mrs doesn't respond to proxy-forwarded requests anymore

The latest version of MRS (6.1.0) doesn't work when behind an apache2 proxy server. The web requests simply keep hanging forever. As if it's infinitely redirected.

An important difference between this MRS and the previous one is that it uses libzeep.3.0.3 instead of 3.0.2

Indexing EMBL takes up too many resources

When the EMBL indexing runs, rsyncing databanks from the server fails. We need to control how many resources MRS takes when performing the indexing. What options do we have?

MRS crash during search

MRS crashed with:

warning: Can't read pathname for load map: Invoer-/uitvoerfout.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/local/bin/m6 server start --pidfile=/var/run/m6.pid'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f265efccf2b in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0  0x00007f265efccf2b in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x0000000000542aaa in as_literal<std::basic_string<char> > (r=...) at /usr/include/boost/range/as_literal.hpp:109
#2  equals<std::basic_string<char>, std::basic_string<char>, boost::algorithm::is_iequal> (Comp=..., Test=..., Input=...)
    at /usr/include/boost/algorithm/string/predicate.hpp:290
#3  boost::algorithm::iequals<std::basic_string<char>, std::basic_string<char> > (Input=..., Test=..., Loc=...) at /usr/include/boost/algorithm/string/predicate.hpp:346
#4  0x0000000000543aa6 in M6DatabankImpl::FindString (this=<optimized out>, inIndex=..., inString=...) at src/M6Databank.cpp:1940
#5  0x000000000054428e in M6Databank::DocNrForID (this=0x23aa730, inID=...) at src/M6Databank.cpp:2323
#6  0x0000000000550632 in Fetch (inDocID=..., this=0x23aa730) at src/M6Databank.cpp:2311
#7  M6DatabankImpl::GetLinkedDocuments (this=0x244ce30, inDB=..., inID=...) at src/M6Databank.cpp:2044
#8  0x000000000064a96a in M6WSSearch::GetLinkedEx (this=0x242a260, db=..., linkedDb=..., ids=..., response=...) at src/M6WSSearch.cpp:586
#9  0x000000000065bb53 in invoke (response=..., arguments=<error reading variable: access outside bounds of object referenced via synthetic pointer>, method=<optimized out>, 
    object=<optimized out>) at /usr/local/include/zeep/dispatcher.hpp:462
#10 zeep::detail::handler<M6WSSearch, void (M6WSSearch::*)(std::string const&, std::string const&, std::vector<std::string, std::allocator<std::string> > const&, std::vector<WSSearchNS::GetLinkedExResult, std::allocator<WSSearchNS::GetLinkedExResult> >&)>::call (this=0x2441f70, in=0x7f25a53bff30) at /usr/local/include/zeep/dispatcher.hpp:191
#11 0x000000000061e3e6 in zeep::dispatcher::dispatch (this=0x242a260, action=..., in=0x7f25a53bff30) at /usr/local/include/zeep/dispatcher.hpp:343
#12 0x00000000005e878e in dispatch (this=0x242a260, in=0x7f25a53bff30) at /usr/local/include/zeep/dispatcher.hpp:327
#13 operator() (reply=..., request=..., this=0x242a760, scope=...) at src/M6Server.cpp:228
#14 boost::detail::function::void_function_obj_invoker3<M6Server::M6Server(const zeep::xml::element*)::<lambda(const zeep::http::request&, const zeep::http::el::scope&, zeep::http::reply&)>, void, const zeep::http::request&, const zeep::http::el::scope&, zeep::http::reply&>::invoke(boost::detail::function::function_buffer &, const zeep::http::request &, const zeep::http::el::scope &, zeep::http::reply &) (function_obj_ptr=..., a0=..., a1=..., a2=...) at /usr/include/boost/function/function_template.hpp:153
#15 0x00007f266140e7f1 in zeep::http::webapp::handle_request(zeep::http::request const&, zeep::http::reply&) () from /usr/local/lib/libzeep.so.3.0
#16 0x00000000005e8539 in M6Server::handle_request (this=0x7fff346efb00, req=..., rep=...) at src/M6Server.cpp:821
#17 0x00007f26613fbc82 in zeep::http::server::handle_request(boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >&, zeep::http::request const&, zeep::http::reply&) () from /usr/local/lib/libzeep.so.3.0
#18 0x00007f26613f4027 in zeep::http::connection::handle_read(boost::system::error_code const&, unsigned long) () from /usr/local/lib/libzeep.so.3.0
#19 0x00007f26613f8100 in boost::asio::detail::reactive_socket_recv_op<boost::asio::mutable_buffers_1, boost::_bi::bind_t<void, boost::_mfi::mf2<void, zeep::http::connection, boost::system::error_code const&, unsigned long>, boost::_bi::list3<boost::_bi::value<boost::shared_ptr<zeep::http::connection> >, boost::arg<1> (*)(), boost::arg<2> (*)()> > >::do_complete(boost::asio::detail::task_io_service*, boost::asio::detail::task_io_service_operation*, boost::system::error_code const&, unsigned long) ()
   from /usr/local/lib/libzeep.so.3.0
#20 0x000000000061c642 in complete (owner=..., this=0x7f25a58ab1d0, bytes_transferred=0, ec=...) at /usr/include/boost/asio/detail/task_io_service_operation.hpp:37
#21 do_run_one (ec=..., private_op_queue=..., this_thread=..., lock=..., this=<optimized out>) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:366
#22 boost::asio::detail::task_io_service::run (this=0x67002f0, ec=...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:146
#23 0x00007f2661400985 in boost::asio::io_service::run() () from /usr/local/lib/libzeep.so.3.0
#24 0x00007f2660ca7da9 in ?? () from /usr/lib/libboost_thread.so.1.48.0
#25 0x00007f26618bfe9a in start_thread (arg=0x7f25cf31c700) at pthread_create.c:308
#26 0x00007f265ec6231d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#27 0x0000000000000000 in ?? ()

It refers to: https://github.com/cmbi/mrs/blob/develop/src/M6Databank.cpp#L1940

It occured during a search action. However, the input is unknown making it hard to reproduce the error.

MRS update sizes

Whenever MRS updates, it makes a copy of the index files on disk. On success, the copy will replace the original.
On failure however, the copy will remain and occupy almost the same amount of disk space as the original databank. On multiple failures, this can cause full disks.

mrs crashes on certain urls

When I called the url: "[base url]/images", the browser first displayed an error page:

Unfortunately, an error occurred. Maybe the following message may reveal what happened:

basic_filebuf::underflow error reading the file

Shortly after this, the deamon crashed and MRS became unavailable. The crash doesn't happen all the time however, so the error isn't reproducable.

pmc build & memory corruption

Segmentation faults frequently occur while mrs is indexing the pubmed databank, using the raw xml files from ftp://ftp.ncbi.nih.gov/pub/pmc/articles*.tar.gz as input. The location of the segfaults vary, but they always co-occur with an allocation or deallocation. Inspection of some of the deallocation segfaults at https://github.com/cmbi/mrs/blob/master/src/M6Lexicon.cpp#L213 shows that the involved pointers are valid.

Facts about the update process:

  • MRS always reports invalid checksums for the pumbed tars, though they are readable by commandline tar.
  • pubmed updating occurs in multiple threads. One thread per archive.
  • MRS uses a custom XML parser for pubmed
  • The pubmed xml files are much larger than the average pdb file.

Uniprot BLAST result contains entry name as id in REST API

Uniprot BLAST result contains the entry name in the id field instead of the Uniprot id. This is not guaranteed to be unique:

Please note that it is highly recommended to use the accession number over the entry name, as the latter cannot be guaranteed to be stable:
http://www.uniprot.org/help/difference_accession_entryname
http://www.uniprot.org/help/entry_name :
The entry name is a useful mnemonic means of identifying a sequence, but, unlike the accession number, it is not a stable identifier. It is sometimes necessary, for reasons of consistency, to change the entry name (for instance to ensure that related entries have similar names or when a UniProtKB/TrEMBL entry is integrated into UniProtKB/Swiss-Prot). We remind users that they should always use the primary accession number of an entry in any citation and link since it is the only unique stable identifier for an entry.

hope has been translating this entry name into a uniprot id by using the uniprot REST API. Specifically, that the API redirects entry names to the URL with the id. This is no longer the case, and although it may be a bug in the uniprot service, it has highlighted an issue with MRS. See issue https://github.com/cmbi/hope-flask/issues/65.

It would be useful to at least add the uniprot id to the result so it can be used instead. So keep the id field and add another field called accession_code or something similar. This ensures that nothing will break if id is being used.

Test failures in test suite "BlastTest"

When I try to build the docker image it fails because the tests don't pass:

Running 1 test case...
blast done in 0s cpu / 0s wall                                                  
unit-tests/M6TestBlast.cpp(24): error in "TestBlast1": check r->mHits.size() > 0 failed
unknown location(0): fatal error in "TestBlast1": memory access violation at address: 0x00000071: no mapping at fault address
unit-tests/M6TestBlast.cpp(30): last checkpoint

*** 2 failures detected in test suite "BlastTest"
make: *** [install] Error 201
GNUmakefile:157: recipe for target 'install' failed
Removing intermediate container b198563148db
ERROR: Service 'mrs' failed to build: The command '/bin/sh -c ./configure && make -j && make install' returned a non-zero code: 2

SIGHUP being received regularly causing service restart

I noticed the following whilst looking through the mrs error logs:

[16/Jun/2014:21:05:03 UTC] listening at 0.0.0.0:18090
[17/Jun/2014:06:51:01 UTC] RunMainLoop recieved signal: SIGHUP
[17/Jun/2014:06:51:01 UTC] Restarting services... done
[17/Jun/2014:06:51:03 UTC] listening at 0.0.0.0:18090
[17/Jun/2014:21:00:39 UTC] RunMainLoop recieved signal: SIGHUP
[17/Jun/2014:21:00:40 UTC] Restarting services... done
[17/Jun/2014:21:00:51 UTC] listening at 0.0.0.0:18090
[18/Jun/2014:21:59:22 UTC] RunMainLoop recieved signal: SIGHUP
[18/Jun/2014:21:59:23 UTC] Restarting services... done
[18/Jun/2014:21:59:32 UTC] listening at 0.0.0.0:18090
[19/Jun/2014:21:16:15 UTC] RunMainLoop recieved signal: SIGHUP
[19/Jun/2014:21:16:16 UTC] Restarting services... done
[19/Jun/2014:21:16:24 UTC] listening at 0.0.0.0:18090

This doesn't seem like normal behaviour to me. The service should continually run. We should investigate why this is happening.

website doesn't update after commandline update

Whenever an update is executed manually from the commandline, the changes usually don't become visible on the website until it's restarted by admin.

So for example if the command:

mrs update pdb

increases the number of entries on disk, then the website won't show these new entries until it's restarted.

Error when downloading blast results

When you search against Uniprot and try download all results it crashes and gives an error message:

query: crambin (P01542)

Entry q43227_tulge does not exist in databank sprot

query: human sialoprotein (P21825)

Entry sial_human does not exist in databank trembl

So it seems like it's just trying to get the sequences from the wrong databases (q43227_tulge is an id from Trembl and sial_human is an id from SwissProt)

Job status request returns UNKNOWN

This is related to this issue in hope: https://github.com/cmbi/hope/issues/17.

A cache used to keep the latest jobs has a limit of 100. Once the job falls out of the cache, UNKNOWN is returned. In this case, it means that the job has to be resent. It's possible to increase the queue in MRS, but that's not a solution.

I'm able to reproduce this problem using accession code 1ltw and mutation P89G (sometimes).

Improve code comments

There are few comments in the code, making it more work to understand what is happening. Add comment headers to classes and functions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.