Giter VIP home page Giter VIP logo

wells-wood-research / de-stress Goto Github PK

View Code? Open in Web Editor NEW
16.0 3.0 1.0 22.42 MB

DE-STRESS is a model evaluation pipeline that aims to make protein design more reliable and accessible.

Home Page: http://destressprotein.design

HTML 0.07% Elm 80.82% Dockerfile 0.13% Python 17.84% JavaScript 1.04% Shell 0.10%
protein protein-structure protein-design structural-biology bioinformatics pipeline webapp

de-stress's Introduction

de-stress

DEsigned STRucture Evaluation ServiceS

Front End Tests Big Structure Tests

DE-STRESS is a web application that provides a suite of tools for evaluating protein designs. Our aim is to help make protein design more reliable, by providing tools to help you select the most promising designs to take into the lab.

The application is available for non-commercial use through the following URL:

https://pragmaticproteindesign.bio.ed.ac.uk/de-stress/

Citing DE-STRESS

If you use DE-STRESS, please cite the following article:

Stam MJ and Wood CW (2021) DE-STRESS: A user-friendly web application for the evaluation of protein designs, Protein Engineering, Design and Selection, 34, gzab029.

Contacting Us

If you find a bug or would like to request a feature, we'd really appreciate it if you report it as an issue. If you're stuck and need help or have any general feedback, please create a post on the discussion page.

For more information about our research group, check out our group website.

Local Deployment

Make sure you have all the relevant dependencies in de-stress/dependencies_for_de-stress/. Currently, these are:

  • Aggrescan3D
  • DFIRE 2 pair
  • DSSP
  • EvoEF2 (source)
  • Rosetta (source)

Create a .env file in the top level de-stress folder. You can copy de-stress/.env-testing and update that. This

Download big_structure.dump and place it in de-stress/database.

Next, from within de-stress/, build all the containers:

# use production-compose.yml if you're deploying in a production environment
docker-compose -f development-compose.yml build

Compile the dependencies in the container:

docker run \
    -it \
    --rm \
    -v /absolute/path/to/de-stress/dependencies_for_de-stress/:/dependencies_for_de-stress \
    de-stress_big-structure:latest \
    sh build_dependencies.sh

This will compile the software, but the output will be stored on the host machine as a volume is used. This means that you cannot move or delete this folder while the application is being served or it will break.

Launch the application:

# Change rq-worker to however many processes you want to use for analysis
docker-compose -f development-compose.yml --env-file .env up -d --scale rq-worker=4

Navigate to de-stress/database and run import_db_dump.sh.

Headless DE-STRESS

The DE-STRESS webserver has a few limitations which are there to ensure the stability of the webserver. These limitations are listed below.

  • Only proteins with 500 residues or less can be uploaded.
  • Only 30 files can be uploaded at a time.
  • There is a max run time of 20 seconds for all the DE-STRESS metrics.

The headless version of DE-STRESS can be ran locally and the user can change the settings to run a larger set of PDB files. The code has been written to allow multiprocessing so that large amounts of files can be ran in a reasonable amount of time. The .env-headless file can be used to update the MAX_RUN_TIME, HEADLESS_DESTRESS_WORKERS and HEADLESS_DESTRESS_BATCH_SIZE variables to change the amount of seconds the DE-STRESS metrics are allowed to run, how many PDB files are in a batch, and how many processers should be used respectively.

Firstly the docker image needs to be built. There is a different docker compose file called headless-compose.yml that needs to be used instead of the development-compose.yml file.

docker compose -f headless-compose.yml build

After this, make sure the dependencies have been built. The path /absolute/path/to/de-stress/dependencies_for_de-stress/ needs to be replaced with the user's local path to the DE-STRESS dependencies.

docker run -it --rm -v /absolute/path/to/de-stress/dependencies_for_de-stress/:/dependencies_for_de-stress de-stress-big-structure:latest sh build_dependencies.sh

Finally, run headless DE-STRESS with the following command and change the /absolute/path/to/ to the the local file path to these folders.

docker run -it --rm --env-file .env-headless -v /absolute/path/to/de-stress/dependencies_for_de-stress/:/dependencies_for_de-stress -v /absolute/path/to/input_path/:/input_path de-stress-big-structure:latest poetry run headless_destress /input_path

de-stress's People

Contributors

chriswellswood avatar dependabot[bot] avatar lunaprau avatar michaeljamesstam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

lunaprau

de-stress's Issues

Data export

The plan for the open beta is not to have server side storage of designs. A limitation of this approach is that you can't share design metrics very easily. A solution to this is the ability to export design data to a text file. There's a couple of varieties of export that I'm interested in:

  1. A text dump of the designs that can be loaded back into another instance of the application.
  2. Nicely formatted data on the designs/reference sets that enables people to creat plots in external software.

Add new metric fields to specifications

We haven't added the new metrics as optional requirements in specifications. This will need to be done retrospectively for EvoEF2, but should be incorporated into the PRs for the other methods (#28, #29, #30).

Headless DE-STRESS fix: README commands

In summary:

  • docker compose instead of docker-compose
  • docker container created by docker compose is named "de-stress-big-structure" not "de-stress_big-structure"
  • wrong .env file given in the last docker run command

Leading to fixed commands:

  1. docker compose -f headless-compose.yml build
  2. docker run -it --rm -v /absolute/path/to/de-stress/dependencies_for_de-stress/ de-stress-big-structure:latest sh build_dependencies.sh
  3. docker run -it --rm --env-file .env-headless -v //absolute/path/to/de-stress/dependencies_for_de-stress/:/dependencies_for_de-stress -v /absolute/path/to/input_path/:/input_path de-stress-big-structure:latest poetry run headless_destress /input_path

Fix stacked bar charts

I converted the composition plot to stacked bar charts to save space without realising that the results would be aggregated.

EvoEF2 hangs for some PDB files

EvoEF2 hangs for two PDB files, 2ht0.pdb and 4dyq.pdb. For 2ht0.pdb, some nan values are returned when running EvoEF2 from the command line. This could potentially be causing a problem as DE-STRESS hangs on "Job is running on server". On the other hand, running EvoEF2 on the file 4dyq.pdb from the command line, causes a segmentation fault. On DE-STRESS, it hangs on "Job submitted to server".

Fix plotting issues

We currently have issues with the Vega Lite plots not rendering correctly. I think that there are a couple of routes to fixing this:

  1. Change the VL javascript code to be a custom element that manages it's own state and updates automatically based on changes in Elm.
  2. Use another library for plotting, preferably one that is Elm native.

I think that option 2 is probably more future proof and will enable tighter integration between the plots and the application, more control over the formatting of the plots and clearer code, mainly due to not having to use Vega's DSL to create plots. The disadvantages are that we're duplicating work we've already done, but we have a fair amount of overhead fixing these plots after every update.

Incorporating DFIRE2 into DE-STRESS

A function in analysis.py needs to be added to run the DFIRE2 energy function on pdb files.
An object needs to be added to elm_types.py and big_structure_models.py.
The schema.py needs to be updated and create_entry.py.
Finally, Metrics.elm and Uuid_String.elm need to be updated to display the energy values.

Beta deployment

As we're getting closer to release, we need to try to deploy the application on production hardware. To do this, we need to:

  • Remove the usage of Debug from the Elm code so that we can compile with --optimize
  • Set-up Nginx reverse proxy
  • Setup static file server for the front end
  • Create deployment script

Add residue level aggrescan3d score plot

Currently there is only summary level data that is shown for aggrescan3d but residue level data is captured as well. We could make a plot to show residue number vs aggrescan3d score to show these results to the user.

Domain name

We need to purchase a shorter domain name before publication, maybe de-stress.app. The default domain is pragmaticproteindesign.bio.ed.ac.uk/de-stress and the new domain will just forward to this location.

Slim down reference set data and download in batches

We're downloading more data than we need for the reference sets, we probably don't need any of the log information or even the detailed score breakdown. If we remove these fields, we should be able to have much larger reference sets. Another thing we should do is download data to create a reference set in batches, so that the http request does not time out.

Add overview information for reference sets

Currently, if you click a reference set, there isn't much information about it contained in the page. I'd like to add:

  • Overview plots
  • Information about the number of designs
  • A way to view the names of files in the reference set
  • A way to export the stats of the reference set

Incorporating the Rosetta energy function into DE-STRESS

A function in analysis.py needs to be added to run the Rosetta energy function on pdb files.
An object needs to be added to elm_types.py and big_structure_models.py.
The schema.py needs to be updated and create_entry.py.
Finally, Metrics.elm and Uuid_String.elm need to be updated to display the energy values.

Fix design renaming

I broke this a while back in the design detail pages, I need to fix it before release.

Adding Glossary of DE-STRESS metrics

We recieved comments from the reviewers of the DE-STRESS submission to PEDS that we need to have a clear glossary of the metrics used in the application.

Collapsible Section

We're using collapsible sections in a couple of places now (/designs and EvoEF2 results on /designs/uuid-string), so it makes sense to make these have consistent formatting and replace the temporary styling. The generic code should be moved to Style.elm.

Add specification requirement relative to a reference set

One of the most obvious requirements that someone would want to set is to say that a metric must be within a given range relative to a reference set. A couple of thoughts on this:

  • Should be able to set in std devs or %age
  • Should convert to a raw value so there is no dependency on the reference set after the specification is created

Providing descriptions for basic metrics

Some of the basic metrics like hydrophobic fitness and the secondary structure assignment don't have any explanations or pop ups at the moment. We need to provide a bit of info to explain to the user what these fields mean.

Tests are broken

The automated testing is currently broken, although the Python tests are passing on my system when running in the big-structure Docker container. The elm tests are also failing. After this fix, no PR that fails the tests should be merged.

Limit compute available for each session

Currently, a user could dump 1,000 structure files with 1,000,000 residues each into de-stress and the server would fall over. Although I trust our future users, it's probably not ideal behaviour. There are a few ways that we can tackle this:

  1. Limit the number of files that can be running on the server for an individual session
  2. Limit the number of residues allowed in a structure file
  3. Add a time limit to individual jobs running on the server

These are not mutually exclusive and I think we should probably use all of these.

Incorporating Aggrescan 3D into DE-STRESS

A function in analysis.py needs to be added to run Aggrescan3D on pdb files.
An object needs to be added to elm_types.py and big_structure_models.py.
The schema.py needs to be updated and create_entry.py.
Finally, Metrics.elm and Uuid_String.elm need to be updated to display the output.

Sequence does not show non-canonical amino acids

The sequence view on Designs/Dynamic does not show non-canonical amino acids. This must be a server side problem as the front end does not determine the sequences, they are returned with the design metrics.

Think deeply about `_weights.txt`

EvoEF2 produces a file called _weights.txt as it is running, what is this for? Do we need to return it to the user? Is it used as a cache?

Add option to download structures from PDB

I imagine that lots of people will test it, or include, structures from the PDB. As all the metrics have been precalculated and cached for these structures, it makes sense to have a separate input method for these so that we don't recompute them every time. I don't think this is super high priority, so I'm not putting it into the v0.1.0 milestone.

Headless DE-STRESS fix: .env-headless file

.env-headless file needs HEADLESS_DESTRESS_BATCH_SIZE defined. E.g.:

POSTGRES_PASSWORD=testpassword

GUNICORN_WORKERS=2
APP_PORT=8181

EVOEF2_BINARY_PATH=/dependencies_for_de-stress/EvoEF2/EvoEF2
DFIRE2_FOLDER_PATH=/dependencies_for_de-stress/DFIRE2-pair/
ROSETTA_BINARY_PATH=/dependencies_for_de-stress/rosetta_src_2020.08.61146_bundle/main/source/bin/score_jd2.linuxgccrelease
AGGRESCAN3D_SCRIPT_PATH=/dependencies_for_de-stress/Aggrescan3D/aggrescan3D_cli_run.py

RQ_DASHBOARD_REDIS_URL=redis://redis:6379
RQ_DASHBOARD_PORT=8182

MAX_RUN_TIME=200
HEADLESS_DESTRESS_WORKERS=3
HEADLESS_DESTRESS_BATCH_SIZE=10

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.