Giter VIP home page Giter VIP logo

bioinfomachinelearning / deepinteract Goto Github PK

View Code? Open in Web Editor NEW
57.0 2.0 11.0 3.07 MB

A geometric deep learning pipeline for predicting protein interface contacts. (ICLR 2022)

Home Page: https://zenodo.org/record/6671582

License: GNU General Public License v3.0

Dockerfile 0.81% Python 99.19%
bioinformatics proteins transformers geometric-deep-learning graph-neural-networks protein-protein-interactions deep-learning machine-learning docker

deepinteract's Introduction

Source code for Geometric Transformers for Protein Interface Contact Prediction (ICLR 2022)

Paper DOI

DeepInteract Architecture

Geometric Transformer

Description

A geometric deep learning pipeline for predicting protein interface contacts.

Citing this work

If you use the code or data associated with this package, please cite:

@inproceedings{morehead2022geometric,
  title={Geometric Transformers for Protein Interface Contact Prediction},
  author={Alex Morehead and Chen Chen and Jianlin Cheng},
  booktitle={International Conference on Learning Representations},
  year={2022},
  url={https://openreview.net/forum?id=CS4463zx6Hi}
}

First time setup

The following step is required in order to run DeepInteract:

Genetic databases

This step requires aria2c to be installed on your machine.

DeepInteract needs only one of the following genetic (sequence) databases compatible with HH-suite3 to run:

Install the BFD for HH-suite3

# Following script originally from AlphaFold2 (https://github.com/deepmind/alphafold):
DOWNLOAD_DIR="~/Data/Databases"
ROOT_DIR="${DOWNLOAD_DIR}/bfd"
mkdir "~/Data" "$DOWNLOAD_DIR" "$ROOT_DIR"
# Mirror of:
# https://bfd.mmseqs.com/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz.
SOURCE_URL="https://storage.googleapis.com/alphafold-databases/casp14_versions/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz"
BASENAME=$(basename "${SOURCE_URL}")

mkdir --parents "${ROOT_DIR}"
aria2c "${SOURCE_URL}" --dir="${ROOT_DIR}"
tar --extract --verbose --file="${ROOT_DIR}/${BASENAME}" \
  --directory="${ROOT_DIR}"
rm "${ROOT_DIR}/${BASENAME}"

# The CLI argument --hhsuite_db for lit_model_predict.py
# should then become '~/Data/Databases/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt'

(Smaller Alternative) Install the Small BFD for HH-suite3

# Following script originally from AlphaFold2 (https://github.com/deepmind/alphafold):
DOWNLOAD_DIR="~/Data/Databases"
ROOT_DIR="${DOWNLOAD_DIR}/small_bfd"
mkdir "~/Data" "$DOWNLOAD_DIR" "$ROOT_DIR"
SOURCE_URL="https://storage.googleapis.com/alphafold-databases/reduced_dbs/bfd-first_non_consensus_sequences.fasta.gz"
BASENAME=$(basename "${SOURCE_URL}")

mkdir --parents "${ROOT_DIR}"
aria2c "${SOURCE_URL}" --dir="${ROOT_DIR}"
pushd "${ROOT_DIR}"
gunzip "${ROOT_DIR}/${BASENAME}"
popd

# The CLI argument --hhsuite_db for lit_model_predict.py
# should then become '~/Data/Databases/small_bfd/bfd-first_non_consensus_sequences.fasta'

(Smaller Alternative) Install Uniclust30 for HH-suite3

# Following script originally from AlphaFold2 (https://github.com/deepmind/alphafold):
DOWNLOAD_DIR="~/Data/Databases"
ROOT_DIR="${DOWNLOAD_DIR}/uniclust30"
mkdir "~/Data" "$DOWNLOAD_DIR" "$ROOT_DIR"
# Mirror of:
# http://wwwuser.gwdg.de/~compbiol/uniclust/2018_08/uniclust30_2018_08_hhsuite.tar.gz
SOURCE_URL="https://storage.googleapis.com/alphafold-databases/casp14_versions/uniclust30_2018_08_hhsuite.tar.gz"
BASENAME=$(basename "${SOURCE_URL}")

mkdir --parents "${ROOT_DIR}"
aria2c "${SOURCE_URL}" --dir="${ROOT_DIR}"
tar --extract --verbose --file="${ROOT_DIR}/${BASENAME}" \
  --directory="${ROOT_DIR}"
rm "${ROOT_DIR}/${BASENAME}"

# The CLI argument --hhsuite_db for lit_model_predict.py
# should then become '~/Data/Databases/uniclust30/uniclust30_2018_08/uniclust30_2018_08'

Repository Directory Structure

DeepInteract
│
└───docker
│
└───img
│
└───project
     │
     └───checkpoints
     │
     └───datasets
     │   │
     │   └───builder
     │   │
     │   └───DB5
     │   │   │
     │   │   └───final
     │   │   │   │
     │   │   │   └───processed
     │   │   │   │
     │   │   │   └───raw
     │   │   │
     │   │   db5_dgl_data_module.py
     │   │   db5_dgl_dataset.py
     │   │
     │   └───CASP_CAPRI
     │   │   │
     │   │   └───final
     │   │   │   │
     │   │   │   └───processed
     │   │   │   │
     │   │   │   └───raw
     │   │   │
     │   │   casp_capri_dgl_data_module.py
     │   │   casp_capri_dgl_dataset.py
     │   │
     │   └───DIPS
     │   │   │
     │   │   └───final
     │   │   │   │
     │   │   │   └───processed
     │   │   │   │
     │   │   │   └───raw
     │   │   │
     │   │   dips_dgl_data_module.py
     │   │   dips_dgl_dataset.py
     │   │
     │   └───Input
     │   │   │
     │   │   └───final
     │   │   │   │
     │   │   │   └───processed
     │   │   │   │
     │   │   │   └───raw
     │   │   │
     │   │   └───interim
     │   │   │   │
     │   │   │   └───complexes
     │   │   │   │
     │   │   │   └───external_feats
     │   │   │   │   │
     │   │   │   │   └───PSAIA
     │   │   │   │       │
     │   │   │   │       └───INPUT
     │   │   │   │
     │   │   │   └───pairs
     │   │   │   │
     │   │   │   └───parsed
     │   │   │
     │   │   └───raw
     │   │
     │   └───PICP
     │       picp_dgl_data_module.py
     │
     └───test_data
     │
     └───utils
     │   deepinteract_constants.py
     │   deepinteract_modules.py
     │   deepinteract_utils.py
     │   dips_plus_utils.py
     │   graph_utils.py
     │   protein_feature_utils.py
     │   vision_modules.py
     │
     lit_model_predict.py
     lit_model_predict_docker.py
     lit_model_train.py
.gitignore
CONTRIBUTING.md
environment.yml
LICENSE
README.md
requirements.txt
setup.cfg
setup.py

Running DeepInteract via Docker

The simplest way to run DeepInteract is using the provided Docker script.

The following steps are required in order to ensure Docker is installed and working correctly:

  1. Install Docker.

  2. Check that DeepInteract will be able to use a GPU by running:

    docker run --rm --gpus all nvidia/cuda:11.2.2-cudnn8-runtime-ubuntu20.04 nvidia-smi

    The output of this command should show a list of your GPUs. If it doesn't, check if you followed all steps correctly when setting up the NVIDIA Container Toolkit or take a look at the following NVIDIA Docker issue.

Now that we know Docker is functioning properly, we can begin building our Docker image for DeepInteract:

  1. Clone this repository and cd into it.

    git clone https://github.com/BioinfoMachineLearning/DeepInteract
    cd DeepInteract/
    DI_DIR=$(pwd)
  2. Download our trained model checkpoints.

    mkdir -p project/checkpoints
    wget -P project/checkpoints https://zenodo.org/record/6671582/files/LitGINI-GeoTran-DilResNet.ckpt
    wget -P project/checkpoints https://zenodo.org/record/6671582/files/LitGINI-GeoTran-DilResNet-DB5-Fine-Tuned.ckpt
  3. Build the Docker image (Warning: Requires ~13GB of Space):

    docker build -f docker/Dockerfile -t deepinteract .
  4. Install the run_docker.py dependencies. Note: You may optionally wish to create a Python Virtual Environment to prevent conflicts with your system's Python environment.

    pip3 install -r docker/requirements.txt
  5. Create directory in which to generate input features and outputs:

    mkdir -p project/datasets/Input
  6. Run run_docker.py pointing to two input PDB files containing the first and second chains of a complex for which you wish to predict the contact probability map. For example, for the DIPS-Plus test target with the PDB ID 4HEQ:

    python3 docker/run_docker.py --left_pdb_filepath "$DI_DIR"/project/test_data/4heq_l_u.pdb --right_pdb_filepath "$DI_DIR"/project/test_data/4heq_r_u.pdb --input_dataset_dir "$DI_DIR"/project/datasets/Input --ckpt_name "$DI_DIR"/project/checkpoints/LitGINI-GeoTran-DilResNet.ckpt --hhsuite_db ~/Data/Databases/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt --num_gpus 0

    This script will generate and (as NumPy array files - e.g., test_data/4heq_contact_prob_map.npy) save to the given input directory the predicted interface contact map as well as the Geometric Transformer's learned node and edge representations for both chain graphs.

  7. Note that by using the default

    --num_gpus 0

    flag when executing run_docker.py, the Docker container will only make use of the system's available CPU(s) for prediction. However, by specifying

    --num_gpus 1

    when executing run_docker.py, the Docker container will then employ the first available GPU for prediction.

Running DeepInteract via a Traditional Installation (for Linux-Based Operating Systems)

First, install and configure Conda environment:

# Clone this repository:
git clone https://github.com/BioinfoMachineLearning/DeepInteract

# Change to project directory:
cd DeepInteract
DI_DIR=$(pwd)

# Set up Conda environment locally
conda env create --name DeepInteract -f environment.yml

# Activate Conda environment located in the current directory:
conda activate DeepInteract

# (Optional) Perform a full install of the pip dependencies described in 'requirements.txt':
pip3 install -r requirements.txt

# (Optional) To remove the long Conda environment prefix in your shell prompt, modify the env_prompt setting in your .condarc file with:
conda config --set env_prompt '({name})'

Installing PSAIA

Install GCC 10 for PSAIA:

# Install GCC 10 for Ubuntu 20.04
sudo apt install software-properties-common
sudo add-apt-repository ppa:ubuntu-toolchain-r/ppa
sudo apt update
sudo apt install gcc-10 g++-10

# Or install GCC 10 for Arch Linux/Manjaro
yay -S gcc10

Install QT4 for PSAIA:

# Install QT4 for Ubuntu 20.04:
sudo add-apt-repository ppa:rock-core/qt4
sudo apt update
sudo apt install libqt4* libqtcore4 libqtgui4 libqtwebkit4 qt4* libxext-dev

# Or install QT4 for Arch Linux/Manjaro
yay -S qt4

Compile PSAIA from source:

# Select the location to install the software:
MY_LOCAL=~/Programs

# Download and extract PSAIA's source code:
mkdir "$MY_LOCAL"
cd "$MY_LOCAL"
wget http://complex.zesoi.fer.hr/data/PSAIA-1.0-source.tar.gz
tar -xvzf PSAIA-1.0-source.tar.gz

# Compile PSAIA (i.e., a GUI for PSA):
cd PSAIA_1.0_source/make/linux/psaia/
qmake-qt4 psaia.pro
make

# Compile PSA (i.e., the protein structure analysis (PSA) program):
cd ../psa/
qmake-qt4 psa.pro
make

# Compile PIA (i.e., the protein interaction analysis (PIA) program):
cd ../pia/
qmake-qt4 pia.pro
make

# Test run any of the above-compiled programs:
cd "$MY_LOCAL"/PSAIA_1.0_source/bin/linux
# Test run PSA inside a GUI:
./psaia/psaia
# Test run PIA through a terminal:
./pia/pia
# Test run PSA through a terminal:
./psa/psa

Finally, substitute your absolute filepath for DeepInteract (i.e., where on your local storage device you downloaded the repository to) anywhere DeepInteract's local repository is referenced in project/datasets/builder/psaia_config_file_input.txt.

Training

Download training and cross-validation DGLGraphs

To train, fine-tune, or test DeepInteract models using CASP-CAPRI, DB5-Plus, or DIPS-Plus targets, we first need to download the preprocessed DGLGraphs from Zenodo:

# Download and extract preprocessed DGLGraphs for CASP-CAPRI, DB5-Plus, and DIPS-Plus
# Requires ~55GB of free space
# Download CASP-CAPRI
mkdir -p project/datasets/CASP_CAPRI/final
cd project/datasets/CASP_CAPRI/final
wget https://zenodo.org/record/6671582/files/final_raw_casp_capri.tar.gz
wget https://zenodo.org/record/6671582/files/final_processed_casp_capri.tar.gz

# Extract CASP-CAPRI
tar -xzf final_raw_casp_capri.tar.gz
tar -xzf final_processed_casp_capri.tar.gz
rm final_raw_casp_capri.tar.gz final_processed_casp_capri.tar.gz

# Download DB5-Plus
mkdir -p ../../DB5/final
cd ../../DB5/final
wget https://zenodo.org/record/6671582/files/final_raw_db5.tar.gz
wget https://zenodo.org/record/6671582/files/final_processed_db5.tar.gz

# Extract DB5-Plus
tar -xzf final_raw_db5.tar.gz
tar -xzf final_processed_db5.tar.gz
rm final_raw_db5.tar.gz final_processed_db5.tar.gz

# Download DIPS-Plus
mkdir -p ../../DIPS/final
cd ../../DIPS/final
wget https://zenodo.org/record/6671582/files/final_raw_dips.tar.gz
wget https://zenodo.org/record/6671582/files/final_processed_dips.tar.gz.partaa
wget https://zenodo.org/record/6671582/files/final_processed_dips.tar.gz.partab

# First, reassemble all processed DGLGraphs
# We split the (tar.gz) archive into two separate parts with
# 'split -b 4096M final_processed_dips.tar.gz "final_processed_dips.tar.gz.part"'
# to upload it to Zenodo, so to recover the original archive:
cat final_processed_dips.tar.gz.parta* >final_processed_dips.tar.gz

# Extract DIPS-Plus
tar -xzf final_raw_dips.tar.gz
tar -xzf final_processed_dips.tar.gz
rm final_processed_dips.tar.gz.parta* final_raw_dips.tar.gz final_processed_dips.tar.gz

Navigate to the project directory and run the training script with the parameters desired:

# Hint: Run `python3 lit_model_train.py --help` to see all available CLI arguments
cd project
python3 lit_model_train.py --lr 1e-3 --weight_decay 1e-2
cd ..

Inference

Download trained model checkpoints

# Return to root directory of DeepInteract repository
cd "$DI_DIR"

# Download our trained model checkpoints
mkdir -p project/checkpoints
wget -P project/checkpoints https://zenodo.org/record/6671582/files/LitGINI-GeoTran-DilResNet.ckpt
wget -P project/checkpoints https://zenodo.org/record/6671582/files/LitGINI-GeoTran-DilResNet-DB5-Fine-Tuned.ckpt

Predict interface contact probability maps

Navigate to the project directory and run the prediction script with the filenames of the left and right PDB chains.

# Hint: Run `python3 lit_model_predict.py --help` to see all available CLI arguments
cd project
python3 lit_model_predict.py --left_pdb_filepath "$DI_DIR"/project/test_data/4heq_l_u.pdb --right_pdb_filepath "$DI_DIR"/project/test_data/4heq_r_u.pdb --ckpt_dir "$DI_DIR"/project/checkpoints --ckpt_name LitGINI-GeoTran-DilResNet.ckpt --hhsuite_db ~/Data/Databases/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
cd ..

This script will generate and (as NumPy array files - e.g., test_data/4heq_contact_prob_map.npy) save to the given input directory the predicted interface contact map as well as the Geometric Transformer's learned node and edge representations for both chain graphs.

Acknowledgements

DeepInteract communicates with and/or references the following separate libraries and packages:

We thank all their contributors and maintainers!

License and Disclaimer

Copyright 2021 University of Missouri-Columbia Bioinformatics & Machine Learning (BML) Lab.

DeepInteract Code License

Licensed under the GNU Public License, Version 3.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.gnu.org/licenses/gpl-3.0.en.html.

Third-party software

Use of the third-party software, libraries or code referred to in the Acknowledgements section above may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.

deepinteract's People

Contributors

amorehead avatar jianlin-cheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

deepinteract's Issues

[Doc] Documentation Confusion

In project/utils/deepinteract_utils.py line 378, the function description about convert_df_to_dgl_graph says that it will return edata['w'] and edata['a'] and in line 860 edata['w'] and edata['a'] are used, but the function convert_df_to_dgl_graph doesn't generate the two parameters.

About the feature generation process.

I notice a strange condition, when I input the 6cp8_a.pdb and 6cp8_c.pdb(which is the test data in CASP13&14), I can't get the same result in the test dataset (which you provide). I continue to check their different, I find the node feature [36:43] is different, So I guess the different result caused by the different. Here is the sample, the graph node feature generated from this repository lit_model_predict.py.
Here is the first residue node feature from 6cp8_a.pdb generated by myself.
[ 0.0000e+00, 1.0000e+00, -2.7021e-01, -9.8549e-01, 0.0000e+00, 9.6280e-01, -1.6976e-01, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 1.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 1.0000e+00, 1.0000e+00, 7.1768e-05, 9.1038e-01, 9.2546e-01, 6.3999e-01, 8.7012e-01, 1.0000e+00, 6.9612e-01, 0.0000e+00, 0.0000e+00, 5.0000e-01, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 2.5000e-01, 0.0000e+00, 0.0000e+00, 2.5000e-01, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 2.5000e-01, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 5.0000e-01, 0.0000e+00, 0.0000e+00, 0.0000e+00, 2.5000e-01, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 3.2309e-01, 0.0000e+00, 0.0000e+00, 0.0000e+00, 6.7689e-01, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 1.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]
Here is you test dataset first residue node feature from 6cp8_a.pdb.
[ 0.0000e+00, 1.0000e+00, -2.7021e-01, -9.8549e-01, 0.0000e+00, 9.6280e-01, -1.6976e-01, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 1.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 1.0000e+00, 1.0000e+00, 3.0420e-05, 1.0000e+00, 1.0000e+00, 1.0000e+00, 1.0000e+00, 1.0000e+00, 1.0000e+00, 0.0000e+00, 0.0000e+00, 5.0000e-01, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 2.5000e-01, 0.0000e+00, 0.0000e+00, 2.5000e-01, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 2.5000e-01, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 5.0000e-01, 0.0000e+00, 0.0000e+00, 0.0000e+00, 2.5000e-01, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 3.2309e-01, 0.0000e+00, 0.0000e+00, 0.0000e+00, 6.7689e-01, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 1.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]

From making sure they are from the same pdb (uploaded), I compare their position and find they are the same. So I want to know how it happened. Thanks for you @amorehead .
6CP8_A_C.zip

what genetic databases did you use?

Hi, for the DIPS-PLUS dataset, what genetic databases did you use when you generated the DIPS-PLUS dataset from the raw PDB file, and for the casp, and db5 datasets?

[BUG?] List index out of range

When I run the line:

python3 docker/run_docker.py --left_pdb_filepath /storage/DeepInteract/project/test_data/4heq_l_u.pdb --right_pdb_filepath /storage/DeepInteract/project/test_data/4heq_r_u.pdb --input_dataset_dir /storage/DeepInteract/project/datasets/Input --ckpt_name /storage/DeepInteract/project/checkpoints/LitGINI-GeoTran-DilResNet.ckpt --hhsuite_db /storage/databases/Uniclust30/UniRef30_2021_06 --num_gpus 1

I get the following, terminating in a "list index out of range" and no output:

I1026 17:17:24.445479 140490422077248 run_docker.py:59] Mounting /storage/DeepInteract/project/test_data -> /mnt/input_pdbs
I1026 17:17:24.445564 140490422077248 run_docker.py:59] Mounting /storage/DeepInteract/project/test_data -> /mnt/input_pdbs
I1026 17:17:24.445607 140490422077248 run_docker.py:59] Mounting /storage/DeepInteract/project/datasets/Input -> /mnt/Input
I1026 17:17:24.445646 140490422077248 run_docker.py:59] Mounting /storage/DeepInteract/project/checkpoints -> /mnt/checkpoints
I1026 17:17:24.445684 140490422077248 run_docker.py:59] Mounting /storage/databases/Uniclust30 -> /mnt/hhsuite_db
I1026 17:17:26.138480 140490422077248 run_docker.py:135] DGL backend not selected or invalid. Assuming PyTorch for now.
I1026 17:17:26.138590 140490422077248 run_docker.py:135] Using backend: pytorch
I1026 17:17:26.141283 140490422077248 run_docker.py:135] I1026 15:17:26.141029 140696250648384 deepinteract_utils.py:1030] Seeding everything with random seed 42
I1026 17:17:26.141357 140490422077248 run_docker.py:135] Global seed set to 42
I1026 17:17:26.177383 140490422077248 run_docker.py:135] I1026 15:17:26.177001 140696250648384 deepinteract_utils.py:587] Making interim data set from raw data
I1026 17:17:26.178824 140490422077248 run_docker.py:135] I1026 15:17:26.178652 140696250648384 parse.py:43] 4 requested keys, 4 produced keys, 0 work keys
I1026 17:17:26.178916 140490422077248 run_docker.py:135] W1026 15:17:26.178736 140696250648384 complex.py:36] Complex file /mnt/Input/interim/complexes/complexes.dill already exists!
I1026 17:17:26.179392 140490422077248 run_docker.py:135] I1026 15:17:26.179221 140696250648384 pair.py:79] 0 requested keys, 0 produced keys, 0 work keys
I1026 17:17:26.179549 140490422077248 run_docker.py:135] I1026 15:17:26.179284 140696250648384 deepinteract_utils.py:608] Generating PSAIA features from PDB files in /mnt/Input/interim/parsed
I1026 17:17:26.179922 140490422077248 run_docker.py:135] I1026 15:17:26.179797 140696250648384 conservation.py:361] 0 PDB files to process with PSAIA
I1026 17:17:26.181284 140490422077248 run_docker.py:135] I1026 15:17:26.179910 140696250648384 parallel.py:46] Processing 1 inputs.
I1026 17:17:26.181358 140490422077248 run_docker.py:135] I1026 15:17:26.181147 140696250648384 parallel.py:62] Sequential Mode.
I1026 17:17:26.181491 140490422077248 run_docker.py:135] I1026 15:17:26.181194 140696250648384 conservation.py:43] PSAIA'ing /mnt/Input/interim/external_feats/PSAIA/INPUT/pdb_list.fls
I1026 17:17:26.199129 140490422077248 run_docker.py:135] I1026 15:17:26.198776 140696250648384 conservation.py:200] For generating protrusion indices, spent 00.02 PSAIA'ing, 00.00 writing, and 00.02 overall.
I1026 17:17:26.199361 140490422077248 run_docker.py:135] I1026 15:17:26.198991 140696250648384 deepinteract_utils.py:625] Generating profile HMM features from PDB files in /mnt/Input/interim/parsed
I1026 17:17:26.199785 140490422077248 run_docker.py:135] I1026 15:17:26.199542 140696250648384 conservation.py:458] 4 requested keys, 4 produced keys, 0 work filenames
I1026 17:17:26.199849 140490422077248 run_docker.py:135] I1026 15:17:26.199590 140696250648384 conservation.py:464] 0 work filenames
I1026 17:17:26.199900 140490422077248 run_docker.py:135] I1026 15:17:26.199645 140696250648384 deepinteract_utils.py:640] Starting postprocessing for all unprocessed pairs in /mnt/Input/interim/pairs
I1026 17:17:26.199948 140490422077248 run_docker.py:135] I1026 15:17:26.199685 140696250648384 deepinteract_utils.py:647] Looking for all pairs in /mnt/Input/interim/pairs
I1026 17:17:26.200107 140490422077248 run_docker.py:135] Setting the default backend to "pytorch". You can change it in the ~/.dgl/config.json file or export the DGLBACKEND environment variable. Valid options are: pytorch, mxnet, tensorflow (all lowercase)
I1026 17:17:26.200161 140490422077248 run_docker.py:135] I1026 15:17:26.199843 140696250648384 deepinteract_utils.py:660] Found 0 work pair(s) in /mnt/Input/interim/pairs
I1026 17:17:26.200797 140490422077248 run_docker.py:135] Traceback (most recent call last):
I1026 17:17:26.200864 140490422077248 run_docker.py:135] File "/app/DeepInteract/project/lit_model_predict_docker.py", line 326, in
I1026 17:17:26.200918 140490422077248 run_docker.py:135] app.run(main)
I1026 17:17:26.200968 140490422077248 run_docker.py:135] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 312, in run
I1026 17:17:26.201017 140490422077248 run_docker.py:135] _run_main(main, args)
I1026 17:17:26.201066 140490422077248 run_docker.py:135] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
I1026 17:17:26.201114 140490422077248 run_docker.py:135] sys.exit(main(argv))
I1026 17:17:26.201161 140490422077248 run_docker.py:135] File "/app/DeepInteract/project/lit_model_predict_docker.py", line 199, in main
I1026 17:17:26.201208 140490422077248 run_docker.py:135] input_dataset = InputDataset(left_pdb_filepath=FLAGS.left_pdb_filepath,
I1026 17:17:26.201254 140490422077248 run_docker.py:135] File "/app/DeepInteract/project/lit_model_predict_docker.py", line 95, in init
I1026 17:17:26.201300 140490422077248 run_docker.py:135] super(InputDataset, self).init(name='InputDataset',
I1026 17:17:26.201347 140490422077248 run_docker.py:135] File "/opt/conda/lib/python3.8/site-packages/dgl/data/dgl_dataset.py", line 94, in init
I1026 17:17:26.201393 140490422077248 run_docker.py:135] self._load()
I1026 17:17:26.201438 140490422077248 run_docker.py:135] File "/opt/conda/lib/python3.8/site-packages/dgl/data/dgl_dataset.py", line 179, in _load
I1026 17:17:26.201483 140490422077248 run_docker.py:135] self.process()
I1026 17:17:26.201529 140490422077248 run_docker.py:135] File "/app/DeepInteract/project/lit_model_predict_docker.py", line 109, in process
I1026 17:17:26.201575 140490422077248 run_docker.py:135] left_complex_graph, right_complex_graph = process_pdb_into_graph(self.left_pdb_filepath,
I1026 17:17:26.201622 140490422077248 run_docker.py:135] File "/app/DeepInteract/project/utils/deepinteract_utils.py", line 741, in process_pdb_into_graph
I1026 17:17:26.201667 140490422077248 run_docker.py:135] input_pair = convert_input_pdb_files_to_pair(left_pdb_filepath, right_pdb_filepath,
I1026 17:17:26.201713 140490422077248 run_docker.py:135] File "/app/DeepInteract/project/utils/deepinteract_utils.py", line 725, in convert_input_pdb_files_to_pair
I1026 17:17:26.201758 140490422077248 run_docker.py:135] pair_filepath = launch_postprocessing_of_pruned_pairs(
I1026 17:17:26.201883 140490422077248 run_docker.py:135] IndexError: list index out of range

A major bug???

It seems that only these few values are printed at the end,

self.log('med_test_acc', torch.median(test_accs)) # Log MedAccuracy of an epoch
self.log('med_test_prec', torch.median(test_precs)) # Log MedPrecision of an epoch
self.log('med_test_recall', torch.median(test_recalls)) # Log MedRecall of an epoch
self.log('med_test_f1', torch.median(test_f1s)) # Log MedF1 of an epoch
self.log('med_test_auroc', torch.median(test_aurocs)) # Log MedAUROC of an epoch
self.log('med_test_auprc', torch.median(test_auprcs)) # Log epoch MedAveragePrecision

And printed precision and recall seem that the value of the last sample.

self.log(f'test_ce', loss, sync_dist=True)
self.log('test_top_10_prec', top_10_prec, sync_dist=True)
self.log('test_top_l_by_10_prec', top_l_by_10_prec, sync_dist=True)
self.log('test_top_l_by_5_prec', top_l_by_5_prec, sync_dist=True)
self.log('test_top_l_recall', top_l_recall, sync_dist=True)
self.log('test_top_l_by_2_recall', top_l_by_2_recall, sync_dist=True)
self.log('test_top_l_by_5_recall', top_l_by_5_recall, sync_dist=True)

what does this code mean?

what does this code mean?It seems to be used during data loader.

if self.train_viz:
n = 5532 # Supports up to a world size of 5,532 GPUs (i.e., asserts that n >= total_num_gpus_used)
self.filenames_frame = self.filenames_frame.head(n=1)
self.filenames_frame = pd.DataFrame(np.repeat(self.filenames_frame.values, n, axis=0))
mode = 'viz'

Empty Raw DIPS

Hi, @amorehead

I recently downloaded the DIPS archive final_raw_dips.tar.gz and noticed that it only contains some index files and statistics results.

Would it be possible to update the package to include all the ****.dill files?

[BUG?] RuntimeWarning: invalid value encountered in double_scalars & Normal vector missing

When I try to run:
python3 docker/run_docker.py --left_pdb_filepath project/test_data/4heq_l_u.pdb --right_pdb_filepath project/test_data/4heq_r_u.pdb --input_dataset_dir project/datasets/CASP_CAPRI --ckpt_name project/checkpoints/LitGINI-GeoTran-DilResNet.ckpt --hhsuite_db ~/Data/Databases/uniclust30/uniclust30_2018_08/uniclust30_2018_08

I get these logs:

I0621 12:54:27.512626 139977373710144 run_docker.py:59] Mounting /home/ryc/pro/DeepInteract/project/test_data -> /mnt/input_pdbs
I0621 12:54:27.512762 139977373710144 run_docker.py:59] Mounting /home/ryc/pro/DeepInteract/project/test_data -> /mnt/input_pdbs
I0621 12:54:27.512836 139977373710144 run_docker.py:59] Mounting /home/ryc/pro/DeepInteract/project/datasets/CASP_CAPRI -> /mnt/Input
I0621 12:54:27.512908 139977373710144 run_docker.py:59] Mounting /home/ryc/pro/DeepInteract/project/checkpoints -> /mnt/checkpoints
I0621 12:54:27.512977 139977373710144 run_docker.py:59] Mounting /home/ryc/Data/Databases/uniclust30/uniclust30_2018_08 -> /mnt/hhsuite_db
I0621 12:54:30.589913 139977373710144 run_docker.py:135] DGL backend not selected or invalid. Assuming PyTorch for now.
I0621 12:54:30.590292 139977373710144 run_docker.py:135] Using backend: pytorch
I0621 12:54:30.594311 139977373710144 run_docker.py:135] I0621 12:54:30.593440 140113106646848 deepinteract_utils.py:1098] Seeding everything with random seed 42
I0621 12:54:30.594596 139977373710144 run_docker.py:135] Global seed set to 42
I0621 12:54:30.643066 139977373710144 run_docker.py:135] cp: cannot stat '/mnt/input_pdbs/4heq_l_u.pdb': No such file or directory
I0621 12:54:30.654789 139977373710144 run_docker.py:135] cp: cannot stat '/mnt/input_pdbs/4heq_r_u.pdb': No such file or directory
I0621 12:54:30.655230 139977373710144 run_docker.py:135] I0621 12:54:30.654651 140113106646848 deepinteract_utils.py:608] Making interim data set from raw data
I0621 12:54:30.675874 139977373710144 run_docker.py:135] I0621 12:54:30.675035 140113106646848 parse.py:43] 62 requested keys, 60 produced keys, 2 work keys
I0621 12:54:30.676792 139977373710144 run_docker.py:135] I0621 12:54:30.675550 140113106646848 parallel.py:46] Processing 2 inputs.
I0621 12:54:30.676914 139977373710144 run_docker.py:135] I0621 12:54:30.676569 140113106646848 parallel.py:62] Sequential Mode.
I0621 12:54:30.677030 139977373710144 run_docker.py:135] I0621 12:54:30.676633 140113106646848 parse.py:63] Reading /mnt/Input/raw/he/4heq_r_u.pdb
I0621 12:54:30.711622 139977373710144 run_docker.py:135] I0621 12:54:30.710961 140113106646848 parse.py:65] Writing /mnt/Input/raw/he/4heq_r_u.pdb to /mnt/Input/interim/parsed/he/4heq_r_u.pdb.pkl
I0621 12:54:30.713438 139977373710144 run_docker.py:135] I0621 12:54:30.712913 140113106646848 parse.py:67] Done writing /mnt/Input/raw/he/4heq_r_u.pdb to /mnt/Input/interim/parsed/he/4heq_r_u.pdb.pkl
I0621 12:54:30.713546 139977373710144 run_docker.py:135] I0621 12:54:30.713084 140113106646848 parse.py:63] Reading /mnt/Input/raw/he/4heq_l_u.pdb
I0621 12:54:30.744368 139977373710144 run_docker.py:135] I0621 12:54:30.743873 140113106646848 parse.py:65] Writing /mnt/Input/raw/he/4heq_l_u.pdb to /mnt/Input/interim/parsed/he/4heq_l_u.pdb.pkl
I0621 12:54:30.745597 139977373710144 run_docker.py:135] I0621 12:54:30.745240 140113106646848 parse.py:67] Done writing /mnt/Input/raw/he/4heq_l_u.pdb to /mnt/Input/interim/parsed/he/4heq_l_u.pdb.pkl
I0621 12:54:30.745825 139977373710144 run_docker.py:135] I0621 12:54:30.745505 140113106646848 complex.py:38] Getting filenames...
I0621 12:54:30.749119 139977373710144 run_docker.py:135] I0621 12:54:30.748700 140113106646848 complex.py:40] Getting complexes...
I0621 12:54:30.770302 139977373710144 run_docker.py:135] I0621 12:54:30.769680 140113106646848 pair.py:79] 31 requested keys, 30 produced keys, 1 work keys
I0621 12:54:30.770423 139977373710144 run_docker.py:135] I0621 12:54:30.769779 140113106646848 parallel.py:46] Processing 1 inputs.
I0621 12:54:30.770527 139977373710144 run_docker.py:135] I0621 12:54:30.769842 140113106646848 parallel.py:62] Sequential Mode.
I0621 12:54:30.770629 139977373710144 run_docker.py:135] I0621 12:54:30.769901 140113106646848 pair.py:97] Working on 4heq
I0621 12:54:30.773638 139977373710144 run_docker.py:135] I0621 12:54:30.773111 140113106646848 pair.py:102] For complex 4heq found 1 pairs out of 2 chains
I0621 12:54:31.086785 139977373710144 run_docker.py:135] I0621 12:54:31.085926 140113106646848 deepinteract_utils.py:689] Generating PSAIA features from PDB files in /mnt/Input/interim/parsed
I0621 12:54:31.090075 139977373710144 run_docker.py:135] I0621 12:54:31.089508 140113106646848 conservation.py:361] 0 PDB files to process with PSAIA
I0621 12:54:31.090215 139977373710144 run_docker.py:135] I0621 12:54:31.089650 140113106646848 parallel.py:46] Processing 1 inputs.
I0621 12:54:31.090428 139977373710144 run_docker.py:135] I0621 12:54:31.089698 140113106646848 parallel.py:62] Sequential Mode.
I0621 12:54:31.090618 139977373710144 run_docker.py:135] I0621 12:54:31.089743 140113106646848 conservation.py:43] PSAIA'ing /mnt/Input/interim/external_feats/PSAIA/INPUT/pdb_list.fls
I0621 12:54:31.114144 139977373710144 run_docker.py:135] I0621 12:54:31.113151 140113106646848 conservation.py:200] For generating protrusion indices, spent 00.02 PSAIA'ing, 00.00 writing, and 00.02 overall.
I0621 12:54:31.114319 139977373710144 run_docker.py:135] I0621 12:54:31.113927 140113106646848 deepinteract_utils.py:706] Generating profile HMM features from PDB files in /mnt/Input/interim/parsed
I0621 12:54:31.125687 139977373710144 run_docker.py:135] I0621 12:54:31.125225 140113106646848 conservation.py:458] 62 requested keys, 60 produced keys, 2 work filenames
I0621 12:54:31.125820 139977373710144 run_docker.py:135] I0621 12:54:31.125341 140113106646848 conservation.py:464] 2 work filenames
I0621 12:54:31.126219 139977373710144 run_docker.py:135] I0621 12:54:31.125793 140113106646848 parallel.py:46] Processing 2 inputs.
I0621 12:54:31.126399 139977373710144 run_docker.py:135] I0621 12:54:31.125915 140113106646848 parallel.py:62] Sequential Mode.
I0621 12:54:31.160443 139977373710144 run_docker.py:135] I0621 12:54:31.159958 140113106646848 conservation.py:152] HHblits'ing /mnt/Input/interim/external_feats/he/work/4heq_l_u.pdb-1-A.fa
I0621 12:55:03.191800 139977373710144 run_docker.py:135] I0621 12:55:03.190688 140113106646848 conservation.py:238] For 1 profile HMMs generated from 4heq_l_u.pdb, spent 32.06 blitsing, 00.00 writing, and 32.06 overall.
I0621 12:55:03.224250 139977373710144 run_docker.py:135] I0621 12:55:03.223448 140113106646848 conservation.py:152] HHblits'ing /mnt/Input/interim/external_feats/he/work/4heq_r_u.pdb-1-B.fa
I0621 12:55:37.966540 139977373710144 run_docker.py:135] I0621 12:55:37.965222 140113106646848 conservation.py:238] For 1 profile HMMs generated from 4heq_r_u.pdb, spent 34.77 blitsing, 00.00 writing, and 34.77 overall.
I0621 12:55:37.966913 139977373710144 run_docker.py:135] I0621 12:55:37.965721 140113106646848 deepinteract_utils.py:722] Starting postprocessing for all unprocessed pairs in /mnt/Input/interim/pairs
I0621 12:55:37.967144 139977373710144 run_docker.py:135] I0621 12:55:37.965833 140113106646848 deepinteract_utils.py:729] Looking for all pairs in /mnt/Input/interim/pairs
I0621 12:55:37.972153 139977373710144 run_docker.py:135] I0621 12:55:37.971457 140113106646848 deepinteract_utils.py:743] Found 1 work pair(s) in /mnt/Input/interim/pairs
I0621 12:55:37.972460 139977373710144 run_docker.py:135] I0621 12:55:37.971827 140113106646848 parallel.py:46] Processing 1 inputs.
I0621 12:55:37.972671 139977373710144 run_docker.py:135] I0621 12:55:37.971918 140113106646848 parallel.py:62] Sequential Mode.
I0621 12:55:41.316108 139977373710144 run_docker.py:135] /opt/conda/lib/python3.8/site-packages/Bio/PDB/vectors.py:357: RuntimeWarning: invalid value encountered in double_scalars
I0621 12:55:41.316489 139977373710144 run_docker.py:135] c = (self * other) / (n1 * n2)
I0621 12:55:41.316720 139977373710144 run_docker.py:135] /opt/conda/lib/python3.8/site-packages/Bio/PDB/vectors.py:357: RuntimeWarning: invalid value encountered in double_scalars
I0621 12:55:41.316918 139977373710144 run_docker.py:135] c = (self * other) / (n1 * n2)
I0621 12:55:41.317103 139977373710144 run_docker.py:135] I0621 12:55:41.314721 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 9 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb
I0621 12:55:41.336150 139977373710144 run_docker.py:135] I0621 12:55:41.335281 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 13 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb
I0621 12:55:41.356255 139977373710144 run_docker.py:135] I0621 12:55:41.355384 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 17 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb
I0621 12:55:41.427843 139977373710144 run_docker.py:135] I0621 12:55:41.426913 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 30 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb
I0621 12:55:41.506287 139977373710144 run_docker.py:135] I0621 12:55:41.505459 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 45 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb
I0621 12:55:41.526054 139977373710144 run_docker.py:135] I0621 12:55:41.525439 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 49 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb
I0621 12:55:41.560204 139977373710144 run_docker.py:135] I0621 12:55:41.559483 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 56 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb
I0621 12:55:41.585425 139977373710144 run_docker.py:135] I0621 12:55:41.584762 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 61 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb
I0621 12:55:41.686717 139977373710144 run_docker.py:135] I0621 12:55:41.686072 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 82 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb
I0621 12:55:41.720789 139977373710144 run_docker.py:135] I0621 12:55:41.720090 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 89 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb
I0621 12:55:41.735666 139977373710144 run_docker.py:135] I0621 12:55:41.735008 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 92 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb
I0621 12:55:41.745175 139977373710144 run_docker.py:135] I0621 12:55:41.744497 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 94 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb
I0621 12:55:41.788900 139977373710144 run_docker.py:135] I0621 12:55:41.788190 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 103 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb
I0621 12:55:41.852835 139977373710144 run_docker.py:135] I0621 12:55:41.852125 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 116 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb
I0621 12:55:41.909060 139977373710144 run_docker.py:135] I0621 12:55:41.908362 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 128 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb
I0621 12:55:42.041210 139977373710144 run_docker.py:135] I0621 12:55:42.040462 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 9 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb
I0621 12:55:42.059900 139977373710144 run_docker.py:135] I0621 12:55:42.059195 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 13 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb
I0621 12:55:42.079119 139977373710144 run_docker.py:135] I0621 12:55:42.078433 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 17 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb
I0621 12:55:42.143947 139977373710144 run_docker.py:135] I0621 12:55:42.143125 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 30 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb
I0621 12:55:42.215802 139977373710144 run_docker.py:135] I0621 12:55:42.215100 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 45 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb
I0621 12:55:42.234920 139977373710144 run_docker.py:135] I0621 12:55:42.234243 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 49 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb
I0621 12:55:42.267936 139977373710144 run_docker.py:135] I0621 12:55:42.267218 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 56 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb
I0621 12:55:42.292104 139977373710144 run_docker.py:135] I0621 12:55:42.291366 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 61 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb
I0621 12:55:42.392189 139977373710144 run_docker.py:135] I0621 12:55:42.391432 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 82 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb
I0621 12:55:42.426449 139977373710144 run_docker.py:135] I0621 12:55:42.425676 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 89 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb
I0621 12:55:42.440944 139977373710144 run_docker.py:135] I0621 12:55:42.440204 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 92 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb
I0621 12:55:42.450548 139977373710144 run_docker.py:135] I0621 12:55:42.449811 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 94 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb
I0621 12:55:42.494798 139977373710144 run_docker.py:135] I0621 12:55:42.493852 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 103 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb
I0621 12:55:42.557198 139977373710144 run_docker.py:135] I0621 12:55:42.556405 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 116 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb
I0621 12:55:42.614418 139977373710144 run_docker.py:135] I0621 12:55:42.613722 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 128 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb
I0621 12:55:42.718224 139977373710144 run_docker.py:135] I0621 12:55:42.717459 140113106646848 deepinteract_utils.py:773] Imputing missing feature values for given inputs
I0621 12:55:42.719177 139977373710144 run_docker.py:135] I0621 12:55:42.718771 140113106646848 parallel.py:46] Processing 31 inputs.
I0621 12:55:42.719329 139977373710144 run_docker.py:135] I0621 12:55:42.718858 140113106646848 parallel.py:62] Sequential Mode.
I0621 12:55:48.405303 139977373710144 run_docker.py:135] I0621 12:55:48.404334 140113106646848 lit_model_predict_docker.py:99] Loading complex for prediction, l_chain: /mnt/input_pdbs/4heq_l_u.pdb, r_chain: /mnt/input_pdbs/4heq_r_u.pdb
I0621 12:55:49.322944 139977373710144 run_docker.py:135] /opt/conda/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Metric AUROC will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint.
I0621 12:55:49.323281 139977373710144 run_docker.py:135] warnings.warn(*args, **kwargs)
I0621 12:55:49.323480 139977373710144 run_docker.py:135] /opt/conda/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Metric AveragePrecision will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint.
I0621 12:55:49.323687 139977373710144 run_docker.py:135] warnings.warn(*args, **kwargs)
I0621 12:55:49.323897 139977373710144 run_docker.py:135] /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:792: UserWarning: You are running on single node with no parallelization, so distributed has no effect.
I0621 12:55:49.324064 139977373710144 run_docker.py:135] rank_zero_warn("You are running on single node with no parallelization, so distributed has no effect.")
I0621 12:55:49.324226 139977373710144 run_docker.py:135] GPU available: False, used: False
I0621 12:55:49.324383 139977373710144 run_docker.py:135] TPU available: False, using: 0 TPU cores
I0621 12:55:49.324540 139977373710144 run_docker.py:135] IPU available: False, using: 0 IPUs
I0621 12:55:49.377713 139977373710144 run_docker.py:135] Setting the default backend to "pytorch". You can change it in the ~/.dgl/config.json file or export the DGLBACKEND environment variable. Valid options are: pytorch, mxnet, tensorflow (all lowercase)
I0621 12:55:49.378072 139977373710144 run_docker.py:135] /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py:105: UserWarning: The dataloader, predict dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 64 which is the number of cpus on this machine) in theDataLoader` init to improve performance.
I0621 12:55:49.378295 139977373710144 run_docker.py:135] rank_zero_warn(
I0621 12:55:50.089681 139977373710144 run_docker.py:135] I0621 12:55:50.088823 140113106646848 lit_model_predict_docker.py:298] Saved predicted contact probability map for 4heq as /mnt/input_pdbs/4heq_contact_prob_map.npy
I0621 12:55:50.092932 139977373710144 run_docker.py:135] I0621 12:55:50.092391 140113106646848 lit_model_predict_docker.py:307] Saved learned node representations for the first chain graph of 4heq as /mnt/input_pdbs/4heq_graph1_node_feats.npy
I0621 12:55:50.093075 139977373710144 run_docker.py:135] I0621 12:55:50.092499 140113106646848 lit_model_predict_docker.py:308] Saved learned edge representations for the first chain graph of 4heq as /mnt/input_pdbs/4heq_graph1_edge_feats.npy
I0621 12:55:50.093154 139977373710144 run_docker.py:135] I0621 12:55:50.092547 140113106646848 lit_model_predict_docker.py:309] Saved learned node representations for the second chain graph of 4heq as /mnt/input_pdbs/4heq_graph2_node_feats.npy
I0621 12:55:50.093222 139977373710144 run_docker.py:135] I0621 12:55:50.092598 140113106646848 lit_model_predict_docker.py:310] Saved learned edge representations for the second chain graph of 4heq as /mnt/input_pdbs/4heq_graph2_edge_feats.npy
Predicting: 100%|██████████| 1/1 [00:00<00:00, 1.41it/s]Predicting: 0it [00:00, ?it/s]

This results in the final generated dill file not working properly

the memory in each GPU seems to be different

hi, authors, great work, when I used your code to train the model in the DDP form, the memory of each GPU was different, so how do I fix the code to let each GPU have the same memory when using DDP to train the model?

[Doc]About pdb files

Hi, @amorehead, can you provide the original 32 pdb files for the DIPS-Plus dataset and 55 pdb files for the DB5 dataset?
And how to process the original pdb files into pdb.dill files for this project.

Thanks!

About the training dataset

Hi, @amorehead

When I reproduce this work, I have some questions about the dataset. I use the ndata["x"] of complex["graph1"] and complex["graph2"] to check the postive labels and distance, but I get some comfused result. The postive labels created by distance map(<6 Angstrom) are less than the complex["examples"]. So I want to know the ndata["x"] is the bound complex coordinates?

Thanks!

[BUG?] Invalid key "graph1". Must be one of the edge types.

Thanks for great DeepInteract!
When I run the line:

python3 lit_model_train.py --lr 1e-3 --weight_decay 1e-2

I get the following:

Traceback (most recent call last):
File "lit_model_train.py", line 223, in
main(args)
File "lit_model_train.py", line 174, in main
trainer.fit(model=model, datamodule=picp_data_module)
File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 552, in fit
self._run(model)
File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 917, in _run
self._dispatch()
File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 985, in _dispatch
self.accelerator.start_training(self)
File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 92, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in start_training
self._results = trainer.run_stage()
File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 995, in run_stage
return self._run_train()
File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1030, in _run_train
self._run_sanity_check(self.lightning_module)
File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1114, in _run_sanity_check
self._evaluation_loop.run()
File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run
self.advance(*args, **kwargs)
File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 109, in advance
dl_outputs = self.epoch_loop.run(
File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run
self.advance(*args, **kwargs)
File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 111, in advance
output = self.evaluation_step(batch, batch_idx, dataloader_idx)
File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 158, in evaluation_step
output = self.trainer.accelerator.validation_step(step_kwargs)
File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 211, in validation_step
return self.training_type_plugin.validation_step(*step_kwargs.values())
File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 392, in validation_step
return self.model(*args, **kwargs)
File "/home/user/miniconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/user/miniconda/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 624, in forward
output = self.module(*inputs, **kwargs)
File "/home/user/miniconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/overrides/base.py", line 93, in forward
output = self.module.validation_step(*inputs, **kwargs)
File "/ryc/DeepInteract/project/utils/deepinteract_modules.py", line 1923, in validation_step
graph1, graph2 = batch['graph1'], batch['graph2']
File "/home/user/miniconda/lib/python3.8/site-packages/dgl/heterograph.py", line 2152, in getitem
raise DGLError('Invalid key "{}". Must be one of the edge types.'.format(orig_key))
dgl._ffi.base.DGLError: Invalid key "graph1". Must be one of the edge types.
Exception in thread Thread-2:
Traceback (most recent call last):
File "/home/user/miniconda/lib/python3.8/threading.py", line 932, in _bootstrap_inner

How to finetune on a new dataset

Hello! Can you help point me to what I should do if I want to finetune this model on a novel set of complexes that are not in any of your datasets?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.