I am trying to run the run_multimer_jobs on GPU using this command: <p dir=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Yes, that was it. Thank you <a class="user-mention notranslate" data-hovercard-type="u

Available platform names are: CUDA Hi <a

run_multimer_jobs issue about alphapulldown HOT 5 CLOSED

J-Held commented on September 15, 2024

run_multimer_jobs issue

from alphapulldown.

Comments (5)

Qrouger commented on September 15, 2024 1

Hi @J-Held, the first part of your errors says that can't use GPU cause you have a problem with your TensorRT. But the script don't crash cause of that, but probably cause of yours command. Take care of your backslash and personally I prefer write the command in line with one space to avoid writing errors. Like this :
run_multimer_jobs.py --mode=all_vs_all --num_cycle=3 --num_predictions_per_model=1 --output_path=/storage/home/jbh249/scratch/output/models/ --data_dir=/storage/home/jbh249/scratch/alphaDatabase/ --protein_lists=/storage/home/jbh249/scratch/candidates.txt --monomer_objects_dir=/storage/home/jbh249/scratch/output/features

Quentin

from alphapulldown.

dingquanyu commented on September 15, 2024 1

Hi @J-Held

I agree with @Qrouger 's suggestion. It's likely that your command is not correctly formatted so that protein_lists wasn't parsed correctly. What you wrote after the \ is not parsed at all.

Yours
Dingquan

from alphapulldown.

J-Held commented on September 15, 2024

Yes, that was it. Thank you @Qrouger and @dingquanyu!

Regarding the GPU, it looks like I'm getting many of the error messages brought up in #339, but the job appears to still be running. Is it just going to time out? Output log below:

I0521 10:54:40.655257 22582644975424 run_multimer_jobs.py:389] Modeling new interaction for /storage/home/jbh249/scratch/output/models/HrpN_and_WAK3
I0521 10:54:41.184001 22582644975424 xla_bridge.py:660] Unable to initialize backend 'cuda': Unable to load cuDNN. Is it installed?
I0521 10:54:41.203725 22582644975424 xla_bridge.py:660] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: CUDA
I0521 10:54:41.204897 22582644975424 xla_bridge.py:660] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
W0521 10:54:41.205006 22582644975424 xla_bridge.py:724] CUDA backend failed to initialize: Unable to load cuDNN. Is it installed? (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
I0521 10:54:43.223712 22582644975424 utils.py:378] Model model_1_multimer_v3 is running 0 prediction with default MSA depth
I0521 10:54:44.160407 22582644975424 utils.py:378] Model model_2_multimer_v3 is running 0 prediction with default MSA depth
I0521 10:54:45.103848 22582644975424 utils.py:378] Model model_3_multimer_v3 is running 0 prediction with default MSA depth
I0521 10:54:46.035488 22582644975424 utils.py:378] Model model_4_multimer_v3 is running 0 prediction with default MSA depth
I0521 10:54:46.962665 22582644975424 utils.py:378] Model model_5_multimer_v3 is running 0 prediction with default MSA depth
I0521 10:54:46.962839 22582644975424 utils.py:384] Using random seed 1682205902281770834 for the data pipeline
I0521 10:54:47.012253 22582644975424 run_multimer_jobs.py:323] now running prediction on HrpN_and_WAK3
I0521 10:54:47.012355 22582644975424 run_multimer_jobs.py:324] output_path is /storage/home/jbh249/scratch/output/models/HrpN_and_WAK3
I0521 10:54:47.012434 22582644975424 predict_structure.py:125] Checking for existing results
I0521 10:54:47.012791 22582644975424 predict_structure.py:139] Running model model_1_multimer_v3_pred_0 on HrpN_and_WAK3
I0521 10:54:47.013137 22582644975424 model.py:165] Running predict with shape(feat) = {'aatype': (1144,), 'residue_index': (1144,), 'seq_length': (), 'msa': (2257, 1144), 'num_alignments': (), 'template_aatype': (4, 1144), 'template_all_atom_mask': (4, 1144, 37), 'template_all_atom_positions': (4, 1144, 37, 3), 'asym_id': (1144,), 'sym_id': (1144,), 'entity_id': (1144,), 'deletion_matrix': (2257, 1144), 'deletion_mean': (1144,), 'all_atom_mask': (1144, 37), 'all_atom_positions': (1144, 37, 3), 'assembly_num_chains': (), 'entity_mask': (1144,), 'num_templates': (), 'cluster_bias_mask': (2257,), 'bert_mask': (2257, 1144), 'seq_mask': (1144,), 'msa_mask': (2257, 1144)}

from alphapulldown.

Qrouger commented on September 15, 2024

No, he just run slowly on CPU.

Quentin.

from alphapulldown.

dingquanyu commented on September 15, 2024

Available platform names are: CUDA

Hi @J-Held

Glad it worked. These messages are not actually errors but some logs that reflect the status of you modelling job. Since you have this Available platform names are: CUDA printed out, it should be successfully running on you GPU. But I would still suggest running nvidia-smi just to double check if the programme is actually consuming your GPU RAM.

Yours
Dingquan

from alphapulldown.

run_multimer_jobs issue about alphapulldown HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent