kuhlman-lab / evopro Goto Github PK

License: MIT License

Python 93.61% Shell 6.39%

evopro's Introduction

About EvoPro

EvoPro is a genetic algorithm-based protein binder optimization pipeline, used in published work for in silico evolution of highly accurate, tight protein binders.
Now including multistate design, including our current unpublished work for conformational switch design!

PLEASE MAKE SURE TO USE "MAIN" BRANCH for code used in paper. Current working version is on "dev branch". Beta dockerized version is on "dockerized" branch.

paper
preprint

(back to top)

Getting Started

Steps to set up a copy of EvoPro locally are detailed below.

Prerequisites

Installation of Anaconda is required to load dependencies.

Installing conda: conda-install-link

EvoPro Installation

Clone the repo:

git clone https://github.com/Kuhlman-Lab/evopro.git

Clone our AF2 and ProteinMPNN repos:

git clone https://github.com/Kuhlman-Lab/alphafold.git
git clone https://github.com/Kuhlman-Lab/proteinmpnn.git

Load AlphaFold2 model weights from source using script: https://github.com/Kuhlman-Lab/alphafold/blob/main/setup/download_alphafold_params.sh

Set up conda environment:

conda env create -n evopro -f setup_conda.yaml
pip3 install --upgrade jax==0.3.25 jaxlib==0.3.25+cuda11.cudnn805 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
python3 -m pip install /path/to/alphafold/alphafold/

Set local paths at the top of each script in the evopro/run/ directory. (temporary step, will be removed when Docker support added)

(back to top)

Usage

Usage of EvoPro requires GPUs. The running directory should contain a sequence file with input sequences, and flags files for EvoPro specifications and AF2 specifications. See below for flag options and evopro/examples/ for examples of directory setups. pd1_binder and pd1_binder_nochainB are old examples using deprecated script - see examples/binder for updated version.

The residue_specs.json file should be generated from the sequence file. Specify sequence file and which residues to mutate in json.flags. (Include symmetry here if needed) Then, in the running directory:

python /path/to/evopro/run/generate_json.py @json.flags

You can set options for EvoPro in the evopro.flags file. To run EvoPro, confirm availability of GPUs. Then,

conda activate evopro
python /path/to/evopro/run/run_evopro_binder.py @evopro.flags

(back to top)

Generating the JSON file

Specify your flag options in json.flags, and provide either a PDB or sequence file to extract starting sequences from. All flag options are listed below:
--pdb
default=None, type=str,
Path to and name of PDB file to extract chains and sequences. Alternatively use sequence file (below).

--sequence_file
default=None, type=str,
Path to and name of text file to extract chains and sequences. Only provide if there is no PDB file.

--mut_res
default=''”, type=str,
PDB chain and residue numbers to mutate, separated by commas. (Make sure there are no extraneous spaces between commas)
Here, you can use "*" to specify the whole chain (eg. A*), or "<" to specify all residues in a chain after a specific residue (eg B17<).

--default_mutres_setting
default='all', type=str,
Default setting for residues to mutate. Individual residues can be changed manually after generation of the file. Default is all. Other examples are “all-PC” (allow any mutation except proline and cysteine), “AVILMFYW” (specify which amino acids are allowed).

--output
default='', type=str,
path and name of output json file

--symmetric_res
default='', type=str,
PDB chain and residue numbers to force symmetry separated by a colon

Specifying the AlphaFold2 options

Specify your AF2 options in af2.flags.
Notably, we suggest turning off MSA generation or providing pre-computed MSA and templates so that the AF2 runner will not have to query MMseqs server for every prediction (this will make runtime much longer and may cause errors).

To turn off MSA generation:
--msa_mode single_sequence

To use precomputed MSA:
--msa_mode single_sequence
--custom_msa_path /path/to/directory/with/a3m/files/

To use custom templates:
--use_templates
--custom_template_path templates

Since we have modified the original AF2 code to allow for custom template databases, you will have to make sure each .pdb file within the templates folder has a name consisting of 4 letters and numbers, essentially mimicking a file from the PDB database with a PDB code (although the file name does not actually have to be a real PDB code). Some examples could be “temp.pdb”, “1uzg.pdb”, “PPPP.pdb”, etc.

EvoPro flag options

--input_dir
Default = current directory, type=str, Path to directory that contains input files.

--num_iter
default=50, type=int, Number of iterations of genetic algorithm. Default is 50.

--pool_size
Default=20, type=int, Size of "genetic pool", or the number of sequences evaluated per iteration. Default is 20.

--pool_size_variable
Specify a file pool_sizes.txt with pool sizes for every iteration (or until pool size stops changing).
Defaults to False and uses constant pool size.

--num_gpus
default=1, type=int, Number of gpus available. Default is 1.

--score_file
type=str, Path and file name of python script containing the score function used to evaluate fitness of the alphafold predictions. Required.

--score_func
type=str, Name of the score function in the score file used to evaluate fitness of the alphafold predictions. Required.

--score_func_2
default=None, type=str, Name of the second score function in the score file used to evaluate fitness of the alphafold predictions after score_func_2 number of iterations (below) has been reached. Optional.

--score_func_2_iteration
default=30, type=int, Number of iterations after which scoring function is switched to score_func_2. Default is 30 if score_func_2 is provided.

--define_contact_area
default=None, type=str, User can specify residues on target interface to be targeted for contacts, passed to score function for parsing. Default is None.

--bonus_contacts
default=None, type=str, User can define residues on target interface to be given a bonus for making contacts, followed by the distance cutoff. Default is None and 4A.

--penalize_contacts
default=None, type=str, User can define residues on target interface to be given a penalty for making contacts, followed by the distance cutoff. Default is None and 8A.

--no_repeat_af2
Use this flag to specify if you DO NOT want AF2 to be run multiple times on the same sequence, and the score averaged. By default (without this flag) all sequences will be rescored every iteration until each sequence has been scored 5 times. Default is False.

--dont_write_compressed_data
Default is False.

--write_pdbs
Default is False.

--mpnn_freq
default=10, type=int, Protein MPNN is used to refill the pool once every _ iterations. Default is 10.

--mpnn_iters
default=None, type=str, Iteration numbers at which MPNN is used to refill the pool. Defaults to mpnn_freq if not provided.

--skip_mpnn
default=None, type=str, Skip MPNN refilling in these iterations.

--mpnn_temp
default='0.1', type=str, Protein MPNN is used to refill the pool at this sampling temperature. Default is 0.1.

--mpnn_temp_variable
Specify a file mpnn_temps.txt with temperatures for every call to MPNN (or until temp stops changing). Defaults to False and uses constant MPNN temp.

--mpnn_version
default="s_48_020", type=str, Model version used to run MPNN. Default is s_48_020 (soluble).

--mpnn_bias_AA
default=None, type=str, DOES NOT WORK YET. Path to json file containing bias dictionary for MPNN. Default is None.

--mpnn_bias_by_res
default=None, type=str, Path to json file containing per residue bias dictionary for MPNN. Default is None.

--mpnn_chains
default=None, type=str, Chains concatenated into a single pdb for MPNN. Default is None and it will use the first af2 prediction. Example: AB,B

--plot_confidences
Makes PAE and pLDDT plots for each output PDB. Default is False.

--plot_scores_avg
Plots average score value of pool over iterations. Default is False.

--plot_scores_median
Plots median score value of pool over iterations. Default is False.

--plot_scores_top
Plots highest score value of pool over iterations. Default is False.

--crossover_percent
default=0.2, type=float, Fraction of pool refilled by crossover. Default is 0.2 (20% crossover).

--vary_length
default=0, type=int, How much the length of mutable regions is allowed to vary. Default is 0.

--substitution_insertion_deletion_weights
default=None, type=str, Specify probability of substitutions, insertions, and deletions (in that order) during mutation. Default is 0.8,0.1,0.1.

--mutation_percents
default=0.125, type=str, Number of mutations made as a percentage of sequence length during random mutation only. Default is 0.125 (12.5% of mutable sequence length) for every iteration. If more than one value is provided, number of iterations will be split evenly and assigned.

--af2_preds
default="AB", type=str, Chain ID permutations to run through individual AF2 runs, separated by commas. Only used for multistate design. Default is just the complex AB.

--af2_preds_extra
default="AB", type=str, DEPRECATED. Chain ID permutations to run through individual AF2 runs, separated by commas, in addition to the complex.

--rmsd_func
default=None, type=str, DEPRECATED. Name of the rmsd function in the score file used to evaluate fitness of the alphafold predictions. Optional, requires stabilize_binder=True.

--rmsd_to_starting
default=None, type=str, DEPRECATED. Name of the rmsd function in the score file and the path/filename to pdb for RMSD used to evaluate rmsd to the starting scaffold using a U-shaped potential.

--path_to_starting
default=None, type=str, DEPRECATED. path/filename to pdb to pass to the scoring function for RMSD to starting.

(back to top)

HOW TO WRITE AN EVOPRO SCORING FUNCTION

The score functions are a way to generate a “score” from the AlphaFold2 result for each sequence of the optimization pool – and this score is used for ranking and filtering the sequences (so it is only the relative value of the score that matters). Each score function therefore takes in the list of results dictionaries (one for each AF2 prediction made for the sequence) and parses the different components such as the predicted structures in PDB form, pLDDT, PAE, etc. The score function should then return a final overall score for the sequence that can contain different weighted score components.

EvoPro includes a few functions that are prewritten to be called within your score function, that calculate different score metrics which may then be used as weighted components in the final overall score that is returned from your score function. For example, a function to calculate the average pLDDT over a predicted PDB (score_plddt_confidence) or to calculate the contacts at an interface, where each contact is weighted by the PAE of the interaction (score_contacts_pae_weighted). You can import these from evopro.score_funcs.score_funcs, as is done at the top of score_binder.py, or use your own functions instead. Important to know: the more negative scores are better, so we negate score terms that we want to maximize and keep positive the score terms that we want to minimize.

See below for an example score function with annotated code:

def score_overall(results, dsobj, contacts=None, distance_cutoffs=None):
#This is the primary scoring function that is called in the main code. The purpose of this
#function is to determine whether it is the complex prediction (with both binder and
#target) or the monomeric (just binder) prediction that is being evaluated and return the
#corresponding score. The main code will then pair the complex and monomer predictions and
#sum the scores from each.
    from alphafold.common import protein
    #The results here are a list of results dictionaries
    #one for each AF2 prediction generated for the same design sequence
    #so this number should match the number of items in af2_preds
    print("Number of predictions being scored:", len(results))
    
    #change binder chain here if target protein has more than one chain
    binder_chain="B"
    #add path and filename to pdb here if you want to calculate RMSD of binder to something
    starting_pdb = None

    score=[]
    pdbs = []
    #parsing through each dictionary and scoring them based on which prediction it is
    for result in results:
        #Generate the pdb from the alphafold output
        pdb = protein.to_pdb(result['unrelaxed_protein'])
        pdbs.append(pdb)
        #Parse chains and residues from the pdb
        #chains will look like [“A”, “B”] and residues like [“A1”, “A2”, … , “B1”, “B2”, …]
        chains, residues, resindices = get_coordinates_pdb(pdb)
        if len(chains)>1: #if we are scoring the complex
            #Here, contacts and distance_cutoffs come as arguments from the main code
            #(see more descriptions of both inside this function below)
            score.append(score_binder_complex(result, dsobj, contacts, distance_cutoffs, binder_chain=binder_chain))
        else:
            #If the prediction has only one chain call the monomer scoring function.
            score.append(score_binder_monomer(result, dsobj))
            if starting_pdb:
                score.append(score_binder_rmsd_to_starting(pdb, starting_pdb, dsobj=dsobj, binder_chain=binder_chain))
    
    score.append(score_binder_rmsd(pdbs[0], pdbs[1], dsobj=dsobj, binder_chain=binder_chain))

    #you can weight individual values here too
    overall_score = sum([x[0] for x in score])

    #Return values. Here, first term is the overall score to use for ranking and second
    #term has the individual score components that will be printed out in the log file and used for plotting.
    #NOTE: pdb, results MUST be returned and should always be the last 2 elements.
    return overall_score, score, pdbs, results
 
def score_binder_complex(results, dsobj, contacts, distance_cutoffs, binder_chain="B"):
#This is the complex scoring function that is called from the primary score function.
 
#Here, contacts is a tuple containing three items which all come from user inputs:
#(interface_contacts, bonus_contacts, penalty_contacts)
#Interface contacts come from the flag --define_contact_area where the user defines
#which residues on the target protein are being targeted for binding.
#Bonus contacts come from the flag -- bonus_contacts (optional) where the user defines
#residues that are given a “bonus” if contacted by the binder.
#For this, we usually use a single residue in the middle of the target interface to help
#select for binders that are in the right place.
#Penalty contacts come from the flag -- penalize_contacts (optional) where the user defines
#residues that are given a “penalty” if contacted by the binder.
 
#Distance cutoffs also contains 3 elements representing the Angstrom cutoff for each
#of these types of contacts. The default is (4,4,8) and can be changed through the flags.
 
 
    from alphafold.common import protein
    pdb = protein.to_pdb(results['unrelaxed_protein'])
    chains, residues, resindices = get_coordinates_pdb(pdb)
 
    #setting defaults if no user input
    if not contacts:
        contacts=(None,None,None)
    if not distance_cutoffs:
        distance_cutoffs=(4,4,8)
 
    #Get interface contacts on target protein
    reslist1 = contacts[0]
    #Get residues on binder (here, all of chain B) that can bind to the interface contacts
    reslist2 = [x for x in residues.keys() if x.startswith(binder_chain)]
   
    #Get list of interacting residues and PAE-weighted contact score
    contact_list, contactscore = score_contacts_pae_weighted(results, pdb, reslist1, reslist2, dsobj=dsobj, first_only=False, dist=distance_cutoffs[0])
   
    bonuses = 0
    #Get bonus contacts on target protein
    bonus_resids = contacts[1]
    if bonus_resids:
        bonus_contacts, bonus_contactscore = score_contacts_pae_weighted(results, pdb, bonus_resids, reslist2, dsobj=dsobj, first_only=False, dist=distance_cutoffs[1])
        for contact in bonus_contacts:
            if contact[0][0:1] == 'A' and int(contact[0][1:]) in bonus_resids:
                bonuses += 1
                print("bonus found at: " + str(contact[0]))
            if contact[1][0:1] == 'A' and int(contact[1][1:]) in bonus_resids:
                bonuses += 1
                print("bonus found at: " + str(contact[1]))
       
    #Give each bonus contact a score value of -3.
    bonus = -bonuses * 3
   
    penalties = 0
    penalty_resids = contacts[2]
    if penalty_resids:
        penalty_contacts, penalty_contactscore = score_contacts_pae_weighted(results, pdb, penalty_resids, reslist2, dsobj=dsobj, first_only=False, dist=distance_cutoffs[2])
        for contact in penalty_contacts:
            if contact[0][0:1] == 'A' and int(contact[0][1:]) in penalty_resids:
                penalties += 1
                print("penalty found at: " + str(contact[0]))
            if contact[1][0:1] == 'A' and int(contact[1][1:]) in penalty_resids:
                penalties += 1
                print("penalty found at: " + str(contact[1]))
     
    #Give each penalty contact a score value of +3
    penalty = penalties * 3
   
    #Get the average PAE of interface contacts (pae_interaction)
    num_contacts = len(contacts)
    pae_per_contact = 0
    if num_contacts > 0:
        pae_per_contact = (70.0-(70.0*contactscore)/num_contacts)/2
   
    #Calculate the weighted score to return
    score = -contactscore + penalty + bonus
 
    return score, (score, len(contacts), contactscore, pae_per_contact, bonus, penalty), contacts, pdb, results
 
def score_binder_monomer(results, dsobj):
#This is the monomer scoring function that is called from the primary score function.
 
    from alphafold.common import protein
    pdb = protein.to_pdb(results['unrelaxed_protein'])
 
    #Parse chains and residues from the pdb
    chains, residues, resindices = get_coordinates_pdb(pdb)
 
    #Select residues to be included in average pLDDT score
    #Here, we select all residues in the binder-only AF2 prediction
    reslist2 = [x for x in residues.keys()]
    confscore2 = score_plddt_confidence(results, reslist2, resindices, dsobj=dsobj, first_only=False)
 
    #calculate overall score to be returned from this function
    #We divide by 10 since the values are in generally the range of 80-100
    #so this term does not dominate over the complex scores (in the 10s range)
    score = -confscore2/10
 
    return score, (score, confscore2), pdb, results

Contact

Amrita Nallathambi - [email protected]
Since I am a grad student, this code may not be perfect. Please email me with any questions or concerns!

(back to top)

evopro's People

Contributors

Stargazers

Watchers

Forkers

cyrusbiotechnology jaekyoung ganqiao1990 cnp-ciimar maplechen314

evopro's Issues

How to use multichain target?

Hello.
I am trying to work with a multichain target.
I tried generating the msa of the target complex (2 chains) and supply it as custom msa and just run the AF2 step (no GA).

--input_dir input
--output_dir AFmult/predictions
--params_dir /home/amin/softwares/Protein-Design/params-2022-12-06/
--compress_output
--use_ptm
--msa_mode single_sequence
--custom_msa_path /home/amin/Work/Project_42/msas_complex
--max_recycle 4
--save_timing
--num_models 5
--num_seeds 5

But this doesn't seem to work as the target is unfolded. If I use a polyglycine linker to link the two chains and use the msa of the joint construct, it works well, but I would like to use multimer to model the complex as it might be more accurate.
I see there is an example in the dev branch.
https://github.com/Kuhlman-Lab/evopro/blob/dev/evopro/examples/example/af2_flags.txt
From the example, it seems like I should provide templates.
Could you please tell me what format the msa should be in and how to specify the multi chain templates?
I would be really grateful for any suggestions.
Best,
Amin.

Include relaxation in the design process

Hello.
Thanks again for this awesome work.
I am wondering if it's possible to include a relaxation step in the design process.
My thought is that since MPNN is dependent on the backbone coordinates, the performance might improve if the backbone is allowed to relax in each generation.
This also seems to be indicated in https://www.nature.com/articles/s41467-023-38328-5

Could you give me some pointers on where I should start to include a relaxation step using amber or openmm in the design protocol?

Can vary_length be used with the stable branch?

Hello.
I would like to use the vary_length option to refine my binder.
It seems that the argument is not accepted by the stable branch but works with the dev branch.
I can change the script to accept it but I wanted to know if there is something wrong with using this with the stable branch and that's why this option has not been enabled in the stable branch.
I would really appreciate your suggestions.
Best,
Amin.

Format of custom-msa

Hello.
Thanks for making this awesome project available.
I can run the example scripts if I remove the
--msa_mode
--custom_msa_path
part.
But, this results in the receptor not folding properly.
I would like to use the protocol described in the paper, give custom msa for the receptor.
I tried generating an msa for the receptor alone and supply it by giving path to the folder containing the a3m file, but I get an error

Traceback (most recent call last):
  File "/home/amin/anaconda3/envs/af2_mpnn2/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/amin/anaconda3/envs/af2_mpnn2/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/amin/softwares/EVOPRO/evopro/evopro/utils/distributor.py", line 128, in _worker_loop
    result = f(val)
  File "/home/amin/softwares/EVOPRO/alphafold/run/run_af2.py", line 243, in af2
    raw_inputs_from_sequence = getRawInputs(
  File "/home/amin/softwares/EVOPRO/alphafold/run/features.py", line 68, in getRawInputs
    custom_msas.update(getCustomMSADict(custom_msa_path))
  File "/home/amin/softwares/EVOPRO/alphafold/run/features.py", line 483, in getCustomMSADict
    custom_msa_dict[seq].append(line)
KeyError: None

My MSA file looks like this

#214    1
>101
AFTVTVPKDLYVVEYGSNMTIECKFPVEKQLDLAALIVYWEMEDKNIIQFVHGEEDLKVQHSSYRQRARLLKDQLSLGNAALQITDVKLQDAGVYRCMISYGGADYKRITVKVNAPYNKINQRILVVDPVTSEHELTCQAEGYPKAEVIWTSSDHQVLSGKTTTTNSKREEKLFNVTSTLRINTTTNEIFYCTFRRLDPEENHTAELVIPELPL
>UniRef100_UPI001B349335        187     0.771   9.977E-50       0       213     214     39      251     311
AFTITVPKDQYVVEYGSNVTIECKFPVEKPLDLNSLVVYWEKGGKQIIQFVHGKEDPKVQHGSYRQRARLLEDRFYEGIAALQITDVKLQDAGIYCCLISYGGADYKRITLKVNAPYHKINQRISV-DPVTSEHELTCQAEGYPEAEVIWTNRDHQILHGKTILTSSNREEQFFNVTSTLRVNATANEIYYCTFQRPGPEENNTAELIIPEPPV
>UniRef100_A4GW17       185     0.962   4.804E-49       0       213     214     17      230     290
AFTVTVPKDLYVVEYGSNMTIECKFPVEKQLDLTSLIVYWEMEDKNIIQFVHGEEDLKVQHSNYRQRAQLLKDQLSLGNAALRITDVKLQDAGVYRCMISYGGADYKRITVKVNAPYNKINQRILVVDPVTSEHELTCQAEGYPKAEVIWTSSDHQVLSGKTTTTNSKREEKLLNVTSTLRINTTANEIFYCIFRRLDPEENHTAELVIPELPL
>UniRef100_A0A8B7U6V9   184     0.822   1.689E-48       0       213     214     17      229     268

Could you please guide me about the format of the msa file to be supplied. Or if you can share the msa that you used with the example given in the repository, I can try to understand the format.
Best,
Amin.

Add LICENSE file?

Hi,

Have you chosen a license for this code?
If so could you add a license file?
Apologies if I missed it.

Cheers,
Jason

Guidance Needed: Clarification on EvoPro Scaffold Handling and Multi-GPU Execution Strategy

Dear EvoPro Team,

Firstly, I'd like to extend my gratitude for your exceptional contributions to the field of computational protein design. I am currently engaging with EvoPro to develop my own protein binders and have encountered some uncertainties about the initial scaffold sequence processing and the optimal utilization of EvoPro on a dual-GPU workstation setup with 90 starting scaffolds.

Issue 1: Confusion Regarding Initial Scaffold Handling in EvoPro

Based on the available documentation and my understanding, there appear to be two interpretations of how EvoPro processes starting scaffold sequences:

Version 1: It seems that EvoPro begins with a mixed pool of all initial scaffolds (51 in the provided example), which are collectively processed in subsequent iterations. Here, scaffolds are not treated independently but as part of a common evolving pool.

Version 2: Alternatively, the documentation suggests that each scaffold is processed in separate trajectories, with each trajectory starting from a distinct scaffold, and no mixing occurs between these trajectories.

Given my experimentation, where I started with a single scaffold for 100 iterations and observed minimal variation in the output scaffold compared to the input, I am inclined to believe that Version 1 might be the actual approach. However, this contradicts my expectations based on Version 2's interpretation. Could you provide detailed clarification on this process?

Issue 2: Guidance on Running EvoPro with Dual GPUs

Furthermore, I seek guidance on efficiently running EvoPro on a workstation equipped with two GPUs, particularly when starting with a large number of scaffolds (90 in my case). Are there specific configurations or strategies recommended for such a setup to optimize computation and results?

Your detailed insights into these queries will be immensely helpful for my ongoing research projects. Thank you for your time and assistance.

Best regards,
Eric

How to use score_binder_rmsd_to_starting

Hello.
Thanks again for this awesome work.
Sorry for editing the issue. I posted the wrong file and error message earlier.
I am finding that for some complexes, the binding protein can move quite a bit from it's starting positions. So, I tried to use score_binder_rmsd_to_starting but I am not able to figure out how exactly to use it.
I tried

--num_iter 10
--pool_size 20
--num_gpus 2
--mpnn_temp 0.1
--score_file /home/amin/softwares/Protein-Design/EVOPRO/evopro/evopro/score_funcs/score_binder.py
--score_func score_binder
--rmsd_to_starting score_binder_rmsd_to_starting ../../Structures/Ahly_loop_pep_renum.pdb
--path_to_starting ../../Structures/Ahly_loop_pep_renum.pdb
--define_contact_area A57,A58,A59,A60,A61,A62,A63,A198,A200,A201,A204,A205,A206,A207,A214
--af2_preds_extra B
--mpnn_freq 5
--plot_scores_avg
--plot_confidences

But I get an error

Traceback (most recent call last):
  File "/home/amin/softwares/Protein-Design/EVOPRO/evopro/evopro/run/run_geneticalg_gpus.py", line 474, in <module>
    run_genetic_alg_gpus(input_dir, input_dir + flagsfile, scorefunc, starting_seqs, poolsizes=pool_sizes, 
  File "/home/amin/softwares/Protein-Design/EVOPRO/evopro/evopro/run/run_geneticalg_gpus.py", line 186, in run_genetic_alg_gpus
    rmsd_score_list.append(rmsd_to_starting_func(bscore[-2], rmsd_to_starting_pdb, dsobj=dsobj))
  File "/home/amin/softwares/Protein-Design/EVOPRO/evopro/evopro/score_funcs/score_binder.py", line 99, in score_binder_rmsd_to_starting
    rmsd_to_starting = get_rmsd(reslist1, pdb_string_starting, reslist2, pdb, ca_only=True, dsobj=dsobj)
  File "/home/amin/softwares/Protein-Design/EVOPRO/evopro/evopro/score_funcs/score_funcs.py", line 217, in get_rmsd
    rmsd = kabsch_rmsd(A, B, translate=translate)
  File "/home/amin/softwares/Protein-Design/EVOPRO/evopro/evopro/score_funcs/calculate_rmsd.py", line 387, in kabsch_rmsd
    P = kabsch_rotate(P, Q)
  File "/home/amin/softwares/Protein-Design/EVOPRO/evopro/evopro/score_funcs/calculate_rmsd.py", line 409, in kabsch_rotate
    U = kabsch(P, Q)
  File "/home/amin/softwares/Protein-Design/EVOPRO/evopro/evopro/score_funcs/calculate_rmsd.py", line 471, in kabsch
    C = np.dot(np.transpose(P), Q)
  File "<__array_function__ internals>", line 5, in dot
ValueError: shapes (3,303) and (17,3) not aligned: 303 (dim 1) != 17 (dim 0)

Could you please tell me the correct way of using this function.

Best,
Amin

jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Could not find the corresponding function

Hi, I am very interested in you wonderful project, but I encounter an error below:

 python   /dat1/apps/evopro/evopro-stable/evopro/run/run_geneticalg_gpus.py @evopro.flags
Reading ./residue_specs.json
No starting sequences file provided. Initial pool will be generated by random mutation.
initializing distributor
Iteration 1: Creating new sequences by random mutation.
initialization of process 0
WARNING:tensorflow:From /training/nong/app/miniconda3/envs/evopro/lib/python3.9/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.
Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089
2023-07-20 09:28:24.561441: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:114] *** WARNING *** You are using ptxas 10.1.243, which is older than 11.1. ptxas before 11.1 is known to miscompile XLA code, leading to incorrect results or invalid-address errors.

2023-07-20 09:28:24.584896: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:231] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 8.6
2023-07-20 09:28:24.584928: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:234] Used ptxas at ptxas
2023-07-20 09:28:24.592690: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:628] failed to get PTX kernel "shift_right_logical" from module: CUDA_ERROR_NOT_FOUND: named symbol not found
2023-07-20 09:28:24.592757: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: INTERNAL: Could not find the corresponding function
Process Process-1:
Traceback (most recent call last):
  File "/training/nong/app/miniconda3/envs/evopro/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/training/nong/app/miniconda3/envs/evopro/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/dat1/apps/evopro/evopro-stable/evopro/utils/distributor.py", line 123, in _worker_loop
    f = f_init(proc_id, arg_file, lengths)
  File "/dat1/apps/evopro/alphafold-main/run/run_af2.py", line 139, in af2_init
    _ = predictStructure(
  File "/dat1/apps/evopro/alphafold-main/run/model.py", line 423, in predictStructure
    prediction = model_runner.predict(
  File "/training/nong/app/miniconda3/envs/evopro/lib/python3.9/site-packages/alphafold/model/model.py", line 169, in predict
    result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
  File "/training/nong/app/miniconda3/envs/evopro/lib/python3.9/site-packages/jax/_src/random.py", line 132, in PRNGKey
    key = prng.seed_with_impl(impl, seed)
  File "/training/nong/app/miniconda3/envs/evopro/lib/python3.9/site-packages/jax/_src/prng.py", line 267, in seed_with_impl
    return random_seed(seed, impl=impl)
  File "/training/nong/app/miniconda3/envs/evopro/lib/python3.9/site-packages/jax/_src/prng.py", line 580, in random_seed
    return random_seed_p.bind(seeds_arr, impl=impl)
  File "/training/nong/app/miniconda3/envs/evopro/lib/python3.9/site-packages/jax/core.py", line 329, in bind
    return self.bind_with_trace(find_top_trace(args), args, params)
  File "/training/nong/app/miniconda3/envs/evopro/lib/python3.9/site-packages/jax/core.py", line 332, in bind_with_trace
    out = trace.process_primitive(self, map(trace.full_raise, args), params)
  File "/training/nong/app/miniconda3/envs/evopro/lib/python3.9/site-packages/jax/core.py", line 712, in process_primitive
    return primitive.impl(*tracers, **params)
  File "/training/nong/app/miniconda3/envs/evopro/lib/python3.9/site-packages/jax/_src/prng.py", line 592, in random_seed_impl
    base_arr = random_seed_impl_base(seeds, impl=impl)
  File "/training/nong/app/miniconda3/envs/evopro/lib/python3.9/site-packages/jax/_src/prng.py", line 597, in random_seed_impl_base
    return seed(seeds)
  File "/training/nong/app/miniconda3/envs/evopro/lib/python3.9/site-packages/jax/_src/prng.py", line 832, in threefry_seed
    lax.shift_right_logical(seed, lax_internal._const(seed, 32)))
  File "/training/nong/app/miniconda3/envs/evopro/lib/python3.9/site-packages/jax/_src/lax/lax.py", line 515, in shift_right_logical
    return shift_right_logical_p.bind(x, y)
  File "/training/nong/app/miniconda3/envs/evopro/lib/python3.9/site-packages/jax/core.py", line 329, in bind
    return self.bind_with_trace(find_top_trace(args), args, params)
  File "/training/nong/app/miniconda3/envs/evopro/lib/python3.9/site-packages/jax/core.py", line 332, in bind_with_trace
    out = trace.process_primitive(self, map(trace.full_raise, args), params)
  File "/training/nong/app/miniconda3/envs/evopro/lib/python3.9/site-packages/jax/core.py", line 712, in process_primitive
    return primitive.impl(*tracers, **params)
  File "/training/nong/app/miniconda3/envs/evopro/lib/python3.9/site-packages/jax/_src/dispatch.py", line 115, in apply_primitive
    return compiled_fun(*args)
  File "/training/nong/app/miniconda3/envs/evopro/lib/python3.9/site-packages/jax/_src/dispatch.py", line 200, in <lambda>
    return lambda *args, **kw: compiled(*args, **kw)[0]
  File "/training/nong/app/miniconda3/envs/evopro/lib/python3.9/site-packages/jax/_src/dispatch.py", line 895, in _execute_compiled
    out_flat = compiled.execute(in_flat)
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Could not find the corresponding function

raise BadZipFile("File is not a zip file")

Iteration 1: Creating new sequences.
work list [[['WNPPTFSPALLVVTEGDNATFTCSFSNTSESFHVVWHRESPSGQTDTLAAFPEDRSQPGQDSRFRVTQLPNGRDFHMSVVRARRNDSGTYVCGVISLAPKIQIKESLRAELRVTERRAE', 'DREEAKERIKELLELVKRVSEEERRELLREARRLAERVNDPEARRLVEELERLIKEL']], [['WNPPTFSPALLVVTEGDNATFTCSFSNTSESFHVVWHRESPSGQTDTLAAFPEDRSQPGQDSRFRVTQLPNGRDFHMSVVRARRNDSGTYVCGVISLAPKIQIKESLRAELRVTERRAE', 'DREEAKERIKEMLELVKRVSEEERRELLREEVKLAERVNDGEARNLVEELERLQKEL']], [['WNPPTFSPALLVVTEGDNATFTCSFSNTSESFHVVWHRESPSGQTDTLAAFPEDRSQPGQDSRFRVTQLPNGRDFHMSVVRARRNDSGTYVCGVISLAPKIQIKESLRAELRVTERRAE', 'DREEAKARIKEMLELVKRVSWEERRELLREARRLAERVNDFEARRLTEELERSIKEL']], [['WNPPTFSPALLVVTEGDNATFTCSFSNTSESFHVVWHRESPSGQTDTLAAFPEDRSQPGQDSRFRVTQLPNGRDFHMSVVRARRNDSGTYVCGVISLAPKIQIKESLRAELRVTERRAE', 'DREEAKERRKHLLELVKRVSLEERRELLREARRVAERVNDPEAARLVQELERLRKEL']], [['WNPPTFSPALLVVTEGDNATFTCSFSNTSESFHVVWHRESPSGQTDTLAAFPEDRSQPGQDSRFRVTQLPNGRDFHMSVVRARRNDSGTYVCGVISLAPKIQIKESLRAELRVTERRAE', 'DREEAKERIKELLELVKSVSEPERRELLFEMRRLAERVNDPEARRLVSSLESLIKGL']], [['WNPPTFSPALLVVTEGDNATFTCSFSNTSESFHVVWHRESPSGQTDTLAAFPEDRSQPGQDSRFRVTQLPNGRDFHMSVVRARRNDSGTYVCGVISLAPKIQIKESLRAELRVTERRAE', 'DREEAKERIKEELELVKRVSEEERRELLREARRIAERVNDPDSSRIVEELERLIKEL']]]
initialization of process 0
Process Process-1:
Traceback (most recent call last):
File "/home/samuel/anaconda3/envs/evopro/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/samuel/anaconda3/envs/evopro/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/samuel/Documents/evopro/evopro/utils/distributor.py", line 116, in _worker_loop
f = f_init(proc_id, arg_file, lengths)
File "/home/samuel/Documents/alphafold/run/run_af2.py", line 125, in af2_init
model_runner = getModelRunner(
File "/home/samuel/Documents/alphafold/run/model.py", line 88, in getModelRunner
params = data.get_model_haiku_params(model_name, params_dir)
File "/home/samuel/anaconda3/envs/evopro/lib/python3.9/site-packages/alphafold/model/data.py", line 31, in get_model_haiku_params
params = np.load(io.BytesIO(f.read()), allow_pickle=False)
File "/home/samuel/anaconda3/envs/evopro/lib/python3.9/site-packages/numpy/lib/npyio.py", line 444, in load
ret = NpzFile(fid, own_fid=own_fid, allow_pickle=allow_pickle,
File "/home/samuel/anaconda3/envs/evopro/lib/python3.9/site-packages/numpy/lib/npyio.py", line 190, in init
_zip = zipfile_factory(fid)
File "/home/samuel/anaconda3/envs/evopro/lib/python3.9/site-packages/numpy/lib/npyio.py", line 103, in zipfile_factory
return zipfile.ZipFile(file, *args, **kwargs)
File "/home/samuel/anaconda3/envs/evopro/lib/python3.9/zipfile.py", line 1266, in init
self._RealGetContents()
File "/home/samuel/anaconda3/envs/evopro/lib/python3.9/zipfile.py", line 1333, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Getting this error after running the evopro line.
af2.flags:
--output_dir outputs/predictions
--params_dir /home/samuel/Documents/alphafold/alphafold/model_weights
--compress_output
--use_ptm
--msa_mode single_sequence
--num_models 1
--max_recycle 1
--save_timing
--design_run
evopro.flags
--num_iter 3
--pool_size 6
--num_gpus 1
--score_file /home/samuel/Documents/evopro/evopro/run/score_file.py
--score_func score_overall
--rmsd_func score_binder_rmsd
--af2_preds_extra B
--define_contact_area A31-A58,A88-A108
--mpnn_freq 2
--plot_scores_avg
--plot_confidences

I don't understand which file is supposed to be a zip file

Prediction of Multimers with AF2 Template Mode

Hello,

Thank you for this repository, it has been very helpful for running AlphaFold in single-sequence template mode. I've been using the script run_af2_dist.py directly for structure prediction. However, I have only gotten monomers to work. I would like to predict multimers using this script. What flags should I provide to AF2? And how should sequences.csv be formatted to predict complexes? I've copied my current flags below. Any help would be greatly appreciated. Thanks!

--output_dir /scratch/ja961/Test-nomsa/template_test/Output/
--params_dir /scratch/ja961/af2_params/
--use_ptm
--num_models 1
--max_recycle 1
--msa_mode single_sequence
--use_templates
--custom_template_path /scratch/ja961/evopro/evopro/run/pdz_template/
--design_run
--use_multimer_v2

Bug?

Traceback (most recent call last):
File "/home/samuel/Documents/EvoPro/evopro/evopro/run/generate_json.py", line 263, in
pdbids, chain_seqs = parse_pdbfile(args.pdb)
File "/home/samuel/Documents/EvoPro/evopro/evopro/run/generate_json.py", line 32, in parse_pdbfile
chains, residues, resindices = get_coordinates_pdb_old(filename, fil=True)
File "/home/samuel/Documents/EvoPro/evopro/evopro/utils/pdb_parser.py", line 185, in get_coordinates_pdb_old
with open(pdb,"r") as f:
TypeError: expected str, bytes or os.PathLike object, not NoneType

When I try running python /path/to/evopro/run/generate_json.py @JSON.flags I get this error. I've already set the local paths at the top of each script and downloaded everything

kuhlman-lab / evopro Goto Github PK

evopro's Introduction

About EvoPro

Getting Started

Prerequisites

EvoPro Installation

Usage

Generating the JSON file

Specifying the AlphaFold2 options

EvoPro flag options

HOW TO WRITE AN EVOPRO SCORING FUNCTION

Contact

evopro's People

Contributors

Stargazers

Watchers

Forkers

evopro's Issues

Recommend Projects

Recommend Topics

Recommend Org