Giter VIP home page Giter VIP logo

Comments (3)

jproney avatar jproney commented on July 21, 2024

Hello! Thanks for the question. The way to handle this scenario is to create a template with the full sequence length, but use the template_all_atom_masks feature to indicate which residues are fully or partially missing from the template. I believe the existing code should be able to handle this scenario, although clearly something is going wrong in this case. The following logic in the score_decoy function is meant to accommodate missing residues so long as the residues that do exist match the target sequence:

  decoy_seq_in = "".join([residue_constants.restypes[x] for x in decoy_prot.aatype]) # the sequence in the decoy PDB file

  mismatch = False
  if decoy_seq_in == target_seq:
    assert jnp.all(decoy_prot.residue_index - 1 == np.arange(len(target_seq)))
  else: # case when template is missing some residues
    if args.verbose:
      print("Sequece mismatch: {}".format(name))
    mismatch=True

    assert "".join(target_seq[i-1] for i in decoy_prot.residue_index) == decoy_seq_in 

Is it one of these assertions that is failing? If so, there could potentially be an issue with the numbering of the residues in the PDB file, which should match their position in the sequence regardless of whether some residues are missing. Any more specific information you can provide on the issue would be helpful. Thanks!

from af2rank.

luhong88 avatar luhong88 commented on July 21, 2024

Okay, I guess there's a higher level question here. So far the way I've been running af2rank is through a local installation of the colab notebook code (https://colab.research.google.com/github/sokrypton/ColabDesign/blob/main/af/examples/AF2Rank.ipynb#scrollTo=UCUZxJdbBjZt). A quick glance at the test_templates.py seems to suggest that this is not quite the same as what's provided with the colab notebook. Should I be switching over to using test_templates.py?

Edit: after reading through the code more carefully, it seems to me that the main differences between test_templates.py and the colab notebook are:

  • test_templates.py directly calls the alphafold module, while the colab notebook goes through colabdesign
  • test_templates.py cannot take in multimers, while the notebook can.
  • test_templates.py can take in templates with missing residues and will produce a predicted structure with the missing residues filled in through af, but the notebook ignores missing residues.

But in the end, both test_templates.py and the notebook will create fake CB coordinates for glycines, and both have the option to mask template sequence and sidechain atom coordinates. Is this accurate?

from af2rank.

jproney avatar jproney commented on July 21, 2024

I think your assessment is accurate, although the notebook can handle templates with missing residues with a few minor modifications! I've put together the following notebook to handle missing residues:

https://colab.research.google.com/drive/1lFg0zem4-dm70JdZEhXJuigZb0NFW3fH?usp=sharing

The notebook contains an example of ranking a template with deleted residues, which should help give some insight into how to address this issue. In this new notebook, calling af.predict(pdb=pdb_path, seq=seq) will ensure that seq is used as the target sequence. The residues in the template will then be correctly indexed and masked within the native sequence, so long as the PDB residues are numbered correctly (see the example in the notebook for reference, but essentially the index associated with each residue in the PDB needs to be its index within the full sequence, meaning that some numbers will be skipped in a PDB with unresolved residues).

@sokrypton if all seems well could you integrate this update to the notebook into the ColabFold repo?

from af2rank.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.