Giter VIP home page Giter VIP logo

Comments (9)

ricomnl avatar ricomnl commented on June 10, 2024 4

@xinformatics The first section of the article The AlphaFold2 Method Paper: A Fount of Good Ideas suggests that s_i is the embedding you want to use. This would correspond to the single key in the prediction_result['representations'] dict.

At every step of the process, {s_i} is kept updated, communicating back and forth with {z_{ij}}, so that whatever is built up in {z_{ij}} is made accessible to {s_i}. As a result {s_i} is front and center in all the major modules. And at the end, in the structure module, it is ultimately {s_i}, not {z_{ij}}, that encodes the structure (where the quaternions get extracted to generate the structure). This avoids the awkwardness of having to project the 2D representation onto 3D space.

from alphafold.

ptynecki avatar ptynecki commented on June 10, 2024 3

I would precise the question:

How can we execute AF2 pipeline to get fixed-length numeric vector which will represent single AA sequence?
If it is possible, should we expect that AA sequence length have to be no longer than 512, 1280 or any other limits?

from alphafold.

xinformatics avatar xinformatics commented on June 10, 2024 3

@tfgg Could you suggest which representation would be a good choice as an protein embedding for downstream tasks? since i get 5 different representations from the prediction result?

from alphafold.

russbates avatar russbates commented on June 10, 2024 2

Hi,
Although the ability to return the final representations/embeddings is not currently exposed in the RunModel container, it should be possible to enable it by adding a return_representations=True key-word argument here:
https://github.com/deepmind/alphafold/blob/d26287ea57e1c5a71372f42bf16f486bb9203068/alphafold/model/model.py#L64

from alphafold.

ricomnl avatar ricomnl commented on June 10, 2024 1

Ah interesting! I'm looking at a similar task. Two things I'll look at is 1) "turning off" the recycling step (doing a one-pass only) and 2) using only 1 of the models (instead of all 7 and then select the best scoring as they do in the provided AlphaFold.ipynb).

[...]
model_names = ['model_1', 'model_2', 'model_3', 'model_4', 'model_5', 'model_2_ptm']

[...]
for model_name in model_names:
   [...]

[...]
# Find the best model according to the mean pLDDT
best_model_name = max(plddts.keys(), key=lambda x: plddts[x].mean())

[...]

from alphafold.

xinformatics avatar xinformatics commented on June 10, 2024

I didn't run the actual model but I was using the jupyter notebook provided by @sokrypton. He suggested to edit class AlphaFold (located inalphafold/model/modules.py ) set return_representations=True.

In the jupyter notebook he provided,

prediction_result = model_runner.predict(processed_feature_dict)

gives 'prediction_result' as a dictionary with a key as 'representations'

prediction_result.keys()
dict_keys(['distogram', 'experimentally_resolved', 'masked_msa', 'predicted_lddt', 'representations', 'structure_module', 'plddt'])

this returns a nested dictionary and then

prediction_result['representations'].keys() outputs
dict_keys(['msa', 'msa_first_row', 'pair', 'single', 'structure_module'])

it contains the learned representations, although I am not sure which one to use. Hope it helps

from alphafold.

ptynecki avatar ptynecki commented on June 10, 2024

@tfgg
Is there any reason why this thread was closed? @xinformatics shared some tips but the main questions still haven't answered.

Thank you for considering.

from alphafold.

xinformatics avatar xinformatics commented on June 10, 2024

@rmeinl Thank you so much. I was thinking on the similar lines. Actually, the problem is my case is that I only need the representations (not the final PDB product) and somehow I am unable to figure out how to run AF2 prediction in a loop. I have 964 sequences and I wish to avoid running AF2 manually on each sequence. The embedding extraction is available on my Github Alphafold

from alphafold.

pykao avatar pykao commented on June 10, 2024

Hi @xinformatics,

I set return_representations=True within alphafold/model/modules.py, relaunched the docker container, and ran the same experiment again. However, the feature.pkl is still the same. Could you please point out which jupyter notebook ColabFold use to generate the protein embedding?

Best,
Po-Yu

from alphafold.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.