Is it possible to obtain the last activation values using <a href="https://github.com/

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Protein Embedding with last activation layers? about alphafold HOT 9 CLOSED

google-deepmind commented on June 10, 2024 5

Protein Embedding with last activation layers?

from alphafold.

Comments (9)

ricomnl commented on June 10, 2024 4

@xinformatics The first section of the article The AlphaFold2 Method Paper: A Fount of Good Ideas suggests that s_i is the embedding you want to use. This would correspond to the single key in the prediction_result['representations'] dict.

At every step of the process, {s_i} is kept updated, communicating back and forth with {z_{ij}}, so that whatever is built up in {z_{ij}} is made accessible to {s_i}. As a result {s_i} is front and center in all the major modules. And at the end, in the structure module, it is ultimately {s_i}, not {z_{ij}}, that encodes the structure (where the quaternions get extracted to generate the structure). This avoids the awkwardness of having to project the 2D representation onto 3D space.

from alphafold.

ptynecki commented on June 10, 2024 3

I would precise the question:

How can we execute AF2 pipeline to get fixed-length numeric vector which will represent single AA sequence?
If it is possible, should we expect that AA sequence length have to be no longer than 512, 1280 or any other limits?

from alphafold.

xinformatics commented on June 10, 2024 3

@tfgg Could you suggest which representation would be a good choice as an protein embedding for downstream tasks? since i get 5 different representations from the prediction result?

from alphafold.

russbates commented on June 10, 2024 2

Hi,
Although the ability to return the final representations/embeddings is not currently exposed in the RunModel container, it should be possible to enable it by adding a return_representations=True key-word argument here:
https://github.com/deepmind/alphafold/blob/d26287ea57e1c5a71372f42bf16f486bb9203068/alphafold/model/model.py#L64

from alphafold.

ricomnl commented on June 10, 2024 1

Ah interesting! I'm looking at a similar task. Two things I'll look at is 1) "turning off" the recycling step (doing a one-pass only) and 2) using only 1 of the models (instead of all 7 and then select the best scoring as they do in the provided AlphaFold.ipynb).

[...]
model_names = ['model_1', 'model_2', 'model_3', 'model_4', 'model_5', 'model_2_ptm']

[...]
for model_name in model_names:
   [...]

[...]
# Find the best model according to the mean pLDDT
best_model_name = max(plddts.keys(), key=lambda x: plddts[x].mean())

[...]

from alphafold.

xinformatics commented on June 10, 2024

I didn't run the actual model but I was using the jupyter notebook provided by @sokrypton. He suggested to edit class AlphaFold (located inalphafold/model/modules.py ) set return_representations=True.

In the jupyter notebook he provided,

prediction_result = model_runner.predict(processed_feature_dict)

gives 'prediction_result' as a dictionary with a key as 'representations'

prediction_result.keys()
dict_keys(['distogram', 'experimentally_resolved', 'masked_msa', 'predicted_lddt', 'representations', 'structure_module', 'plddt'])

this returns a nested dictionary and then

prediction_result['representations'].keys() outputs
dict_keys(['msa', 'msa_first_row', 'pair', 'single', 'structure_module'])

it contains the learned representations, although I am not sure which one to use. Hope it helps

from alphafold.

ptynecki commented on June 10, 2024

@tfgg
Is there any reason why this thread was closed? @xinformatics shared some tips but the main questions still haven't answered.

Thank you for considering.

from alphafold.

xinformatics commented on June 10, 2024

@rmeinl Thank you so much. I was thinking on the similar lines. Actually, the problem is my case is that I only need the representations (not the final PDB product) and somehow I am unable to figure out how to run AF2 prediction in a loop. I have 964 sequences and I wish to avoid running AF2 manually on each sequence. The embedding extraction is available on my Github Alphafold

from alphafold.

pykao commented on June 10, 2024

Hi @xinformatics,

I set return_representations=True within alphafold/model/modules.py, relaunched the docker container, and ran the same experiment again. However, the feature.pkl is still the same. Could you please point out which jupyter notebook ColabFold use to generate the protein embedding?

Best,
Po-Yu

from alphafold.

Protein Embedding with last activation layers? about alphafold HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent