Comments (9)
@xinformatics The first section of the article The AlphaFold2 Method Paper: A Fount of Good Ideas
suggests that s_i is the embedding you want to use. This would correspond to the single
key in the prediction_result['representations'] dict.
At every step of the process, {s_i} is kept updated, communicating back and forth with {z_{ij}}, so that whatever is built up in {z_{ij}} is made accessible to {s_i}. As a result {s_i} is front and center in all the major modules. And at the end, in the structure module, it is ultimately {s_i}, not {z_{ij}}, that encodes the structure (where the quaternions get extracted to generate the structure). This avoids the awkwardness of having to project the 2D representation onto 3D space.
from alphafold.
I would precise the question:
How can we execute AF2 pipeline to get fixed-length numeric vector which will represent single AA sequence?
If it is possible, should we expect that AA sequence length have to be no longer than 512, 1280 or any other limits?
from alphafold.
@tfgg Could you suggest which representation would be a good choice as an protein embedding for downstream tasks? since i get 5 different representations from the prediction result?
from alphafold.
Hi,
Although the ability to return the final representations/embeddings is not currently exposed in the RunModel
container, it should be possible to enable it by adding a return_representations=True
key-word argument here:
https://github.com/deepmind/alphafold/blob/d26287ea57e1c5a71372f42bf16f486bb9203068/alphafold/model/model.py#L64
from alphafold.
Ah interesting! I'm looking at a similar task. Two things I'll look at is 1) "turning off" the recycling step (doing a one-pass only) and 2) using only 1 of the models (instead of all 7 and then select the best scoring as they do in the provided AlphaFold.ipynb).
[...]
model_names = ['model_1', 'model_2', 'model_3', 'model_4', 'model_5', 'model_2_ptm']
[...]
for model_name in model_names:
[...]
[...]
# Find the best model according to the mean pLDDT
best_model_name = max(plddts.keys(), key=lambda x: plddts[x].mean())
[...]
from alphafold.
I didn't run the actual model but I was using the jupyter notebook provided by @sokrypton. He suggested to edit class AlphaFold (located inalphafold/model/modules.py
) set return_representations=True.
In the jupyter notebook he provided,
prediction_result = model_runner.predict(processed_feature_dict)
gives 'prediction_result' as a dictionary with a key as 'representations'
prediction_result.keys()
dict_keys(['distogram', 'experimentally_resolved', 'masked_msa', 'predicted_lddt', 'representations', 'structure_module', 'plddt'])
this returns a nested dictionary and then
prediction_result['representations'].keys()
outputs
dict_keys(['msa', 'msa_first_row', 'pair', 'single', 'structure_module'])
it contains the learned representations, although I am not sure which one to use. Hope it helps
from alphafold.
@tfgg
Is there any reason why this thread was closed? @xinformatics shared some tips but the main questions still haven't answered.
Thank you for considering.
from alphafold.
@rmeinl Thank you so much. I was thinking on the similar lines. Actually, the problem is my case is that I only need the representations (not the final PDB product) and somehow I am unable to figure out how to run AF2 prediction in a loop. I have 964 sequences and I wish to avoid running AF2 manually on each sequence. The embedding extraction is available on my Github Alphafold
from alphafold.
Hi @xinformatics,
I set return_representations=True
within alphafold/model/modules.py
, relaunched the docker container, and ran the same experiment again. However, the feature.pkl
is still the same. Could you please point out which jupyter notebook ColabFold use to generate the protein embedding?
Best,
Po-Yu
from alphafold.
Related Issues (20)
- RMSD95 Definition
- Calculation of ipTM score for multimers greater than a dimer
- Attention docstring missing head dimension for arguments mask and nonbatched_bias
- HMMER MSAs aren't saved and repeated when running with --use_precomputed_msas=True HOT 10
- bash redirection error HOT 1
- ` HHSearch failed` with no `stdout` or `stderr` output HOT 5
- JAX 0.4.14 jaxlib cudnn error - non-docker installation HOT 2
- 404 Client Error for http+docker://localhost/v1.44/images/create?tag=latest&fromImage=alphafold: Not Found ("pull access denied for alphafold, repository does not exist or may require 'docker login': denied: requested access to the resource is denied") HOT 6
- Downloading mmCIF files problem HOT 2
- Run multimer with a "single receptor" and "multiple partners" HOT 5
- Failing to install; Colab notebook version HOT 1
- CONDA will not install HOT 1
- error whole running alphafold google colab HOT 2
- prediction results are not consistent HOT 2
- Getting more than 25 predicted models for multimer mode HOT 4
- Does it cost money to download the dataset or not?
- ValueError: Could not find CIFs
- The proper way to find the maximum subbatch_size?
- Fatal Python error: Segmentation fault
- AlphaFold uses the wrong resolution field during structure parsing HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alphafold.