Implementing Eigenfold paper (https://arxiv.org/pdf/2304.02198.pdf)
- Download PDB data
- Process PDB data
- Generate OmegaFold Embeddings on processed pdb data
- Train on those embeddings
Structure graph G = (V, E) where G represents a protein with a specific sequence, V is the set of residues and E the edges connecting neighboring residues. The model learns G-dependent probability distributions under a forward diffusion process:
dx = โ1/2Hxdt + dw
x are the coordinates of the alpha carbons and H is chosen such that undesired, chemically implausible structures have high energy E(x):
![image](https://private-user-images.githubusercontent.com/3611926/258490443-bbf836e5-6333-43e1-9b5d-aa40ca7baf21.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjIxNDA4MTAsIm5iZiI6MTcyMjE0MDUxMCwicGF0aCI6Ii8zNjExOTI2LzI1ODQ5MDQ0My1iYmY4MzZlNS02MzMzLTQzZTEtOWI1ZC1hYTQwY2E3YmFmMjEucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcyOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MjhUMDQyMTUwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ODE1MjhjNWVjMDM4MTUwNzk3MTdjMzQ4NDRkMTlhNWMwNWRkMDBhODlhODUwMDE1OWYzNDcwYWY1ZjllNWY4OSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.j0Oqy4cz-mvLvFRKiTotNByDIiICv-FYoHvRtbVXdYU)
To enforce a RMS distance of 3.8 ร between adjacent alpha carbons:
![image](https://private-user-images.githubusercontent.com/3611926/258491580-da2ebb7c-a5e3-4bf8-be0b-95d7f602313b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjIxNDA4MTAsIm5iZiI6MTcyMjE0MDUxMCwicGF0aCI6Ii8zNjExOTI2LzI1ODQ5MTU4MC1kYTJlYmI3Yy1hNWUzLTRiZjgtYmUwYi05NWQ3ZjYwMjMxM2IucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcyOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MjhUMDQyMTUwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZjQ2Mjc5M2U1YjNjOGI1ZTM4NWE3NmQwZDA3MTUzOTEzMTQ2Nzk0NDgwZjExMGI3MTY2NzE5ZjI4MDk1NjY1OCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.s-RVGTcU65USI4dGiiteKdUMkH6-PVWCVQ8h7zAeGbM)
Now we have a SDE and can train a score model. The score model is a graph neural net with message passing layers between all residues. The network does not just have residue coordinates but also has featurized OmegaFold embeddings. The message passing layers are from Tensor field networks: Rotation- and translation-equivariant neural networks so the model is invariant to 3D rotations/translations!