Comments (4)
更新一下,我修正这些问题之后的结果是
RMSE: 1.238, MAE: 0.978, SD: 1.23, R: 0.824
论文结果: RMSE: 1.316(0.031), MAE: 1.027(0.025), SD: 1.312(0.035), R:0.797(0.012)
from paddlehelix.
感谢指出的问题。
- 这部分确实是数据处理部分存在的bug,今天进行了修复并重新实验,在我们机器环境下与论文结果相差不大,结果如下:
RMSE: 1.311(0.021), MAE: 1.021(0.009), SD: 1.312(0.027), R:0.798(0.008)
-
在特征抽取时针对不同数据格式我们当时尝试了不同的方法,在release代码时放错了函数,实际生成的预处理数据集没有问题
-
我们在代码中其实有加入去除图中情况的处理策略,来避免a_01和a_10相连,同样在整理发布代码时有所疏忽。
bond_graph_base[range(num_bonds), [indices.index([x[1],x[0]]) for x in indices]] = 0
我们会在近期同步修正以上问题后的代码~
from paddlehelix.
好的,我试下去掉自邻边看看,这样做除了减少计算量,有什么其它好处吗?
PS. 提两个小的建议,可以加速数据预处理
- bond2bond的邻接矩阵计算,可以采用gpu来加速计算两个矩阵的外积,例如使用cupy
import cupy as cp
bond_graph_base = cp.matmul(cp.array(assignment_b2a, dtype='int8'), cp.array(assignment_a2b, dtype='int8')).get()
- 两边的夹角计算可以用numpy来并行计算
我采用了这两个策略后,平均每个样本的处理耗时0.1S左右
from paddlehelix.
好的,我试下去掉自邻边看看,这样做除了减少计算量,有什么其它好处吗?
PS. 提两个小的建议,可以加速数据预处理
- bond2bond的邻接矩阵计算,可以采用gpu来加速计算两个矩阵的外积,例如使用cupy
import cupy as cp bond_graph_base = cp.matmul(cp.array(assignment_b2a, dtype='int8'), cp.array(assignment_a2b, dtype='int8')).get()
- 两边的夹角计算可以用numpy来并行计算
我采用了这两个策略后,平均每个样本的处理耗时0.1S左右
这里是类似于atom graph里去掉self-loop,主要目的还是让模型学习每个target atom/bond的周围邻居的空间分布。或者单独再划分一个domain来加入这种『自邻边』也是可以的,我们之后也准备进一步尝试一下不同的策略。
特别感谢提出的一系列建议👍🏻
from paddlehelix.
Related Issues (20)
- Get Embedding Layer HOT 2
- Could you please provide the CASP14 test set which you evaluated on? HOT 2
- GEM pretrain model. HOT 1
- GEM does't work HOT 2
- 在使用paddlehelix安装时,sh scripts/build.sh,会出现make: *** makefile。 停止。的错误
- 使用命令 pip install --upgrade git+https~ 时候会出现sklearn is depreciation use scikit-learn instead
- when will Code for "Multimodal Pre-Training Model for Sequence-based Prediction of Protein-Protein Interaction" be released?
- Encountered ImportError when using HelixFold-Single. HOT 1
- 运行JTVAE的代码,preprocess没有问题,训练代码报错,使用的是你们官方提供的数据集 HOT 1
- how to inference with DCU
- ask for the generation of processed.pkl in the geomGCL HOT 1
- 运行tutorial的时候出现与cuda有关的error HOT 1
- 权重
- Get distogram prediction from Helix-single
- paddlehelix安装问题 HOT 4
- 加载GEM pretrain model提示模型shape和权重shape不一致 HOT 1
- 运行脚本train_cls.py报错:AttributeError: type object 'paddle.fluid.libpaddle.VarBase' has no attribute '__getitem__'
- 运行脚本train_cls.py报错,AttributeError: type object 'paddle.fluid.libpaddle.VarBase' has no attribute '__getitem__'
- 运行helixfold-single-inference.py 报错 HOT 1
- pip install paddlehelix报错 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paddlehelix.