Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Just to clarify, I didn't mean going over the GO annotations by hand, but rather

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

GO Annotation Vector about protein_bert HOT 8 OPEN

rhysnewell commented on July 4, 2024

GO Annotation Vector

from protein_bert.

Comments (8)

nadavbra commented on July 4, 2024

Hi @rhysnewell,

Again I really apologize for this, but we haven't yet been able to recover the ordering of the GO annotation vector or retrain the model. If you have the time to do some investigation, maybe you can feed the model with many protein sequences for which you know the correct GO annotations and try to use it to recover which bit corresponds to which GO annotation (I'm not 100% sure it will work, but maybe it's worth trying before resorting to retraining the whole thing..)

You should think about these 0-1 numbers as probabilities. A value of 0.1 would indicate that the model assigns 10% probability for the protein having that annotation. Studying these values was not part of the paper, so I can't recommend any specific cutoffs (I think it's more dependent on what you use it for).

What do you mean when you say that the model is stringent?

I hope that at least somewhat helps (and again, really sorry about this loss of data..)

from protein_bert.

rhysnewell commented on July 4, 2024

Thanks for clarifying. Unfortunately, I don't think that would work as there are just too many annotations to go over and determine one by one.

So in the paper how did you decide whether the model correctly predicted a GO annotation then? Did you not have a cutoff that was used (even if arbitrary like 95% or something)?

The stringency comment was just a mistake on my part I think. I passed it some more genes and it was providing much more sensible probabilities (i.e. > 99%).

Also, regarding the loss of data. Are you or collaborators not using your model to make their own predictions? Maybe someone you work with has access or has re-generated the model and kept the GO annotation vector.

Cheers,
Rhys

from protein_bert.

nadavbra commented on July 4, 2024

Just to clarify, I didn't mean going over the GO annotations by hand, but rather writing a program to do it (but it still could be a lot of work, so I totally get you)
We never directly assessed the predicted GO annotations in the paper. We just looked at the loss (cross entropy) which uses the continuous probabilities (so we never had to choose any cutoff).
My co-author Dan is planning to retrain an updated version of the model at some point, but I'm not sure when it's going to happen.

from protein_bert.

TheLostLambda commented on July 4, 2024

Hi @nadavbra !

If it's any help, I'm considering some retraining myself and have access to my universities HPC cluster (with some very powerful GPUs) that could help speed up the process? Do you know if your co-author Dan has an updated version he'd like to retrain?

I'm happy to use my university's HPC resources to do so (which might speed up the training) and this might help recover the GO vector and also open the door to some slightly larger models (which you hypothesized in the paper could further increase it's power).

If you could put me in touch with Dan, I'd love to help out!

from protein_bert.

ddofer commented on July 4, 2024

Hi @TheLostLambda
We'd love to retrain proteinBert with improvements (a larger model, another convolution layer, and ideally, to remove the GO annotation as input (while keeping it as an output), and using an updated uniprot/uniref90 dump).
Myself and Nadav don't have a ton of capacity, at least at the level of improving the input format (I can do the model hyperparameter architecture tweaks easily though). Would you be interested in collaabing on that?

from protein_bert.

TheLostLambda commented on July 4, 2024

Hi @ddofer ! I'll admit I'm somewhat new to machine learning as a field, but I do have a pretty solid computing background overall, so I'd be happy to give things a swing and try to help out! I'm happy to try to implement all of the improvements you've both already thought up and to train the model using my university's resources!

I'm definitely interested in collaborating and helping out however I can!

from protein_bert.

TheLostLambda commented on July 4, 2024

@ddofer @nadavbra Let me know if you'd like a meeting at some point to set things in motion!

from protein_bert.

a-ill commented on July 4, 2024

@nadavbra Any progress with recovering the vector or retraining the model?

from protein_bert.

GO Annotation Vector about protein_bert HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent