Giter VIP home page Giter VIP logo

Comments (8)

nadavbra avatar nadavbra commented on July 4, 2024

Hi @rhysnewell,

Again I really apologize for this, but we haven't yet been able to recover the ordering of the GO annotation vector or retrain the model. If you have the time to do some investigation, maybe you can feed the model with many protein sequences for which you know the correct GO annotations and try to use it to recover which bit corresponds to which GO annotation (I'm not 100% sure it will work, but maybe it's worth trying before resorting to retraining the whole thing..)

You should think about these 0-1 numbers as probabilities. A value of 0.1 would indicate that the model assigns 10% probability for the protein having that annotation. Studying these values was not part of the paper, so I can't recommend any specific cutoffs (I think it's more dependent on what you use it for).

What do you mean when you say that the model is stringent?

I hope that at least somewhat helps (and again, really sorry about this loss of data..)

from protein_bert.

rhysnewell avatar rhysnewell commented on July 4, 2024

Thanks for clarifying. Unfortunately, I don't think that would work as there are just too many annotations to go over and determine one by one.

So in the paper how did you decide whether the model correctly predicted a GO annotation then? Did you not have a cutoff that was used (even if arbitrary like 95% or something)?

The stringency comment was just a mistake on my part I think. I passed it some more genes and it was providing much more sensible probabilities (i.e. > 99%).

Also, regarding the loss of data. Are you or collaborators not using your model to make their own predictions? Maybe someone you work with has access or has re-generated the model and kept the GO annotation vector.

Cheers,
Rhys

from protein_bert.

nadavbra avatar nadavbra commented on July 4, 2024
  1. Just to clarify, I didn't mean going over the GO annotations by hand, but rather writing a program to do it (but it still could be a lot of work, so I totally get you)
  2. We never directly assessed the predicted GO annotations in the paper. We just looked at the loss (cross entropy) which uses the continuous probabilities (so we never had to choose any cutoff).
  3. My co-author Dan is planning to retrain an updated version of the model at some point, but I'm not sure when it's going to happen.

from protein_bert.

TheLostLambda avatar TheLostLambda commented on July 4, 2024

Hi @nadavbra !

If it's any help, I'm considering some retraining myself and have access to my universities HPC cluster (with some very powerful GPUs) that could help speed up the process? Do you know if your co-author Dan has an updated version he'd like to retrain?

I'm happy to use my university's HPC resources to do so (which might speed up the training) and this might help recover the GO vector and also open the door to some slightly larger models (which you hypothesized in the paper could further increase it's power).

If you could put me in touch with Dan, I'd love to help out!

from protein_bert.

ddofer avatar ddofer commented on July 4, 2024

Hi @TheLostLambda
We'd love to retrain proteinBert with improvements (a larger model, another convolution layer, and ideally, to remove the GO annotation as input (while keeping it as an output), and using an updated uniprot/uniref90 dump).
Myself and Nadav don't have a ton of capacity, at least at the level of improving the input format (I can do the model hyperparameter architecture tweaks easily though). Would you be interested in collaabing on that?

from protein_bert.

TheLostLambda avatar TheLostLambda commented on July 4, 2024

Hi @ddofer ! I'll admit I'm somewhat new to machine learning as a field, but I do have a pretty solid computing background overall, so I'd be happy to give things a swing and try to help out! I'm happy to try to implement all of the improvements you've both already thought up and to train the model using my university's resources!

I'm definitely interested in collaborating and helping out however I can!

from protein_bert.

TheLostLambda avatar TheLostLambda commented on July 4, 2024

@ddofer @nadavbra Let me know if you'd like a meeting at some point to set things in motion!

from protein_bert.

a-ill avatar a-ill commented on July 4, 2024

@nadavbra Any progress with recovering the vector or retraining the model?

from protein_bert.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.