Giter VIP home page Giter VIP logo

Comments (16)

amva13 avatar amva13 commented on September 28, 2024

@abearab ^ you can have a look at this

from tdc.

miguelgondu avatar miguelgondu commented on September 28, 2024

I'm having the same issue with GSK3B. Moreover, there's a discrepancy on whether I evaluate a list of SMILES or just a single SMILE. If I evaluate a SMILE, I get 0.0; if I evaluate a list, I get the error @amva13 is getting for JNK3.

I wonder whether something changed in sklearn's random forests and their formatting. That being said, I'm using sklearn==1.3.0, which is the version inside this project's requirements.txt.

from tdc.

miguelgondu avatar miguelgondu commented on September 28, 2024

The culprit for the discrepancy between lists/individual SMILES is the try-except block in L656 of the implementation of oracles.

In other words, the loading of the oracle is failing silently, and thus the oracle returns the default value.

So we could try to solve two problems:

  1. Calling oracles on smile_str and [smile_str] should have the same behavior.
  2. Fixing the loading of the oracles for GSK3B and JNK3.

I'm happy to volunteer on any of those!

from tdc.

amva13 avatar amva13 commented on September 28, 2024

Hi @miguelgondu , thanks for the find! For clarity, changing the try-except block would only reveal the real error, not fix it. What version of the package are you using? Could you try 0.4.1 ?

from tdc.

miguelgondu avatar miguelgondu commented on September 28, 2024

Hi @amva13,

Yes! Changing the try-except block only reveals the error. Fixing it would involve checking what changed with the pkl files/their loading, I imagine.

I've tried with both 0.4.1 and 0.4.6. Both have the same issue.

from tdc.

amva13 avatar amva13 commented on September 28, 2024

Ok. This was to confirm error is not due to recent release changes. I will be personally inspecting this error starting now. One thing I'd try while I'm looking into it. There might be something to your claim about sklearn==1.3.0 causing a breaking change.

I would try building package 0.4.1 in a virtual environment (i.e. conda). 0.4.1 does not specify versions in requirements.txt and this might fix the behavior.

from tdc.

amva13 avatar amva13 commented on September 28, 2024

This error is indeed because of a mismatch in the formatting between the pickle object and the format expected by scikit learn. This is in part due to a version upgrade in scikit.

See reverse issue here
yzhao062/pyod#519

Evaluating some fixes and will push new version of package asap.

EDIT: Downgrading scikit-learn fixes the dtype issue but does not solve the underlying problem.

from tdc.

amva13 avatar amva13 commented on September 28, 2024

Hi @miguelgondu I believe I've solved it. Would you mind sharing some of the input SMILES strings which produced a 0.0 value for these oracles for you?

from tdc.

miguelgondu avatar miguelgondu commented on September 28, 2024

Hi @amva13, I used the one in the docs: 'CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1' should have a GSK3B score of 0.03 (at least according to the minimal example provided here)

from tdc.

amva13 avatar amva13 commented on September 28, 2024

Hi @miguelgondu I just pushed the fix and will be releasing the new package now. Will lyk when you can install

from tdc.

miguelgondu avatar miguelgondu commented on September 28, 2024

Thanks! Looking forward.

from tdc.

miguelgondu avatar miguelgondu commented on September 28, 2024

Just FYI: I'm getting a warning on Thiothixene_Rediscovery that is similar in spirit to this issue:

InconsistentVersionWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.23.0 when using version 1.3.0. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
  https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations

from tdc.

amva13 avatar amva13 commented on September 28, 2024

Got it. Thanks for pointing out. The best solution is to pickle these solutions with a more modern scikit (or invoke the models with a different method entirely to avoid the dependency issues altogether). For now the downgrade seems to work, though that particular classifier came from version 0.23.0.. so not great. I'll flag this is a longer term issue to look at.

from tdc.

amva13 avatar amva13 commented on September 28, 2024

@miguelgondu it's all fixed. you can install 0.4.7 for the working version

example:
https://colab.research.google.com/drive/17mGlLaVkfA2-0sqhbZlQ4cUI0JnFBpRq?usp=sharing

from tdc.

miguelgondu avatar miguelgondu commented on September 28, 2024

Hi @amva13 , thanks for the fix!

Checking with the other oracles in that specific version, something seems to break in deco hop. In the first example of the documentation (the same one I provided above) I went from getting 0.5338... to getting 0.0. Weird!

The rest of the oracles seem to work as expected, except for the ones in the issue I raised recently (#244).

Thanks again for the hard work.

from tdc.

amva13 avatar amva13 commented on September 28, 2024

ack'd issue opened

from tdc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.