Comments (16)
@abearab ^ you can have a look at this
from tdc.
I'm having the same issue with GSK3B
. Moreover, there's a discrepancy on whether I evaluate a list of SMILES or just a single SMILE. If I evaluate a SMILE, I get 0.0; if I evaluate a list, I get the error @amva13 is getting for JNK3
.
I wonder whether something changed in sklearn's random forests and their formatting. That being said, I'm using sklearn==1.3.0
, which is the version inside this project's requirements.txt
.
from tdc.
The culprit for the discrepancy between lists/individual SMILES is the try-except block in L656 of the implementation of oracles.
In other words, the loading of the oracle is failing silently, and thus the oracle returns the default value.
So we could try to solve two problems:
- Calling oracles on
smile_str
and[smile_str]
should have the same behavior. - Fixing the loading of the oracles for
GSK3B
andJNK3
.
I'm happy to volunteer on any of those!
from tdc.
Hi @miguelgondu , thanks for the find! For clarity, changing the try-except block would only reveal the real error, not fix it. What version of the package are you using? Could you try 0.4.1 ?
from tdc.
Hi @amva13,
Yes! Changing the try-except block only reveals the error. Fixing it would involve checking what changed with the pkl files/their loading, I imagine.
I've tried with both 0.4.1 and 0.4.6. Both have the same issue.
from tdc.
Ok. This was to confirm error is not due to recent release changes. I will be personally inspecting this error starting now. One thing I'd try while I'm looking into it. There might be something to your claim about sklearn==1.3.0 causing a breaking change.
I would try building package 0.4.1 in a virtual environment (i.e. conda). 0.4.1 does not specify versions in requirements.txt and this might fix the behavior.
from tdc.
This error is indeed because of a mismatch in the formatting between the pickle object and the format expected by scikit learn. This is in part due to a version upgrade in scikit.
See reverse issue here
yzhao062/pyod#519
Evaluating some fixes and will push new version of package asap.
EDIT: Downgrading scikit-learn fixes the dtype issue but does not solve the underlying problem.
from tdc.
Hi @miguelgondu I believe I've solved it. Would you mind sharing some of the input SMILES strings which produced a 0.0 value for these oracles for you?
from tdc.
Hi @amva13, I used the one in the docs: 'CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1'
should have a GSK3B
score of 0.03 (at least according to the minimal example provided here)
from tdc.
Hi @miguelgondu I just pushed the fix and will be releasing the new package now. Will lyk when you can install
from tdc.
Thanks! Looking forward.
from tdc.
Just FYI: I'm getting a warning on Thiothixene_Rediscovery
that is similar in spirit to this issue:
InconsistentVersionWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.23.0 when using version 1.3.0. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
from tdc.
Got it. Thanks for pointing out. The best solution is to pickle these solutions with a more modern scikit (or invoke the models with a different method entirely to avoid the dependency issues altogether). For now the downgrade seems to work, though that particular classifier came from version 0.23.0.. so not great. I'll flag this is a longer term issue to look at.
from tdc.
@miguelgondu it's all fixed. you can install 0.4.7 for the working version
example:
https://colab.research.google.com/drive/17mGlLaVkfA2-0sqhbZlQ4cUI0JnFBpRq?usp=sharing
from tdc.
Hi @amva13 , thanks for the fix!
Checking with the other oracles in that specific version, something seems to break in deco hop
. In the first example of the documentation (the same one I provided above) I went from getting 0.5338...
to getting 0.0
. Weird!
The rest of the oracles seem to work as expected, except for the ones in the issue I raised recently (#244).
Thanks again for the hard work.
from tdc.
ack'd issue opened
from tdc.
Related Issues (20)
- Why the data in TDC is less than the original paper? HOT 1
- Expand cancer cell line and patient related datasets, e.g. DepMap, CCLE, TCGA HOT 5
- ADMET leaderboard: some molecules appear both in the `train_val` and `test` sets HOT 1
- DeepPurpose baseline installation error HOT 1
- The meaning of the score in the document 'toy_data/ppi.txt' HOT 2
- [Docking Leaderboard DRD3] Reproducibility Issues HOT 15
- SingleCellPrediction should instead be labled Perturb-based prediction task for scperturb datasets HOT 4
- Load CRISPR perturbation datasets from scPerturb [Feature Request] HOT 4
- Oracle Pickles are outdated HOT 1
- Add documentation on using the hf models
- Small discrepancies in the documentation of oracles HOT 6
- Oracles not satisfying public benchmarks HOT 6
- JNK3 uses ECFP4 rather than ECFP6 as claimed on the website
- bug in loading scPerturb datasets HOT 3
- Support augmentation of single-cell datasets via label projection
- better expose anndata dataframe in the single-cell dataloaders
- New Data: data loader for cisTarget database HOT 1
- scperturb issue HOT 2
- oracle unittests are failing HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tdc.