Comments (14)
The first error looks like all the scores from that round of optimization were invalid, causing molpal to calculate 0/0 and raising that error. That’s an edge case we can look at covering in the code, but it’s generally a cause for concern when every objective calculation failed. I’m undecided on how we should handle this in the code.
the second error looks to be a result of self.top_k_avg
being None
at the end of optimization. Again, this is likely due to there being too few valid scores from which to calculate a top-k average. This really should not be the case (reasonably), so I’m curious why so many of your objective evaluations are failing
from molpal.
@davidegraff Thanks for your advice.
For lookup process, the score in CSV file was positive value that re-calculated from docking score.
And then, I removed "--minimize" in objective option to apply maximize a optimization.
So the parameter setting could make that objective calculation failed?
I am trying to run the process that change a negative score and add "--minimize" option.
The process would be complete without error. I notice you again.
Have a good day!
from molpal.
Are you sure that your lookup objective is being constructed properly?
from molpal.
@davidegraff What do I check a lookup objective for proper construction?
smiles,score
C[C@@]1(c2ccccc2)OCCO[C@H]1C(=O)O,-3.961000
Cc1ncn(C[C@H]2CC(C)(C)CO2)c1C,-4.435000
CC(=O)N1C[C@H]2CNC[C@@]2(C(=O)N(C)Cc2ccoc2)C1,-5.111000
O[C@H]1C[C@@H]2CCCN(C1)C2,-4.209000
CNC(=O)c1cccc(Nc2nc(O)nc(O)c2C#N)c1,-5.455000
Cc1nc(CNc2ccc(F)c(N3CCCS3(=O)=O)c2)cs1,-6.235000
Cc1c(/N=N/c2cccc(C)c2C)c(-c2ccccc2)nn1C(=S)S,-6.337000
OCc1cc(-c2ccc(Cl)c(Cl)c2)ccn1,-3.907000
Cc1noc(C)c1COC(=O)c1ccc(Cl)cc1N1CCCC1=O,-5.407000
O=C(O)c1ccccc1S(=O)(=O)n1ccc(=O)[nH]c1=O,-5.967000
CC(C)CC(=O)NC[C@@]12CNC[C@@H]1COC2,-4.212000
As I changed docking score in csv into all of negative values with "--minimize", the process was finished completely.
But I got maximum positive value in all_explopred_final.csv
Is it wrong parameter for objective option?
MolPAL will be run with the following arguments:
batch_sizes: [0.01]
budget: 1.0
cache: False
checkpoint_file: None
chkpt_freq: 0
cluster: False
conf_method: mve
config: njkoo_config.ini
cxsmiles: False
ddp: False
delimiter: ,
delta: 0.1
epsilon: 0.0
final_lr: 0.0001
fingerprint: pair
fps: /home/njgoo/Data1/program/molpal/libraries/ZINC20_Stock.h5
init_lr: 0.0001
init_size: 0.01
invalid_idxs: []
k: 0.0005
length: 2048
libraries: ['/home/njgoo/Data1/program/molpal/libraries/ZINC20_Stock.csv.gz']
max_iters: 50
max_lr: 0.001
metric: random
minimize: True
model: mpn
model_seed: None
ncpu: 20
objective: lookup
objective_config: njkoo_lookup.ini
output_dir: molpal_stock
pool: eager
precision: 32
previous_scores: None
radius: 2
retrain_from_scratch: True
scores_csvs: None
seed: None
smiles_col: 0
test_batch_size: None
title_line: True
verbose: 0
window_size: 10
write_final: True
write_intermediate: True
from molpal.
i would just add in a print
statement to see what sort of values you're getting out of objective.calc(...)
. If all of the values failed, then there's an issue with how you're constructing your MolPAL run
from molpal.
I also noticed that my output files are filled with the positive scores while my lookup file has negative scores (more negative = better compound). I think the sign just got swapped during processing.
It explored compounds with more negative score progressively so I think it was doing what it supposed to
from molpal.
The output files always use positive scores, regardless of the input lookup file
from molpal.
@davidegraff Would we change a positive score of output into negative score? Because of docking score of total energy, more negative values indicate better compounds.
And, what file do I add for print statement? I don't find function of objective.calc() Thanks for quick response!
from molpal.
Yes. The framing of MolPAL is a maximization problem. So the output reflects that by the most positive output being the best
from molpal.
I wonder what is meaning of --minimize option
.
Before of your comment, I understood that run with 'minimze' option can get more negative score and run without 'minimize' option could get more positive score.
from molpal.
In docking, a more negative score is better, so you want to —minimize
It. Unless of course you were trying to find the worst possible binder for your target of interest, in which case you would want to maximize
it (the default assumption.) To perform a minimization, we multiply objective values by -1 under the hood, so that the rest of the program sees a maximization. You see the result of this multiplication in the output.
from molpal.
Thanks for kind explain! As your mention, I understood the same meaning.
However, I'm confused to interpret the output result in all_explored_final.csv.
I got positive value from run with --minimize option
, but negative value from run without '--minimize option'.
The result was not same as your mention "The output files always use positive scores, regardless of the input lookup file"
Also, it generated the opposite results that I expect.
Could you check the code for calculation of multiply objective values?
I just change the out value into new value by -1.
I found a different default value for objective.
minimize: bool = False in base.py
minimize: bool = True in lookup.py
Have a good day!
from molpal.
I misspoke earlier. The values in the output are not always positive, but they are always reflective of more positive values being "better" in MolPALs view. I.e., if you --minimize
your objective, then the true objective values in the output should be multiplied by -1
. If you maximize, then you may take the output scores as-is. The different default values are overridden by the supplied minimize
value from the arguments, which is False
by default.
from molpal.
Thank you very much!
from molpal.
Related Issues (20)
- Y_pred.npy file HOT 3
- docking for objectives HOT 7
- Fingerprints not generating HOT 3
- bug in fingerprints.py
- test on in-house data, result is bad HOT 5
- can you please provide an example config file that use docking rather than lookup? HOT 1
- molpal crush due to invalid smiles HOT 1
- [QUESTION]: Recommended paramaters for molecule screen HOT 1
- what's your take on random sampling? HOT 1
- [QUESTION]: HOT 6
- how to handle tautomers? HOT 2
- [QUESTION]: training dataloader is currently defined with shuffle=False. Is this intentional? HOT 1
- [BUG]: HOT 4
- [QUESTION]: Are docking scores for 2.1 million member HTS Collection (“Enamine HTS”) against 4UNN PBB available? HOT 3
- [BUG]: unable to run molpal in docking objective HOT 1
- [BUG]: Not compatible with the latest Ray/Pytorch-Lightning versions HOT 1
- [QUESTION]: Basic MolPal usage
- [QUESTION]: Small dataset for training HOT 2
- [QUESTION]: How can I build a larger datasets than Enamine HTS?
- OOM Errors when running EnamineHTS_single_batch.ini on a machine with RTX3090
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from molpal.