Hi Samuel! Thank you for your repo and great work! It is reported in

Sure! This is the link to this particular run: <a href="https://wandb.ai/thomaswan

Could you release the notebook to reproduce figure 10 of the paper? about lambo HOT 17 CLOSED

samuelstanton commented on August 16, 2024

Could you release the notebook to reproduce figure 10 of the paper?

from lambo.

Comments (17)

Thomaswbt commented on August 16, 2024 1

Sure! This is the link to this particular run:
https://wandb.ai/thomaswang/lambo_replicate/runs/3shmruby?workspace=user-thomaswang

from lambo.

samuelstanton commented on August 16, 2024 1

thanks for sharing. there's three things that account for the discrepancy here

For consistency with PyMOO I followed the convention in the code that all objectives are minimized, so you need to account for the sign difference for maximized properties like penalized logP.
the candidates/obj_val_* field shows the best objective value within each query batch over time, so to transform into a plot like the one shown in the paper you need to apply a cummin transform to show the best-so-far as a function of time
I recall there being pretty substantial variance in performance across seeds (which is why I plotted quantiles), so you'd want to run for at least 5 trials, apply the cummin transform and compute the quantiles to reproduce the plot in the paper.

I will see what I can do about getting you a notebook to reproduce, but it may have to wait a bit while I deal with other work on my plate. In any case, I'm delighted you're taking the time to reproduce these experiments, if you have any further questions don't hesitate to ask :)

from lambo.

Thomaswbt commented on August 16, 2024

The reason I raise this issue is that I tried to train a single-objective LaMBO model with the exact command from README:
python scripts/black_box_opt.py optimizer=lambo optimizer.encoder_obj=lanmt task=chem_lsbo tokenizer=selfies surrogate=single_task_svgp acquisition=ei encoder=lanmt_cnn

but this is the wandb logging I get wrt the penalized logp metric:

the bb evaluations totaled 64*50=3.2k, but the best score was just above 6, which is different from the results in figure 10, so I wonder if I have missed some extra processing steps to reproduce the results. Thanks!

from lambo.

samuelstanton commented on August 16, 2024

Sorry for the delayed response, would you mind sharing the link to the wandb data for your run?

from lambo.

Thomaswbt commented on August 16, 2024

Thank you for your suggestion! I will first fix these differences.

from lambo.

samuelstanton commented on August 16, 2024

sounds good. note that the seed is fixed in the config, so you'll want to be sure to override it, e.g.

python scripts/black_box_opt.py -m optimizer=lambo optimizer.encoder_obj=lanmt task=chem_lsbo tokenizer=selfies surrogate=single_task_svgp acquisition=ei encoder=lanmt_cnn seed=1,2,3,4

from lambo.

samuelstanton commented on August 16, 2024

one more thing, obj_val_0 is actually the negative penalized logP, so you'll want to either apply cummin and negate, or negate then apply cummax. I've edited my previous response to reflect this.

https://github.com/samuelstanton/lambo/blob/main/lambo/tasks/chem/chem.py#L105

from lambo.

Thomaswbt commented on August 16, 2024

Sure, thanks for the reminder! The experiments are still running, and I also wonder if it's reasonable that the single-objective experiments need 1 day 12 hours to finish, while the multi-objective experiments need just 5 hours? Intuitively the single-objective runs should be faster than the multi-objective ones?

from lambo.

samuelstanton commented on August 16, 2024

fair question. the single-objective experiment collects bigger batches of data over more rounds than the multi-objective experiments, so using exact GP inference would require a lot of GPU memory and would likely be numerically unstable. Instead for this task I use a variational GP, which has constant memory footprint and is more numerically stable for large datasets. Unfortunately variational GPs are fairly slow to train, which leads to the dramatic increase in runtime. There probably is room for optimization here, the current training recipe is optimized more for stability than speed.

from lambo.

Thomaswbt commented on August 16, 2024

Thanks for the reply! It makes sense now.

However, as I re-ran the experiments with seed 1, 2, 3, 4, I found that the optimization performance is still under expectation. The wandb loggings are the ones with id 12, 13, 14, 15 of the project:

https://wandb.ai/thomaswang/lambo_replicate/groups/test/table?workspace=user-thomaswang

I did not do cummin operations for the log outputs, but we can see that the min values for the obj_val_0 are around -7 in all runs, which are 7 for penalized logp.

I wonder if there are some problems with the default configurations of the setting? Would it be possible for you to double check the configurations? On my side, I will also double check if there is something wrong with my reproduction.

Thanks very much!

from lambo.

samuelstanton commented on August 16, 2024

hm ok I'll take a look, thanks for raising the issue

from lambo.

jasonkyuyim commented on August 16, 2024

Hi! I am also intereested in the single objective use case for LaMBO. Is there any update on reproducing the published numbers?

from lambo.

kirjner commented on August 16, 2024

@samuelstanton I'm also having some trouble reproducing the results, I ran the following line:
python scripts/black_box_opt.py -m optimizer=lambo optimizer.encoder_obj=lanmt task=chem_lsbo tokenizer=selfies surrogate=single_task_svgp acquisition=ei encoder=lanmt_cnn seed=1,2,3,4
and, while the script is still running, I'm getting results similar to @Thomaswbt above (in fact slightly worse)

It would be great to get an update on this, thank you!

from lambo.

samuelstanton commented on August 16, 2024

Thank you all for your patience. I've determined that the some of the default hyperparameters were indeed misconfigured and have updated the command in the README. That being said the results I'm getting now are not quite what I expect and I will continue to investigate. Here's what I'm getting now

40%, 60%, and 80% quantiles across 5 seeds (0-4)

Performance by seed

While this is much better than the results you were seeing and the algorithm does "solve" the problem for 3/5 seeds (i.e. learn to output long hydrocarbon chains), this is not as good as what I was seeing before and is more sensitive to the random seed than I'd like. In any case I wanted to share an update while I continue looking in to this. I've also pushed the notebook I used to create these plots to notebooks/plot_lsbo_comparison.ipynb.

from lambo.

samuelstanton commented on August 16, 2024

The major hypers that have been corrected are:

optimizer.window_size=1 --> optimizer.window_size=8 this hyperparameter controls how many corruptions are made to the seed sequence and can have a major effect when the optimal solution requires large increases to the sequence length.
surrogate.bs=32 --> surrogate.bs=256 with a larger dataset increasing the batch size decreases the run time significantly, I was seeing about 6 hours per seed on an A100 after this change
optimizer.resampling_weight=1.0 --> optimizer.resampling_weight=0.5 this change makes the optimizer sample "good" seeds more aggressively when constructing batches of candidates.

from lambo.

samuelstanton commented on August 16, 2024

Increasing the max context length to 256 (task.max_len=256) improves performance on this benchmark, as I noted in the paper, but variance across seeds is still an issue.

from lambo.

Thomaswbt commented on August 16, 2024

Sorry for the late response. Thank you for your effort! Previously I also found that the choice for the starting sequences matters a lot to the final results. I think I will close the issue.

from lambo.

Could you release the notebook to reproduce figure 10 of the paper? about lambo HOT 17 CLOSED

Comments (17)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent