Giter VIP home page Giter VIP logo

guacamol's People

Contributors

avaucher avatar bfabiandev avatar den-run-ai avatar dependabot[bot] avatar joshuameyers avatar msgbai avatar sauravmaheshkar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

guacamol's Issues

scipy imread

I installed guacamol on a blank new conda environment, with only rdkit and pytorch preinstalled. Guacamol was therefore installing scipy. However, the scipy version guacamol is installing doesn't have the imread function anymore (removed since scipy 1.2, guacamole installs 1.4.1).
Simply removing the import of imread in FCD.py line 24 seems to fix the problem, as the functions is not used in the whole file.

The hash value of the file is inconsistent with the hash value given in the code

When I execute the order :
python -m guacamol.data.get_data -o "/home/zh/桌面/project/git2/从头设计的分子基准模型测试/guacamol/data" --chembl

I get a different hash value:

Traceback (most recent call last):
  File "/home/zh/sda3/Anaconda3/envs/guac/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/zh/sda3/Anaconda3/envs/guac/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/project/git2/从头设计的分子基准模型测试/guacamol/guacamol/data/get_data.py", line 263, in <module>
    main()
  File "/project/git2/从头设计的分子基准模型测试/guacamol/guacamol/data/get_data.py", line 253, in main
    compare_hash(train_path, TRAIN_HASH)
  File "/project/git2/从头设计的分子基准模型测试/guacamol/guacamol/data/get_data.py", line 149, in compare_hash
    raise ValueError(f'{output_file} file has different hash {output_hash} than expected {correct_hash}!')
ValueError: /home/zh/桌面/project/git2/从头设计的分子基准模型测试/guacamol/guacamol/data/chembl24_canon_train.smiles file has different hash 75a644a29fdd347687f96aa65f1dbbce than expected 05ad85d871958a05c02ab51a4fde8530!

What is this because of this?

Unknown segfault

When using guacamol with PytorchLightning==1.6.5 and PyTorch==1.12.0 I get a mysterious segfault when running the following code:

import pytorch_lightning as pl
from guacamol import standard_benchmarks as sb
sb.valsartan_smarts()

However, when using PyTorch==1.11.0 this segfault does not occur. Unclear what is causing this issue.

For reproducibility I've attached the exports of my Conda environments for both the working configuration and the broken configuration. broken.yml is the environment that will segfault while working.yml is the environment that works.
Environments.zip

ChemNet file name has changed in FCD version 1.2

Hi,

The ChemNet file name has changed in FCD version 1.2, causing a bug when evaluating this metric in assess_distribution_learning. The new name is 'ChemNet_v0.13_pretrained.pt' (see here).

The bug is simply fixed by downgrading to FCD 1.1. Could you please update the dependencies or change the file name in your code ?

Cheers

Error while assessing distribution learning benchmarks - FCD metric

Hi all,

I tried using the assess_distribution_learning() function to calculate the benchmark metrics for one of my models with a custom training dataset. I have created a class as an instance of the DistributionMatchingGenerator and written the sampling code to obtain any number of molecules from my pre-trained model as instructed. The code runs fine for a while and in the FCD metric calculation, it fails with the following stack trace:

Traceback (most recent call last):
  File "benchmark_model_with_guacamol_v2.py", line 475, in <module>
    assess_distribution_learning(vae_model, chembl_training_file=training_data, json_output_file=json_file_path, benchmark_version="v1")
  File "/home/sowmya/anaconda3/envs/ddenv_new/lib/python3.6/site-packages/guacamol/assess_distribution_learning.py", line 34, in assess_distribution_learning
    number_samples=10000)
  File "/home/sowmya/anaconda3/envs/ddenv_new/lib/python3.6/site-packages/guacamol/assess_distribution_learning.py", line 51, in _assess_distribution_learning
    results = _evaluate_distribution_learning_benchmarks(model=model, benchmarks=benchmarks)
  File "/home/sowmya/anaconda3/envs/ddenv_new/lib/python3.6/site-packages/guacamol/assess_distribution_learning.py", line 83, in _evaluate_distribution_learning_benchmarks
    result = benchmark.assess_model(model)
  File "/home/sowmya/anaconda3/envs/ddenv_new/lib/python3.6/site-packages/guacamol/frechet_benchmark.py", line 53, in assess_model
    mu_ref, cov_ref = self._calculate_distribution_statistics(chemnet, self.reference_molecules)
  File "/home/sowmya/anaconda3/envs/ddenv_new/lib/python3.6/site-packages/guacamol/frechet_benchmark.py", line 94, in _calculate_distribution_statistics
    gen_mol_act = fcd.get_predictions(model, sample_std)
  File "/home/sowmya/anaconda3/envs/ddenv_new/lib/python3.6/site-packages/fcd/FCD.py", line 196, in get_predictions
    steps=np.ceil(len(gen_mol) / 128))
  File "/home/sowmya/anaconda3/envs/ddenv_new/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1915, in predict_generator
    callbacks=callbacks)
  File "/home/sowmya/anaconda3/envs/ddenv_new/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1629, in predict
    tmp_batch_outputs = self.predict_function(iterator)
  File "/home/sowmya/anaconda3/envs/ddenv_new/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "/home/sowmya/anaconda3/envs/ddenv_new/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 862, in _call
    results = self._stateful_fn(*args, **kwds)
  File "/home/sowmya/anaconda3/envs/ddenv_new/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2943, in __call__
    filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
  File "/home/sowmya/anaconda3/envs/ddenv_new/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1919, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/home/sowmya/anaconda3/envs/ddenv_new/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 560, in call
    ctx=ctx)
  File "/home/sowmya/anaconda3/envs/ddenv_new/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError:  TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
Traceback (most recent call last):

  File "/home/sowmya/anaconda3/envs/ddenv_new/lib/python3.6/site-packages/tensorflow/python/ops/script_ops.py", line 249, in __call__
    ret = func(*args)

  File "/home/sowmya/anaconda3/envs/ddenv_new/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 620, in wrapper
    return func(*args, **kwargs)

  File "/home/sowmya/anaconda3/envs/ddenv_new/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 891, in generator_py_func
    values = next(generator_state.get_iterator(iterator_id))

  File "/home/sowmya/anaconda3/envs/ddenv_new/lib/python3.6/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 807, in wrapped_generator
    for data in generator_fn():

  File "/home/sowmya/anaconda3/envs/ddenv_new/lib/python3.6/site-packages/fcd/FCD.py", line 156, in myGenerator_predict
    smiEnc = get_one_hot(currentSmiles, pad_len=nn)

  File "/home/sowmya/anaconda3/envs/ddenv_new/lib/python3.6/site-packages/fcd/FCD.py", line 118, in get_one_hot
    smiles = smiles + '.'

TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'


	 [[{{node PyFunc}}]]
	 [[IteratorGetNext]] [Op:__inference_predict_function_2225]

Function call stack:
predict_function

I am unable to identify why this error pops up at this stage. Any suggestions to resolve this will be really helpful. I have previously used the same code with ChEMBL dataset a couple months ago to benchmark another model and it worked fine at that time. Not sure if any of the package versions are not compatible anymore. So I am giving the specs of the packages below:

Tensorflow: v2.4.0
Keras: v2.4.3
GuacaMol: v0.5.2
Python: v3.6.13

Thanks in advance!
Sowmya

Something wrong in ranolazine_mpo() ?

it says 'Make start_pop_ranolazine more polar and add a fluorine',
but the code is:
logP_under_4 = RdkitScoringFunction(descriptor=logP, score_modifier=MaxGaussianModifier(mu=7, sigma=1))
I guess logP_under_4 is a correct name stands for 'trying to minimize logP till its under 4', but the function uses MaxGaussianModifier with a mu=7, shouldn't that be MinGaussianModifier with a mu=4?

isomeriSmiles= False

From what I understand you set isomericSmiles = False in your preprocessing (filter_and_canonicalize function).

This means you don't take into account any isomeric information. Do you think this might be an issue, especially since isomers don't necessarily have similar chemical or physical properties?

Latest scypi does not Support histogram

in utils.chemistry it references the histogram class from copy which no longer exists

I exchanged it with from numpy import histogram and it seems to work.

Is it possible to update guacamole to support ?

How the FCD value changes with the sample size of the reference molecule set and the padding length

This is actually not an issue, but a type of "might be useful to know". This graph shows the effect of two variables on the FCD value: the sample size of the molecule reference set (GuacaMol uses 10,000 afaik) and the padding length of the molecules before they go into the ChemNet model (fcd uses 350). A bit more background is in this repo: https://github.com/hogru/GuacaMolEval

Main result/diagram: https://github.com/hogru/GuacaMolEval/blob/main/figures/fcd_values.jpg

Support for input files of generated molecules

I would like to use Guacamol to benchmark 3rd party products for generative chemistry. I realize that some default Guacamol benchmarks may be unsuitable for this, such as those that measure training data distributions (which we cannot see) against generated molecule distributions. However, we’d still like to do our best evaluating these tools in the Guacamol framework.

Do you have any advice around this? I have explored usage of Guacamol as a Python library that integrates with my generative code, but these 3rd party tools instead typically yield molecules via web browser interfaces or minimal web APIs. Would it be best for me to create Python subroutines that can mock molecule generation for Guacamol, but are really reading from a file containing molecules generated by these tools? Or are there other options you suggest? Many thanks in advance!

TypeError: generate() missing 1 required positional argument: 'self'

when I finished specialize "DistributionMatchingGenerator" class and try to use assess_distribution_learning to asses my model,there was some error I can't figure it out ,could you please give me some advice? thanks~ the error as follow:

File "D:\Anaconcada3\envs\my-rdkit-env\Lib\site-packages\guacamol\main_analysis.py", line 20, in
benchmark_version='v1')
File "D:\Anaconcada3\envs\my-rdkit-env\Lib\site-packages\guacamol\assess_distribution_learning.py", line 34, in assess_distribution_learning
number_samples=10000)
File "D:\Anaconcada3\envs\my-rdkit-env\Lib\site-packages\guacamol\assess_distribution_learning.py", line 51, in _assess_distribution_learning
results = _evaluate_distribution_learning_benchmarks(model=model, benchmarks=benchmarks)
File "D:\Anaconcada3\envs\my-rdkit-env\Lib\site-packages\guacamol\assess_distribution_learning.py", line 83, in _evaluate_distribution_learning_benchmarks
result = benchmark.assess_model(model)
File "D:\Anaconcada3\envs\my-rdkit-env\lib\site-packages\guacamol\distribution_learning_benchmark.py", line 69, in assess_model
molecules = model.generate(number_samples=self.number_samples)
TypeError: generate() missing 1 required positional argument: 'self'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.