leojklarner / gauche Goto Github PK

View Code? Open in Web Editor NEW

185.0 185.0 18.0 36.13 MB

A Library for Gaussian Processes in Chemistry

Home Page: https://leojklarner.github.io/gauche/

License: MIT License

Python 18.74% Makefile 0.01% Jupyter Notebook 81.21% TeX 0.05%

gauche's Issues

Write tests for DataLoader classes

Parent:

splitting and scaling

Molecular property prediction

loading benchamarks
validating SMILES
featurisation to fingerprints, fragments and fragprints

Protein ligand binding affinity

loading benchmarks
validating pdb/sdf files
featurisation to graph-based features, interaction-based features
check whether ligand extraction is correct by comparing extracted ligand IDs to those in PDBbind

Inconsistent / unclear python version requirements?

Currently the python versioning of this project is unclear: the README has no info, the supplied conda env says python 3.7, while the internal setup.py says python>=3.8. Maybe this should be standardized? It looks like any python version should do as long as the dependencies are supported, no?

Buckminsterfullerene has stopped rotating

The buckminsterfullerene gif has stopped working in the README :)

Separation of core and optional dependencies

I'd recommend splitting dependencies into groups like this

This makes it easier to keep track of what is need for testing, dev, docs, prod etc.

Relevant lines for setup.py

Google Colab Links in README Don't have Instructions for Installing Dependencies

The Google colab links for the notebook tutorials have no instructions for installing dependencies.

Refactor data loader

include additional variable for internal representation which is not overwritten upon featurisation

DataLoader class for PL binding affinity

load benchmark sets (such as pdbbind)
add download and splitting of arbitrary pdb complexes
cluster by ligand and protein similarity
add PLEC fingerprints, BINANA+Vina features, RFScore features
add explicit hydrogens during bond order augmentation

BO Hackathon for chemistry and materials

Apologies for the ad-like post. Figured this was the easiest way to bring it on your radar though. It's a 2-day event happening next week, and details are at https://ac-bo-hackathon.github.io/. It would be great to have a Gauche contribution! Feel free to close the issue at any point.

Need a tutorial for finger print kernel, graph kernel

Investigate Convolutional Kernel Networks

It may be worth investigating whether convolutional kernel networks [1] can be integrated as a GP graph kernel.

[1] Chen, D., Jacob, L. and Mairal, J., Convolutional kernel networks for graph-structured data. ICML, 2020.

Difficulty saving and loading models using`NonTensorialInputs` data

First off, great work, this is a really cool package!

I've been playing with the graph representation inputs using graphein to a model building off of SIGP (some examples in your codebase call it GraphGP) and have been getting some really great performance out of it. However, I'm struggling to understand how to correctly save and then load the model back into memory for inference after training. If I save the state dict then re-init using that state dict, the model performs as if it had been randomly initialized. I also tried pickling the model (not the ideal solution) I get the following exception:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[88], [line 4](vscode-notebook-cell:?execution_count=88&line=4)
      [1](vscode-notebook-cell:?execution_count=88&line=1) import pickle
      [3](vscode-notebook-cell:?execution_count=88&line=3) with open('model.pkl', 'wb') as file:
----> [4](vscode-notebook-cell:?execution_count=88&line=4)     pickle.dump(model, file)

RuntimeError: Pickling of "rdkit.Chem.rdchem.Atom" instances is not enabled (http://www.boost.org/libs/python/doc/v2/pickle.html)

I tried setting train_inputs to None before saving. This took care of the exception, however I'm back to the original issue where the model seems to be randomly initialized.

I was wondering if you had any guidance here, or if there was something in the docs that I missed. Thanks!

How to install gauche as a library?

Hi, thanks for your work! I'm wondering how exactly do I install gauche as a library. It seems that the instructions in the README are only for installing dependencies. Meanwhile, when I do pip install git+https://github.com/leojklarner/gauche.git, I get an error

error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [1 lines of output]
      error in gauche setup command: 'extras_require' must be a dictionary whose values are strings or lists of strings containing valid project/version requirement specifiers.
      [end of output]

Thanks!

Warnings from Graphein need to be suppressed

@a-r-j will address this in the next release of graphein I believe. Just leaving a TODO note here.

It would be nice for non pip install suggestions

All dependencies mentioned in the readme are available as conda packages.
Always nicer to avoid pip when using anaconda imho. Also, much easier to prepare a yaml file?
you already have a requirements file. then you wouldn't need a setup.py file?
you know, a simple
Just a thought, not an issue per se.

Move benchmark models into Gprotorch codebase

I think moving the contents of the benchmarks directory into the codebase (.e.g gprotorch.benchmarks) will make organisation and docs clearer. It also helps (modestly) to enforce a consistent API across the library.

Also, we should rename from gprotorch to gauche.

leojklarner / gauche Goto Github PK

gauche's People

Contributors

Stargazers

Watchers

Forkers

gauche's Issues

Recommend Projects

Recommend Topics

Recommend Org