hips / molecule-autoencoder Goto Github PK
View Code? Open in Web Editor NEWA project to enable optimization of molecules by transforming them to and from a continuous representation.
A project to enable optimization of molecules by transforming them to and from a continuous representation.
Could you please provide some details for the data included in the all_drugs.smi?
What is the source (PubChem, Drugbank ?) of this data?
What are the inclusion criteria used in the process?
Thank you!
Hi,
Are the details (and/or the code) for implementing the baselines, esp. the Genetic Algorithm, available? I can't seem to find them here or in the paper.
Thank you very much for the information!
As I proposed in maxhodak/keras-molecules#54. I am interested in why the charset is designed like this. It's not straightforward. From the viewpoint of chemistry, the chlorine "Cl" should not be treated as "C" and "l". Maybe it will be some improvement if we re-design the charset. I used the implementation from keras-molecules, and when I tried to interpolate between 2 chemical structures (CC=C(C(=CC)c1ccc(O)cc1)c1ccc(O)cc1 and CN1C(=O)CCS(=O)(=O)C1c1ccc(Cl)cc1).
). I got something like these invalid structures below, so I guess the charset is the reason for this.
CC(C)(O)CCC1CCC(Cr)So2c1ccc(C)cc1
CCNC(=O)CN(CC1((l)CN1c1ccc(OC)cc1
CN1C(=O)CN(CC1((#)CN1c1ccc(OC)cc1
CN1C(=O)CC(CC**()(=O)C1c1ccc(Cl)cc1
CN1C(=O)CC(NC()(=O)C1**c1ccc(Cl)cc1
Hi, thanks for releasing the code. However, how can I train "the best model" myself? Such a configuration is missing in hyperparams.py
(simple_params
does not correspond to it and makes many random choices anyway). Also, one cannot fully reconstruct it from best_vae_model.json
, especially regarding optimization details.
Hi,
I tried to perform bayesian optimization on decoded smiles from 292 dimensional vectors. Following your paper, I first used the latent vector of this smile 'CCN(CC)C(=O)Cc1ccc(S(=O)(=O)N2CCCc3ccccc32)cc1' as inducing point. And I modified five dimensions of this vector(with value range [-0.8,1]) to obtain new smiles. But the following smiles I got are not valid ones. Do you any suggestions about how to modify the vectors in order to get new smiles? Thanks a lot
[['CCN(CC)C(=O)Cc1ccc(S(=O)(=O)N2CCCc3ccccc32)cc1'], ['CCN(CC)(=O)CSc1ccc(S(=O)(=O)N2CCCc3ccccc33)cc1C'], ['CCN(CC(C(=O)Oc1cccc1S(])(==)NCCCcc2ccccc2)2C1'], ['CC1CCN1C(=O)Nc2cccc1C((=O)=O)N1CCc2ccccc3)cc1'], ['CCCCCN1C(=O)Nc2cccc1S(C)((=O)N(CCc2ccccc3))c21']]
Can you share the code of bayesian optimization used in this paper?
Hi, I noticed that you put softmax activation inside GRU cell, as I understand in this case you wont get sum of activations for each timestep equals to 1. Here is link for GRU cell and the same situation for terminal GRU https://github.com/HIPS/molecule-autoencoder/blob/master/autoencoder/train_autoencoder.py#L225
I also checked with you version of keras that it does not sum to 1, here is link to ghist https://gist.github.com/fgvbrt/1f2e1828c6d8c0eb88614f14c60874ad
Was it done on purpose or was it mistake?
Thanks in advance.
I was following the instructions on the homepage of this github repository, trying to run the sample_autoencoder.py file exactly according to the example. However, this is what showed up:
python sample_autoencoder.py
../data/best_vae_model.json
../data/best_vae_annealed_weights.h5
../data/250k_rndm_zinc_drugs_clean.smi
../data/zinc_char_list.json
-l5000
Using Theano backend.
Traceback (most recent call last):
File "sample_autoencoder.py", line 97, in
model = model_from_json(json.dumps(model_dict))
File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 213, in model_from_json
return layer_from_config(config, custom_objects=custom_objects)
File "/usr/local/lib/python2.7/dist-packages/keras/utils/layer_utils.py", line 27, in layer_from_config
class_name = config['class_name']
KeyError: 'class_name'
Can anyone please tell me what is going on here? Thank you very much!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.