cvignac / midi Goto Github PK

View Code? Open in Web Editor NEW

78.0 78.0 12.0 17.28 MB

MiDi: Mixed Graph and 3D Denoising Diffusion for Molecule Generation

License: MIT License

Python 100.00%

midi's People

Contributors

Stargazers

Watchers

Forkers

octaviomtz darrengao628 nagham-osman poliveirap nichrun aleeehu baitutanglj changzhijiang congliuuva dunni3 aurovarat longwei-yang

midi's Issues

pyg incompatible with rdkit version 2023

When setting up the environment with conda, pyg can not be installed. To solve this, when setting up the environment specify the RDKit version:

conda create -c conda-forge -n MoleculeDiffusion rdkit=2022.03.5 python=3.9

Alternatively @cvignac it might be worth adding an environment.yml so we can directly install the same package versions you are using.

About GEOM Datasets is missing in the URL that provided

Congrats on this very nice work!
I find that there are no files about GEOM Datasets of train validation test in the URL. Could you please upload them again or tell how can I process datasets from raw GEOM Datasets?
Thanks

Extending Functionality for Custom Graph Generation

First and foremost, I would like to express my admiration for the work all parties involved, have done with both DiGress and here MiDi. It is very impressive and I have already learned a lot from the work, particularly in the context of molecule generation. This is a new field for me and I appreciate the steep learning curve.

I would like to kindly ask about the potential adaptability of your codebase for a project I am currently working on (not molecules). My project involves generating graphs where nodes possess categorical features, like node types (which works with current DiGress) and specifically directional attributes represented categorically as 'north', 'south', 'east', and 'west'. Additionally, these nodes have a relative placement attribute that defines their position in relation to connected nodes that will also be expressed categorically. I believe all these features can be discretised (one-hot-encoded).

I am interested in understanding how feasible it would be to extend or modify your existing code to accommodate these features. Would the architecture of your system support such customisation without extensive overhauls? If so, could you provide any guidance or suggestions on where to start with the modifications?

I have currently looked into a DataModule implementation for my use-case and try to draw parallels to the features you define in your abstract_dataset.py. I am wondering if I draw parallels of e.g. my directional attributes to your molecule charges, that I can introduce these by only changing the input/output dimensions and changing the current Placeholder object being used.

I really hope this message reaches you and you have time to give some guidance 😊

Weights of the model trained on GEOM

Congrats on this very nice work!!
Are you planning to release the final weights of the model trained on the GEOM dataset?

Thanks,
Octavio

Is is possible to release the GEOMDrugsDataset processed files ?

Hello,

I'm trying to use MiDi to generate molecules based on the model trained on GEOM with explicit H.
The trained model requires the dataset_infos as input, which needs the datamodule to get the statistics. However, I currently don't have enough RAM on my machine to load the training set of GEOM in the pickle file you provide.
I was thinking that probably having the processed files for the GEOMDrugsDataset could avoid the process() function (that is run when the processed files don't exist) and these files could be lighter than the whole pickle file containing molecules? Can you provide those ?
Or if you see another workaround (i.e. separating the statistics/configuration required for the dataset_infos in other files that do not always require the datamodule), please let me know?

Thank you very much,

Best,
Benoit

Problems in resuming/testing + same seed for all test runs

Hi Clément,

When trying to use the testing procedure, I encountered a bug related to input_dims. It seems that in the current version of the code, they are updated twice : once in get_resume/load_from_checkpoint and once when the model is created.

It don't know exactly where it comes from but the following fix worked for me (at least for testing, haven't tried resuming) :

    if cfg.general.test_only: 
        cfg, model = get_resume(cfg, dataset_infos, train_smiles, to_absolute_path(cfg.general.test_only), test=True) 
    elif cfg.general.resume is not None: 
        # When resuming, we can override some parts of previous configuration 
        print("Resuming from {}".format(to_absolute_path(cfg.general.resume))) 
        cfg, model = get_resume(cfg, dataset_infos, train_smiles, to_absolute_path(cfg.general.resume), test=False) 
    else: 
        model = DiffusionModel(cfg=cfg, dataset_infos=dataset_infos, train_smiles=train_smiles)

Also, it seems that all runs share the same seed when num_final_sampling is greater than 1, hence yielding the same results ...

Cheers,

Antoine

There is no dataset called qm0 and no experiment call qm9_no_h in your files

python3 main.py dataset=qm0 dataset.remove_h=True +experiment=qm9_no_h

This line is not even runnable. There is no dataset called qm0 and no experiment called qm9_no_h.

There are some typos I guess.

Please add a license

Hi, thanks for this very nice work and code.

I noticed this project does not have an explicit license. Could you please add a license stating the terms of use?

ImportError: cannot import name 'LightningDataset' from 'torch_geometric.data'

MiDi/midi_src/datasets/adaptive_loader.py", line 6, in
from torch_geometric.data import LightningDataset

ImportError: cannot import name 'LightningDataset' from 'torch_geometric.data'

Missing "dataloaders" attribute in datamodule

Hello Clément,

I'm trying to use your code for the QM9 dataset, but I have got this error :

                                                             ...
File "D:\Users\antoine\Documents\Sorbonne\Stage_PS\CTDGG\src\datasets\abstract_dataset.py", line 23, in __getitem__
    return self.dataloaders['train'][idx]
AttributeError: 'QM9DataModule' object has no attribute 'dataloaders'

It seems that the dataloaders attribute is not defined anywhere, hence raising an error when the Mixin __getitem__ method is called. In Digress implementation, it was defined in the prepare_data method which is not present anymore in this implementation. How do you create this attribute now ?

Note that I'm not using ray, nor ray-lightning packages for now, could it interfer with the dataloader creation ?

Nice update of Digress code btw, it's nice that you added multi-gpu support !

Best regards,

Antoine

cvignac / midi Goto Github PK

midi's People

Contributors

Stargazers

Watchers

Forkers

midi's Issues

pyg incompatible with rdkit version 2023

About GEOM Datasets is missing in the URL that provided

Extending Functionality for Custom Graph Generation

Weights of the model trained on GEOM

Is is possible to release the GEOMDrugsDataset processed files ?

Problems in resuming/testing + same seed for all test runs

There is no dataset called qm0 and no experiment call qm9_no_h in your files

Please add a license

ImportError: cannot import name 'LightningDataset' from 'torch_geometric.data'

Missing "dataloaders" attribute in datamodule

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent