Giter VIP home page Giter VIP logo

gnn-project's Introduction

Comments about the code

This is the code for this video series: https://www.youtube.com/watch?v=nAEb1lOf_4o

Installing RDKIT

You will need rdkit to run this code.

Follow these instructions to install rdkit. https://www.rdkit.org/docs/Install.html

If you run on Ubuntu / WSL you can simply run:

sudo apt-get install python-rdkit

Ideally execute the code in an anaconda environment, that's the easiest solution with rdkit.

Installing the other packages

For pytorch geometric follow this tutorial: https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html

Make sure your CUDA version as well as torch version match the PyG version you install. I've used torch 1.6.0 as it seemed to be most stable with the other libraries.

Further things

Dashboard (MLFlow + Streamlit)

It is required to use conda for this setup, e.g.

wget https://repo.continuum.io/archive/Anaconda3-5.3.1-Linux-x86_64.sh

You need to start the following things:

  • Streamlit server
streamlit run dashboard.py
  • MlFlow Server
mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root ./artifacts \
    --host 0.0.0.0
    --port 5000
  • MlFlow served model
export MLFLOW_TRACKING_URI=http://localhost:5000
mlflow models serve -m "models:/YourModelName/Staging" -p 1234

TODO: Check if multi-input models work for MLFLOW!!!

gnn-project's People

Contributors

deepfindr avatar shi-kejian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

gnn-project's Issues

The error in the calculation of AUC

Hi,

Thank you for this excellent GNN project. But In the train.py file, line 106 shows:

roc= roc_auc_score(y_pred,y_true)

According to sklearn, the first argument should be true label and second argument should be predicted score, so is it something wrong? The correct version could be:

roc=roc_auc_score(y_true,all_preds_raw)

Doubt regarding code execution

Hello sir, can u please explain how to execute the code, while running train.py file I am getting the error

Screenshot 2023-09-26 012101

Hope you will reply to this ASAP

Following videos

Hi,
First thank you for all this amazing materials !
For my part, I have a lot of difficulties trying to follow the video and the code that seems to be the final project.
Is there a way to get the scripts for each videos please ?
Thank you !

Error in f.to_pyg_graph(): TypeError: type object got multiple values for keyword argument 'pos'

Hello, I am trying to run your code, but I am facing a problem when creating the dataset. Particularly on line 54 of the dataset_featurizer.py, where we are transforming to a Pytorch Geometric graph using:
data = f.to_pyg_graph()
I run into this error:
TypeError: type object got multiple values for keyword argument 'pos'

The f in this case looks like this:
GraphData(node_features=[46, 30], edge_index=[2, 108], edge_features=[108, 11], pos=[0])

This has been created from row = ('level_0', 0) ('Unnamed: 0', 3999) ('index', 3999) ('smiles', 'CSc1cc2[n+]3c(c1)-c1cccc[n+]1[Zn-4]314([n+]3ccccc3-2)[n+]2ccccc2-c2cc(SC)cc([n+]21)-c1cccc[n+]14.[O-]Cl+3([O-])[O-]') ('activity', 'CI') ('HIV_active', 0)

I don't really know what to do or how to solve it.
Maybe you could also upload the content in /data/processed, so that this issue is solved.

Help please, I am very stuck with this issue and I cannot run your code.

Thank you for your time.

dataset_featurizer.py referencing a base Class of DeepChem MolGraphConvFeaturizer?

Hi! Thanks for the great effort.

self.process()
  File "/..../dataset_featurizer.py", line 53, in process
    f = featurizer.featurize(mol["smiles"])
  > data = f[0].to_pyg_graph()
AttributeError: 'numpy.ndarray' object has no attribute 'to_pyg_graph'

It seems like the return of featurizer.featurize is a np array not an GraphData object.

Preprocessing with deepchem. Issue with positions

I was runing train.py with recent installation of libraries. I think there is a mismatch of versions such that im getting

 File "../venv2023/lib/python3.8/site-packages/deepchem/feat/graph_data.py", line 151, in to_pyg_graph
    return Data(x=torch.from_numpy(self.node_features).float(),
TypeError: type object got multiple values for keyword argument 'pos'

I found a workaround by ignoring the positional information since f=featurizer._featurize(mol) later shows:

>>>f
GraphData(node_features=[75, 30], edge_index=[2, 162], edge_features=[162, 11], pos=[0])

The workaround is to write a custom function to convert into pyg_graph from f

    def _custom_to_pyg_graph(self,graph_data):
        from torch_geometric.data import Data
        return Data(x=torch.from_numpy(graph_data.node_features).float(),
                    edge_index=torch.from_numpy(graph_data.edge_index).long(),
                    edge_attr=torch.from_numpy(graph_data.edge_features).float())

    def process(self):
        self.data = pd.read_csv(self.raw_paths[0]).reset_index()
        featurizer = dc.feat.MolGraphConvFeaturizer(use_edges=True)
        for index, row in tqdm(self.data.iterrows(), total=self.data.shape[0]):
            # Featurize molecule
            mol = Chem.MolFromSmiles(row["smiles"])
            f = featurizer._featurize(mol)
            data = self._custom_to_pyg_graph(f)
            # data = f.to_pyg_graph()
            data.y = self._get_label(row["HIV_active"])
            data.smiles = row["smiles"]
            if self.test:
                torch.save(data, 
                    os.path.join(self.processed_dir, 
                                 f'data_test_{index}.pt'))
            else:
                torch.save(data, 
                    os.path.join(self.processed_dir, 
                                 f'data_{index}.pt'))

So far it is working in processing. But future versions with positions included needs to considered for general purpose solution.

Also, perhaps requirements didnt have some of the toolboxes like deepchem , providing a version for each tool or dockerizing the venv you have used could help.

sklearn.metrics.confusion_matrix

Thank you for this great work !
I just wanted to make a remark about the confusion matrix function – the y_true comes before the y_pred in the sklearn.metrics.confusion_matrix function's signature.
sklearn.metrics.confusion_matrix(y_true, y_pred, *, labels=None, sample_weight=None, normalize=None)

Where does the 0 come from ?

Hi
Thank you for all your effort on the Gnn-project.

Once the training is done:

Processing...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3999/3999 [00:27<00:00, 145.31it/s]
Done!
Loading model...
...
...
...
   if i % self.top_k_every_n == 0:
ZeroDivisionError: integer division or modulo by zero

Any insights @deepfindr

Thanks in advance

question on TransformerConv

Hi Deepfindr,

Thank you so much for you great video and code!

I have question on TransformerConv you used. You defined the layer as:

self.conv1 = TransformerConv(feature_size, 
                                    embedding_size, 
                                    heads=n_heads, 
                                    dropout=dropout_rate,
                                    edge_dim=edge_dim,
                                    beta=True) 

but according to PyG website, you are supposed to have in_channel as the first parameter, which is either a tuple defining the shape of the input, or -1, which derive the size from the first input(s) to the forward method.

but feature_size is either of them. Is it a version issue? which version of PyG did you use for the tutorial?

Thank you so much!

Best,
Nicole

all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 20

Hi!

I've been exploring some self-made datasets and I've managed to get the project up'n running fine. The training runs well except sometimes this error happens:

F1 Score: 0.764872521246459
Accuracy: 0.7331189710610932
MCC: 0.48423538939278077
Precision: 0.6835443037974683
Recall: 0.8681672025723473
ROC AUC: 0.7331189710610932
Epoch 215 | Test Loss 0.5649742603302002
Early stopping due to no improvement.
  0%|                                                                        | 0/100 [14:09<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 191, in <module>
    results = tuner.minimize()
  File "/opt/conda/lib/python3.7/site-packages/mango/tuner.py", line 153, in minimize
    return self.run()
  File "/opt/conda/lib/python3.7/site-packages/mango/tuner.py", line 140, in run
    self.results = self.runBayesianOptimizer()
  File "/opt/conda/lib/python3.7/site-packages/mango/tuner.py", line 263, in runBayesianOptimizer
    X_sample = np.vstack((X_sample, X_next_batch))
  File "<__array_function__ internals>", line 6, in vstack
  File "/opt/conda/lib/python3.7/site-packages/numpy/core/shape_base.py", line 282, in vstack
    return _nx.concatenate(arrs, 0)
  File "<__array_function__ internals>", line 6, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 20

As you can see this happens at the 0th epoch.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.