Giter VIP home page Giter VIP logo

gnn's Introduction

GNN

In this repo we try to implement a graph neural network that should ideally predict the chemical compositions of organisms in across the tree of life.

If the user wants to reproduce this, it should first download the latest version of LOTUS here or like this :

wget https://zenodo.org/record/7534071/files/230106_frozen_metadata.csv.gz
mv 230106_frozen_metadata.csv.gz ./data/

The algorithm implements a link classification task in a graph between nodes species and nodes molecule. We use HinSAGE with mean aggregator from StellarGraph library.

Graph creation

To reproduce the model, the user should first :

conda env create -f environment.yml
conda activate stellargraph

We will first parse the LOTUS database and get the taxonomy from GBIF for each species. This will be the species features. Then we create a graph of LOTUS and split them into training and testing dataset (for now 70-30 split):

python ./scripts/gbif_taxo.py
python ./scripts/graph_creation_train.py
python ./scripts/graph_creation_test.py

After grid searching for the best parameters, we set the neural network with two hidden layers of 1024 neurons each with activations "elu" and "selu" respectively. The training of the model can be seen in the HinSAGE_mol_to_species.ipynb or HinSAGE_species_to_mol.ipynb notebooks. Testing on unseen data is in the HinSAGE_test_*.ipynb notebooks.

If we want to recreate the entire LOTUS database as a graph simply run :

g_train = nx.read_graphml("./graph/train_graph.gml")
g_test = nx.read_graphml("./graph/test_graph.gml")
g = nx.compose(g_train, g_test)

Training

Since HinSAGE can only predict one edge type at a time, we created two models. One for predicting unknown molecules in known species and one for predicting unknown species in known molecules.

To train the models, you can run the two Jupyter Notebooks, HinSAGE_mol_to_species.ipynb and HinSAGE_species_to_mol.ipynb.

Please note that for now the model works best for molecules and species already present in the graph.

Molecules to species

Currently the model is overfitting a little bit, we might need to switch back to have only the Classyfire as features.

Testing

Molecules to species

With known species but unknown molecules, the model has a an accuracy of 0.92 (with threshold at 0.5 or above considered as present).

Species to molecules

With known molecules but unknown species, the model has an accuracy of 0.82 (same threshold).

For more details and explanations, please see here.

Anticipate LOTUS

  • To do some predictions, have a look at test_pipeline.ipynb.
  • ⚠️ The user can also decided to create all possible pairs in the LOTUS database. To do so, run ./scripts/create_all_pairs.py and ./scripts/chunk_file.py. ATTENTION this should be run on a cluster since the file size created are in the hundreds of gigas. ⚠️

gnn's People

Contributors

mvisani avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.