Cluster patient based on their multiomics data utilizing graph autoencoders. Adapted from the Simple and Effective Graph Autoencoders with One-Hop Linear Models(Salha et al., 2020) in PyTorch Geometric.
Project based on pytorch-geometric. It uses clinical EHR, gene expression and somatic mutation data from the TCGA Study TCGA Study
Transform omics data into binary or numerical features and preselect them. Generated patient nodes have an edge connecting them if their distance in the feature space is below a set threshold. The feature matrix and the adjacency matrix are stored in a PyTorch Data Object.
GAE are graph convolutional nets that integrate feature and adjacency information. The resulting latent represenation is decoded to reconstruct the adjacency information and the loss is the mean squared error between the original matrix and the reconstructed one. Various architectures from the pytorch geometric project are included and they all result in a latent representation after training. Mainly using simple linear AE, GAE, VGAE, variational simple linear AE
The latent represenation can the be projected via an dimensionality reduction (UMAP) and clustered (DBSCAN). An survival analysis is performed on the clustered patients afterwards.
For GPU usage please check CUDA (min version 10.1) distributions in dependencies and in the requirements in the following links. Conda environment preferred: follow installation steps for pytorch under (min version 1.4.0): Pytorch Installation
follow installation steps for pytorch geometric under (min version 1.6): PyG Docs and PyG Installation
Remaining required packages under Dependencies
Single runs can be executed by running
pytorch_linearVAE.py
Multiple runs with different parameters can be executed by running
run.sh
The output of the runs is visualized in Tensorboard (HTML based Dashboard) and executable for example:
Terminal command:
tensorboard --logdir=./Deepan/runs/2021-03-18
This repository is licensed under MIT license The AGE graph clustering implementation can be found under ferdinand-popp/AGE and utilizes the pytorch dataset generated by this repository.