Giter VIP home page Giter VIP logo

hicoex's Introduction

HiCoEx

Prediction of Gene Co-expression from Chromatin Contacts with Graph Attention Network framework The paper will be published on Bioinformatics.

Data preprocessing

The files in this folder preprocess all RNA-seq and Hi-C data. The sources of datasets include two parts. The first is pancreatic islet data, collected from Greenwald et al. 2019 for Hi-C and Fadista et al. 2014 for RNA-seq data. The second is dataset with 12 types of tissues and cell lines, following Marco et al. 2020.

Change the dataset_path in each script for storing input and output data. There are two data files already provided: GTEx_Analysis_v8_Annotations_SampleAttributesDS file (RNA-seq data download from GTEx) and GRCh37_p13_gene_info file (genome annotation). Other data could be downloaded according to the accession numbers in the paper. For RNA-seq data downloaded from GTEx, run run_split_tissues.sh first to split the samples according to the tissu types.

01_gene_expression_islet.py and 02_hic_islet.py are for pancreatic islet data specially, and others are adapted from Marco et al. 2020.

Network construction

The files in this folder construst gene co-expression network and gene contact network. Specifying --chr-src and --chr-tgt with same value (from 1 to 22) could construct the network for a certain chromosome. In 02_coexpression_network.py and 04_chromatin_network.py, specifying --single-chrom with False could construct a genome-wide network of all intra-chromosomal coexpression/contact relations.

Link prediction

The files in this folder implement link prediction with HiCoEx and all baselines. train_GNN.py and gnn_model.py are for training all GNN models. Specifying --classifier with 'mlp' takes the FF layer as the classifier, and specifying --classifier with 'direct' takes the dot product of edge embedding as the classifier.

For all baselines, first run matrix_factorization.py and random_walk.py to generate the gene embeddings from gene contact network (required BioNEV library).

01_link_prediction_chromosome.py implements co-expression prediction for each intra chromosome, 02_link_prediction_chrosome.py implements co-expression prediction for the dataset of genome-wide intra chromosomes.

Model explanation

The files in this folder are to explain the gene embeddings learned from HiCoEx and analyze the specific subgraph of gene pairs. After training the model, one can reproduce the results of Figure 5 in the paper by biological_explanation_reproduce.ipynb.

Acknowledgement

Code of data preprocessing and link prediction by baseline methods (random-rf, topological-rf, svd-rf, node2vec-rf) are adapted from Marco et al. 2020 with the repository https://github.com/marcovarrone/gene-expression-chromatin.

hicoex's People

Contributors

pearlstory avatar jiezheng-shanghaitech avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.