Giter VIP home page Giter VIP logo

to-gcn's Introduction

Time-ordered Gene Coexpression Network (TO-GCN)

Pipeline of time-ordered gene coexpression network (TO-GCN) construction from three-dimensional (gene expression, condition, and time) data

The pipeline contains three steps: (1) Determining the cutoff values, (2) constructing eight GCNs for different coexpression types, and (3) determining time-ordered levels for nodes in a GCN of interest.

Prepare the gene expression data

Before going to the pipeline, we need to prepare two lists of RPKM values (TF genes and all genes) at different sample points under two conditions (the three-dimensional data). In addition to the data files, you also need to prepare the information of the numbers of samples under condition 1 (n1) and condition (n2).

In the example folder (example_data), there are two data files from the study of "A Comparative Transcriptomics Method to Infer Time-ordered Gene Coexpression Networks and its Applications". The data file should be a Tab-separated values (.tsv) format that contains m rows and n+1 columns, where m is the number of genes (TF genes or all genes) and n represents the summation (n1 + n2) of sample numbers under condition 1 (n1) and condition 2 (n2).The gene IDs are listed in the first column. For each gene, the RPKM values of each sample point under condition 1 and condition 2 are listed from the second to (n1+1)-th columns and from (n1+2)-th to (n1+n2+1)-th columns, respectively. In the example data of TFs_1718.tsv, there are 1718 rows for 1718 TF genes and 27 columns for one gene ID, 13 samples of condition 1, and 13 samples of condition 2.

Run the programs of pipeline

As mentioned above, there are three steps for the pipeline. Therefore, we provided a program for each step: (1) Cutoff, (2) GCN, and (3) TO-GCN. You can directly run the program by downloading the corresponding binary codes for different system platforms, Linux, MacOSX, or Windows. You can also download the C++ source code (.cpp) and compile to an executable one by yourself. For compiling source codes by yourself, you can use the following commands:

g++ Cutoff.cpp -o Cutoff
g++ GCN.cpp -o GCN
g++ TO-GCN.cpp -o TO-GCN

(1) Cutoff: Determining the cutoff values

First of all, you need postive and negative cutoff values of Pearson’s Correlation Coefficients (PCCs) under two conditions for constructing the GCN. Our method is to calculate all the PCC values for each TF-gene pair under each condition. With all the PCC values, we generate distributions of probability density function (PDF) and cumulative density function (CDF). According to the CDF, we can suggest you the positive and negative cutoff values with p < 0.05 for each condition. To run the Cutoff program, you have to give 4 parameters: number of samples under condition 1, number of samples under condition 2, data file of TF genes, and data file of all genes. Here is the example of our study:

Cutoff 13 13 example_data/TFs_1718.tsv example_data/All_genes_25489.tsv

In addition to the suggested cutoff values, the program will also generate a file of PCC value distribution in the .tsv format. You can use the file to generate a histogram bar chart by Microsoft Excel or R program.

(2) GCN: Constructing eight GCNs for different types

In the second step, we want to construct eight coexpression types of GCN under two conditions (C1 and C2): C1+C2+, C1+C20, C1+C2–, C10C2+ C1–C2+, C1–C2–, C1–C20, and C10C2–, where +, -, 0 represents the positive, negative, and no coexpression, respectively. The output file of each GCN is listed in comma-separated value (.csv) format. The five columns represent the TF gene ID, coexpression type, gene ID, PCC under condition 1, PCC under condition 2. You can import these gene pair into the network generation tool, like Cytoscape, to get the visualization of the GCN. To run the GCN program, you have to give 4 more parameters (total 8 parameters) that indicate the positive cutoff values for conditions 1 and 2 and the negative cutoff values for condtions 1 and 2. Here is an example:

GCN 13 13 example_data/TFs_1718.tsv example_data/TFs_1718.tsv 0.84 0.84 -0.75 -0.75

(3) TO-GCN: Determining time-ordered level in the interesting GCN

The final step is to determine the time-order (level) of nodes in the GCN. The time-order is assigned by the breadth-first search (BFS) algorithm, starting with a set of seed nodes you chose (listed in seeds.txt). In most case, we will select some genes as seeds that highly expressed in the first time point and lowly expressed in the following time points. In our study, we select a gene with ID, Zm00001d041056, and run the TO-GCN program to assign the time-order (level) of nodes in C1+C2+ GCN. Therefore, we only need the positive cutoffs for conditions 1 and 2 and another 2 parameters that indicate the seed node gene ID and the coexpression type (0, 1, or 2) where 0, 1, and 2 represent the C1+C2+, C1+C20, and C10C2+, respectively. The level of each node and the GCN (obtaine from 2nd step) can be both imported into the Cytoscape.

TO-GCN 13 13 example_data/TFs_1718.tsv example_data/TFs_1718.tsv 0.84 0.84 example_data/seeds.txt 0

to-gcn's People

Contributors

petitmingchang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.