Giter VIP home page Giter VIP logo

codc-cli's Introduction

CODC CLI: Command-line interface to calculate copula-based differential gene co-expression

Cover Image

Table of Contents

Brief Description

The CODC CLI tool is designed for analyzing gene expression data to calculate differential co-expression using a copula-based approach. It is implemented in Python based on the R implementation of Ray et al. for enhanced performance, with support for parallel processing. The tool allows users to compute differential co-expression networks and provides additional commands for downstream analysis and performance measurement. Installation can be done via Docker or locally using PDM, a Python package manager. The tool expects input files in TSV format and outputs the co-expression network as a TSV file as well.

Reference to the Publication

This tool implements the method proposed by Ray, S., Lall, S., & Bandyopadhyay, S. in "CODC: a Copula-based model to identify differential co-expression.".

Methodology

The methodology to compute the copula based differential co-expression and mathematical explaination is detailed here

Available Commands

The CLI includes commands for:

This readme, explains Copula based differential co-expression calculation (codc).

Installation Instructions

Clone the repository and go to the project root dir

Before installing and running the CLI tool, you have to clone the repo and navigate to the project's root directory.

git clone [email protected]:bionetslab/grn-benchmark.git && cd grn-benchmark/src/codc-cli-tool

Using Docker

docker build -t codc-tool .

OR Using Locally

Install PDM (Python package manager) if not already installed:

pip install pdm

Then, install the packages using PDM:

pdm install

Execution of codc Using BRCA Data

The commands below will output the network.tsvin ./data/ directory

Using Docker

docker run --rm -v ./data:/data codc-tool codc --input_file_1 /data/BRCA_normal.tsv --input_file_2 /data/BRCA_tumor.tsv --output_path /data --batch_size 100

OR Using Locally

pdm run cli codc --input_file_1 ./data/BRCA_normal.tsv --input_file_2 ./data/BRCA_tumor.tsv --output_path ./data --batch_size 100

Explanation of the Relevant Parameters

--input_file_1

  • Description: Path to the TSV file containing gene expression data for the first condition.
  • Required: Yes
  • Example: --inputfile_1 /path/to/condition1.tsv

--input_file_2

  • Description: Path to the TSV file containing gene expression data for the second condition.
  • Required: Yes
  • Example: --inputfile_2 /path/to/condition2.tsv

--output_path

  • Description: The directory where the output TSV file will be saved. This file will contain the computed differential co-expression network based on copula approach.
  • Required: Yes
  • Example: --output_path /path/to/output
  • Output Details: The output is a TSV file named network.tsv, which includes columns for target gene, regulator gene, condition, and the weight as the co-expression difference.

--ties_method

  • Description: Method to handle ties in data ranking within the pseudo-observations calculation.
  • Required: No (default is "average")
  • Options:
    • average: Average ranks of ties.
    • max: Use the maximum rank for ties.
  • Example: --ties_method max

--smoothing

  • Description: Specifies the smoothing technique applied to the empirical copula calculation.
  • Required: No (default is "none")
  • Options:
    • none: No smoothing applied.
    • beta: Use a beta smoothing approach.
    • checkerboard: Apply checkerboard smoothing.
  • Example: --smoothing beta

--ks_stat_method

  • Description: Determines the method used for computing the Kolmogorov-Smirnov statistic, which quantifies the differential co-expression.
  • Required: No (default is "asymp")
  • Options:
    • asymp: Use asymptotic properties of the KS statistic.
    • auto: Automatically determine the best method based on data characteristics.
    • exact: Compute an exact KS statistic.
  • Example: --ks_stat_method exact

--batch_size

  • Description: Determines how many pair of genes will be executed in each batch in parallel execution.
  • Required: No (default is 100)
  • Example: --batch_size 100

Input File Format Specification

Input files must be in a tab-separated format with gene names in rows and sample IDs in columns. Example:

Gene TCGA-A7-A0CE TCGA-A7-A0CH
ACTA1 6.872032023 4.947203749
MYL2 0.415445555 0.0

Output File Format Specification

The output network.tsv is a tab-separated file that includes:

  • Target: Target gene of the edge.
  • Regulator: Source gene of the edge.
  • Condition: Describes the differential co-expression across conditions.
  • Weight: Numerical value indicating the strength of the relationship.

Example output:

Target Regulator Condition Weight
MYL2 ACTA1 Diff Co-Exp between both Condition 0.1111

Explanation and Interpretation of the Output

The network.tsv output file lists gene pairs that are differentially coexpressed between two conditions, providing insights into gene interactions under different conditions.

Recommended Hyperparameters by the Authors

There were no specific hyperparameters recommended by the authors. The default parameters used are based on typical settings derived from the author's R implementation:

  • ks_stat_method = asymp
  • ties_method = average
  • smoothing = none

Md Badiuzzaman Pranto, Friedrich-Alexander-Universität, Erlangen-Nürnberg.

codc-cli's People

Contributors

prantoamt avatar

Stargazers

Sujit Debnath avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.