Giter VIP home page Giter VIP logo

geneiase's Introduction

About

GeneiASE is a software for detection of condition-dependent allele specific expression in single individuals. GeneiASE does not require haplotype phasing and performs consistently over a range of read depths and ASE effect sizes. See the paper for further information.

Copyright © 2015 Daniel Edsgärd, Olof Emanuelsson
GeneiASE is available free to use, under the GNU GPL version 3 license.

Download

You find the latest stable release here.

Installation

Prerequisites

There are a number of R package dependencies. The dependencies can be installed from within R by:

install.packages(c('getopt', 'binom', 'VGAM'))
Test the installation

GeneiASE can be run from the shell prompt by entering the downloaded and unzipped directory (geneiase) and then executing the program, residing in the bin directory:

cd geneiase
bin/geneiase -t static -i test/static.test.input.tab -b 100
bin/geneiase -t icd -i test/icd.test.input.tab -b 100
Optionally add geneiase to your shell PATH

By adding the 'geneiase/bin' directory to your shell PATH you can execute the program without needing to entering the directory where it resides or using the full path.

Running GeneiASE

See Test the installation above for an example.

Required arguments

Only two arguments are required.

  1. -t, followed by a string with the allowed values "static" or "icd", specifying if static or individual condition-dependent ASE is to be tested for.
  2. -i, followed by the input file name. The input file should contain tab-separated columns. In the case of static ASE there should be four columns: feautureID, snpID, alternative allele count, reference allele count, and in the case of icd-ASE there should be six columns: feautureID, snpID, Untreated alternative allele count, Untreated reference allele count, Treated alternative allele count, Treated reference allele count.
Description of arguments

For detailed help on each available argument run the program with the -h flag:

geneiase -h
Input files

Examples of how to format the input files you find in the directory 'test' as part of the code-bundle.

Output description

A test-statistica, s, is generated for each variant reflecting the effect-size (see paper). In a meta-analysis approach the effect sizes for all variants within a gene is combined. We provide several simple gene-wise measures based on the variant effect-sizes. The p-value is based on the Liptak-Stouffer method (column liptak.s) and the generation of a null distribution from resampling of a parametric model (see paper). The fields in the tab-separated output file are:

  • feat: FeatureID as specified in the input file (typically a gene identifier)
  • n.vars: Number of variants within the gene
  • mean.s: Mean of s across the variants within the gene
  • median.s: Median of s across the variants within the gene
  • sd.s: Standard deviation of s across the variants within the gene
  • cv.s: Coefficient of variation of s across the variants within the gene
  • liptak.s: Stouffer-Liptak combination of s
  • p.nom: Nominal p-value
  • fdr: Benjamini-Hochberg corrected p-value

Typical workflow for generation of the input table containing allelic read counts

We are not aware of any pipeline that on a genome-wide scale is designed for the particular purpose of generating a table with allelic read counts. The generation of allelic counts starting from raw reads can be specific to sequencing technology and desired tools (such as choice of read mapping and variant calling software). However, to facilitate the generation of such a table we list a possible workflow below.

  1. Quality control of reads, such as trimming (FASTQ)
  2. Map reads (BAM)
  3. Quality control of mapped reads, such as PCR duplicate removal
  4. Call variants (VCF)
  5. Filter variants on heterozygosity, read depth, and on satisfying other mapping quality and variant calling criteria.
  6. Optional: Assess mapping bias for each variant and either use the mapping bias estimate of each variant in downstream analysis or filter out variants exhibiting mapping bias.
  7. Optional: If there are several samples from the same individual, such as in the case of cd-ASE, filter out variants which differ between the samples.
  8. Annotate variants
  9. Filter on relevant annotation, such as within a gene or presence in dbSNP.
  10. Get allelic counts for filtered variants based on a pileup of mapped reads.

Citing GeneiASE

If you use GeneiASE, please cite it as follows:

Edsgärd D. et al., GeneiASE: Detection of condition-dependent and static allele-specific expression from RNA-seq data without haplotype information, Scientific Reports, 2016

Contact

Please contact the corresponding author if you have technical issues or other comments or questions:
Olof Emanuelsson

geneiase's People

Contributors

edsgard avatar

Stargazers

jundu avatar LIU avatar  avatar  avatar  avatar LiPidong avatar Jimmy Breen avatar Olof Emanuelsson avatar Etai Jacob avatar Kiran N' Bishwa avatar  avatar Edoardo "Dado" Marcora avatar Kamil Slowikowski avatar Colin Davenport avatar

Watchers

Olof Emanuelsson avatar Leland Taylor avatar  avatar

geneiase's Issues

GeneiASE runs very slow on SGE scheduler

I'm trying to use GeneiASE on an HPC setup of (2 cores + 40Gb sharedmem) and the job is going through 18K-19K features. The jobs almost take 4 days to complete (with absolutely no error). Here is the head of the output:
++++++++++++++++++++++++++++++++++++++
Program parameters:
-t: static
-i: ASERC.txt
-o: static.txt
-p: 0.49
-r: 0.012
-b: 100000
-m: 2
-x: 100
Reading input data from file...
Input: 19278 features (after filtered on -m 2)
Calculating p-values for features...
++++++++++++++++++++++++++++++++++++++
The head of error:
Loading required package: methods
Loading required package: stats4
Loading required package: splines
+++++++++++++++++++++++++++++++++++++++

I have even tried the run with Microsoft R Open (MKL multithread libraries) but there is still a very slow progress. Any help or tip to speed up the run would be appreciated.

Cheers.

Geneiase is not executed by job scheduler (Slurm)

Geneiase has been properly installed, and I tested it with the provided test files.

I need to run it on 100 samples. When I try running it directly on the command line, geneiase starts without problems. But then I have to stop it, since I am working on a cluster and I should submit my jobs using an scheduler. We use slurm.

The problem is that when I submit a job, slurm does not return any error or standard output, even though the call worked in the command line.

Do zero p.nom and fdr values indicate significant ASE?

Hello,

I have a simple question. After running geneiase in static mode on my dataset, I see lots of genes reporting p.nom == 0 and fdr == 0. I also see these zero p.nom and fdr values in the output from the test datasets provided in the package. Here is an example from the test dataset:

    feat n.vars mean.s median.s  sd.s  cv.s liptak.s p.nom   fdr
   <dbl>  <dbl>  <dbl>    <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl>
 1  12.1     11   4.72     4.10  4.58 0.970    15.7      0     0
 2 124.       4   7.81     6.20  6.86 0.878    15.6      0     0
 3 151.       4   7.51     5.58  7.10 0.946    15.0      0     0

Do these genes with zero values have significant allele-specific expression? I'm curious if the zero values means lack of machine precision to represent very small p-values, or if these zero values are a placeholder value to indicate a failure in the statistical test for these genes?

Thank you for your help.

Error in rep_len(size, use.n) : cannot replicate NULL to a non-zero length

Hello. I'm trying to use geneise, but I'm getting an error from R and I don't know how to debug this.

The command I'm using is
bin/geneiase -t static -i test.tab -b 10

And the program's output is:


Loading required package: stats4
Loading required package: splines
Program parameters:
-t: static
-i: test.tab
-o: test.tab.static.gene.pval.tab
-p: 0.49
-r: 0.012
-b: 10
-m: 2
-x: 100
Reading input data from file...
Input: 371 features (after filtered on -m 2)
Calculating p-values for features...
Error in rep_len(size, use.n) :
  cannot replicate NULL to a non-zero length
Calls: main ... get.gene.null -> get.snv.null -> rbetabinom -> rbetabinom.ab
Execution halted

My R session info is:

R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.0.4

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.