The geneiase from edsgard

About

GeneiASE is a software for detection of condition-dependent allele specific expression in single individuals. GeneiASE does not require haplotype phasing and performs consistently over a range of read depths and ASE effect sizes. See the paper for further information.

Copyright © 2015 Daniel Edsgärd, Olof Emanuelsson
GeneiASE is available free to use, under the GNU GPL version 3 license.

Download

You find the latest stable release here.

Installation

Prerequisites

There are a number of R package dependencies. The dependencies can be installed from within R by:

install.packages(c('getopt', 'binom', 'VGAM'))

Test the installation

GeneiASE can be run from the shell prompt by entering the downloaded and unzipped directory (geneiase) and then executing the program, residing in the bin directory:

cd geneiase
bin/geneiase -t static -i test/static.test.input.tab -b 100
bin/geneiase -t icd -i test/icd.test.input.tab -b 100

Optionally add geneiase to your shell PATH

By adding the 'geneiase/bin' directory to your shell PATH you can execute the program without needing to entering the directory where it resides or using the full path.

Running GeneiASE

See Test the installation above for an example.

Required arguments

Only two arguments are required.

-t, followed by a string with the allowed values "static" or "icd", specifying if static or individual condition-dependent ASE is to be tested for.
-i, followed by the input file name. The input file should contain tab-separated columns. In the case of static ASE there should be four columns: feautureID, snpID, alternative allele count, reference allele count, and in the case of icd-ASE there should be six columns: feautureID, snpID, Untreated alternative allele count, Untreated reference allele count, Treated alternative allele count, Treated reference allele count.

Description of arguments

For detailed help on each available argument run the program with the -h flag:

geneiase -h

Input files

Examples of how to format the input files you find in the directory 'test' as part of the code-bundle.

Output description

A test-statistica, s, is generated for each variant reflecting the effect-size (see paper). In a meta-analysis approach the effect sizes for all variants within a gene is combined. We provide several simple gene-wise measures based on the variant effect-sizes. The p-value is based on the Liptak-Stouffer method (column liptak.s) and the generation of a null distribution from resampling of a parametric model (see paper). The fields in the tab-separated output file are:

feat: FeatureID as specified in the input file (typically a gene identifier)
n.vars: Number of variants within the gene
mean.s: Mean of s across the variants within the gene
median.s: Median of s across the variants within the gene
sd.s: Standard deviation of s across the variants within the gene
cv.s: Coefficient of variation of s across the variants within the gene
liptak.s: Stouffer-Liptak combination of s
p.nom: Nominal p-value
fdr: Benjamini-Hochberg corrected p-value

Typical workflow for generation of the input table containing allelic read counts

We are not aware of any pipeline that on a genome-wide scale is designed for the particular purpose of generating a table with allelic read counts. The generation of allelic counts starting from raw reads can be specific to sequencing technology and desired tools (such as choice of read mapping and variant calling software). However, to facilitate the generation of such a table we list a possible workflow below.

Quality control of reads, such as trimming (FASTQ)
Map reads (BAM)
Quality control of mapped reads, such as PCR duplicate removal
Call variants (VCF)
Filter variants on heterozygosity, read depth, and on satisfying other mapping quality and variant calling criteria.
Optional: Assess mapping bias for each variant and either use the mapping bias estimate of each variant in downstream analysis or filter out variants exhibiting mapping bias.
Optional: If there are several samples from the same individual, such as in the case of cd-ASE, filter out variants which differ between the samples.
Annotate variants
Filter on relevant annotation, such as within a gene or presence in dbSNP.
Get allelic counts for filtered variants based on a pileup of mapped reads.

Citing GeneiASE

If you use GeneiASE, please cite it as follows:

Edsgärd D. et al., GeneiASE: Detection of condition-dependent and static allele-specific expression from RNA-seq data without haplotype information, Scientific Reports, 2016

Contact

Please contact the corresponding author if you have technical issues or other comments or questions:
Olof Emanuelsson

Error in rep_len(size, use.n) : cannot replicate NULL to a non-zero length

Hello. I'm trying to use geneise, but I'm getting an error from R and I don't know how to debug this.

The command I'm using is
bin/geneiase -t static -i test.tab -b 10

And the program's output is:


Loading required package: stats4
Loading required package: splines
Program parameters:
-t: static
-i: test.tab
-o: test.tab.static.gene.pval.tab
-p: 0.49
-r: 0.012
-b: 10
-m: 2
-x: 100
Reading input data from file...
Input: 371 features (after filtered on -m 2)
Calculating p-values for features...
Error in rep_len(size, use.n) :
  cannot replicate NULL to a non-zero length
Calls: main ... get.gene.null -> get.snv.null -> rbetabinom -> rbetabinom.ab
Execution halted

My R session info is:

R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.0.4

edsgard / geneiase Goto Github PK

geneiase's Introduction

About

Download

Installation

Prerequisites

Test the installation

Optionally add geneiase to your shell PATH

Running GeneiASE

Required arguments

Description of arguments

Input files

Output description

Typical workflow for generation of the input table containing allelic read counts

Citing GeneiASE

Contact

geneiase's People

Contributors

Stargazers

Watchers

Forkers

geneiase's Issues

Recommend Projects

Recommend Topics

Recommend Org