Giter VIP home page Giter VIP logo

afcn's Introduction

aFC-n

aFC-n calculates the effect size of conditionally independent Expression Quantitative Trait Loci (eQTLs) based on allelic Fold Change (aFC), which could be used to predict genetically driven gene expressionan and allelic imbalance using conditional eQTL data. This script calculates aFCs using least squares optimization (levenberg-marquardt) for a set of eQTLs given a set of gene expressions and phased VCF file. See the manuscript for method description.

Installation

First install all the dependencies by doing:

pip3 -r requirements.txt

You will also need gcc and the python 3 development headers. If you are on a Debian based distro, do:

sudo apt install gcc python3-dev

Or on RHEL derivatives:

sudo dnf install gcc python3-devel

Finally, compile the cython files:

bash compile.sh

Running the code effectively

Running the least squares optimization on several cores is highly recommended, see the plot below as well as the usage examples for more details. alt text

Usage Examples

Calculating aFCs without confidence intervals on a single core:

python3 afcn.py --vcf input_vcf.gz --expr input_expressions.gz --eqtl eqtls.txt --output output.txt

Calculating aFCs with confidence intervals on a single core:

python3 afcn.py --conf --vcf input_vcf.gz --expr input_expressions.gz --eqtl eqtls.txt --output output.txt

Calculating aFCs with confidence intervals using 12 cores:

python3 afcn.py -j 12 --conf --vcf input_vcf.gz --expr input_expressions.gz --eqtl eqtls.txt --output output.txt

Calculating aFCs with confidence intervals using 12 cores, with the expressions being in GCT format:

python3 afcn.py --gct -j 12 --conf --vcf input_vcf.gz --expr input_expressions.gz --eqtl eqtls.txt --output output.txt

Calculating aFCs with confidence intervals using 12 cores, with the expressions being log transformed and normalized (--logtransform will do log-transform, --normalize will do log-transform and normalization) :

python3 afcn.py --normalize --logtransform -j 12 --conf --vcf input_vcf.gz --expr input_expressions.gz --eqtl eqtls.txt --output output.txt

Use flags

Required

--vcf VCF-FILE Genotype VCF

--expr EXPR-FILE Expressions file

--eqtl eQTL-FILE File containing QTL to calculate allelic fold change

--output OUT-FILE Output file name

Optional

--nthreads N Number of threads to do fitting on

--conf Calculate confidence intervals for aFC estimates

--normalize Expressions matrix has not been normalized yet

--logtransform Expressions matrix has not been log transformed yet

--splitexpr If set, the individual names in the expressions file will be split on “-” characters and the parts of the name on the two side of the first “-” character will be retained.

--gct If set, it will be assumed that the Expressions file is in gct format

Input formats

IMPORTANT: The REF and ALT information in the VCF file should match the REF and ALT information in the EQTL matrix

Expressions file

The script expects a gzipped file as an input for gene counts. The input gene counts should be in the format:

Name, sample_id1, sample_id2..

Where Name is a column that has the gene ID (such as ENSG00000224533), which should be in the same format as the gene IDs in the EQTL file.

eQTLs

This file should contain gene IDs, variant IDs that match - it can also contain other stuff, but it needs to contain at least these two columns:

gene_id	variant_id other_stuff1 other_stuff2...

VCF file

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO  sample_id1  sample_id2...

where #CHROM is the chr # POS is the position, ID is the variant ID in the format chr1_13550_G_A_b38, REF is the reference allele, ALT is the alternative allele. A tabix index needs to be generated for the gzipped vcf file:

bgzip vcf_file.vcf && tabix -p vcf vcf_file.vcf.gz

Output file

Your output will be your input EQTL matrix, except that the resulting columns will be added as the additional columns at the end.

By default

log2_aFC - the resulting aFC values

log2_aFC_error - Standard error of the aFCs

log2_aFC_c0 - The residual from the fit

With the --conf flag

log2_aFC_min_95_interv - The lower 95% conf interval for the aFC value

log2_aFC_plus_95_interv - The upper 95% conf interval for the aFC value

log2_aFC_c0_min_95_interv - The lower 95% conf interval for the residual

log2_aFC_c0_plus_95_interv - The upper 95% conf interval for the residual

afcn's People

Contributors

navaehsan avatar

Stargazers

transposons avatar

Watchers

Daniel Munro avatar Robert Vogel avatar Pejman avatar

Forkers

dtaylo95

afcn's Issues

NameError: name 'read_eqtls' is not defined

Hello, thank you very much for providing this tool. I have some problems now, is there any easy way to solve it, thank you
Traceback (most recent call last):
File "/public/home/yangjie/yangys/biosoft/aFCn-main/src/afcn.py", line 219, in
main()
File "/public/home/yangjie/yangys/biosoft/aFCn-main/src/afcn.py", line 55, in main
eqtl_dataframe = read_eqtls(eqtl_filename)
^^^^^^^^^^
NameError: name 'read_eqtls' is not defined

No matching genes found between the EQTL and expressions files

I am getting the below error even though the genes in the eqtl file are the same ones in the expression file. I am attaching the two files for your attention. I hope you will help know what could be wrong.
exp.txt.gz
tt.tsv.gz

Done reading eqtls
Traceback (most recent call last):
  File "software/aFCn/src/afcn.py", line 218, in <module>
    main()
  File "software/aFCn/src/afcn.py", line 60, in main
    expr_dataframe = read_expressions(expressions_filename, eqtl_dataframe, args)
  File "parse.pyx", line 87, in parse.read_expressions
    raise Exception("No matching genes found between the EQTL and expressions files")
Exception: No matching genes found between the EQTL and expressions files

Correcting for covariates

Hello! I am interested in calculating aFC for genes with multiple causal variants and was pointed to this tool by Stephane Castel.

This seems like a really great extension of the aFC tool! I was curious if you had recommendations for how to handle covariates (e.g. PEER factors, genotyping PCs, etc.) when using this tool. I believe in the original aFC tool, covariates that were found to be associated with expression were regressed out of the expression values and then those corrected expression counts were used when calculating aFC.

Would something similar work here (passing the covariate-corrected expression values to aFCn)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.