A SCons script for Leukemia VCFs annotation and filter.
From the directory containing the launch_annotator.scons script launch it with all the required parameters. The only directory needed must be called 00_starting_vcfs and it should have all the VCF to annotate. Example:
scons -f launch_annotator.scons SNPSIFT_PATH=~/local/snpEff/ GNOMAD_ANNOTATION_FILE_PATH=~/annotations/gnomad.exomes.r2.0.2.sites.vcf.bgz DBSNP_ANNOTATION_FILE_PATH=~/annotations/All_20180423.vcf.gz DBNSFP_ANNOTATION_FILE_PATH=~/annotations/dbNSFP2.9.3_lite.txt.gz FATHMM_RANKSCORE=0.3 GENOME_VERSION=GRCh37.75 AF_VALUE=0.05 AF=AF_NFE CLINVAR_ANNOTATION_FILE_PATH=~/annotations/clinvar_20180603.vcf JUNK_GENES_FILE_PATH=~/annotations/junk_genes.txt LEUKEMIA_GENES=~/annotations/leukemia_genes.txt COSMIC_FILE_PATH=~/annotations/CosmicCodingMuts.vcf.gz NX=2 RNA_TISSUE_CONSENSUS=~/annotations/rna_tissue_consensus.tsv.zip DEBUG=T
-
Add GT to VCFs produced by Strelka2. Diretory of results: 01_strelka2_gt
-
SnpEff annotation. Directory of results: 02_snpeff
-
gnomAD annotation. Directory of results: 03_gnomad
-
dbSNP annotation. Directory of results: 04_dbsnp
-
dbNSFP annotation. Diretroy of results: 05_dbnsfp
-
Clinvar annotation. Directory of results: 06_clinvar
-
HIGH and MODERATE filter. Directory of results: 07_high_moderate
-
PASS filter. Directory of results: *08_pass
-
AF filter by alternate allele frequency: Directory of results: 09_population_af
-
No Junk filter out junk genes: Directory of results: 10_no_junk
-
No SNPs filter out SNPs using COSMIC VCF file of all coding mutations in the current release. Directory of results: 11_no_snps
-
Leukemia genes keep only genes associated to leukemia or cancer. It includes the genes from Cancer Gene Census and Leukemia Gene Literature Database. Directory of results: 12_leukemia
-
Filter out germline variants from VarScan2. It uses the VarScan2 VCF produced by vs_format_converter.py and the remove_germline_variants_from_varscan.py script from iSeqs2. Directory of results: 13_no_varscan_germline
-
Build tab separated files from VCFs found in the 13_no_varscan_germline directory. Directory of results: 14_tables
-
Gene Expression Filter, filter out variants based on the level of expression in B, NK, T, bone marrow , dendritic, granulocytes and monocytes normal cells. Data from rna_tissue_consensus.tsv.zip file Uobtained from Human Protein Atlas. Directory of results: 15_filtered_by_gene_expression
-
Build tab separated files from VCFs found in the 14_filtered_by_gene_expression directory. Directory of results: 16_tables
-
Filter tab separated files by FATHMM_rankscore. Directory of results: 17_filtered_by_fathmm_tables
-
SNPSIFT_PATH: the directory containing the SnpSift.jar program
-
GNOMAD_ANNOTATION_FILE_PATH: the path for the gnomad.exomes file (example: ~/annotations/gnomad.exomes.r2.0.2.sites.vcf.bgz)
-
DBSNP_ANNOTATION_FILE_PATH: the path for the dbSNP file (example: ~/annotations/All_20180423.vcf.gz)
-
DBNSFP_ANNOTATION_FILE_PATH: the path for the dbNSFP file (example: ~/annotations/dbNSFP2.9.3_lite.txt.gz)
-
CLINVAR_ANNOTATION_FILE_PATH: the path for the ClinVar file (example: ~/annotations/clinvar_20180603.vcf)
-
FATHMM_RANKSCORE: the FATHMM_rankscore obtained from dbNSF to filter (example: 0.75, must be between 0 and 1)
-
SNPEFF_DATA_DIR the directory where SnpEff will download the annotation files (example: ~/annotations)
-
AF_VALUE the maximum value to be used from the alternative allele frequency from a given population (default: 0.05)
-
AF the population to be used (default: AF_NFE, Non-Finnish European genotypes)
-
JUNK_GENES_FILE_PATH a file containing the list of junk genes. One gene for each line
-
LEUKEMIA_GENES a file containing the list of genes associated to leukemia or cancer in general. One gene for each line
-
COSMIC_FILE_PATH the path to the CosmicCodingMuts.vcf.gz file downloaded from COSMIC (example: ~/annotations/CosmicCodingMuts.vcf.gz)
-
RNA_TISSUE_CONSENSUS the path to the rna_tissue_consensus.tsv.zip file downloaded from The Human Protein Atlas (example: ~/annotations/rna_tissue_consensus.tsv.zip)
-
NX the number to be used for filtering from The Human Protein Atlas (example: 2)
-
DEBUG T or F, show or do not show debug informations about the launched commands (default: F)